API
GraphCollection
- class collaboration_detection.datastructures.Vertex(graph: Graph, vertex_id: int, label: str, label_id: int, metadata: Dict[str, Any] | None = None, **kwargs)
Defines a vertex/node with a given label and incoming and outgoing edges.
- add_edge(other_vertex: Vertex, directed=True, edge_metadata: Dict[str, Any] | None = None) Vertex
Add an outgoing edge to this vertex. The other vertex must be present in the same graph.
- Parameters:
other_vertex – The other vertex
directed – Defines if the edge is directed or not
edge_metadata – The optional metadata for the edge
- Returns:
this vertex
- property as_str_rep: str
To string representation: The id of the vertex is printed, as well as the label_id as integer (No metadata are saved)
‘v id label’
e.g.: ‘v 1 3’ :return: A string representation of the vertex
- get_edge(other: Vertex) Edge | None
Get an outgoing edge to the other Vertex, return None if there is no edge
- Parameters:
other – The other Vertex
- Returns:
An edge or None if there is no edge
- has_edge(other: Vertex) bool
Check if there is an outgoing edge to the other Vertex
- Parameters:
other – The other Vertex
- Returns:
True if there is an edge, else False
- class collaboration_detection.datastructures.Edge(graph, from_vertex: Vertex, to_vertex: Vertex, metadata: Dict[str, Any] | None = None, **kwargs)
An edge represents a directed arc between two vertices.
- property as_edge_rep
Save as edge representation. (No metadata are saved)
- Returns:
A tuple of the source vertex label and the sink vertex label, and the type
- property as_str_rep: str
The str representation of the edge where the ids of the vertices are used. (No metadata are saved)
‘e source sink type’
e.g.:
‘e 1 2 1’
- Returns:
A string representation of the edge
- class collaboration_detection.datastructures.Graph(graph_collection: GraphCollection, graph_id: int, cluster_id: int | None = None, metadata: Dict[str, Any] | None = None, **kwargs)
A graph represents a set of vertices and a set of edges which connects the vertices.
- add_edge(from_vertex: Vertex, to_vertex: Vertex, directed=True, edge_metadata: Dict[str, Any] | None = None)
Add a new edge in this graph. Both vertices must be part of this graph.
- Parameters:
from_vertex – The source vertex
to_vertex – The sink vertex
directed – if False, the edge is created twice. In the second edge the source and sink vertex are swapped.
edge_metadata – Optional metadata for the edge
- property as_adjacency_matrix: DataFrame
Create an adjacency matrix of this graph as Pandas DataFrame.
- Returns:
the adjacency matrix of the graph as Pandas DataFrame
- property as_edge_rep: Dict
Returns a dict of the graph with all edges. The keys are tuples of strings (the two connected vertices) The vertices are defined by their label (string). The value is the type of the edge. (No metadata are saved)
e.g.
{(‘A’,’B’):2,(‘B’,’C’):1}
- Returns:
The edge representation of a graph
- property as_edge_tuple: List[tuple]
Returns a dict of the graph with all edges. The keys are tuples of strings (the two connected vertices) The vertices are defined by their label (string). The value is the type of the edge. e.g.
{(‘A’,’B’):2,(‘B’,’C’):1}
- Returns:
The edge representation of a graph
- property as_graph_map: Graph
Converts the graph into a graph map. (No metadata are saved) :return: A Graph Map
- property as_networkx_digraph: DiGraph
Converts the graph into a python networkx graph object :return: DiGraph object of this graph
- property as_str_rep: str
Returns a string of the graph with all vertices and edges. (No metadata are saved)
t # graph_id
v vertex_id vertex_attributes_id
v vertex_id vertex_attributes_id
…
e vertex_source_id vertex_sink_id edge_type
e vertex_source_id vertex_sink_id edge_type
…
- Returns:
The string representation of the graph
- get_vertex(label: str, *, force_create=False, vertex_id: int | None = None, metadata: Dict[str, Any] | None = None) Vertex
Get the vertex with the given label. If the vertex does not exist, it will be created.
- Parameters:
label – The label of the requested vertex
force_create – Force create a new vertex, even if there is a vertex with the same label
vertex_id – If provided, the vertex with this id is returned if exising, else created with this id. If the force create is True, and no vertex_id is provided, a new vertex id will be created.
metadata – The optional metadata added to the vertex if the vertex is created. If the vertex already this parameter will be ignored.
- Returns:
The vertex with the given label
- get_vertex_by_id(vertex_id: int) Vertex | None
Get the vertex with the given vertex_id. If the vertex does not exist, the function will return None.
- Parameters:
vertex_id – The label of the requested vertex
- Returns:
The vertex with the given vertex_id or None, if the vertex does not exist.
- is_subgraph_of(other: Graph) bool
Check if the current graph is a subgraph of the other graph. A graph is a subgraph, only if all its vertices are in the other graph and all its edges are also in the other graph.
- Parameters:
other – The other graph (the super graph)
- Returns:
True if the current graph is a subgraph of the other graph.
- to_dot_digraph(path: str | None = None, rankdir: str = 'TB') Digraph
Create a digraph object of the graph. :param path: if filled with a path to a file name (.png/.svg) the digraph is saved to the given path :param rankdir: the rank direction of the graph, “TB” or “LR” :return: a digraph
- class collaboration_detection.datastructures.GraphCollection(*, label_list: str | List[str] | None = None, collection_id: str | None = None)
A GraphCollection represents a collection of graphs. It holds these graphs in a list, where each graph has a unique ID. It further contains a shared mapping for the labels.
- as_edge_rep() List[Dict]
Creates an edge representation of the graphs. So the output of this function is a list of graphs. Each graph is represented as a dict. Each Edge is represented as a tuple of two strings, which are the both connected vertices. The value represents the type of the edge.
[{(‘A’,’B’):2,(‘B’,’C’):1},{(‘A’,’C’):1,(‘C’,’B’): 1},…]
- Returns:
a list of dicts (graphs as edge representation)
- as_networkx_digraphs() List[DiGraph]
Creates a list of networkx.DiGraph objects.
- Returns:
A list of networkx.DiGraph objects of the graphs
- as_str_rep() str
Creates the string representation of the graph collection. Important: The labels are printed as integer values (the id of the label)
t # graph_id
v vertex_id vertex_attributes_id
v vertex_id vertex_attributes_id
…
e vertex_source_id vertex_sink_id edge_type
e vertex_source_id vertex_sink_id edge_type
…
t # graph_id
…
- Returns:
The string representation of the graphs
- as_str_rep_lines() List[str]
Creates the string representation of the graph collection. Important: The labels are printed as integer values (the id of the label)
- Returns:
a list of strings (the string representation of the graphs)
- clean(label_list: str | List[str] | None = None)
Clean the graph collection. Optional initialize the label mapping with the given label_list. The label_list should contain strings of the labels.
- Parameters:
label_list – The list of labels. Or path to the pickle file.
- property collection_id
Get the collection id of this collection
- export(path: str)
Saves the graph collection as pickle file.
- File:
the path of the file
- filter(vertex_count_min: int | None = None, vertex_count_max: int | None = None, edge_count_min: int | None = None, edge_count_max: int | None = None, vertex_label_in: str | None = None, vertex_label_is: str | None = None, cluster_id: int | None = None, expected_metadata: Dict[str, Any] | None = None, filter_func: Callable[[Graph], bool] | None = None) Iterator[Graph]
Creates an iterator for this graph collection. It iterates over all graphs and applies the given filter.
- Parameters:
vertex_count_min – The graph should contain at least x vertices
vertex_count_max – The graph should contain at maximum x vertices
edge_count_min – The graph should contain at least x edges
edge_count_max – The graph should contain at maximum x edges
vertex_label_in – The graph should contain a vertex with a label matches a part of this value
vertex_label_is – The graph should contain a vertex with a label matches this value
cluster_id – The graph should be part of the given cluster_id
expected_metadata – The graph should have the values of the specified expected_metadata
filter_func – A filter function that gets a graph and should return true or false
- Returns:
Iterator over the graphs
- get_label(label_id: int) str
Get the label for the given label id
- Parameters:
label_id – the id of the label
- Returns:
the label as string or None if the id is not present
- get_set_label_id(label: str) int
Get the id of the given label. If the label is not present, it is created for this graph_collection.
- Parameters:
label – the label as string
- Returns:
the id of the label
- init_label_mapping(label_list: str | List[str] | None = None)
Init the label mapping with the given list of label tuples. The index of the label is the id of the label mapping. The param label_list can be a path to a pickle file which was created with ‘save_label_mapping’.
- Parameters:
label_list – The list of labels. Or path to the pickle file.
- static load(path: str) GraphCollection
Loads the graph collection from a pickle file
- File:
the path of the file
- load_graph_from_heuristic_net(heu_net: HeuristicsNet, ignore_type=True, graph_metadata: Dict[str, Any] | None = None) Graph
Loads a graph from a heuristic net.
- Parameters:
heu_net – The heuristic net
ignore_type – Ignores the edge type (sets to 1)
graph_metadata – Additional graph metadata
- Returns:
the created graph
- load_graphs_from_edge_rep(graph_edge_rep: Dict[Tuple[str, str], int], ignore_type=True) Graph
Loads the graph collection from an edge representation.
e.g. [{(‘A’,’B’):2,(‘B’,’C’):1},{(‘A’,’C’):1,(‘C’,’B’): 1},…]
- Parameters:
graph_edge_rep – The edge representation object
ignore_type – Ignores the edge type (sets to 1)
- Returns:
the created graph
- load_graphs_from_str_rep(graph_str_rep: str | List[str], graph_metadata: Dict[str, Any] | None = None)
Loads the graph collection from a string representation. Important: The label_mapping must be initialized before starting the import! The labels are not stored in the string representation!
e.g.
t # graph_id
v vertex_id vertex_attributes_id
v vertex_id vertex_attributes_id
…
e vertex_source_id vertex_sink_id edge_type
e vertex_source_id vertex_sink_id edge_type
…
t # graph_id
- Parameters:
graph_str_rep – The string representation
graph_metadata – Additional graph metadata
- Returns:
the created graph
- load_graphs_from_str_rep_file(file)
Loads the graph collection from a string representation file. Important: The label_mapping must be initialized before starting the import.
e.g.
t # graph_id
v vertex_id vertex_attributes_id
v vertex_id vertex_attributes_id
…
e vertex_source_id vertex_sink_id edge_type
e vertex_source_id vertex_sink_id edge_type
…
t # graph_id
…
- Parameters:
file – The file of the string representation
- new_graph(*, graph_id: int | None = None, cluster_id: int | None = None, metadata: dict | None = None, **kwargs) Graph
Creates a new graph in this graph collection. The graph gets a new unique id.
- Parameters:
graph_id – An id for the graph
cluster_id – An id for the cluster the graph belongs to
metadata – Optional additional metadata as dict.
- Returns:
a new Graph object
- save_label_mapping(file)
Save the label mapping in a pickle file
- Parameters:
file – the file or path of the pickle file
- save_str_rep(file)
Saves the graph collection as string representation as file.
- Parameters:
file – The path or the file object
GraphMining
- class collaboration_detection.graph_mining.graph_miner.GraphMiner(converter: EventLogConverter | None = None, **kwargs)
Create graphs from event logs. The input events logs are processed and converted using a custom event log converter. Furthermore, this class can be used to create for each original trace individual graphs.
- property converter: EventLogConverter
Get the current event log converter of this graph miner.
- get_graph(event_log: DataFrame, graph_collection: GraphCollection | None = None) Tuple[GraphCollection, List[DataFrame] | DataFrame]
This function first converts the event log into a new event log and creates then a new graph. If the converter returns multiple sublogs, multiple graphs are created. The resulting graph(s) is/are stored inside the GraphCollection.
- Parameters:
event_log – The event log
graph_collection – A graph collection. If no graph collection is provided a new GraphCollection is created.
- Returns:
The GraphCollection with the minded graph(s)
- get_graphs_for_traces(event_log: DataFrame) Tuple[GraphCollection, List[DataFrame]]
This function first converts each trace into a new EventLog and creates then creates for each of them a new graph. The resulting graphs are stored inside the GraphCollection.
- Parameters:
event_log – The event log
- Returns:
The GraphCollection with the minded graphs
- class collaboration_detection.graph_mining.DfgGraphMiner(converter: EventLogConverter | None = None, **kwargs)
A simple graph mining algorithm that takes an event log and converts it into a Directly Follow Graph (DFG). No additional kwargs are required.
- class collaboration_detection.graph_mining.HeuristicGraphMiner(converter: EventLogConverter | None = None, **kwargs)
Discovers a heuristics net.
- The following kwargs parameters can be provided:
dependency_threshold: Dependency threshold (default: 0.5)
and_threshold: AND threshold (default: 0.65)
loop_two_threshold: Loop two threshold (default: 0.5)
- class collaboration_detection.graph_mining.CollaborationProcessInstanceMiner(converter: EventLogConverter | None = None, **kwargs)
This algorithm discovers a collaboration process instance.
- The following kwargs parameters can be provided:
- activity_classifier: list of attributes for the concept name of the activity nodes:
default: []
relation_attributes: Set of attribute names, which are used as object nodes that are connected to the activities.
activity_delimiter: Delimiter of the activity classifier attributes.
Preprocessing Converter
- class collaboration_detection.preprocessing.event_log_converter.EventLogConverter
Transform an event log into a new event log where the traces and/or events (and/or their attributes) are preprocessed / converted based on the concrete implementation of the converter.
- abstractmethod convert_event(event: Series) Series
Converts an event into a new event.
- Parameters:
event – The input event
- Returns:
The converted event
- abstractmethod convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]
Converts an event log into one or more new event log(s).
- Parameters:
event_log – The input event log
- Returns:
The converted event log(s)
- abstractmethod convert_trace(trace: DataFrame) DataFrame
Converts a trace into one (OR MORE) new trace(s).
- Parameters:
trace – The input trace
- Returns:
The converted trace or a list of traces
- abstractmethod is_sub_log_converter() bool
Does the converter returns multiple sub-logs?
- Returns:
True if convert_event_log returns multiple (sub-)logs
- iter_converted_traces(event_log: DataFrame) Iterable[DataFrame]
Iterates over an event log and yields all converted traces.
- Parameters:
event_log – The input event log
- class collaboration_detection.preprocessing.event_log_converter.DefaultEventLogConverter
The DefaultEventLogConverter converts all traces and events as they are (no modifications). Can be used as base class for inheritance.
- convert_event(event: Series) Series
Converts an event into a new event.
- Parameters:
event – The input event
- Returns:
The converted event
- convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]
Converts an event log into one or more new event log(s).
- Parameters:
event_log – The input event log
- Returns:
The converted event log(s)
- convert_trace(trace: DataFrame) DataFrame
Converts a trace into one (OR MORE) new trace(s).
- Parameters:
trace – The input trace
- Returns:
The converted trace or a list of traces
- is_sub_log_converter() bool
Does the converter returns multiple sub-logs?
- Returns:
True if convert_event_log returns multiple (sub-)logs
- class collaboration_detection.preprocessing.event_log_converter.ActivityJoinerConverter(activity_classifier: List[str] | None = None, activity_delimiter=' / ')
The ActivityJoinerConverter converts the concept:name of all events by joining the attributes provided by activity_classifier. All other attributes are untouched.
- convert_event_log(event_log: DataFrame) DataFrame
Converts an event log into one or more new event log(s).
- Parameters:
event_log – The input event log
- Returns:
The converted event log(s)
- class collaboration_detection.preprocessing.event_log_converter.TraceSplitConverter(split_attributes: List[str])
The trace split converter convert a trace from the event log into a new trace with modified case IDs. The new case IDs are created by appending a unique identifier to the original case ID for each trace. The unique identifier is created by concatenating the values (the category codes) for the attributes specified in the split_attributes list. If the split_attributes list is empty, the case IDs are not modified.
- convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]
Converts an event log into one or more new event log(s).
- Parameters:
event_log – The input event log
- Returns:
The converted event log(s)
- convert_trace(trace: DataFrame) DataFrame
Converts a trace into one (OR MORE) new trace(s).
- Parameters:
trace – The input trace
- Returns:
The converted trace or a list of traces
- class collaboration_detection.preprocessing.event_log_converter.CombinedConverter(converter_list: List[EventLogConverter])
The CombinedConverter combines multiple event log converters into a single converter. The convert_event_log, convert_trace, and convert_event methods apply the corresponding method of each converter in the given order to the input data.
- convert_event(event: Series) Series
Converts an event into a new event.
- Parameters:
event – The input event
- Returns:
The converted event
- convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]
Converts an event log into one or more new event log(s).
- Parameters:
event_log – The input event log
- Returns:
The converted event log(s)
- convert_trace(trace: DataFrame) DataFrame
Converts a trace into one (OR MORE) new trace(s).
- Parameters:
trace – The input trace
- Returns:
The converted trace or a list of traces
- is_sub_log_converter() bool
Does the converter returns multiple sub-logs?
- Returns:
True if convert_event_log returns multiple (sub-)logs
- class collaboration_detection.preprocessing.event_log_converter.SublogConverter(split_attributes: List[str], similarity_attributes: List[str], timedelta_s: int = 0)
Convert an event log into a list of sublogs. The split_attributes list is used to split each existing trace into multiple subtraces which are then are used to create the sublogs. Each sublog contains a group of related (sub)traces, where the trace are considered as related if their events have overlapping timestamps and share common values for the attributes specified in the similarity_attributes list. The timedelta parameter defines how many seconds two events of two traces can be apart from each other.
- convert_event_log(event_log: DataFrame) List[DataFrame]
Converts an event log into one or more new event log(s).
- Parameters:
event_log – The input event log
- Returns:
The converted event log(s)
- is_sub_log_converter() bool
Does the converter returns multiple sub-logs?
- Returns:
True if convert_event_log returns multiple (sub-)logs
- class collaboration_detection.preprocessing.event_log_converter.AddPseudoEventConverter(add_pseudo_start_event=True, add_pseudo_end_event=True, attributes_start_event: Dict[str, Any] | None = None, attributes_end_event: Dict[str, Any] | None = None)
This converter adds a new pseudo-event at the beginning and end of the trace, depending on the values of the add_pseudo_start_event and add_pseudo_end_event variables. The timestamp of the events is based on the first/last event -/+ one second.
The new events are created using the attributes_start_event and attributes_end_event dictionaries and are added to the trace.
- convert_trace(trace: DataFrame) DataFrame
Converts a trace into one (OR MORE) new trace(s).
- Parameters:
trace – The input trace
- Returns:
The converted trace or a list of traces
- class collaboration_detection.preprocessing.event_log_converter.TraceMergerConverter(*, new_case_id='NEW_DEFAULT_CASE', case_id_provider: Callable[[], str] | None = None)
Merges all traces of the event log into a single trace by setting the CASE_CONCEPT_NAME to the value provided by the function case_id_provider or by a fixed value defined by new_case_id.
- convert_event_log(event_log: DataFrame) DataFrame
Converts an event log into one or more new event log(s).
- Parameters:
event_log – The input event log
- Returns:
The converted event log(s)
- class collaboration_detection.preprocessing.event_log_converter.AddCountAttributeInTraceConverter(columns_with_value_prefix: Dict[str, str], column_prefix='counted_')
The AddCountAttributeInTraceConverter adds count-based attributes to specified columns in a trace DataFrame. The parameter columns_with_value_prefix (Dict[str, str]) defines a dictionary specifying columns and their corresponding value prefixes. The value prefixes will be used to create a new attribute for each unique value in the specified columns. The parameter column_prefix defines a prefix that will be added to the newly created columns.
- convert_trace(trace: DataFrame) DataFrame
Converts a trace into one (OR MORE) new trace(s).
- Parameters:
trace – The input trace
- Returns:
The converted trace or a list of traces
Frequent Subgraph Mining
Graph Set Clustering
- class collaboration_detection.clustering.clustering.Clustering
- static execute(graph_collection: GraphCollection, algorithm: Algorithm, distance_metric: DistanceMetric, number_of_clusters: int | None = None, cluster_division_selector_metric: ClusterDivisionSelectorMetric | None = None, cluster_representative_seeder: ClusterRepresentativeSeeder | None = None, cluster_dissimilarity: float | None = None, cluster_centroid_selector: ClusterCentroidSelector | None = None) List[Cluster]
This function is the entry point into the clustering package.
- Parameters:
graph_collection – the graph collection to get the graphs from. The resulting clusters are updated inplace.
algorithm – the clustering_tests algorithm to use for clustering_tests
distance_metric – the distance metric to calculate the distances between the graphs
number_of_clusters – the number of clusters as stopping criterion for the hierarchical algorithm
cluster_division_selector_metric – the selector for the cluster to divided for the hierarchical algorithm
cluster_representative_seeder – the seed selector for the split clusters of the hierarchical algorithm
cluster_dissimilarity – the dissimilarity measure for the dissimilar cluster centroid selector for density and partitioning algorithm
cluster_centroid_selector – the selected cluster centroid selector for the partitioning algorithm
- Returns:
Resulting clusters