API

GraphCollection

class collaboration_detection.datastructures.Vertex(graph: Graph, vertex_id: int, label: str, label_id: int, metadata: Dict[str, Any] | None = None, **kwargs)

Defines a vertex/node with a given label and incoming and outgoing edges.

add_edge(other_vertex: Vertex, directed=True, edge_metadata: Dict[str, Any] | None = None) Vertex

Add an outgoing edge to this vertex. The other vertex must be present in the same graph.

Parameters:
  • other_vertex – The other vertex

  • directed – Defines if the edge is directed or not

  • edge_metadata – The optional metadata for the edge

Returns:

this vertex

property as_str_rep: str

To string representation: The id of the vertex is printed, as well as the label_id as integer (No metadata are saved)

‘v id label’

e.g.: ‘v 1 3’ :return: A string representation of the vertex

get_edge(other: Vertex) Edge | None

Get an outgoing edge to the other Vertex, return None if there is no edge

Parameters:

other – The other Vertex

Returns:

An edge or None if there is no edge

has_edge(other: Vertex) bool

Check if there is an outgoing edge to the other Vertex

Parameters:

other – The other Vertex

Returns:

True if there is an edge, else False

property preceding_vertices: List[Vertex]

Return all preceding vertices (from incoming edges).

Returns:

All preceding vertices (from incoming edges)

property proceeding_vertices: List[Vertex]

Return all proceeding vertices (from outgoing edges).

Returns:

All proceeding vertices (from outgoing edges)

class collaboration_detection.datastructures.Edge(graph, from_vertex: Vertex, to_vertex: Vertex, metadata: Dict[str, Any] | None = None, **kwargs)

An edge represents a directed arc between two vertices.

property as_edge_rep

Save as edge representation. (No metadata are saved)

Returns:

A tuple of the source vertex label and the sink vertex label, and the type

property as_str_rep: str

The str representation of the edge where the ids of the vertices are used. (No metadata are saved)

‘e source sink type’

e.g.:

‘e 1 2 1’

Returns:

A string representation of the edge

class collaboration_detection.datastructures.Graph(graph_collection: GraphCollection, graph_id: int, cluster_id: int | None = None, metadata: Dict[str, Any] | None = None, **kwargs)

A graph represents a set of vertices and a set of edges which connects the vertices.

add_edge(from_vertex: Vertex, to_vertex: Vertex, directed=True, edge_metadata: Dict[str, Any] | None = None)

Add a new edge in this graph. Both vertices must be part of this graph.

Parameters:
  • from_vertex – The source vertex

  • to_vertex – The sink vertex

  • directed – if False, the edge is created twice. In the second edge the source and sink vertex are swapped.

  • edge_metadata – Optional metadata for the edge

property as_adjacency_matrix: DataFrame

Create an adjacency matrix of this graph as Pandas DataFrame.

Returns:

the adjacency matrix of the graph as Pandas DataFrame

property as_edge_rep: Dict

Returns a dict of the graph with all edges. The keys are tuples of strings (the two connected vertices) The vertices are defined by their label (string). The value is the type of the edge. (No metadata are saved)

e.g.

{(‘A’,’B’):2,(‘B’,’C’):1}

Returns:

The edge representation of a graph

property as_edge_tuple: List[tuple]

Returns a dict of the graph with all edges. The keys are tuples of strings (the two connected vertices) The vertices are defined by their label (string). The value is the type of the edge. e.g.

{(‘A’,’B’):2,(‘B’,’C’):1}

Returns:

The edge representation of a graph

property as_graph_map: Graph

Converts the graph into a graph map. (No metadata are saved) :return: A Graph Map

property as_networkx_digraph: DiGraph

Converts the graph into a python networkx graph object :return: DiGraph object of this graph

property as_str_rep: str

Returns a string of the graph with all vertices and edges. (No metadata are saved)

  • t # graph_id

  • v vertex_id vertex_attributes_id

  • v vertex_id vertex_attributes_id

  • e vertex_source_id vertex_sink_id edge_type

  • e vertex_source_id vertex_sink_id edge_type

Returns:

The string representation of the graph

get_vertex(label: str, *, force_create=False, vertex_id: int | None = None, metadata: Dict[str, Any] | None = None) Vertex

Get the vertex with the given label. If the vertex does not exist, it will be created.

Parameters:
  • label – The label of the requested vertex

  • force_create – Force create a new vertex, even if there is a vertex with the same label

  • vertex_id – If provided, the vertex with this id is returned if exising, else created with this id. If the force create is True, and no vertex_id is provided, a new vertex id will be created.

  • metadata – The optional metadata added to the vertex if the vertex is created. If the vertex already this parameter will be ignored.

Returns:

The vertex with the given label

get_vertex_by_id(vertex_id: int) Vertex | None

Get the vertex with the given vertex_id. If the vertex does not exist, the function will return None.

Parameters:

vertex_id – The label of the requested vertex

Returns:

The vertex with the given vertex_id or None, if the vertex does not exist.

is_subgraph_of(other: Graph) bool

Check if the current graph is a subgraph of the other graph. A graph is a subgraph, only if all its vertices are in the other graph and all its edges are also in the other graph.

Parameters:

other – The other graph (the super graph)

Returns:

True if the current graph is a subgraph of the other graph.

to_dot_digraph(path: str | None = None, rankdir: str = 'TB') Digraph

Create a digraph object of the graph. :param path: if filled with a path to a file name (.png/.svg) the digraph is saved to the given path :param rankdir: the rank direction of the graph, “TB” or “LR” :return: a digraph

class collaboration_detection.datastructures.GraphCollection(*, label_list: str | List[str] | None = None, collection_id: str | None = None)

A GraphCollection represents a collection of graphs. It holds these graphs in a list, where each graph has a unique ID. It further contains a shared mapping for the labels.

as_edge_rep() List[Dict]

Creates an edge representation of the graphs. So the output of this function is a list of graphs. Each graph is represented as a dict. Each Edge is represented as a tuple of two strings, which are the both connected vertices. The value represents the type of the edge.

[{(‘A’,’B’):2,(‘B’,’C’):1},{(‘A’,’C’):1,(‘C’,’B’): 1},…]

Returns:

a list of dicts (graphs as edge representation)

as_networkx_digraphs() List[DiGraph]

Creates a list of networkx.DiGraph objects.

Returns:

A list of networkx.DiGraph objects of the graphs

as_str_rep() str

Creates the string representation of the graph collection. Important: The labels are printed as integer values (the id of the label)

  • t # graph_id

  • v vertex_id vertex_attributes_id

  • v vertex_id vertex_attributes_id

  • e vertex_source_id vertex_sink_id edge_type

  • e vertex_source_id vertex_sink_id edge_type

  • t # graph_id

Returns:

The string representation of the graphs

as_str_rep_lines() List[str]

Creates the string representation of the graph collection. Important: The labels are printed as integer values (the id of the label)

Returns:

a list of strings (the string representation of the graphs)

clean(label_list: str | List[str] | None = None)

Clean the graph collection. Optional initialize the label mapping with the given label_list. The label_list should contain strings of the labels.

Parameters:

label_list – The list of labels. Or path to the pickle file.

property collection_id

Get the collection id of this collection

export(path: str)

Saves the graph collection as pickle file.

File:

the path of the file

filter(vertex_count_min: int | None = None, vertex_count_max: int | None = None, edge_count_min: int | None = None, edge_count_max: int | None = None, vertex_label_in: str | None = None, vertex_label_is: str | None = None, cluster_id: int | None = None, expected_metadata: Dict[str, Any] | None = None, filter_func: Callable[[Graph], bool] | None = None) Iterator[Graph]

Creates an iterator for this graph collection. It iterates over all graphs and applies the given filter.

Parameters:
  • vertex_count_min – The graph should contain at least x vertices

  • vertex_count_max – The graph should contain at maximum x vertices

  • edge_count_min – The graph should contain at least x edges

  • edge_count_max – The graph should contain at maximum x edges

  • vertex_label_in – The graph should contain a vertex with a label matches a part of this value

  • vertex_label_is – The graph should contain a vertex with a label matches this value

  • cluster_id – The graph should be part of the given cluster_id

  • expected_metadata – The graph should have the values of the specified expected_metadata

  • filter_func – A filter function that gets a graph and should return true or false

Returns:

Iterator over the graphs

get_label(label_id: int) str

Get the label for the given label id

Parameters:

label_id – the id of the label

Returns:

the label as string or None if the id is not present

get_set_label_id(label: str) int

Get the id of the given label. If the label is not present, it is created for this graph_collection.

Parameters:

label – the label as string

Returns:

the id of the label

init_label_mapping(label_list: str | List[str] | None = None)

Init the label mapping with the given list of label tuples. The index of the label is the id of the label mapping. The param label_list can be a path to a pickle file which was created with ‘save_label_mapping’.

Parameters:

label_list – The list of labels. Or path to the pickle file.

static load(path: str) GraphCollection

Loads the graph collection from a pickle file

File:

the path of the file

load_graph_from_heuristic_net(heu_net: HeuristicsNet, ignore_type=True, graph_metadata: Dict[str, Any] | None = None) Graph

Loads a graph from a heuristic net.

Parameters:
  • heu_net – The heuristic net

  • ignore_type – Ignores the edge type (sets to 1)

  • graph_metadata – Additional graph metadata

Returns:

the created graph

load_graphs_from_edge_rep(graph_edge_rep: Dict[Tuple[str, str], int], ignore_type=True) Graph

Loads the graph collection from an edge representation.

e.g. [{(‘A’,’B’):2,(‘B’,’C’):1},{(‘A’,’C’):1,(‘C’,’B’): 1},…]

Parameters:
  • graph_edge_rep – The edge representation object

  • ignore_type – Ignores the edge type (sets to 1)

Returns:

the created graph

load_graphs_from_str_rep(graph_str_rep: str | List[str], graph_metadata: Dict[str, Any] | None = None)

Loads the graph collection from a string representation. Important: The label_mapping must be initialized before starting the import! The labels are not stored in the string representation!

e.g.

  • t # graph_id

  • v vertex_id vertex_attributes_id

  • v vertex_id vertex_attributes_id

  • e vertex_source_id vertex_sink_id edge_type

  • e vertex_source_id vertex_sink_id edge_type

  • t # graph_id

Parameters:
  • graph_str_rep – The string representation

  • graph_metadata – Additional graph metadata

Returns:

the created graph

load_graphs_from_str_rep_file(file)

Loads the graph collection from a string representation file. Important: The label_mapping must be initialized before starting the import.

e.g.

  • t # graph_id

  • v vertex_id vertex_attributes_id

  • v vertex_id vertex_attributes_id

  • e vertex_source_id vertex_sink_id edge_type

  • e vertex_source_id vertex_sink_id edge_type

  • t # graph_id

Parameters:

file – The file of the string representation

new_graph(*, graph_id: int | None = None, cluster_id: int | None = None, metadata: dict | None = None, **kwargs) Graph

Creates a new graph in this graph collection. The graph gets a new unique id.

Parameters:
  • graph_id – An id for the graph

  • cluster_id – An id for the cluster the graph belongs to

  • metadata – Optional additional metadata as dict.

Returns:

a new Graph object

save_label_mapping(file)

Save the label mapping in a pickle file

Parameters:

file – the file or path of the pickle file

save_str_rep(file)

Saves the graph collection as string representation as file.

Parameters:

file – The path or the file object


GraphMining

class collaboration_detection.graph_mining.graph_miner.GraphMiner(converter: EventLogConverter | None = None, **kwargs)

Create graphs from event logs. The input events logs are processed and converted using a custom event log converter. Furthermore, this class can be used to create for each original trace individual graphs.

property converter: EventLogConverter

Get the current event log converter of this graph miner.

get_graph(event_log: DataFrame, graph_collection: GraphCollection | None = None) Tuple[GraphCollection, List[DataFrame] | DataFrame]

This function first converts the event log into a new event log and creates then a new graph. If the converter returns multiple sublogs, multiple graphs are created. The resulting graph(s) is/are stored inside the GraphCollection.

Parameters:
  • event_log – The event log

  • graph_collection – A graph collection. If no graph collection is provided a new GraphCollection is created.

Returns:

The GraphCollection with the minded graph(s)

get_graphs_for_traces(event_log: DataFrame) Tuple[GraphCollection, List[DataFrame]]

This function first converts each trace into a new EventLog and creates then creates for each of them a new graph. The resulting graphs are stored inside the GraphCollection.

Parameters:

event_log – The event log

Returns:

The GraphCollection with the minded graphs

iter_graphs_for_traces(event_log: DataFrame) Iterator[Tuple[Graph, DataFrame]]

Iter all traces, convert them into a new EventLog and yield a graph for each of these traces.

Parameters:

event_log – The event log

class collaboration_detection.graph_mining.DfgGraphMiner(converter: EventLogConverter | None = None, **kwargs)

A simple graph mining algorithm that takes an event log and converts it into a Directly Follow Graph (DFG). No additional kwargs are required.

class collaboration_detection.graph_mining.HeuristicGraphMiner(converter: EventLogConverter | None = None, **kwargs)

Discovers a heuristics net.

The following kwargs parameters can be provided:
  • dependency_threshold: Dependency threshold (default: 0.5)

  • and_threshold: AND threshold (default: 0.65)

  • loop_two_threshold: Loop two threshold (default: 0.5)

class collaboration_detection.graph_mining.CollaborationProcessInstanceMiner(converter: EventLogConverter | None = None, **kwargs)

This algorithm discovers a collaboration process instance.

The following kwargs parameters can be provided:
  • activity_classifier: list of attributes for the concept name of the activity nodes:

    default: []

  • relation_attributes: Set of attribute names, which are used as object nodes that are connected to the activities.

  • activity_delimiter: Delimiter of the activity classifier attributes.

Preprocessing Converter

class collaboration_detection.preprocessing.event_log_converter.EventLogConverter

Transform an event log into a new event log where the traces and/or events (and/or their attributes) are preprocessed / converted based on the concrete implementation of the converter.

abstractmethod convert_event(event: Series) Series

Converts an event into a new event.

Parameters:

event – The input event

Returns:

The converted event

abstractmethod convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

abstractmethod convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces

abstractmethod is_sub_log_converter() bool

Does the converter returns multiple sub-logs?

Returns:

True if convert_event_log returns multiple (sub-)logs

iter_converted_traces(event_log: DataFrame) Iterable[DataFrame]

Iterates over an event log and yields all converted traces.

Parameters:

event_log – The input event log

class collaboration_detection.preprocessing.event_log_converter.DefaultEventLogConverter

The DefaultEventLogConverter converts all traces and events as they are (no modifications). Can be used as base class for inheritance.

convert_event(event: Series) Series

Converts an event into a new event.

Parameters:

event – The input event

Returns:

The converted event

convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces

is_sub_log_converter() bool

Does the converter returns multiple sub-logs?

Returns:

True if convert_event_log returns multiple (sub-)logs

class collaboration_detection.preprocessing.event_log_converter.ActivityJoinerConverter(activity_classifier: List[str] | None = None, activity_delimiter=' / ')

The ActivityJoinerConverter converts the concept:name of all events by joining the attributes provided by activity_classifier. All other attributes are untouched.

convert_event_log(event_log: DataFrame) DataFrame

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

class collaboration_detection.preprocessing.event_log_converter.TraceSplitConverter(split_attributes: List[str])

The trace split converter convert a trace from the event log into a new trace with modified case IDs. The new case IDs are created by appending a unique identifier to the original case ID for each trace. The unique identifier is created by concatenating the values (the category codes) for the attributes specified in the split_attributes list. If the split_attributes list is empty, the case IDs are not modified.

convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces

class collaboration_detection.preprocessing.event_log_converter.CombinedConverter(converter_list: List[EventLogConverter])

The CombinedConverter combines multiple event log converters into a single converter. The convert_event_log, convert_trace, and convert_event methods apply the corresponding method of each converter in the given order to the input data.

convert_event(event: Series) Series

Converts an event into a new event.

Parameters:

event – The input event

Returns:

The converted event

convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces

is_sub_log_converter() bool

Does the converter returns multiple sub-logs?

Returns:

True if convert_event_log returns multiple (sub-)logs

class collaboration_detection.preprocessing.event_log_converter.SublogConverter(split_attributes: List[str], similarity_attributes: List[str], timedelta_s: int = 0)

Convert an event log into a list of sublogs. The split_attributes list is used to split each existing trace into multiple subtraces which are then are used to create the sublogs. Each sublog contains a group of related (sub)traces, where the trace are considered as related if their events have overlapping timestamps and share common values for the attributes specified in the similarity_attributes list. The timedelta parameter defines how many seconds two events of two traces can be apart from each other.

convert_event_log(event_log: DataFrame) List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

is_sub_log_converter() bool

Does the converter returns multiple sub-logs?

Returns:

True if convert_event_log returns multiple (sub-)logs

class collaboration_detection.preprocessing.event_log_converter.AddPseudoEventConverter(add_pseudo_start_event=True, add_pseudo_end_event=True, attributes_start_event: Dict[str, Any] | None = None, attributes_end_event: Dict[str, Any] | None = None)

This converter adds a new pseudo-event at the beginning and end of the trace, depending on the values of the add_pseudo_start_event and add_pseudo_end_event variables. The timestamp of the events is based on the first/last event -/+ one second.

The new events are created using the attributes_start_event and attributes_end_event dictionaries and are added to the trace.

convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces

class collaboration_detection.preprocessing.event_log_converter.TraceMergerConverter(*, new_case_id='NEW_DEFAULT_CASE', case_id_provider: Callable[[], str] | None = None)

Merges all traces of the event log into a single trace by setting the CASE_CONCEPT_NAME to the value provided by the function case_id_provider or by a fixed value defined by new_case_id.

convert_event_log(event_log: DataFrame) DataFrame

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

class collaboration_detection.preprocessing.event_log_converter.AddCountAttributeInTraceConverter(columns_with_value_prefix: Dict[str, str], column_prefix='counted_')

The AddCountAttributeInTraceConverter adds count-based attributes to specified columns in a trace DataFrame. The parameter columns_with_value_prefix (Dict[str, str]) defines a dictionary specifying columns and their corresponding value prefixes. The value prefixes will be used to create a new attribute for each unique value in the specified columns. The parameter column_prefix defines a prefix that will be added to the newly created columns.

convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces


Frequent Subgraph Mining


Graph Set Clustering

class collaboration_detection.clustering.clustering.Clustering
static execute(graph_collection: GraphCollection, algorithm: Algorithm, distance_metric: DistanceMetric, number_of_clusters: int | None = None, cluster_division_selector_metric: ClusterDivisionSelectorMetric | None = None, cluster_representative_seeder: ClusterRepresentativeSeeder | None = None, cluster_dissimilarity: float | None = None, cluster_centroid_selector: ClusterCentroidSelector | None = None) List[Cluster]

This function is the entry point into the clustering package.

Parameters:
  • graph_collection – the graph collection to get the graphs from. The resulting clusters are updated inplace.

  • algorithm – the clustering_tests algorithm to use for clustering_tests

  • distance_metric – the distance metric to calculate the distances between the graphs

  • number_of_clusters – the number of clusters as stopping criterion for the hierarchical algorithm

  • cluster_division_selector_metric – the selector for the cluster to divided for the hierarchical algorithm

  • cluster_representative_seeder – the seed selector for the split clusters of the hierarchical algorithm

  • cluster_dissimilarity – the dissimilarity measure for the dissimilar cluster centroid selector for density and partitioning algorithm

  • cluster_centroid_selector – the selected cluster centroid selector for the partitioning algorithm

Returns:

Resulting clusters