API

GraphCollection

class collaboration_detection.datastructures.Vertex(graph: Graph, vertex_id: int, label: str, metadata: Dict[str, Any] | None = None, **kwargs)

Defines a vertex/node with a given label and incoming and outgoing edges.

add_edge(other_vertex: Vertex, directed=True, edge_metadata: Dict[str, Any] | None = None) Vertex

Add an outgoing edge to this vertex. The other vertex must be present in the same graph.

Parameters:
  • other_vertex – The other vertex

  • directed – Defines if the edge is directed or not

  • edge_metadata – The optional metadata for the edge

Returns:

this vertex

property as_str_rep: str

To string representation: The id of the vertex is printed, as well as the label_id as integer (No metadata are saved)

‘v id label’

e.g.: ‘v 1 3’ :return: A string representation of the vertex

get_edge(other: Vertex) Edge | None

Get an outgoing edge to the other Vertex, return None if there is no edge

Parameters:

other – The other Vertex

Returns:

An edge or None if there is no edge

property graph: Graph

The graph of the vertex

has_edge(other: Vertex) bool

Check if there is an outgoing edge to the other Vertex

Parameters:

other – The other Vertex

Returns:

True if there is an edge, else False

property label: str

The label property.

property label_id: int

The id of the label

property preceding_vertices: List[Vertex]

Return all preceding vertices (from incoming edges).

Returns:

All preceding vertices (from incoming edges)

property proceeding_vertices: List[Vertex]

Return all proceeding vertices (from outgoing edges).

Returns:

All proceeding vertices (from outgoing edges)

property vertex_id: int

The id of the vertex

class collaboration_detection.datastructures.Edge(graph: Graph, from_vertex: Vertex, to_vertex: Vertex, metadata: Dict[str, Any] | None = None, **kwargs)

An edge represents a directed arc between two vertices.

property as_edge_rep

Save as edge representation. (No metadata are saved)

Returns:

A tuple of the source vertex label and the sink vertex label, and the type

property as_str_rep: str

The str representation of the edge where the ids of the vertices are used. (No metadata are saved)

‘e source sink type’

e.g.:

‘e 1 2 1’

Returns:

A string representation of the edge

property from_vertex: Vertex

The source vertex of this edge

property graph: Graph

The graph of the edge

property to_vertex: Vertex

The sink vertex of this edge

class collaboration_detection.datastructures.GraphSetCluster(cluster_id: int, representative: Optional[collaboration_detection.datastructures.graph_collection.Graph], graphs: Set[collaboration_detection.datastructures.graph_collection.Graph] = <factory>, attributes: Dict[str, Any] = <factory>)
zip_images(path: str, format: str = 'svg')

Exports the graphs of a GraphCollection cluster to a zip file (images of the graphs).

Parameters:
  • path – The base path where the zip file will be saved.

  • format – The format of the images, Default: svg

class collaboration_detection.datastructures.Graph(graph_collection: GraphCollection, graph_id: int, cluster_id: int | None = None, metadata: Dict[str, Any] | None = None, **kwargs)

A graph represents a set of vertices and a set of edges which connects the vertices.

add_edge(from_vertex: Vertex, to_vertex: Vertex, directed=True, edge_metadata: Dict[str, Any] | None = None)

Add a new edge in this graph. Both vertices must be part of this graph.

Parameters:
  • from_vertex – The source vertex

  • to_vertex – The sink vertex

  • directed – if False, the edge is created twice. In the second edge the source and sink vertex are swapped.

  • edge_metadata – Optional metadata for the edge

property as_adjacency_matrix: DataFrame

Create an adjacency matrix of this graph as Pandas DataFrame.

Returns:

the adjacency matrix of the graph as Pandas DataFrame

property as_edge_rep: Dict

Returns a dict of the graph with all edges. The keys are tuples of strings (the two connected vertices) The vertices are defined by their label (string). The value is the type of the edge. (No metadata are saved)

e.g.

{(‘A’,’B’):2,(‘B’,’C’):1}

Returns:

The edge representation of a graph

property as_edge_tuple: List[tuple]

Returns a dict of the graph with all edges. The keys are tuples of strings (the two connected vertices) The vertices are defined by their label (string). The value is the type of the edge. e.g.

{(‘A’,’B’):2,(‘B’,’C’):1}

Returns:

The edge representation of a graph

property as_graph_map: Graph

Converts the graph into a graph map. (No metadata are saved) :return: A Graph Map

property as_networkx_digraph: DiGraph

Converts the graph into a python networkx graph object :return: DiGraph object of this graph

property as_str_rep: str

Returns a string of the graph with all vertices and edges. (No metadata are saved)

  • t # graph_id

  • v vertex_id vertex_attributes_id

  • v vertex_id vertex_attributes_id

  • e vertex_source_id vertex_sink_id edge_type

  • e vertex_source_id vertex_sink_id edge_type

Returns:

The string representation of the graph

copy_into(graph_collection: GraphCollection, additional_metadata: Dict[str, Any] | None = None) Graph

Copy this graph into a graph_collection

Parameters:

graph_collection – The other graph collection

Returns:

The new graph on the other graph_collection

get_edges_by_metadata(**kwargs) List[Edge]

Get all edges with the given metadata attributes

Parameters:

kwargs – The attribute key, value pair the edge metadata must satisfy

Returns:

The dict of edges matching the provided metadata (id: Vertex)

get_vertex(label: str, *, force_create=False, vertex_id: int | None = None, metadata: Dict[str, Any] | None = None) Vertex

Get the vertex with the given label. If the vertex does not exist, it will be created.

Parameters:
  • label – The label of the requested vertex

  • force_create – Force create a new vertex, even if there is a vertex with the same label

  • vertex_id – If provided, the vertex with this id is returned if exising, else created with this id. If the force create is True, and no vertex_id is provided, a new vertex id will be created.

  • metadata – The optional metadata added to the vertex if the vertex is created. This parameter will be ignored if the vertex already exists.

Returns:

The vertex with the given label

get_vertex_by_id(vertex_id: int) Vertex | None

Get the vertex with the given vertex_id. If the vertex does not exist, the function will return None.

Parameters:

vertex_id – The label of the requested vertex

Returns:

The vertex with the given vertex_id or None, if the vertex does not exist.

get_vertices_by_metadata(**kwargs) Dict[int, Vertex]

Get all vertices with the given metadata attributes

Parameters:

kwargs – The attribute key, value pair the vertex metadata must satisfy

Returns:

The dict of vertices matching the provided metadata (id: Vertex)

is_subgraph_of(other: Graph) bool

Check if the current graph is a subgraph of the other graph. A graph is a subgraph, only if all its vertices are in the other graph and all its edges are also in the other graph.

Parameters:

other – The other graph (the super graph)

Returns:

True if the current graph is a subgraph of the other graph.

to_dot_digraph(path: str | None = None, rankdir: str = 'TB') Digraph

Create a digraph object of the graph. :param path: if filled with a path to a file name (.png/.svg) the digraph is saved to the given path :param rankdir: the rank direction of the graph, “TB” or “LR” :return: a digraph

class collaboration_detection.datastructures.GraphCollection(*, label_list: str | List[str] | None = None, collection_id: str | None = None)

A GraphCollection represents a collection of graphs. It holds these graphs in a list, where each graph has a unique ID. It further contains a shared mapping for the labels.

as_edge_rep() List[Dict]

Creates an edge representation of the graphs. So the output of this function is a list of graphs. Each graph is represented as a dict. Each Edge is represented as a tuple of two strings, which are the both connected vertices. The value represents the type of the edge.

[{(‘A’,’B’):2,(‘B’,’C’):1},{(‘A’,’C’):1,(‘C’,’B’): 1},…]

Returns:

a list of dicts (graphs as edge representation)

as_networkx_digraphs() List[DiGraph]

Creates a list of networkx.DiGraph objects.

Returns:

A list of networkx.DiGraph objects of the graphs

as_str_rep() str

Creates the string representation of the graph collection. Important: The labels are printed as integer values (the id of the label)

  • t # graph_id

  • v vertex_id vertex_attributes_id

  • v vertex_id vertex_attributes_id

  • e vertex_source_id vertex_sink_id edge_type

  • e vertex_source_id vertex_sink_id edge_type

  • t # graph_id

Returns:

The string representation of the graphs

as_str_rep_lines() List[str]

Creates the string representation of the graph collection. Important: The labels are printed as integer values (the id of the label)

Returns:

a list of strings (the string representation of the graphs)

clean(label_list: str | List[str] | None = None)

Clean the graph collection. Optional initialize the label mapping with the given label_list. The label_list should contain strings of the labels.

Parameters:

label_list – The list of labels. Or path to the pickle file.

property collection_id

Get the collection id of this collection

export(path: str)

Saves the graph collection as pickle file.

File:

the path of the file

filter(vertex_count_min: int | None = None, vertex_count_max: int | None = None, edge_count_min: int | None = None, edge_count_max: int | None = None, vertex_label_in: str | None = None, vertex_label_is: str | None = None, cluster_id: int | None = None, expected_metadata: Dict[str, Any] | None = None, filter_func: Callable[[Graph], bool] | None = None) Iterator[Graph]

Creates an iterator for this graph collection. It iterates over all graphs and applies the given filter.

Parameters:
  • vertex_count_min – The graph should contain at least x vertices

  • vertex_count_max – The graph should contain at maximum x vertices

  • edge_count_min – The graph should contain at least x edges

  • edge_count_max – The graph should contain at maximum x edges

  • vertex_label_in – The graph should contain a vertex with a label matches a part of this value

  • vertex_label_is – The graph should contain a vertex with a label matches this value

  • cluster_id – The graph should be part of the given cluster_id

  • expected_metadata – The graph should have the values of the specified expected_metadata

  • filter_func – A filter function that gets a graph and should return true or false

Returns:

Iterator over the graphs

get_label(label_id: int) str

Get the label for the given label id

Parameters:

label_id – the id of the label

Returns:

the label as string or None if the id is not present

get_set_label_id(label: str) int

Get the id of the given label. If the label is not present, it is created for this graph_collection.

Parameters:

label – the label as string

Returns:

the id of the label

init_label_mapping(label_list: str | List[str] | None = None)

Init the label mapping with the given list of label tuples. The index of the label is the id of the label mapping. The param label_list can be a path to a pickle file which was created with ‘save_label_mapping’.

Parameters:

label_list – The list of labels. Or path to the pickle file.

static load(path: str) GraphCollection

Loads the graph collection from a pickle file

File:

the path of the file

load_graph_from_heuristic_net(heu_net: HeuristicsNet, ignore_type=True, graph_metadata: Dict[str, Any] | None = None) Graph

Loads a graph from a heuristic net.

Parameters:
  • heu_net – The heuristic net

  • ignore_type – Ignores the edge type (sets to 1)

  • graph_metadata – Additional graph metadata

Returns:

the created graph

load_graphs_from_edge_rep(graph_edge_rep: Dict[Tuple[str, str], int], ignore_type=True) Graph

Loads the graph collection from an edge representation.

e.g. [{(‘A’,’B’):2,(‘B’,’C’):1},{(‘A’,’C’):1,(‘C’,’B’): 1},…]

Parameters:
  • graph_edge_rep – The edge representation object

  • ignore_type – Ignores the edge type (sets to 1)

Returns:

the created graph

load_graphs_from_str_rep(graph_str_rep: str | List[str], graph_metadata: Dict[str, Any] | None = None)

Loads the graph collection from a string representation. Important: The label_mapping must be initialized before starting the import! The labels are not stored in the string representation!

e.g.

  • t # graph_id

  • v vertex_id vertex_attributes_id

  • v vertex_id vertex_attributes_id

  • e vertex_source_id vertex_sink_id edge_type

  • e vertex_source_id vertex_sink_id edge_type

  • t # graph_id

Parameters:
  • graph_str_rep – The string representation

  • graph_metadata – Additional graph metadata

Returns:

the created graph

load_graphs_from_str_rep_file(file)

Loads the graph collection from a string representation file. Important: The label_mapping must be initialized before starting the import.

e.g.

  • t # graph_id

  • v vertex_id vertex_attributes_id

  • v vertex_id vertex_attributes_id

  • e vertex_source_id vertex_sink_id edge_type

  • e vertex_source_id vertex_sink_id edge_type

  • t # graph_id

Parameters:

file – The file of the string representation

new_graph(*, graph_id: int | None = None, cluster_id: int | None = None, metadata: dict | None = None, **kwargs) Graph

Creates a new graph in this graph collection. The graph gets a new unique id.

Parameters:
  • graph_id – An id for the graph

  • cluster_id – An id for the cluster the graph belongs to

  • metadata – Optional additional metadata as dict.

Returns:

a new Graph object

save_label_mapping(file)

Save the label mapping in a pickle file

Parameters:

file – the file or path of the pickle file

save_str_rep(file)

Saves the graph collection as string representation as file.

Parameters:

file – The path or the file object

zip_images(path: str, format: str = 'svg')

Exports the graphs of a GraphCollection to a zip file (images of the graphs).

Parameters:
  • path – The base path where the zip file will be saved.

  • format – The format of the images, Default: svg

Neo4J

class collaboration_detection.datastructures.neo4jstorage.GraphCollectionRepository(uri: str, user: str, password: str, **kwargs)
delete_graph_collection(graph_collection: str | GraphCollection)

Delete the graph collection in the repository. Delete all nodes of this graph collection.

Parameters:

graph_collection – The Graph Collection

download_graph_collection(collection_id: str, graph_id: int | None = None, cluster_id: int | None = None) GraphCollection

Download the Graph Collection with the given ID

get_collections() List[CollectionInfo]

Get all collections (overview, not data) of the repository

get_number_of_clusters(collection_id: str) int

Get the number of clusters of this graph collection

get_number_of_graphs(collection_id: str, cluster_id: int | None = None) int

Get the size of a graph collection, optional filtered by the cluster_id for the size of the cluster

set_cluster_attributes(graph_collection: GraphCollection, cluster_id: int | None)

Saves the attributes for the given cluster

Parameters:
  • graph_collection – The Graph Collection

  • cluster_id – Optional: The Cluster Id, if the attributes of a single cluster should be saved

set_clusters(graph_collection: GraphCollection, cluster_id: int | None = None)

Save all cluster information for the given graph_collection

set_graph_metadata(graph_collection: GraphCollection, graph_id: int | None)

Saves the metadata for the given graph (or all graphs)

Parameters:
  • graph_collection – The Graph Collection

  • graph_id – Optional: The Graph Id, if the metadata of a single graph should be saved

upload_graph_collection(graph_collection: GraphCollection)

Upload the graph collection Attention: The existing graph collection will be deleted beforehand.

Parameters:

graph_collection – The Graph Collection


GraphMining

class collaboration_detection.graph_mining.graph_miner.GraphMiner(converter: EventLogConverter | None = None, **kwargs)

Create graphs from event logs. The input events logs are processed and converted using a custom event log converter. Furthermore, this class can be used to create for each original trace individual graphs.

property converter: EventLogConverter

Get the current event log converter of this graph miner.

get_graph(event_log: DataFrame, graph_collection: GraphCollection | None = None) Tuple[GraphCollection, List[DataFrame] | DataFrame]

This function first converts the event log into a new event log and creates then a new graph. If the converter returns multiple sublogs, multiple graphs are created. The resulting graph(s) is/are stored inside the GraphCollection.

Parameters:
  • event_log – The event log

  • graph_collection – A graph collection. If no graph collection is provided a new GraphCollection is created.

Returns:

The GraphCollection with the minded graph(s)

get_graphs_for_traces(event_log: DataFrame) Tuple[GraphCollection, List[DataFrame]]

This function first converts each trace into a new EventLog and creates then creates for each of them a new graph. The resulting graphs are stored inside the GraphCollection.

Parameters:

event_log – The event log

Returns:

The GraphCollection with the minded graphs

iter_graphs_for_traces(event_log: DataFrame) Iterator[Tuple[Graph, DataFrame]]

Iter all traces, convert them into a new EventLog and yield a graph for each of these traces.

Parameters:

event_log – The event log

class collaboration_detection.graph_mining.DfgGraphMiner(converter: EventLogConverter | None = None, **kwargs)

A simple graph mining algorithm that takes an event log and converts it into a Directly Follow Graph (DFG). No additional kwargs are required.

class collaboration_detection.graph_mining.HeuristicGraphMiner(converter: EventLogConverter | None = None, **kwargs)

Discovers a heuristics net.

The following kwargs parameters can be provided:
  • dependency_threshold: Dependency threshold (default: 0.5)

  • and_threshold: AND threshold (default: 0.65)

  • loop_two_threshold: Loop two threshold (default: 0.5)

class collaboration_detection.graph_mining.CollaborationProcessInstanceMiner(converter: EventLogConverter | None = None, **kwargs)

This algorithm discovers a collaboration process instance.

The following kwargs parameters can be provided:
  • activity_classifier: List of additional attributes for the concept name of the activity nodes.

    default: []

  • relation_attributes: List of attribute names, which are used as object nodes that are connected to the activities. default = []

  • activity_delimiter: Delimiter of the activity classifier attributes. default = “ – “

  • create_object_nodes: If true, create the object nodes; if false, just simulate the object nodes and only create the activty nodes; default: True

  • override_labels_of_relation_attributes: If true, the labels of the object nodes are overridden with the object type (e.g. org:resource, spm:sdid, ..); default: False

Preprocessing Converter

class collaboration_detection.preprocessing.event_log_converter.EventLogConverter

Transform an event log into a new event log where the traces and/or events (and/or their attributes) are preprocessed / converted based on the concrete implementation of the converter.

abstractmethod convert_event(event: Series) Series

Converts an event into a new event.

Parameters:

event – The input event

Returns:

The converted event

abstractmethod convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

abstractmethod convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces

abstractmethod is_sub_log_converter() bool

Does the converter returns multiple sub-logs?

Returns:

True if convert_event_log returns multiple (sub-)logs

iter_converted_traces(event_log: DataFrame) Iterable[DataFrame]

Iterates over an event log and yields all converted traces.

Parameters:

event_log – The input event log

class collaboration_detection.preprocessing.event_log_converter.DefaultEventLogConverter

The DefaultEventLogConverter converts all traces and events as they are (no modifications). Can be used as base class for inheritance.

convert_event(event: Series) Series

Converts an event into a new event.

Parameters:

event – The input event

Returns:

The converted event

convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces

is_sub_log_converter() bool

Does the converter returns multiple sub-logs?

Returns:

True if convert_event_log returns multiple (sub-)logs

class collaboration_detection.preprocessing.event_log_converter.ActivityJoinerConverter(activity_classifier: List[str] | None = None, activity_delimiter=' / ')

The ActivityJoinerConverter converts the concept:name of all events by joining the attributes provided by activity_classifier. All other attributes are untouched.

convert_event_log(event_log: DataFrame) DataFrame

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

class collaboration_detection.preprocessing.event_log_converter.TraceSplitConverter(split_attributes: List[str])

The trace split converter convert a trace from the event log into a new trace with modified case IDs. The new case IDs are created by appending a unique identifier to the original case ID for each trace. The unique identifier is created by concatenating the values (the category codes) for the attributes specified in the split_attributes list. If the split_attributes list is empty, the case IDs are not modified.

convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces

class collaboration_detection.preprocessing.event_log_converter.CombinedConverter(converter_list: List[EventLogConverter])

The CombinedConverter combines multiple event log converters into a single converter. The convert_event_log, convert_trace, and convert_event methods apply the corresponding method of each converter in the given order to the input data.

convert_event(event: Series) Series

Converts an event into a new event.

Parameters:

event – The input event

Returns:

The converted event

convert_event_log(event_log: DataFrame) DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces

is_sub_log_converter() bool

Does the converter returns multiple sub-logs?

Returns:

True if convert_event_log returns multiple (sub-)logs

class collaboration_detection.preprocessing.event_log_converter.SublogConverter(split_attributes: List[str], similarity_attributes: List[str], timedelta_s: int = 0)

Convert an event log into a list of sublogs. The split_attributes list is used to split each existing trace into multiple subtraces which are then are used to create the sublogs. Each sublog contains a group of related (sub)traces, where the trace are considered as related if their events have overlapping timestamps and share common values for the attributes specified in the similarity_attributes list. The timedelta parameter defines how many seconds two events of two traces can be apart from each other.

convert_event_log(event_log: DataFrame) List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

is_sub_log_converter() bool

Does the converter returns multiple sub-logs?

Returns:

True if convert_event_log returns multiple (sub-)logs

class collaboration_detection.preprocessing.event_log_converter.AddPseudoEventConverter(add_pseudo_start_event=True, add_pseudo_end_event=True, attributes_start_event: Dict[str, Any] | None = None, attributes_end_event: Dict[str, Any] | None = None)

This converter adds a new pseudo-event at the beginning and end of the trace, depending on the values of the add_pseudo_start_event and add_pseudo_end_event variables. The timestamp of the events is based on the first/last event -/+ one second.

The new events are created using the attributes_start_event and attributes_end_event dictionaries and are added to the trace.

convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces

class collaboration_detection.preprocessing.event_log_converter.TraceMergerConverter(*, new_case_id='NEW_DEFAULT_CASE', case_id_provider: Callable[[], str] | None = None)

Merges all traces of the event log into a single trace by setting the CASE_CONCEPT_NAME to the value provided by the function case_id_provider or by a fixed value defined by new_case_id.

convert_event_log(event_log: DataFrame) DataFrame

Converts an event log into one or more new event log(s).

Parameters:

event_log – The input event log

Returns:

The converted event log(s)

class collaboration_detection.preprocessing.event_log_converter.AddCountAttributeInTraceConverter(columns_with_value_prefix: Dict[str, str], column_prefix='counted_')

The AddCountAttributeInTraceConverter adds count-based attributes to specified columns in a trace DataFrame. The parameter columns_with_value_prefix (Dict[str, str]) defines a dictionary specifying columns and their corresponding value prefixes. The value prefixes will be used to create a new attribute for each unique value in the specified columns. The parameter column_prefix defines a prefix that will be added to the newly created columns.

convert_trace(trace: DataFrame) DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:

trace – The input trace

Returns:

The converted trace or a list of traces


Frequent Subgraph Mining

collaboration_detection.subgraph_mining.gspan(data: str | GraphCollection, result=None, sup=10, min_node=3, max_node=10, remove_data_graphs=True, tagging_sub_and_super_graphs=False, variant: GSpanVariant = GSpanVariant.GSPAN_JAVA)

Executes the gspan algorithm using a java implementation.

The jar file of the implementation should be provided at ../..bin/gSpan.Java-1.2.jar, or the path as jar_file param. Download the jar from here: GitHub gSpan.Java The data parameter must be provided. The output is stored in the Graph Collection.

Parameters:
  • data – File path (str) of the graph data set / or GraphCollection

  • result – File path of the result file

  • sup – Minimum support

  • min_node – Minimum number of nodes for each sub-graph

  • max_node – Maximum number of nodes for each sub-graph

  • remove_data_graphs – Only relevant, if type(data)==GraphCollection: Remove all other graphs before adding the sub graphs

  • tagging_sub_and_super_graphs – Only relevant, if type(data)==GraphCollection and remove_data_graphs==False: Create the subgraph mapping between the data graphs and the created sub graphs

  • variant – Define the GSpanVariant (java or rust implementation)

collaboration_detection.subgraph_mining.rfsm(g_c: GraphCollection, min_node: int, max_node: int, support: int) GraphCollection

Execute relaxed fsm

This function is the wrapper function to execute relaxed fsm with a given graph collection, the min/max number of nodes and the requested support.

Parameters:
  • g_c – GraphCollection

  • min_node – int Number of min nodes in the final patterns

  • max_node – int Number of max nodes in the final patterns

  • support – int Minimum support value for the final patterns

Returns:

GraphCollection (patterns)


Graph Set Clustering

class collaboration_detection.clustering.clustering.Clustering
static execute(graph_collection: GraphCollection, algorithm: Algorithm, distance_metric: DistanceMetric, number_of_clusters: int | None = None, cluster_division_selector_metric: ClusterDivisionSelectorMetric | None = None, cluster_representative_seeder: ClusterRepresentativeSeeder | None = None, cluster_dissimilarity: float | None = None, cluster_centroid_selector: ClusterCentroidSelector | None = None) List[Cluster]

This function is the entry point into the clustering package.

Parameters:
  • graph_collection – the graph collection to get the graphs from. The resulting clusters are updated inplace.

  • algorithm – the clustering_tests algorithm to use for clustering_tests

  • distance_metric – the distance metric to calculate the distances between the graphs

  • number_of_clusters – the number of clusters as stopping criterion for the hierarchical algorithm

  • cluster_division_selector_metric – the selector for the cluster to divided for the hierarchical algorithm

  • cluster_representative_seeder – the seed selector for the split clusters of the hierarchical algorithm

  • cluster_dissimilarity – the dissimilarity measure for the dissimilar cluster centroid selector for density and partitioning algorithm

  • cluster_centroid_selector – the selected cluster centroid selector for the partitioning algorithm

Returns:

Resulting clusters