API

GraphCollection

class collaboration_detection.datastructures.Vertex(graph: Graph, vertex_id: int, label: str, metadata: Dict[str, Any] | None = None, **kwargs)

Defines a vertex/node with a given label and incoming and outgoing edges.

add_edge(other_vertex: Vertex, directed=True, edge_metadata: Dict[str, Any] | None = None) → Vertex

Add an outgoing edge to this vertex. The other vertex must be present in the same graph.

Parameters:

other_vertex – The other vertex
directed – Defines if the edge is directed or not
edge_metadata – The optional metadata for the edge

Returns:

this vertex

property as_str_rep: str

To string representation: The id of the vertex is printed, as well as the label_id as integer (No metadata are saved)

‘v id label’

e.g.: ‘v 1 3’ :return: A string representation of the vertex

get_edge(other: Vertex) → Edge | None

Get an outgoing edge to the other Vertex, return None if there is no edge

Parameters:: other – The other Vertex
Returns:: An edge or None if there is no edge

property graph: Graph: The graph of the vertex

has_edge(other: Vertex) → bool

Check if there is an outgoing edge to the other Vertex

Parameters:: other – The other Vertex
Returns:: True if there is an edge, else False

property label: str: The label property.

property label_id: int: The id of the label

property preceding_vertices: List[Vertex]

Return all preceding vertices (from incoming edges).

Returns:: All preceding vertices (from incoming edges)

property proceeding_vertices: List[Vertex]

Return all proceeding vertices (from outgoing edges).

Returns:: All proceeding vertices (from outgoing edges)

property vertex_id: int: The id of the vertex

class collaboration_detection.datastructures.Edge(graph: Graph, from_vertex: Vertex, to_vertex: Vertex, metadata: Dict[str, Any] | None = None, **kwargs)

An edge represents a directed arc between two vertices.

property as_edge_rep

Save as edge representation. (No metadata are saved)

Returns:: A tuple of the source vertex label and the sink vertex label, and the type

property as_str_rep: str

The str representation of the edge where the ids of the vertices are used. (No metadata are saved)

‘e source sink type’

e.g.:

‘e 1 2 1’

Returns:: A string representation of the edge

property from_vertex: Vertex: The source vertex of this edge

property graph: Graph: The graph of the edge

property to_vertex: Vertex: The sink vertex of this edge

class collaboration_detection.datastructures.GraphSetCluster(cluster_id: int, representative: Optional[collaboration_detection.datastructures.graph_collection.Graph], graphs: Set[collaboration_detection.datastructures.graph_collection.Graph] = <factory>, attributes: Dict[str, Any] = <factory>)

zip_images(path: str, format: str = 'svg')

Exports the graphs of a GraphCollection cluster to a zip file (images of the graphs).

Parameters:

path – The base path where the zip file will be saved.
format – The format of the images, Default: svg

class collaboration_detection.datastructures.Graph(graph_collection: GraphCollection, graph_id: int, cluster_id: int | None = None, metadata: Dict[str, Any] | None = None, **kwargs)

A graph represents a set of vertices and a set of edges which connects the vertices.

add_edge(from_vertex: Vertex, to_vertex: Vertex, directed=True, edge_metadata: Dict[str, Any] | None = None)

Add a new edge in this graph. Both vertices must be part of this graph.

Parameters:

from_vertex – The source vertex
to_vertex – The sink vertex
directed – if False, the edge is created twice. In the second edge the source and sink vertex are swapped.
edge_metadata – Optional metadata for the edge

property as_adjacency_matrix: DataFrame

Create an adjacency matrix of this graph as Pandas DataFrame.

Returns:: the adjacency matrix of the graph as Pandas DataFrame

property as_edge_rep: Dict

Returns a dict of the graph with all edges. The keys are tuples of strings (the two connected vertices) The vertices are defined by their label (string). The value is the type of the edge. (No metadata are saved)

e.g.

{(‘A’,’B’):2,(‘B’,’C’):1}

Returns:: The edge representation of a graph

property as_edge_tuple: List[tuple]

Returns a dict of the graph with all edges. The keys are tuples of strings (the two connected vertices) The vertices are defined by their label (string). The value is the type of the edge. e.g.

{(‘A’,’B’):2,(‘B’,’C’):1}

Returns:: The edge representation of a graph

property as_graph_map: Graph: Converts the graph into a graph map. (No metadata are saved) :return: A Graph Map

property as_networkx_digraph: DiGraph: Converts the graph into a python networkx graph object :return: DiGraph object of this graph

property as_str_rep: str

Returns a string of the graph with all vertices and edges. (No metadata are saved)

t # graph_id

v vertex_id vertex_attributes_id

v vertex_id vertex_attributes_id

…

e vertex_source_id vertex_sink_id edge_type

e vertex_source_id vertex_sink_id edge_type

…

Returns:: The string representation of the graph

copy_into(graph_collection: GraphCollection, additional_metadata: Dict[str, Any] | None = None) → Graph

Copy this graph into a graph_collection

Parameters:: graph_collection – The other graph collection
Returns:: The new graph on the other graph_collection

get_edges_by_metadata(**kwargs) → List[Edge]

Get all edges with the given metadata attributes

Parameters:: kwargs – The attribute key, value pair the edge metadata must satisfy
Returns:: The dict of edges matching the provided metadata (id: Vertex)

get_vertex(label: str, *, force_create=False, vertex_id: int | None = None, metadata: Dict[str, Any] | None = None) → Vertex

Get the vertex with the given label. If the vertex does not exist, it will be created.

Parameters:

label – The label of the requested vertex
force_create – Force create a new vertex, even if there is a vertex with the same label
vertex_id – If provided, the vertex with this id is returned if exising, else created with this id. If the force create is True, and no vertex_id is provided, a new vertex id will be created.
metadata – The optional metadata added to the vertex if the vertex is created. This parameter will be ignored if the vertex already exists.

Returns:

The vertex with the given label

get_vertex_by_id(vertex_id: int) → Vertex | None

Get the vertex with the given vertex_id. If the vertex does not exist, the function will return None.

Parameters:: vertex_id – The label of the requested vertex
Returns:: The vertex with the given vertex_id or None, if the vertex does not exist.

get_vertices_by_metadata(**kwargs) → Dict[int, Vertex]

Get all vertices with the given metadata attributes

Parameters:: kwargs – The attribute key, value pair the vertex metadata must satisfy
Returns:: The dict of vertices matching the provided metadata (id: Vertex)

is_subgraph_of(other: Graph) → bool

Check if the current graph is a subgraph of the other graph. A graph is a subgraph, only if all its vertices are in the other graph and all its edges are also in the other graph.

Parameters:: other – The other graph (the super graph)
Returns:: True if the current graph is a subgraph of the other graph.

to_dot_digraph(path: str | None = None, rankdir: str = 'TB') → Digraph: Create a digraph object of the graph. :param path: if filled with a path to a file name (.png/.svg) the digraph is saved to the given path :param rankdir: the rank direction of the graph, “TB” or “LR” :return: a digraph

class collaboration_detection.datastructures.GraphCollection(*, label_list: str | List[str] | None = None, collection_id: str | None = None)

A GraphCollection represents a collection of graphs. It holds these graphs in a list, where each graph has a unique ID. It further contains a shared mapping for the labels.

as_edge_rep() → List[Dict]

Creates an edge representation of the graphs. So the output of this function is a list of graphs. Each graph is represented as a dict. Each Edge is represented as a tuple of two strings, which are the both connected vertices. The value represents the type of the edge.

[{(‘A’,’B’):2,(‘B’,’C’):1},{(‘A’,’C’):1,(‘C’,’B’): 1},…]

Returns:: a list of dicts (graphs as edge representation)

as_networkx_digraphs() → List[DiGraph]

Creates a list of networkx.DiGraph objects.

Returns:: A list of networkx.DiGraph objects of the graphs

as_str_rep() → str

Creates the string representation of the graph collection. Important: The labels are printed as integer values (the id of the label)

t # graph_id
v vertex_id vertex_attributes_id
v vertex_id vertex_attributes_id
…
e vertex_source_id vertex_sink_id edge_type
e vertex_source_id vertex_sink_id edge_type
…
t # graph_id
…

Returns:: The string representation of the graphs

as_str_rep_lines() → List[str]

Creates the string representation of the graph collection. Important: The labels are printed as integer values (the id of the label)

Returns:: a list of strings (the string representation of the graphs)

clean(label_list: str | List[str] | None = None)

Clean the graph collection. Optional initialize the label mapping with the given label_list. The label_list should contain strings of the labels.

Parameters:: label_list – The list of labels. Or path to the pickle file.

property collection_id: Get the collection id of this collection

export(path: str)

Saves the graph collection as pickle file.

File:: the path of the file

Creates an iterator for this graph collection. It iterates over all graphs and applies the given filter.

Parameters:

vertex_count_min – The graph should contain at least x vertices
vertex_count_max – The graph should contain at maximum x vertices
edge_count_min – The graph should contain at least x edges
edge_count_max – The graph should contain at maximum x edges
vertex_label_in – The graph should contain a vertex with a label matches a part of this value
vertex_label_is – The graph should contain a vertex with a label matches this value
cluster_id – The graph should be part of the given cluster_id
expected_metadata – The graph should have the values of the specified expected_metadata
filter_func – A filter function that gets a graph and should return true or false

Returns:

Iterator over the graphs

get_label(label_id: int) → str

Get the label for the given label id

Parameters:: label_id – the id of the label
Returns:: the label as string or None if the id is not present

get_set_label_id(label: str) → int

Get the id of the given label. If the label is not present, it is created for this graph_collection.

Parameters:: label – the label as string
Returns:: the id of the label

init_label_mapping(label_list: str | List[str] | None = None)

Init the label mapping with the given list of label tuples. The index of the label is the id of the label mapping. The param label_list can be a path to a pickle file which was created with ‘save_label_mapping’.

Parameters:: label_list – The list of labels. Or path to the pickle file.

static load(path: str) → GraphCollection

Loads the graph collection from a pickle file

File:: the path of the file

load_graph_from_heuristic_net(heu_net: HeuristicsNet, ignore_type=True, graph_metadata: Dict[str, Any] | None = None) → Graph

Loads a graph from a heuristic net.

Parameters:

heu_net – The heuristic net
ignore_type – Ignores the edge type (sets to 1)
graph_metadata – Additional graph metadata

Returns:

the created graph

load_graphs_from_edge_rep(graph_edge_rep: Dict[Tuple[str, str], int], ignore_type=True) → Graph

Loads the graph collection from an edge representation.

e.g. [{(‘A’,’B’):2,(‘B’,’C’):1},{(‘A’,’C’):1,(‘C’,’B’): 1},…]

Parameters:

graph_edge_rep – The edge representation object
ignore_type – Ignores the edge type (sets to 1)

Returns:

the created graph

load_graphs_from_str_rep(graph_str_rep: str | List[str], graph_metadata: Dict[str, Any] | None = None)

Loads the graph collection from a string representation. Important: The label_mapping must be initialized before starting the import! The labels are not stored in the string representation!

e.g.

t # graph_id
v vertex_id vertex_attributes_id
v vertex_id vertex_attributes_id
…
e vertex_source_id vertex_sink_id edge_type
e vertex_source_id vertex_sink_id edge_type
…
t # graph_id

Parameters:

graph_str_rep – The string representation
graph_metadata – Additional graph metadata

Returns:

the created graph

load_graphs_from_str_rep_file(file)

Loads the graph collection from a string representation file. Important: The label_mapping must be initialized before starting the import.

e.g.

t # graph_id
v vertex_id vertex_attributes_id
v vertex_id vertex_attributes_id
…
e vertex_source_id vertex_sink_id edge_type
e vertex_source_id vertex_sink_id edge_type
…
t # graph_id
…

Parameters:: file – The file of the string representation

new_graph(*, graph_id: int | None = None, cluster_id: int | None = None, metadata: dict | None = None, **kwargs) → Graph

Creates a new graph in this graph collection. The graph gets a new unique id.

Parameters:

graph_id – An id for the graph
cluster_id – An id for the cluster the graph belongs to
metadata – Optional additional metadata as dict.

Returns:

a new Graph object

save_label_mapping(file)

Save the label mapping in a pickle file

Parameters:: file – the file or path of the pickle file

save_str_rep(file)

Saves the graph collection as string representation as file.

Parameters:: file – The path or the file object

zip_images(path: str, format: str = 'svg')

Exports the graphs of a GraphCollection to a zip file (images of the graphs).

Parameters:

path – The base path where the zip file will be saved.
format – The format of the images, Default: svg

Neo4J

class collaboration_detection.datastructures.neo4jstorage.GraphCollectionRepository(uri: str, user: str, password: str, **kwargs)

delete_graph_collection(graph_collection: str | GraphCollection)

Delete the graph collection in the repository. Delete all nodes of this graph collection.

Parameters:: graph_collection – The Graph Collection

download_graph_collection(collection_id: str, graph_id: int | None = None, cluster_id: int | None = None) → GraphCollection: Download the Graph Collection with the given ID

get_collections() → List[CollectionInfo]: Get all collections (overview, not data) of the repository

get_number_of_clusters(collection_id: str) → int: Get the number of clusters of this graph collection

get_number_of_graphs(collection_id: str, cluster_id: int | None = None) → int: Get the size of a graph collection, optional filtered by the cluster_id for the size of the cluster

set_cluster_attributes(graph_collection: GraphCollection, cluster_id: int | None)

Saves the attributes for the given cluster

Parameters:

graph_collection – The Graph Collection
cluster_id – Optional: The Cluster Id, if the attributes of a single cluster should be saved

set_clusters(graph_collection: GraphCollection, cluster_id: int | None = None): Save all cluster information for the given graph_collection

set_graph_metadata(graph_collection: GraphCollection, graph_id: int | None)

Saves the metadata for the given graph (or all graphs)

Parameters:

graph_collection – The Graph Collection
graph_id – Optional: The Graph Id, if the metadata of a single graph should be saved

upload_graph_collection(graph_collection: GraphCollection)

Upload the graph collection Attention: The existing graph collection will be deleted beforehand.

Parameters:: graph_collection – The Graph Collection

GraphMining

class collaboration_detection.graph_mining.graph_miner.GraphMiner(converter: EventLogConverter | None = None, **kwargs)

Create graphs from event logs. The input events logs are processed and converted using a custom event log converter. Furthermore, this class can be used to create for each original trace individual graphs.

property converter: EventLogConverter: Get the current event log converter of this graph miner.

get_graph(event_log: DataFrame, graph_collection: GraphCollection | None = None) → Tuple[GraphCollection, List[DataFrame] | DataFrame]

This function first converts the event log into a new event log and creates then a new graph. If the converter returns multiple sublogs, multiple graphs are created. The resulting graph(s) is/are stored inside the GraphCollection.

Parameters:

event_log – The event log
graph_collection – A graph collection. If no graph collection is provided a new GraphCollection is created.

Returns:

The GraphCollection with the minded graph(s)

get_graphs_for_traces(event_log: DataFrame) → Tuple[GraphCollection, List[DataFrame]]

This function first converts each trace into a new EventLog and creates then creates for each of them a new graph. The resulting graphs are stored inside the GraphCollection.

Parameters:: event_log – The event log
Returns:: The GraphCollection with the minded graphs

iter_graphs_for_traces(event_log: DataFrame) → Iterator[Tuple[Graph, DataFrame]]

Iter all traces, convert them into a new EventLog and yield a graph for each of these traces.

Parameters:: event_log – The event log

class collaboration_detection.graph_mining.DfgGraphMiner(converter: EventLogConverter | None = None, **kwargs): A simple graph mining algorithm that takes an event log and converts it into a Directly Follow Graph (DFG). No additional kwargs are required.

class collaboration_detection.graph_mining.HeuristicGraphMiner(converter: EventLogConverter | None = None, **kwargs)

Discovers a heuristics net.

The following kwargs parameters can be provided:

dependency_threshold: Dependency threshold (default: 0.5)
and_threshold: AND threshold (default: 0.65)
loop_two_threshold: Loop two threshold (default: 0.5)

class collaboration_detection.graph_mining.CollaborationProcessInstanceMiner(converter: EventLogConverter | None = None, **kwargs)

This algorithm discovers a collaboration process instance.

The following kwargs parameters can be provided:

activity_classifier: List of additional attributes for the concept name of the activity nodes.
default: []
relation_attributes: List of attribute names, which are used as object nodes that are connected to the activities. default = []
activity_delimiter: Delimiter of the activity classifier attributes. default = “ – “
create_object_nodes: If true, create the object nodes; if false, just simulate the object nodes and only create the activty nodes; default: True
override_labels_of_relation_attributes: If true, the labels of the object nodes are overridden with the object type (e.g. org:resource, spm:sdid, ..); default: False

Preprocessing Converter

class collaboration_detection.preprocessing.event_log_converter.EventLogConverter

Transform an event log into a new event log where the traces and/or events (and/or their attributes) are preprocessed / converted based on the concrete implementation of the converter.

abstractmethod convert_event(event: Series) → Series

Converts an event into a new event.

Parameters:: event – The input event
Returns:: The converted event

abstractmethod convert_event_log(event_log: DataFrame) → DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:: event_log – The input event log
Returns:: The converted event log(s)

abstractmethod convert_trace(trace: DataFrame) → DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:: trace – The input trace
Returns:: The converted trace or a list of traces

abstractmethod is_sub_log_converter() → bool

Does the converter returns multiple sub-logs?

Returns:: True if convert_event_log returns multiple (sub-)logs

iter_converted_traces(event_log: DataFrame) → Iterable[DataFrame]

Iterates over an event log and yields all converted traces.

Parameters:: event_log – The input event log

class collaboration_detection.preprocessing.event_log_converter.DefaultEventLogConverter

The DefaultEventLogConverter converts all traces and events as they are (no modifications). Can be used as base class for inheritance.

convert_event(event: Series) → Series

Converts an event into a new event.

Parameters:: event – The input event
Returns:: The converted event

convert_event_log(event_log: DataFrame) → DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:: event_log – The input event log
Returns:: The converted event log(s)

convert_trace(trace: DataFrame) → DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:: trace – The input trace
Returns:: The converted trace or a list of traces

is_sub_log_converter() → bool

Does the converter returns multiple sub-logs?

Returns:: True if convert_event_log returns multiple (sub-)logs

class collaboration_detection.preprocessing.event_log_converter.ActivityJoinerConverter(activity_classifier: List[str] | None = None, activity_delimiter=' / ')

The ActivityJoinerConverter converts the concept:name of all events by joining the attributes provided by activity_classifier. All other attributes are untouched.

convert_event_log(event_log: DataFrame) → DataFrame

Converts an event log into one or more new event log(s).

Parameters:: event_log – The input event log
Returns:: The converted event log(s)

class collaboration_detection.preprocessing.event_log_converter.TraceSplitConverter(split_attributes: List[str])

The trace split converter convert a trace from the event log into a new trace with modified case IDs. The new case IDs are created by appending a unique identifier to the original case ID for each trace. The unique identifier is created by concatenating the values (the category codes) for the attributes specified in the split_attributes list. If the split_attributes list is empty, the case IDs are not modified.

convert_event_log(event_log: DataFrame) → DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:: event_log – The input event log
Returns:: The converted event log(s)

convert_trace(trace: DataFrame) → DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:: trace – The input trace
Returns:: The converted trace or a list of traces

class collaboration_detection.preprocessing.event_log_converter.CombinedConverter(converter_list: List[EventLogConverter])

The CombinedConverter combines multiple event log converters into a single converter. The convert_event_log, convert_trace, and convert_event methods apply the corresponding method of each converter in the given order to the input data.

convert_event(event: Series) → Series

Converts an event into a new event.

Parameters:: event – The input event
Returns:: The converted event

convert_event_log(event_log: DataFrame) → DataFrame | List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:: event_log – The input event log
Returns:: The converted event log(s)

convert_trace(trace: DataFrame) → DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:: trace – The input trace
Returns:: The converted trace or a list of traces

is_sub_log_converter() → bool

Does the converter returns multiple sub-logs?

Returns:: True if convert_event_log returns multiple (sub-)logs

class collaboration_detection.preprocessing.event_log_converter.SublogConverter(split_attributes: List[str], similarity_attributes: List[str], timedelta_s: int = 0)

Convert an event log into a list of sublogs. The split_attributes list is used to split each existing trace into multiple subtraces which are then are used to create the sublogs. Each sublog contains a group of related (sub)traces, where the trace are considered as related if their events have overlapping timestamps and share common values for the attributes specified in the similarity_attributes list. The timedelta parameter defines how many seconds two events of two traces can be apart from each other.

convert_event_log(event_log: DataFrame) → List[DataFrame]

Converts an event log into one or more new event log(s).

Parameters:: event_log – The input event log
Returns:: The converted event log(s)

is_sub_log_converter() → bool

Does the converter returns multiple sub-logs?

Returns:: True if convert_event_log returns multiple (sub-)logs

class collaboration_detection.preprocessing.event_log_converter.AddPseudoEventConverter(add_pseudo_start_event=True, add_pseudo_end_event=True, attributes_start_event: Dict[str, Any] | None = None, attributes_end_event: Dict[str, Any] | None = None)

This converter adds a new pseudo-event at the beginning and end of the trace, depending on the values of the add_pseudo_start_event and add_pseudo_end_event variables. The timestamp of the events is based on the first/last event -/+ one second.

The new events are created using the attributes_start_event and attributes_end_event dictionaries and are added to the trace.

convert_trace(trace: DataFrame) → DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:: trace – The input trace
Returns:: The converted trace or a list of traces

class collaboration_detection.preprocessing.event_log_converter.TraceMergerConverter(*, new_case_id='NEW_DEFAULT_CASE', case_id_provider: Callable[[], str] | None = None)

Merges all traces of the event log into a single trace by setting the CASE_CONCEPT_NAME to the value provided by the function case_id_provider or by a fixed value defined by new_case_id.

convert_event_log(event_log: DataFrame) → DataFrame

Converts an event log into one or more new event log(s).

Parameters:: event_log – The input event log
Returns:: The converted event log(s)

class collaboration_detection.preprocessing.event_log_converter.AddCountAttributeInTraceConverter(columns_with_value_prefix: Dict[str, str], column_prefix='counted_')

The AddCountAttributeInTraceConverter adds count-based attributes to specified columns in a trace DataFrame. The parameter columns_with_value_prefix (Dict[str, str]) defines a dictionary specifying columns and their corresponding value prefixes. The value prefixes will be used to create a new attribute for each unique value in the specified columns. The parameter column_prefix defines a prefix that will be added to the newly created columns.

convert_trace(trace: DataFrame) → DataFrame

Converts a trace into one (OR MORE) new trace(s).

Parameters:: trace – The input trace
Returns:: The converted trace or a list of traces

Frequent Subgraph Mining

collaboration_detection.subgraph_mining.gspan(data: str | GraphCollection, result=None, sup=10, min_node=3, max_node=10, remove_data_graphs=True, tagging_sub_and_super_graphs=False, variant: GSpanVariant = GSpanVariant.GSPAN_JAVA)

Executes the gspan algorithm using a java implementation.

The jar file of the implementation should be provided at ../..bin/gSpan.Java-1.2.jar, or the path as jar_file param. Download the jar from here: GitHub gSpan.Java The data parameter must be provided. The output is stored in the Graph Collection.

Parameters:

data – File path (str) of the graph data set / or GraphCollection
result – File path of the result file
sup – Minimum support
min_node – Minimum number of nodes for each sub-graph
max_node – Maximum number of nodes for each sub-graph
remove_data_graphs – Only relevant, if type(data)==GraphCollection: Remove all other graphs before adding the sub graphs
tagging_sub_and_super_graphs – Only relevant, if type(data)==GraphCollection and remove_data_graphs==False: Create the subgraph mapping between the data graphs and the created sub graphs
variant – Define the GSpanVariant (java or rust implementation)

collaboration_detection.subgraph_mining.rfsm(g_c: GraphCollection, min_node: int, max_node: int, support: int) → GraphCollection

Execute relaxed fsm

This function is the wrapper function to execute relaxed fsm with a given graph collection, the min/max number of nodes and the requested support.

Parameters:

g_c – GraphCollection
min_node – int Number of min nodes in the final patterns
max_node – int Number of max nodes in the final patterns
support – int Minimum support value for the final patterns

Returns:

GraphCollection (patterns)

Graph Set Clustering

class collaboration_detection.clustering.clustering.Clustering

static execute(graph_collection: GraphCollection, algorithm: Algorithm, distance_metric: DistanceMetric, number_of_clusters: int | None = None, cluster_division_selector_metric: ClusterDivisionSelectorMetric | None = None, cluster_representative_seeder: ClusterRepresentativeSeeder | None = None, cluster_dissimilarity: float | None = None, cluster_centroid_selector: ClusterCentroidSelector | None = None) → List[Cluster]

This function is the entry point into the clustering package.

Parameters:

graph_collection – the graph collection to get the graphs from. The resulting clusters are updated inplace.
algorithm – the clustering_tests algorithm to use for clustering_tests
distance_metric – the distance metric to calculate the distances between the graphs
number_of_clusters – the number of clusters as stopping criterion for the hierarchical algorithm
cluster_division_selector_metric – the selector for the cluster to divided for the hierarchical algorithm
cluster_representative_seeder – the seed selector for the split clusters of the hierarchical algorithm
cluster_dissimilarity – the dissimilarity measure for the dissimilar cluster centroid selector for density and partitioning algorithm
cluster_centroid_selector – the selected cluster centroid selector for the partitioning algorithm

Returns:

Resulting clusters