Frequent Subgraph Mining

After generating multiple process graphs with the Process Mining module, the resulting graphs can be processed using Frequent Subgraph Mining to extract patterns. To generate the frequent sub-graphs, the following code can be used.

gSpan (Exact Frequent Subgraph Mining)

The gSpan algorithm is used. This package requires the gSpan Java implementation.

from collaboration_detection.datastructures import GraphCollection
from collaboration_detection.subgraph_mining import gspan

# 1. create a graph collection (import or by mining new graphs)
g_c = GraphCollection().load("graphs.db")
# 2. Execute the gspan algorithm
gspan(
    data=g_c,
    min_node=3,
    max_node=10,
    sup=20,
    remove_data_graphs=True
)
# 3. All sub-graphs are now stored in the graph collection
g_c.graphs.items()

Relaxed Frequent Subgraph Mining

For a more fuzzy finding of patterns, a relaxed algorithm can be called:

from collaboration_detection.datastructures import GraphCollection
from collaboration_detection.subgraph_mining import rfsm

# 1. create a graph collection (import or by mining new graphs)
g_c = GraphCollection().load("graphs.db")
# 2. Execute the rfsm algorithm
g_c = rfsm(
    g_c,
    min_node=3,
    max_node=10,
    support=20
)
# 3. All sub-graphs are now stored in the graph collection
g_c.graphs.items()