Frequent Subgraph Mining

After generating multiple process graphs with the Process Mining module, the resulting graphs can be processed using Frequent Subgraph Mining to extract patterns. To generate the frequent sub-graphs (the patterns), the following code can be used.

Collaboration Instance Subgraph Mining

Only for patterns created by the Collaboration Process Instance Miner.

tbd…

gSpan (Exact Frequent Subgraph Mining)

The gSpan algorithm is used. This package requires the gSpan Java implementation.

from collaboration_detection.datastructures import GraphCollection
from collaboration_detection.subgraph_mining import gspan

# 1. create a graph collection (import or by mining new graphs)
g_c = GraphCollection().load("graphs.db")
# 2. Execute the gspan algorithm
gspan(
    data=g_c,
    min_node=3,
    max_node=10,
    sup=20,
    remove_data_graphs=True
)
# 3. All sub-graphs are now stored in the graph collection
g_c.graphs.items()

Relaxed Frequent Subgraph Mining

For a more fuzzy finding of patterns, a relaxed algorithm can be called:

from collaboration_detection.datastructures import GraphCollection
from collaboration_detection.subgraph_mining import rfsm

# 1. create a graph collection (import or by mining new graphs)
g_c = GraphCollection().load("graphs.db")
# 2. Execute the rfsm algorithm
g_c = rfsm(
    g_c,
    min_node=3,
    max_node=10,
    support=20
)
# 3. All sub-graphs are now stored in the graph collection
g_c.graphs.items()