# Frequent Subgraph Mining

After generating multiple process graphs with the [Process Mining](processMining.html) module, the resulting graphs can be processed using **Frequent Subgraph Mining** to extract patterns.
To generate the frequent sub-graphs (the patterns), the following code can be used.


## Collaboration Instance Subgraph Mining
Only for patterns created by the Collaboration Process Instance Miner.

tbd...


## gSpan (Exact Frequent Subgraph Mining)
The [gSpan](https://sites.cs.ucsb.edu/~xyan/software/gSpan.htm) algorithm is used. This package requires
the [gSpan Java implementation](https://github.com/joleaf/gSpan.Java).

```python
from collaboration_detection.datastructures import GraphCollection
from collaboration_detection.subgraph_mining import gspan

# 1. create a graph collection (import or by mining new graphs)
g_c = GraphCollection().load("graphs.db")
# 2. Execute the gspan algorithm
gspan(
    data=g_c,
    min_node=3,
    max_node=10,
    sup=20,
    remove_data_graphs=True
)
# 3. All sub-graphs are now stored in the graph collection
g_c.graphs.items()
```

## Relaxed Frequent Subgraph Mining
For a more fuzzy finding of patterns, a relaxed algorithm can be called:
```python
from collaboration_detection.datastructures import GraphCollection
from collaboration_detection.subgraph_mining import rfsm

# 1. create a graph collection (import or by mining new graphs)
g_c = GraphCollection().load("graphs.db")
# 2. Execute the rfsm algorithm
g_c = rfsm(
    g_c,
    min_node=3,
    max_node=10,
    support=20
)
# 3. All sub-graphs are now stored in the graph collection
g_c.graphs.items()
```