# Process Mining

## Mining Process Graphs

The first step is to mine (process) graphs from event logs.
Therefore, the [PM4Py](https://pm4py.fit.fraunhofer.de/) library is used for reading the input log and performing the
discovery task.

Directly Follows Graphs (DFGs) can be generated with the following code:

```python
from pm4py.objects.log.importer.xes import importer as xes_importer
from collaboration_detection.graph_mining import DfgGraphMiner
from collaboration_detection.graph_mining import HeuristicGraphMiner

event_log = xes_importer.apply('myEventLog.xes.gz')
miner = DfgGraphMiner()  # with default EventLogConverter
# Get a single graph
graph_collection, converted_event_log = miner.get_graph(event_log)
# Get all graphs for all traces individually
graph_collection, converted_event_logs = miner.get_graphs_for_traces(event_log)
# or using the heuristic miner
miner = HeuristicGraphMiner()
graph_collection, converted_event_log = miner.get_graphs_for_traces(event_log)
```

The resulting graph(s) are stored in a [Graph Collection](graphCollection.html).

To iterate over all generated graphs of an event log, the following code can be used:

```python
from pm4py.objects.log.importer.xes import importer as xes_importer
from collaboration_detection.graph_mining import DfgGraphMiner, HeuristicGraphMiner

event_log = xes_importer.apply('myEventLog.xes.gz')
miner = DfgGraphMiner()
# miner = HeuristicGraphMiner() Use the HeuristicMiner
for graph, event_log in miner.iter_graphs_for_traces(event_log):
    ...
```

### Collaboration Process Instance Miner

The **Collaboration Process Instance Miner** creates a collaboration process instance graph, where the activity label is
merged with relevant attributes (e.g., the concept name and the resource)
These activity nodes are connected to the related important objects (e.g., the social document).

```python
miner = CollaborationProcessInstanceMiner(
    converter=CombinedConverter(
        [
            SublogConverter(
                split_attributes=['spm:sdid'],
                similarity_attributes=['org:resource']
            ),
            TraceMergerConverter(),
            AddPseudoEventConverter(),
            AddCountAttributeInTraceConverter(columns_with_value_prefix={'org:resource': 'res_', 'spm:sdid': sd_},
                                              column_prefix='counted_')
        ]
    ),
    activity_classifier=['counted_org:resource'],
    relation_attributes=['counted_spm_sdid']
)
```

## Preprocessing

Before applying the miner, the event log can be preprocessed with an Event Log Converter.
More details on the preprocessing modules can be found [here](preprocessing.html).

```python
from collaboration_detection.graph_mining import DfgGraphMiner
from collaboration_detection.preprocessing.event_log_converter import

DefaultEventLogConverter, ActivityJoinerConverter, SublogConverter, CombinedConverter, AddPseudoEventConverter

miner = DfgGraphMiner(DefaultEventLogConverter())
# with custom converter
miner = DfgGraphMiner(ActivityJoinerConverter(activity_classifier=['concept:name', 'other:attribute']))
# with the sublog converter
converter = SublogConverter(
    split_attributes=[],
    similarity_attributes=['attr1']
)
miner = DfgGraphMiner(converter)
# with multiple converters
# The idea is that an event log is split into multiple sublogs (e.g. each trace is one log).
# Furthermore, the AddPseudoEventConverter adds additional start and end events to each trace.
# which are then used as input for the process miner.
converter = CombinedConverter([
    SublogConverter(
        split_attributes=['concept:name'],
        similarity_attributes=['org:resource']
    ),
    AddPseudoEventConverter()
])
miner = DfgGraphMiner(converter)
```