# 2. Preprocessing & Process Discovery Configuration The preprocessing and process model generation is directly executed after the [Log Import](1_Import_Event_Log.md) with a default configuration. However, it can be triggered again with a new configuration that can be defined in this step. After submitting the configuration form the log is preprocessed again. The resulting [Process Models](3_Process_Models.md) can be viewed in the third step. ![img.png](img/pm_config.gif) ## Configuration This form defines custom parameters for the preprocessing and the process discovery algorithm. ### Preprocessing #### Split attributes The `Split Attributes` are used to split each original trace into multiple traces, which are then are used to create the sub logs. For instance, a trace with 3 social documents (`spm:sdid`) is split into three traces if the attribute `spm:sdid` is used as split attribute. #### Similarity attributes The `Similarity Attributes` define event attributes that are used to combine traces into sub logs. Traces that (1) have at least two events with equal attribute values and (2) are overlapping in their timeframe are merged into sub-logs. #### Timestamp overlap delta The delta that is added to the timestamps of events so that the allowed overlapping timeframe of traces (for combining them into sub-logs) is extended. #### Minimum Number of Resources Define the minimum number of resources `n` so that all sublogs with less than `n` resources are dropped. #### Artificial start and end activities In each trace in a created sublog, an artificial start and end event can be included. E.g., the case `A->B->C` becomes `Start->A->B->C->End`. This behaviour can be configured. ### Process discovery Currently, the *Collaboration Instance Miner* (default), the *DFG-Miner*, or the *Heuristic-Miner* can be selected and used for the process instance model discovery. For the Heuristic Miner, the parameters `Dependency Threshold`, `And Threshold`, and `Loop Two Threshold` can be defined [1]. [1] For details see _Weijters, A. & Aalst, Wil & Medeiros, Alves. (2006). Process Mining with the Heuristics Miner-algorithm._ --- > **_NOTE:_** The *Collaboration Pattern Detection Frameworks* uses the [pm4py](https://pm4py.fit.fraunhofer.de/) > library for the discovery algorithms. --- #### Collaboration Process Instance Miner The *Collaboration Process Instance Miner* is based on the DFG miner with additional features: - Categorical attributes can be used as additional part of a composed activity. Thereby the category is converted into an incremented label. E.g., the *Resource* attribute can be used. The first *Resource* becomes a "Resource 1", etc. This feature ensures that the subsequent pattern recognition step can recognize patterns based on the labels (i.e., it is irrelevant whether user "A" starts a discussion or user "B"). - For each sub-log (process instance), the corresponding cases are merged into single long traces (sorted by the timestamps). This feature ensures that the handoff between different social documents (`spm:sdid`) is visible.