2. Preprocessing & Process Discovery Configuration

The preprocessing and process model generation is directly executed after the Log Import with a default configuration. However, it can be triggered again with a new configuration that can be defined in this step. After submitting the configuration form the log is preprocessed again. The resulting Process Models can be viewed in the third step.

Configuration

This form defines custom parameters for the preprocessing and the process discovery algorithm.

Preprocessing

Split attributes

The Split Attributes are used to split each original trace into multiple traces, which are then are used to create the sub logs. For instance, a trace with 3 social documents (spm:sdid) is split into three traces if the attribute spm:sdid is used as split attribute.

Similarity attributes

The Similarity Attributes define event attributes that are used to combine traces into sub logs. Traces that (1) have at least two events with equal attribute values and (2) are overlapping in their timeframe are merged into sub-logs.

Timestamp overlap delta

The delta that is added to the timestamps of events so that the allowed overlapping timeframe of traces (for combining them into sub-logs) is extended.

Minimum Number of Resources

Define the minimum number of resources n so that all sublogs with less than n resources are dropped.

Artificial start and end activities

In each trace in a created sublog, an artificial start and end event can be included. E.g., the case A->B->C becomes Start->A->B->C->End. This behaviour can be configured.

Process discovery

Currently, the Collaboration Instance Miner (default), the DFG-Miner, or the Heuristic-Miner can be selected and used for the process instance model discovery. For the Heuristic Miner, the parameters Dependency Threshold, And Threshold, and Loop Two Threshold can be defined [1].

^{[1] For details see Weijters, A. & Aalst, Wil & Medeiros, Alves. (2006). Process Mining with the Heuristics
Miner-algorithm.}

NOTE: The Collaboration Pattern Detection Frameworks uses the pm4py library for the discovery algorithms.

Collaboration Process Instance Miner

The Collaboration Process Instance Miner is based on the DFG miner with additional features:

Categorical attributes can be used as additional part of a composed activity. Thereby the category is converted into an incremented label. E.g., the Resource attribute can be used. The first Resource becomes a “Resource 1”, etc. This feature ensures that the subsequent pattern recognition step can recognize patterns based on the labels (i.e., it is irrelevant whether user “A” starts a discussion or user “B”).
For each sub-log (process instance), the corresponding cases are merged into single long traces (sorted by the timestamps). This feature ensures that the handoff between different social documents (spm:sdid) is visible.