Process Mining

Mining Process Graphs

The first step is to mine (process) graphs from event logs. Therefore, the PM4Py library is used for reading the input log and performing the discovery task.

Directly Follows Graphs (DFGs) can be generated with the following code:

from pm4py.objects.log.importer.xes import importer as xes_importer
from collaboration_detection.graph_mining import DfgGraphMiner
from collaboration_detection.graph_mining import HeuristicGraphMiner

event_log = xes_importer.apply('myEventLog.xes.gz')
miner = DfgGraphMiner()  # with default EventLogConverter
# Get a single graph
graph_collection, converted_event_log = miner.get_graph(event_log)
# Get all graphs for all traces individually
graph_collection, converted_event_logs = miner.get_graphs_for_traces(event_log)
# or using the heuristic miner
miner = HeuristicGraphMiner()
graph_collection, converted_event_log = miner.get_graphs_for_traces(event_log)

The resulting graph(s) are stored in a Graph Collection.

To iterate over all generated graphs of an event log, the following code can be used:

from pm4py.objects.log.importer.xes import importer as xes_importer
from collaboration_detection.graph_mining import DfgGraphMiner, HeuristicGraphMiner

event_log = xes_importer.apply('myEventLog.xes.gz')
miner = DfgGraphMiner()
# miner = HeuristicGraphMiner() Use the HeuristicMiner
for graph, event_log in miner.iter_graphs_for_traces(event_log):
    ...

Collaboration Process Instance Miner

The Collaboration Process Instance Miner creates a collaboration process instance graph, where activity nodes represent events with their concept name and can be enriched with additional attributes. Further object nodes represent the context of the activities and are created from the events’ attributes. The miner creates edges between consecutive activities and connects activities to related object nodes (e.g., resources, social documents).

Key Features

  • Activity Nodes: Represent events with labels based on the activity classifier (default: concept:name)

  • Object Nodes: Represent related entities (e.g., org:resource, spm:sdid) that activities interact with

  • Follow Relations: Edges between consecutive activities in a trace

  • Object Relations: Edges between activities and related object nodes

Parameters

The miner accepts the following parameters via kwargs:

Parameter

Type

Default

Description

activity_classifier

List[str]

[]

Additional attributes to include in the activity label (concept:name is always included)

relation_attributes

List[str]

[]

Attribute names used to create object nodes connected to activities

activity_delimiter

str

" -- "

Delimiter used to join activity classifier attributes

create_object_nodes

bool

True

If True, creates actual object nodes; if False, simulates object nodes (only activity nodes created)

override_labels_of_relation_attributes

bool or Set[str]

False

If True, overrides object node labels with their type; if a set is provided, only those attributes are used

use_object_type_in_label

bool

True

If True, object node labels include the type prefix (e.g., org:resource: John)

Example Usage

Basic usage with default settings:

from collaboration_detection.graph_mining import CollaborationProcessInstanceMiner

miner = CollaborationProcessInstanceMiner(
    converter=TraceMergerConverter(),
    relation_attributes=["org:resource"]
)
graph_collection, converted_log = miner.get_graph(event_log)

Advanced usage with custom configuration:

from collaboration_detection.graph_mining import CollaborationProcessInstanceMiner
from collaboration_detection.preprocessing.event_log_converter import (
    CombinedConverter, SublogConverter, TraceMergerConverter,
    AddPseudoEventConverter, AddCountAttributeInTraceConverter
)

miner = CollaborationProcessInstanceMiner(
    converter=CombinedConverter([
        SublogConverter(
            split_attributes=['spm:sdid'],
            similarity_attributes=['org:resource']
        ),
        TraceMergerConverter(),
        AddPseudoEventConverter(),
        AddCountAttributeInTraceConverter(
            columns_with_value_prefix={'org:resource': 'res_', 'spm:sdid': 'sd_'},
            column_prefix='counted_'
        )
    ]),
    activity_classifier=['counted_org:resource'],
    relation_attributes=['counted_spm_sdid'],
    use_object_type_in_label=False,
    create_object_nodes=True
)

graph_collection, converted_log = miner.get_graph(event_log)

Post-processing: Override Object Node Labels

The override_labels_of_object_nodes function allows post-processing of object node labels:

from collaboration_detection.graph_mining import CollaborationProcessInstanceMiner
from collaboration_detection.graph_mining.collaboration_process_instance_miner import (
    override_labels_of_object_nodes
)

miner = CollaborationProcessInstanceMiner(
    converter=TraceMergerConverter(),
    relation_attributes=["org:resource"]
)

graph_collection, converted_log = miner.get_graph(event_log)

# Override all object node labels with their type
override_labels_of_object_nodes(graph_collection, object_node_types={"org:resource"})

This modifies the graph in-place, replacing object node labels (e.g., org:resource: John Smith) with just the type (e.g., org:resource).

This function can also be executed directly by setting the use_object_type_in_label with a set of attributes.

Preprocessing

Before applying the miner, the event log can be preprocessed with an Event Log Converter. More details on the preprocessing modules can be found here.

from collaboration_detection.graph_mining import DfgGraphMiner
from collaboration_detection.preprocessing.event_log_converter import

DefaultEventLogConverter, ActivityJoinerConverter, SublogConverter, CombinedConverter, AddPseudoEventConverter

miner = DfgGraphMiner(DefaultEventLogConverter())
# with custom converter
miner = DfgGraphMiner(ActivityJoinerConverter(activity_classifier=['concept:name', 'other:attribute']))
# with the sublog converter
converter = SublogConverter(
    split_attributes=[],
    similarity_attributes=['attr1']
)
miner = DfgGraphMiner(converter)
# with multiple converters
# The idea is that an event log is split into multiple sublogs (e.g. each trace is one log).
# Furthermore, the AddPseudoEventConverter adds additional start and end events to each trace.
# which are then used as input for the process miner.
converter = CombinedConverter([
    SublogConverter(
        split_attributes=['concept:name'],
        similarity_attributes=['org:resource']
    ),
    AddPseudoEventConverter()
])
miner = DfgGraphMiner(converter)