Preprocessing

Event Log Converter

With the perprocessing module, the event logs can be converted. Different converter transform an event log into one (or more) event logs(s). Thereby, depending on the individual converter, a single event (or attribute), a complete trace, or the complete event log will be converted. The following subsections introduces each of the current available Converter. A converter can be created and passed to the Process Miner.

Activity Joiner

The ActivityJoinerConverter converts the concept:name of all events by joining the attributes provided by activity_classifier. All other attributes are untouched.

Example

# log = [{'concept:name': "A", 'org:resource': "a",...},...]
converter = ActivityJoinerConverter(
    activity_classifier=['concept:name', 'org:resource'],
    activity_delimiter=" (/) "
)
new_log = converter.convert_event_log(log)
# new_log = [{'concept:name': "Aa", 'org:resource': "a",...},...}

Trace Split

The trace split converter convert a trace from the event log into a new trace with modified case IDs. The new case IDs are created by appending a unique identifier to the original case ID for each trace. The unique identifier is created by concatenating the values (the category codes) for the attributes specified in the split_attributes list. If the split_attributes list is empty, the case IDs are not modified.

Example

# log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},...]
converter = TraceSplitConverter(
    split_attributes=['concept:name', 'org:resource']
)
new_log = converter.convert_event_log(log)
# new_log = [{'case:concept:name': 'c1_0_0', concept:name': "A", 'org:resource': "a",...},...}

Sublog Converter

Convert an event log into a list of sublogs. The split_attributes list is used to split each existing trace into multiple subtraces which are then are used to create the sublogs. Each sublog contains a group of related (sub)traces, where the trace are considered as related if their events have overlapping timestamps and share common values for the attributes specified in the similarity_attributes list. The timedelta parameter defines how many seconds two events of two traces can be apart from each other.

Example

# log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "b",...},...]
converter = SublogConverter(
    split_attributes=['concept:name'],
    similarity_attributes=['org:resource']
)
new_logs = converter.convert_event_log(log)
# new_logs = [[{'case:concept:name': 'c1_0', concept:name': "A", 'org:resource': "a",...},
#             {'case:concept:name': 'c1_1', concept:name': "B", 'org:resource': "a",...},
#             ]
#             [{'case:concept:name': 'c1_0', concept:name': "A", 'org:resource': "b",...}...],
#             ...]

Add Pseudo Event

This converter adds a new pseudo-event at the beginning and end of the trace, depending on the values of the add_pseudo_start_event and add_pseudo_end_event variables. The timestamp of the events is based on the first/last event -/+ one second.

The new events are created using the attributes_start_event and attributes_end_event dictionaries and are added to the trace.

Example

# log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "b",...},...]
log = self.generate_test_log(traces=2)
converter = AddPseudoEventConverter()
new_log = converter.convert_event_log(log)
# new_log = [{'case:concept:name': 'c1', concept:name': "Start"},
#            {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#            {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...},
#            {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "b",...},
#            {'case:concept:name': 'c1', concept:name': "End"},
#            ...]

Trace Merger Converter

Merges all traces of the event log into a single trace by setting the CASE_CONCEPT_NAME to the value provided by the function case_id_provider or by a fixed value defined by new_case_id.

Example

# log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...},
#        {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "b",...},...]
log = self.generate_test_log(traces=20, no_of_concept_names=10)
converter = TraceMergerConverter(case_id_provider=lambda: "test")
new_log = converter.convert_event_log(log)
# new_log = [{'case:concept:name': 'test', concept:name': "A", 'org:resource': "a",...},
#            {'case:concept:name': 'test', concept:name': "B", 'org:resource': "a",...},
#            {'case:concept:name': 'test', concept:name': "A", 'org:resource': "b",...},...]

Add Count Attribute In Trace

The AddCountAttributeInTraceConverter adds count-based attributes to specified columns in a trace DataFrame. The parameter columns_with_value_prefix (Dict[str, str]) defines a dictionary specifying columns and their corresponding value prefixes. The value prefixes will be used to create a new attribute for each unique value in the specified columns. The parameter column_prefix defines a prefix that will be added to the newly created columns.

Example

# log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#        ...
#        {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "b",...},
#        {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "a",...},...]
converter = AddCountAttributeInTraceConverter(columns_with_value_prefix={'org:resource': 'res_'},
                                              column_prefix='counted_')
new_log = converter.convert_event_log(log)
# new_log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a", 'counted_org:resource': "res_1",...},
#           {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a", 'counted_org:resource': "res_1"...},
#           ...
#           {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "b", 'counted_org:resource': "res_1"...},
#           {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "a", 'counted_org:resource': "res_2"...},...]

CombinedConverter

The CombinedConverter combines multiple event log converters into a single converter. The convert_event_log, convert_trace, and convert_event methods apply the corresponding method of each converter in the given order to the input data.

Example

converter = CombinedConverter([
                SublogConverter(
                    split_attributes=['concept:name'],
                    similarity_attributes=['org:resource']
                ),
                TraceMergerConverter()
        ])
new_logs = converter.convert_event_log(log)