Preprocessing

Event Log Converter

With the perprocessing module, the event logs can be converted. Different converter transform an event log into one (or more) event logs(s). Thereby, depending on the individual converter, a single event (or attribute), a complete trace, or the complete event log will be converted. The following subsections introduces each of the current available Converter. A converter can be created and passed to the Process Miner.

Activity Joiner

The ActivityJoinerConverter converts the concept:name of all events by joining the attributes provided by activity_classifier. All other attributes are untouched.

Example

# log = [{'concept:name': "A", 'org:resource': "a",...},...]
converter = ActivityJoinerConverter(
    activity_classifier=['concept:name', 'org:resource'],
    activity_delimiter=" (/) "
)
new_log = converter.convert_event_log(log)
# new_log = [{'concept:name': "Aa", 'org:resource': "a",...},...}

Trace Split

The trace split converter convert a trace from the event log into a new trace with modified case IDs. The new case IDs are created by appending a unique identifier to the original case ID for each trace. The unique identifier is created by concatenating the values (the category codes) for the attributes specified in the split_attributes list. If the split_attributes list is empty, the case IDs are not modified.

Example

# log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},...]
converter = TraceSplitConverter(
    split_attributes=['concept:name', 'org:resource']
)
new_log = converter.convert_event_log(log)
# new_log = [{'case:concept:name': 'c1_0_0', concept:name': "A", 'org:resource': "a",...},...}

Sublog Converter

Convert an event log into a list of sublogs. The split_attributes list is used to split each existing trace into multiple subtraces which are then are used to create the sublogs. Each sublog contains a group of related (sub)traces, where the trace are considered as related if their events have overlapping timestamps and share common values for the attributes specified in the similarity_attributes list. The timedelta parameter defines how many seconds two events of two traces can be apart from each other.

Example

# log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "b",...},...]
converter = SublogConverter(
    split_attributes=['concept:name'],
    similarity_attributes=['org:resource']
)
new_logs = converter.convert_event_log(log)
# new_logs = [[{'case:concept:name': 'c1_0', concept:name': "A", 'org:resource': "a",...},
#             {'case:concept:name': 'c1_1', concept:name': "B", 'org:resource': "a",...},
#             ]
#             [{'case:concept:name': 'c1_0', concept:name': "A", 'org:resource': "b",...}...],
#             ...]

Add Pseudo Event

This converter adds a new pseudo-event at the beginning and end of the trace, depending on the values of the add_pseudo_start_event and add_pseudo_end_event variables. The timestamp of the events is based on the first/last event -/+ one second.

The new events are created using the attributes_start_event and attributes_end_event dictionaries and are added to the trace.

Example

# log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "b",...},...]
log = self.generate_test_log(traces=2)
converter = AddPseudoEventConverter()
new_log = converter.convert_event_log(log)
# new_log = [{'case:concept:name': 'c1', concept:name': "Start"},
#            {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#            {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...},
#            {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "b",...},
#            {'case:concept:name': 'c1', concept:name': "End"},
#            ...]

Trace Merger Converter

Merges all traces of the event log into a single trace by setting the CASE_CONCEPT_NAME to the value provided by the function case_id_provider or by a fixed value defined by new_case_id.

Example

# log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...},
#        {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "b",...},...]
log = self.generate_test_log(traces=20, no_of_concept_names=10)
converter = TraceMergerConverter(case_id_provider=lambda: "test")
new_log = converter.convert_event_log(log)
# new_log = [{'case:concept:name': 'test', concept:name': "A", 'org:resource': "a",...},
#            {'case:concept:name': 'test', concept:name': "B", 'org:resource': "a",...},
#            {'case:concept:name': 'test', concept:name': "A", 'org:resource': "b",...},...]

Add Count Attribute In Trace

The AddCountAttributeInTraceConverter adds count-based attributes to specified columns in a trace DataFrame. The parameter columns_with_value_prefix (Dict[str, str]) defines a dictionary specifying columns and their corresponding value prefixes. The value prefixes will be used to create a new attribute for each unique value in the specified columns. The parameter column_prefix defines a prefix that will be added to the newly created columns.

Example

# log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#        {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},
#        ...
#        {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "b",...},
#        {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "a",...},...]
converter = AddCountAttributeInTraceConverter(columns_with_value_prefix={'org:resource': 'res_'},
                                              column_prefix='counted_')
new_log = converter.convert_event_log(log)
# new_log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a", 'counted_org:resource': "res_1",...},
#           {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a", 'counted_org:resource': "res_1"...},
#           ...
#           {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "b", 'counted_org:resource': "res_1"...},
#           {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "a", 'counted_org:resource': "res_2"...},...]

CombinedConverter

The CombinedConverter combines multiple event log converters into a single converter. The convert_event_log, convert_trace, and convert_event methods apply the corresponding method of each converter in the given order to the input data.

Example

converter = CombinedConverter([
                SublogConverter(
                    split_attributes=['concept:name'],
                    similarity_attributes=['org:resource']
                ),
                TraceMergerConverter()
        ])
new_logs = converter.convert_event_log(log)