# Preprocessing ## Event Log Converter With the *perprocessing* module, the event logs can be converted. Different converter transform an event log into one (or more) event logs(s). Thereby, depending on the individual converter, a single event (or attribute), a complete trace, or the complete event log will be converted. The following subsections introduces each of the current available Converter. A converter can be created and passed to the [Process Miner](processMining.html). ### Activity Joiner The ActivityJoinerConverter converts the `concept:name` of all events by joining the attributes provided by `activity_classifier`. All other attributes are untouched. #### Example ```python # log = [{'concept:name': "A", 'org:resource': "a",...},...] converter = ActivityJoinerConverter( activity_classifier=['concept:name', 'org:resource'], activity_delimiter=" (/) " ) new_log = converter.convert_event_log(log) # new_log = [{'concept:name': "Aa", 'org:resource': "a",...},...} ``` ### Trace Split The trace split converter convert a trace from the event log into a new trace with modified case IDs. The new case IDs are created by appending a unique identifier to the original case ID for each trace. The unique identifier is created by concatenating the values (the category codes) for the attributes specified in the `split_attributes` list. If the `split_attributes` list is empty, the case IDs are not modified. #### Example ```python # log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...},...] converter = TraceSplitConverter( split_attributes=['concept:name', 'org:resource'] ) new_log = converter.convert_event_log(log) # new_log = [{'case:concept:name': 'c1_0_0', concept:name': "A", 'org:resource': "a",...},...} ``` ### Sublog Converter Convert an event log into a list of sublogs. The `split_attributes` list is used to split each existing trace into multiple subtraces which are then are used to create the sublogs. Each sublog contains a group of related (sub)traces, where the trace are considered as related if their events have overlapping timestamps and share common values for the attributes specified in the `similarity_attributes` list. The `timedelta` parameter defines how many seconds two events of two traces can be apart from each other. #### Example ```python # log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...}, # {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...}, # {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "b",...},...] converter = SublogConverter( split_attributes=['concept:name'], similarity_attributes=['org:resource'] ) new_logs = converter.convert_event_log(log) # new_logs = [[{'case:concept:name': 'c1_0', concept:name': "A", 'org:resource': "a",...}, # {'case:concept:name': 'c1_1', concept:name': "B", 'org:resource': "a",...}, # ] # [{'case:concept:name': 'c1_0', concept:name': "A", 'org:resource': "b",...}...], # ...] ``` ### Add Pseudo Event This converter adds a new pseudo-event at the beginning and end of the trace, depending on the values of the `add_pseudo_start_event` and `add_pseudo_end_event` variables. The timestamp of the events is based on the first/last event -/+ one second. The new events are created using the `attributes_start_event` and `attributes_end_event` dictionaries and are added to the trace. #### Example ```python # log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...}, # {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...}, # {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "b",...},...] log = self.generate_test_log(traces=2) converter = AddPseudoEventConverter() new_log = converter.convert_event_log(log) # new_log = [{'case:concept:name': 'c1', concept:name': "Start"}, # {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...}, # {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...}, # {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "b",...}, # {'case:concept:name': 'c1', concept:name': "End"}, # ...] ``` ### Trace Merger Converter Merges all traces of the event log into a single trace by setting the `CASE_CONCEPT_NAME` to the value provided by the function `case_id_provider` or by a fixed value defined by `new_case_id`. #### Example ```python # log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...}, # {'case:concept:name': 'c1', concept:name': "B", 'org:resource': "a",...}, # {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "b",...},...] log = self.generate_test_log(traces=20, no_of_concept_names=10) converter = TraceMergerConverter(case_id_provider=lambda: "test") new_log = converter.convert_event_log(log) # new_log = [{'case:concept:name': 'test', concept:name': "A", 'org:resource': "a",...}, # {'case:concept:name': 'test', concept:name': "B", 'org:resource': "a",...}, # {'case:concept:name': 'test', concept:name': "A", 'org:resource': "b",...},...] ``` ### Add Count Attribute In Trace The AddCountAttributeInTraceConverter adds count-based attributes to specified columns in a trace DataFrame. The parameter `columns_with_value_prefix` (Dict[str, str]) defines a dictionary specifying columns and their corresponding value prefixes. The value prefixes will be used to create a new attribute for each unique value in the specified columns. The parameter `column_prefix` defines a prefix that will be added to the newly created columns. #### Example ```python # log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...}, # {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a",...}, # ... # {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "b",...}, # {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "a",...},...] converter = AddCountAttributeInTraceConverter(columns_with_value_prefix={'org:resource': 'res_'}, column_prefix='counted_') new_log = converter.convert_event_log(log) # new_log = [{'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a", 'counted_org:resource': "res_1",...}, # {'case:concept:name': 'c1', concept:name': "A", 'org:resource': "a", 'counted_org:resource': "res_1"...}, # ... # {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "b", 'counted_org:resource': "res_1"...}, # {'case:concept:name': 'c2', concept:name': "A", 'org:resource': "a", 'counted_org:resource': "res_2"...},...] ``` ### CombinedConverter The CombinedConverter combines multiple event log converters into a single converter. The `convert_event_log`, `convert_trace`, and `convert_event` methods apply the corresponding method of each converter in the given order to the input data. #### Example ```python converter = CombinedConverter([ SublogConverter( split_attributes=['concept:name'], similarity_attributes=['org:resource'] ), TraceMergerConverter() ]) new_logs = converter.convert_event_log(log) ```