Process Mining - Step by Step

The methodology of process mining includes certain steps from data acquisition to process optimization. It includes both the calculation of deterministic algorithms and the application of machine learning to gain insights.

Process mining is an intelligent analysis of regular business processes based on event logs (information system logs). Process mining is mainly used to analyse non-trivial processes with complex hierarchy, but it can also be used to optimize processes that consist of only a few steps. The methodology of process mining involves specific steps and operations.

In general, a process mining analysis algorithm consists of the following steps:

  1. Discovery
  2. Conformance checking
  3. Enhancement
  4. Monitoring

Process Mining Methodology


The purpose of the first process mining step is to gather all data capturing the actual process flow, evaluate its completeness and quality, and calculate basic metrics and indicators characterizing the current process.

In the first step, automatic process recognition and exploratory analysis is performed, and business transactions are recorded in the event logs of the information systems. The minimum composition of the fields to be defined is:

  • Process ID;
  • Process name (Event name);
  • Time stamp.

The data is then collected and consolidated, and target metrics are calculated. Let's look at these processes in detail. Data collection and consolidation:

  • Data preparation (ETL);
  • Merging from different sources, connecting directories, etc;
  • Consolidation of data across all protocols;
  • Input validation;
  • Data validation.

Calculation of characteristics and metrics:

  • Time metrics - identifying net work time or work effort - actual duration of events, downtime, rework, overtime, etc.;
  • Graph values - identifying loops (rework), "ping-pongs" between steps, redundant process links;
  • Productivity metrics - calculation of workload per employee, average productivity value, average wait time;
  • Cost metrics - calculation of cost for each process step;
  • Loss calculation - calculation of time lost for each process;
  • Competitive metrics - benchmarking - comparing performance of employees, stores, etc;
  • Performance metrics - determining the business benefits of process optimization.

Conformance checking

In this phase, the aim is to determine the consistency of the actual process with the benchmark process, to identify critical deviations that hinder the planned process, and to carry out a kind of target/actual comparison.

The real process corresponds to the benchmark process, which is regulated in the company. First, the real process is replicated using the following algorithm:

  • Determining the actual rather than the "ideal" flow of activities;
  • Creation of a process word;
  • Detection of the repetitive and standard processes;
  • Determination of the percentage of processes that correspond to the reference path;
  • Detection of "happy paths" - sequences of events that most often lead to the desired result;
  • Search for behavioral patterns: cycles, "ping pong", etc.


This step is used to take into account and optimize the results of the conformance checking phase. Based on the results of the analysis, the processes are redesigned and the improvements made are pre-tested.

Making conclusions:

  • Visualization of the results, creation of dashboards;
  • Creation of expert recommendations;
  • Predicting the impact of improvements;
  • Business process optimization.
  • Redesign and modeling:
  • Proposing improvements;
  • Testing improvements against mathematical models


The goal of the last phase is to monitor the correct operation of the updated process and to check whether what was intended has been received.

In the last step, a regular monitoring of the processes is established, providing feedback to the interested users on the correct implementation of the procedures:

  • Monitoring the KPIs for the process with a certain frequency;
  • Verifying the compliance of the process with internal regulations;
  • Monitoring and alerting on incorrect transitions: Errors, process deficiencies, fraud;
  • Identification of resource-intensive processes;
  • Notifying process owners of deviations through various communication channels.

Process Mining & Machine Learning

Machine Learning is often used for process mining. The use of Machine Learning makes it possible to not only see the facts and identify the "process word", but to understand the problem at a deeper level. The following machine learning techniques are most commonly used for process mining:

  • Associative Rule Retrieval - automatic identification of basic and specific process flows.
  • Robust methods - automatic detection of variations in time, cost, and frequency. Enables detection of abrupt changes that are compensated when aggregating data over a long period of time.
  • Time series analysis - prediction of process cycle times and allowable deviations. Allows you to assess the limits of process variability and the need to react.