Process Mining: Data science in Action

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be applied directly to analyze and improve processes in a variety of domains.

About The Course

Data science is the profession of the future, because organizations that are unable to use (big) data in a smart way will not survive. It is not sufficient to focus on data storage and data analysis. The data scientist also needs to relate data to process analysis. Process mining bridges the gap between traditional model-based process analysis (e.g., simulation and other business process management techniques) and data-centric analysis techniques such as machine learning and data mining. Process mining seeks the confrontation between event data (i.e., observed behavior) and process models (hand-made or discovered automatically). This technology has become available only recently, but it can be applied to any type of operational processes (organizations and systems). Example applications include: analyzing treatment processes in hospitals, improving customer service processes in a multinational, understanding the browsing behavior of customers using a booking site, analyzing failures of a baggage handling system, and improving the user interface of an X-ray machine. All of these applications have in common that dynamic behavior needs to be related to process models. Hence, we refer to this as "data science in action".

The course explains the key analysis techniques in process mining. Participants will learn various process discovery algorithms. These can be used to automatically learn process models from raw event data. Various other process analysis techniques that use event data will be presented. Moreover, the course will provide easy-to-use software, real-life data sets, and practical skills to directly apply the theory in a variety of application domains.

Frequently Asked Questions

Will I earn a Statement of Accomplishment for this course?
Students who successfully complete the class will receive a Statement of Accomplishment signed by the instructor.

What resources will I need for this class?
To watch the lectures: a desktop, laptop and/or tablet (note that the tools do not work on Android or Apple tablets), a good internet connection, the reading material that we will provide and your curiosity. For the tools and the peer assignment a desktop or laptop is required as the tools do not work on tablets.

Do I need specific software?
Yes, besides standard software such as an internet browser, we use specific tools. Please see the “Software” section on this page.

Do I need a scientific background?
A basic understanding of logic, sets, and statistics (at the undergraduate level) is assumed.

How do I ask questions?
There will be an on-line discussion forum in which students can ask questions and receive answers. While the scale of an on-line class means that often the fastest (and best!) answer comes from another student, the course staff will monitor the discussions for accuracy and to address questions where the student community particularly wants to hear from the staff.

Why do you offer this course for free?
Eindhoven University of Technology is a young and very ambitious technical university that wants to expand its international profile and communicate some of its many core expertise areas to the rest of the world. We are committed to providing students the space for obtaining a thorough and multifaceted education. This MOOC offers us the possibility to share our knowledge globally.

Data Scientist: The Sexiest Job of the 21st Century?
Hal Varian, the chief economist at Google said in 2009:"The sexy job in the next 10 years will be statisticians. People think I'm joking, but who would've guessed that computer engineers would've been the sexy job of the 1990s?". Later the article "Data Scientist: The Sexiest Job of the 21st Century" triggered a discussion on the emerging need for data scientists. This was picked up by several media and when analyzing job vacancies, one can indeed see the rapidly growing demand for data scientists. The recent attention for Big Data illustrates the importance of data science.

Is process mining the same as data mining?
Traditional data mining approaches are not process-centric. Input for data mining is typically a set of records and the output is a decision tree, a collection of clusters, or frequent patterns. Process mining starts from events and the output is related to an end-to-end process model. Data mining tools can be used to support particular decisions in a larger process. However, they cannot be used for process discovery, conformance checking, and other forms of process analysis. The course also introduces basic data mining approaches and relates these to process mining to show differences and commonalities.

What kind of software will be used?
The courses uses ProM, an open-source process mining framework (see www.processmining.org), and Disco, a commercial process mining tool from Fluxicon (see www.fluxicon.com). Disco is an easy to use tool that can be used free of charge by course participants. Using Disco it is very easy to convert raw data into an event log suitable for process mining and quickly create process models showing the bottlenecks in a process. ProM is a more advanced tool that provides hundreds of different types of analysis. All process mining techniques discussed in the course are supported by ProM.

What kind of datasets will be used?
The course will provide several data sets ranging from simple synthetic event logs to complex and large real-life event logs, e.g., treatment data of a hospital, incident logs from a car manufacturer, loan application logs from an insurance company, and event logs from a bank. The simple event logs are used to explain and illustrate the techniques. The complex event logs are used to provide insights into the challenges real-life data science projects are facing.

Is process mining only suitable for the analysis of business processes?
No, although many of the examples will come from business processes, one can also find processes in software and all kinds of devices. Process mining can for example also be used to understand why and when machines and software products fail. Through the internet of things more and more devices will be connected to the internet, thus significantly extending the reach of process mining. Process mining can be used for the analysis of any behavior, i.e., also at the level of machines and hardware/software systems.

Can I apply process mining to my own data?
Event data is everywhere, as is illustrated by the many examples in this course. Participants are encouraged to apply the software to data sets surrounding them, e.g., data taken from social media (twitter and facebook) or from enterprise information systems surrounding them (e.g., SAP)


Recommended Background

A basic understanding of logic, sets, and statistics (at the undergraduate level) is assumed. Basic computer skills are required to use the software provided with the course (but no programming experience is needed). Participants are also expected to have an interest in process modeling and data mining but no specific prior knowledge is assumed as these concepts are introduced in the course.