top of page
Writer's pictureJames Newman

Path of the Data 4 - Process Mining Data Model Configuration Walkthrough

Load extracted and transformed data into a polished process mining data model optimized for fast analysis.


This final video in our Path of the Data tutorial explains critcal stesps like definining table relationships, resolving duplicate keyerrors, and assigning the case table.


Watch below and read the trascript to cement the lesson in your mind.



Hi my name is James Newman. In this fourth video of our Path of the Data course, we'll be covering data models - this would be the load portion of the ETL pipeline. In Celonis, the load also defines the joins between the tables so the PQL in Studio can automatically do that for you.

 

We're in our AppDev environment, our Aardvark data pool. Right now there are 0 data model loads and this blue data model we're going to be talking about, as well as the output. You can say the load portion defines the joins and foreign keys.

 

It builds off the data storage, creates a polished query engine linked to Studio. So the query engine is the process data model in memory - once it's loaded, PQL can make queries much faster.

 

To make a data model, select the tables - I created a video_case table, it's just the case key to act as our case table. The difference between the event log and case table: the activity table is the N, the case table is one. So it's a snowflake data structure, all one-to-many data connections.

 

For the case table it has one row corresponding to N rows in the activity table. We're on the activity table definition - first is case key, our primary key. Next is activity, timestamp is event time.

 

Last is sorting, optional, but useful if activities have the same timestamp - it then puts them in order. We'll skip that for now. Click finish and we're brought into the configuration UI. The data model has been created successfully - video_test has this A button denoting it's the activity table.

 

Some layout things: all tables show up in this dotted box. You'll have a missing foreign keys error if any table is left there at the end. Event log auto merge is for when you have multiple event logs from merged activity tables - we'll cover that later.

 

You can view table relationships as a graph or list. I prefer graph. Add tables with this button or new foreign key to connect tables - that brings up settings to define the join. Note it lists the columns and has a Swap Tables button - that refers to the one and the many tables.

 

So video_test is actually our event log - see it has activity, case key, event time. Video_case only has case key - it should be our one table. Swap them, click case key to connect, click Save. This defines the joins between activity and case tables.

 

It's set up now, we need to load it. Go to Data Loads - right now no model load is in progress so click Load. This will take a few seconds, we only have 33 activity rows. The load length is directly proportional to table size.

 

Here's a common error - duplicates on both sides of the key relationship. Instead of one-to-many, it's many-to-many. Easy fix in this case, our case table select just needs a DISTINCT before selecting the case keys. Rerun that, then reload the model.

 

It loaded successfully now, no errors! See load time and details like rows loaded per table. Last things before Studio - Calendar defines working times and days, used in cycle time calculations in components. Name mappings allow human readable mapping of coded values.

 

One last key step - set the case table assignment. Without this, it would make a third table when you only need a two-table model. Hopefully this was helpful in loading a data model from creation through joins and loading. Let us know any questions!


Would you like regular tips? Click on the image below to receive a weekly tip so that you can get the most value from your process mining tools.  



6 views0 comments
bottom of page