top of page

Managing Data Creep in Process Mining

Writer's picture: James NewmanJames Newman

Storage can be a tricky problem. Especially after going live on a process mining project when the live data feed drives the number up.


Managing Data Creep in Process Mining

Continuous data pulls can be massively important but can also be a massive pain when they start to creep larger and larger. Each new idea seems to take far more data than initially expected and you can quickly be looking down the barrel of a large expansion fee. Data creep in process mining doesn't have to happen to you -

 

Doculabs 5 pronged approach to begin a continuous cycle of data awareness:

 

1. Document a list of tables outside of the tool and maintain its accuracy

Each new table first gets added here where its pros and cons can be weighted before it adds more weight to your storage.

 

2. Survey your data usage from extraction to dashboard

While this is the largest and most daunting task on the list, it is also the most important. Why pay to store data that your users don't even see? Let that data stay in the source system until it becomes useful again. This is an important data task we akin to spring cleaning.

 

3. Be mindful of creating new tables

The most common tool on the market, Celonis, will limit your extractions if they will exceed the storage. However, creating new tables inside the tool will sneak past the limit warning and could potentially lock down your instance. This can be mitigated by doing a select count wrapper around the insert or create statement before executing. While it could slow you down a day or two, executing slightly slower is preferable to a shutdown over the weekend.

 

4. Do the math

This is our writers least favorite task on the list but can be critical to evaluating a poor data situation. Boiling the issue all the way down to the number of bytes, columns, and widgets can give you a clearer insight into what is taking the most data. It can be down to calculate potential issues before extracting data but is also useful in identifying potential columns that aren't bringing the value for the amount of data they take.

 

5. Don't be afraid to remove data

While it can be hard to overcome this fear, understanding the flow of data and utilizing it to your benefit can be greatly rewarding. Doing a cost benefit analysis of the price of keeping data in the cloud versus moving it offline does not always land in the corner of cloud. Therefore, it is an analysis that is necessary to execute before moving forward with a data strategy. Be sure to keep the proper data in the proper location for maximum utilization.

 

Hope this tip from ProcessMiningIQ helps you in your process mining journey!




28 views0 comments

Comments


bottom of page