MobyDQ is a tool for data engineering teams to automate data quality checks on their data pipeline, capture data quality issues and trigger alerts in case of anomaly, regardless of the data sources they use.

Data Pipeline

This tool has been inspired by an internal project developed at Ubisoft Entertainment in order to measure and improve the data quality of its Enterprise Data Platform. However, this open source version has been reworked to improve its design and remove technical dependencies with commercial software.

Getting Started

Skip the bla bla and run your data quality indicators by following the Getting Started page.

Measuring Data Quality

A considerable amount of data quality research involves investigating and describing various categories of desirable attributes of data. These dimensions commonly include accuracy, correctness, currency, completeness and relevance. Nearly 200 such terms have been identified and there is little agreement in their nature [...] source: Wikipedia

Taking this lack of consensus into account, MobyDQ provides a toolbox for data engineering teams to design data quality indicators with the objective to answer the following questions:

  • Is all the necessary data present in the system?
  • Is the data available at the time needed for its usage?
  • Is the data compliant with validation or business rules?
  • Does the data reflect real world objects?

These questions can be answered using the following types of indicators:

Indicator Type Description
Anomaly detection Machine learning algorithms to detect outlying values. **Work in progress**.
Completeness Difference in percentage between a measure computed in the source system and the same measure computed in the target system.
Freshness Difference in minutes between the current timestamp and the last updated timestamp in the target system.
Latency Difference in minutes between the last updated timestamp in the source system and the last updated timestamp in the target system.
Validity Any measure computed in target system which does not comply with a validation or business rules.