MobyDQ is a tool for data engineering teams to automate data quality checks on their data pipeline, capture data quality issues and trigger alerts in case of anomaly, regardless of the data sources they use.
This tool has been inspired by an internal project developed at Ubisoft Entertainment in order to measure and improve the data quality of its Enterprise Data Platform. However, this open source version has been reworked to improve its design and remove technical dependencies with commercial software.
Getting Started
Skip the bla bla and run your data quality indicators by following the Getting Started page.
Measuring Data Quality
A considerable amount of data quality research involves investigating and describing various categories of desirable attributes of data. These dimensions commonly include accuracy, correctness, currency, completeness and relevance. Nearly 200 such terms have been identified and there is little agreement in their nature [...] source: Wikipedia
Taking this lack of consensus into account, MobyDQ provides a toolbox for data engineering teams to design data quality indicators with the objective to answer the following questions:
- Is all the necessary data present in the system?
- Is the data available at the time needed for its usage?
- Is the data compliant with validation or business rules?
- Does the data reflect real world objects?
These questions can be answered using the following types of indicators:
Indicator Type | Description |
---|---|
Anomaly detection | Machine learning algorithms to detect outlying values. **Work in progress**. |
Completeness | Difference in percentage between a measure computed in the source system and the same measure computed in the target system. |
Freshness | Difference in minutes between the current timestamp and the last updated timestamp in the target system. |
Latency | Difference in minutes between the last updated timestamp in the source system and the last updated timestamp in the target system. |
Validity | Any measure computed in target system which does not comply with a validation or business rules. |