Enterango has developed an innovative travel planning and management tool for meetings and events, that requires large amounts of data. Some of that data are not readily available and has to be integrated from different sources. These sources have a heterogeneous structure, might contain errors, are often not complete and could be updated at any time. To create a sustainable and efficient data integration and cleaning process, Enterango teams up with the Information and Software Engineering team of the TU Wien to apply the latest research insights to solve this real world data problem.

In this Verification and Semantics for Data Quality Improvements (VaSQua) project, a new data integration and workflow tracking process is described.

First, data from different sources are integrated into a knowledge graph. The various datasets are enriched with metadata describing the history, origin, processing of the data and other features. The data with its metadata are then used to automatically clean and correct data and the remaining data is manually corrected by a human. The data-integration process is novel, as according to latest research additional metadata findings are used to reduce the required human work during the data cleaning.

Call From Science to Products 2019