Numerous digital documents, databases, webpages, sensor data and many other heterogeneous data sources are available through the WWW and information clouds. Efficiently using and exploiting all this information in querying, retrieval and analysis tasks is a real IT challenge.
DDCM investigates how structured data (databases), semi-structured data (webpages) and unstructured data (digitized text and multimedia) can be (temporarily) integrated in a single, consistent structure that is better suited for further querying, processing or analyses. We developed a unique software framework for the integration, cleaning and analysis of data contained in multiple data sources. Related to this work is the development of metasearch engines for text, xml and html documents. A metasearch engine searches over multiple data sources and then combines the results into a single list.