-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0d82b28
commit 8d914fa
Showing
1 changed file
with
12 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,23 +1,27 @@ | ||
--- | ||
layout: default | ||
layout: track | ||
track-id: 2 | ||
title: Data Integration | ||
leader: Asterios Katsifodimos | ||
phd: George Siachamis | ||
--- | ||
|
||
# Track 2: Data Integration | ||
|
||
The data integration track recognizes the importance of data for almost any application of artificial intelligence at ING. | ||
ING is a data-rich organization. Its data lake constitutes a federation of different data storage types. The relationships between the many different data sources evolve over time, and are hard to predict and manage. | ||
|
||
The goal of this track is to use semantics-based data matching to recognize such data relationships automatically. In particular, machine learning will be applied for the purpose of meta-data matching, automated schema discovery, schema evolution, and schema alignment. The results can be used to support data engineers to make data integration decisions by means of dataset exploration, discovery, and integration recommendation. | ||
The goal of this track is to use semantics-based data matching to recognize such data relationships automatically. In particular, we apply machine learning for the purpose of meta-data matching, automated schema discovery, schema evolution, and schema alignment. The results can be used to support data engineers to make data integration decisions by means of dataset exploration, discovery, and integration recommendation. | ||
|
||
The context in which this will take place is the ING cloud infra-service platform, which continuously collects operational data from a large range of private cloud services operated by ING across layers. | ||
The context is the ING cloud infra-service platform, which continuously collects operational data from a large range of private cloud services operated by ING across layers. | ||
|
||
Related work: | ||
## Selected publications | ||
|
||
- Hennie Huijgens, Eric Greuter, Jerry Brons, Evert A. van Doorn, Ioannis Papadopoulos, Francisco Morales Martinez, Mauricio Finavaro Aniche, Otto Visser, Arie van Deursen: Factors affecting cloud infra-service development lead times: a case study at ING. ICSE (SEIP) 2019: 233-242 | ||
1. G. Siachamis, K. Psarakis, M. Fragkoulis, Odysseas Papapetrou, A. van Deursen, A Katsifodimos (2023), Adaptive Distributed Streaming Similarity Joins, Marcelo Pasin (Eds.), In DEBS '23: Proceedings of the 17th ACM International Conference on Distributed and Event-based Systems p.25-36 ([preprint](https://research.tudelft.nl/en/publications/adaptive-distributed-streaming-similarity-joins)). | ||
|
||
- Daniel Vliegenthart, Sepideh Mesbah, Christoph Lofi, Akiko Aizawa, Alessandro Bozzon: Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications. TPDL 2019: 3-17 | ||
1. George Siachamis, Job Kanis, Wybe Koper, Kyriakos Psarakis, Marios Fragkoulis, Arie Van Deursen, Asterios Katsifodimos (2023), Towards Evaluating Stream Processing Autoscalers, In Proceedings - 2023 IEEE 39th International Conference on Data Engineering Workshops, ICDEW 2023 p.95-99, Institute of Electrical and Electronics Engineers (IEEE) ([preprint](https://research.tudelft.nl/en/publications/towards-evaluating-stream-processing-autoscalers)). | ||
|
||
**Track leader:** {{ page.leader }} | ||
1. G. Siachamis, G.J.P.M. Houben, A. van Deursen, A Katsifodimos (2021), Integrating Massive Data Streams, Philip A. Bernstein , Tilmann Rabl (Eds.), In Proceedings of the VLDB 2021 PhD Workshop Volume 2971, CEUR-WS ([preprint](https://research.tudelft.nl/en/publications/integrating-massive-data-streams)). | ||
|
||
1. Christos Koutras, Kyriakos Psarakis, George Siachamis, Andra Ionescu, Marios Fragkoulis, Angela Bonifati, Asterios Katsifodimos (2021), Valentine in Action: Matching Tabular Data at Scale, In Proceedings of the VLDB Endowment Volume 14 p.2871–2874 ([preprint](https://research.tudelft.nl/en/publications/valentine-in-action-matching-tabular-data-at-scale) and [dataset](https://delftdata.github.io/valentine/)). | ||
|
||
1. Christos Koutras, George Siachamis, Andra Ionescu, Kyriakos Psarakis, Jerry Brons, Marios Fragkoulis, Christoph Lofi, Angela Bonifati, Asterios Katsifodimos (2021), Valentine: Evaluating Matching Techniques for Dataset Discovery, In Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021 p.468-479, IEEE ([preprint](https://research.tudelft.nl/en/publications/valentine-evaluating-matching-techniques-for-dataset-discovery) and [dataset](https://delftdata.github.io/valentine/)). |