Skip to content

Commit

Permalink
Update data integration track
Browse files Browse the repository at this point in the history
  • Loading branch information
avandeursen committed Sep 30, 2023
1 parent 0d82b28 commit 8d914fa
Showing 1 changed file with 12 additions and 8 deletions.
20 changes: 12 additions & 8 deletions _tracks/02_data_integration.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,27 @@
---
layout: default
layout: track
track-id: 2
title: Data Integration
leader: Asterios Katsifodimos
phd: George Siachamis
---

# Track 2: Data Integration

The data integration track recognizes the importance of data for almost any application of artificial intelligence at ING.
ING is a data-rich organization. Its data lake constitutes a federation of different data storage types. The relationships between the many different data sources evolve over time, and are hard to predict and manage.

The goal of this track is to use semantics-based data matching to recognize such data relationships automatically. In particular, machine learning will be applied for the purpose of meta-data matching, automated schema discovery, schema evolution, and schema alignment. The results can be used to support data engineers to make data integration decisions by means of dataset exploration, discovery, and integration recommendation.
The goal of this track is to use semantics-based data matching to recognize such data relationships automatically. In particular, we apply machine learning for the purpose of meta-data matching, automated schema discovery, schema evolution, and schema alignment. The results can be used to support data engineers to make data integration decisions by means of dataset exploration, discovery, and integration recommendation.

The context in which this will take place is the ING cloud infra-service platform, which continuously collects operational data from a large range of private cloud services operated by ING across layers.
The context is the ING cloud infra-service platform, which continuously collects operational data from a large range of private cloud services operated by ING across layers.

Related work:
## Selected publications

- Hennie Huijgens, Eric Greuter, Jerry Brons, Evert A. van Doorn, Ioannis Papadopoulos, Francisco Morales Martinez, Mauricio Finavaro Aniche, Otto Visser, Arie van Deursen: Factors affecting cloud infra-service development lead times: a case study at ING. ICSE (SEIP) 2019: 233-242
1. G. Siachamis, K. Psarakis, M. Fragkoulis, Odysseas Papapetrou, A. van Deursen, A Katsifodimos (2023), Adaptive Distributed Streaming Similarity Joins, Marcelo Pasin (Eds.), In DEBS '23: Proceedings of the 17th ACM International Conference on Distributed and Event-based Systems p.25-36 ([preprint](https://research.tudelft.nl/en/publications/adaptive-distributed-streaming-similarity-joins)).

- Daniel Vliegenthart, Sepideh Mesbah, Christoph Lofi, Akiko Aizawa, Alessandro Bozzon: Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications. TPDL 2019: 3-17
1. George Siachamis, Job Kanis, Wybe Koper, Kyriakos Psarakis, Marios Fragkoulis, Arie Van Deursen, Asterios Katsifodimos (2023), Towards Evaluating Stream Processing Autoscalers, In Proceedings - 2023 IEEE 39th International Conference on Data Engineering Workshops, ICDEW 2023 p.95-99, Institute of Electrical and Electronics Engineers (IEEE) ([preprint](https://research.tudelft.nl/en/publications/towards-evaluating-stream-processing-autoscalers)).

**Track leader:** {{ page.leader }}
1. G. Siachamis, G.J.P.M. Houben, A. van Deursen, A Katsifodimos (2021), Integrating Massive Data Streams, Philip A. Bernstein , Tilmann Rabl (Eds.), In Proceedings of the VLDB 2021 PhD Workshop Volume 2971, CEUR-WS ([preprint](https://research.tudelft.nl/en/publications/integrating-massive-data-streams)).

1. Christos Koutras, Kyriakos Psarakis, George Siachamis, Andra Ionescu, Marios Fragkoulis, Angela Bonifati, Asterios Katsifodimos (2021), Valentine in Action: Matching Tabular Data at Scale, In Proceedings of the VLDB Endowment Volume 14 p.2871–2874 ([preprint](https://research.tudelft.nl/en/publications/valentine-in-action-matching-tabular-data-at-scale) and [dataset](https://delftdata.github.io/valentine/)).

1. Christos Koutras, George Siachamis, Andra Ionescu, Kyriakos Psarakis, Jerry Brons, Marios Fragkoulis, Christoph Lofi, Angela Bonifati, Asterios Katsifodimos (2021), Valentine: Evaluating Matching Techniques for Dataset Discovery, In Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021 p.468-479, IEEE ([preprint](https://research.tudelft.nl/en/publications/valentine-evaluating-matching-techniques-for-dataset-discovery) and [dataset](https://delftdata.github.io/valentine/)).

0 comments on commit 8d914fa

Please sign in to comment.