This project is oriented towards demonstrating the ingestion of data in diverse formats (both structured and unstructured) originating from various data source types, including relational databases, file systems, SFTP locations, among others. The mechanism employed for this purpose involves the utilization of the Azure Synapse Analytics service to transfer this data into an Azure Storage account, specifically Azure Data Lake Gen 2.
Additionally, the project serves as an illustration of how these ingested files, once residing within Azure Data Lake, can undergo processing by harnessing the myriad features offered by Azure Synapse Analytics. To provide a practical context, the code within this project encompasses the creation of pipelines responsible for copying data from a local SQL server table and a local CSV file into Azure Data Lake. Subsequently, serverless SQL tools are employed to cleanse and transform this data, rendering it into a state fit for use in serverless Data Lakehouse views.
The overarching objective of this endeavor is to emulate a scenario commonly encountered in real-world organizational data management, wherein data from disparate sources, previously scattered in silos, can be methodically organized. This facilitates enhanced analytical capabilities by aligning the data with a pre-established physical data model, meticulously designed in accordance with specific business requirements.