Skip to content

4. Extract‐Transform‐Load (ETL) in Power BI

Silvia Aragon edited this page Mar 1, 2024 · 7 revisions

The first step is to consolidate all the data into a single Power BI (.pbix) file. Currently, the data needs ot be extracted from four different sources (see 4.1), cleaned (see 4.2) and loaded.

The diagram below gives a quick overview of the ETL pipeline, as well as the types of transformations performed on the data.

ETL_final

NOTE: the credentials needed to access and/or download the data are not available in this repository.

4.1. Importing Data into Power BI

The first phase focuses on data loading and preparation. Each table was uploaded from a different source. The tables downloaded from Azure required specific credentials provided by AiCore (unavailable in this repository).

To import data into Power BI, go to Get Data menu and find the appropriate option.

Table Type Imported from
Orders Facts Azure SQL Database
Customers Dimension Combining 3 .csv files, stored in a local folder
Products Dimension Local .csv file
Stores Dimension Azure Blob Storage

The figure below shows how to import data from Azure SQL Databse and Azure Blob Storage. get_azure_data_final

The figure below shows how to import data from a local .csv file or multiple files saved in a local folder. get_folder_data

4.2. Data Cleaning: Transforming Data in Power Query Editor

After importing all tables to Power BI, we need to transform the data in the Power Query Editor. Common transformations to all tables include:

  • Fix column naming. Throughout this process, it is important to make sure that all tables have consistent and comprehensive naming, and that their format matched the convention: for example, column names should be written as "Full Name" instead of "full_name" or "full-name".
  • Ensure data types are correct and accurate.

The specific transformations performed on each table are specified in the ETL diagram above and in the sections below in more detail.

Orders table: Transformations

Action Column/Condition Reason
Delete column Card number Ensure data privacy
Delete rows If value is null Ensure data integrity
Split datetime column Order date Split into a date column and a time column
Split datetime column Shipping date Split into a date column and a time column
  • To split datetime columns, go to Split Column > by delimiter, and chose a blank space as the delimiter. Alternatively, duplicate the column, and choose the format as 'date' or 'time' in the data type option.
power_query_split_col_by_delimiter
  • To change data type, click on the icon to the left on the column name and select the desired data type.
power_query_change_dtype

Products table: Transformations

Action Column/Condition Reason Method
Remove duplicates Product code Ensure each product code is unique See below
Split column weight (e.g. values: 10g) Into weight values column and units column Column from Examples
Fix datatype weight values column To decimal n/a
Calculated column New column name: Weight Kilograms Calculated from Weight Values and Weight Units columns, to transform all values to kg DAX (see below)
  • To remove duplicates, click on 'Remove Rows' > 'Remove Duplicates' in the Power Query Home pane:
power_query_remove_rows_with_nulls
  • The weight column contains values in either in kg, g or mL. Those in mL will be approximated to g using the density of water (1 g = 1 mL). Before transforming, the data is in the format e.g. 800g, 400mL, 0.1kg. this involves multiple steps:
      1. In Power Query, create 'Column from Examples' based on the weight column, containing only the numerical values.
power_query_col_from_examples
- 2. In Power Query, create 'Column from Examples' based on the weight column, containing only the units as strings e.g. 'g'.
- 3. Load the data to Power BI (exit Power Query). 
- 4. Create final weight column, containing only weights in kg (as floats): to do this go to `Data View` > `New Calculated Column`. The DAX formula used for this transformation is: 
    ```
    Weight Kilograms = IF([Weight Units]="kg", [Clean Weight Values], [Clean Weight Values]/1000)
    ``` 
    The desired output should look like the column highlighted in the image below:
weight_column

Customers table: Transformations

The main transformation performed on this table was combining the columns First Name and Last name into a new column called Full name. This can be done in Power Query by selecting both columns of interest and going to Add Column > Merge Columns in the Transform pane. power_query_merge_columns