Delta Lake Explorer is a Streamlit application that allows users to explore Delta Lake tables on Azure Data Lake Storage using DuckDB. The application provides a code editor for writing SQL queries, a sidebar for configuring settings, and a result viewer for displaying query results.
- Code Editor: Write and execute SQL queries.
- Query Parsing: Automatically parse and transform queries to use
delta_scan
. - Query Timing: Display the time taken to execute queries.
- Clone the repository:
git clone https://github.com/mrjsj/delta-lake-explorer.git
cd delta-lake-explorer
- Create a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows, use .venv\Scripts\activate
- Install the required packages:
pip install -r requirements.txt
-
Rename the
.streamlit/secrets-template.toml
to.streamlit/secrets.toml
: -
Fill in the following values in
.streamlit/secrets.toml
:STORAGE_ACCOUNT_NAME
: The name of your Azure storage account.DELTA_LAKE_ROOT_PATH
: The root path up until the delta lake catalog. This includes the container name and the path to the delta lake catalog. E.g., if the full delta table path isabfss://container/path/to/catalog/layer/table
, then the root path iscontainer/path/to
. If the delta lake catalog is at the root of the storage account, then the root path is an empty string.
-
Choose a way to authenticate to Azure. You can use a service principal, or a Azure CLI login. In either case, make sure you have at least Storage Blob Data Reader role assigned to your service principal or your personal user on the storage account.
- If you choose a service principal, fill in the following values in
.streamlit/secrets.toml
:AZURE_TENANT_ID
: The tenant ID of your Azure AD.AZURE_CLIENT_ID
: The client ID of your service principal.AZURE_CLIENT_SECRET
: The client secret of your service principal.
- If you choose Azure CLI login, run
az login
before running the application.
- If you choose a service principal, fill in the following values in
Run the Streamlit application:
streamlit run main.py
Query using DuckDB syntax. Tables must be refences by catalog.schema.table
, e.g.:
SELECT * FROM catalog.schema.table;
For more information on DuckDB syntax, see the DuckDB documentation.
This project is licensed under the MIT License. See the LICENSE file for details.