BQ Metadata

BQ Metadata is a project created for the following purposes:

Keep datasets, tables and views metadata history.
Create datasets, tables and views from gitlab with config files and trigger the code with CI/CD.
Manage all of the developments about BQ datasets, tables and views.

How is it work?

Project has docker image, it contains miniconda , gcloud to run some google cli commands and our scripts for validation and deployment pipelines.
We have importer code to import your existing structure. It will create all of the folders for your project. You should run:

python importer.py -p your_project_name

We add configurations to gitlab-ci.yml.
We have 2 variables we defined as a Gitlab CI-CD variable here:
- SERVICE_ACCOUNT : Google Service Account. Your project will work with using this service acccount.
- BQMETADATA_RUN_PIPELINE : It should be True if you want the pipeline to be triggered. You can set a different value if you want to disable it.
We copied SERVICE_ACCOUNT to app/credentials.json in our image. cat "$SERVICE_ACCOUNT" > /app/credentials.json
Activate gcloud with

    - gcloud auth activate-service-account --key-file=/app/credentials.json
    - project_id=($(jq -r '.project_id' /app/credentials.json))
    - gcloud config set project "${project_id[@]}"

Create config file to create dataset/table/view. The folder directory structure sould be like:

bq_metadata
  └───project_1
  │   └───dataset_1
  │   │   │   dataset_1.json
  │   │   └───tables
  │   │       │   table_1.json
  │   │       │   table_2.json
  │   │       │   ...
  │   │   └───views
  │   │       │   view_1.sql
  │   │       │   view_2.sql
  │   │       │   ...
  │   └───dataset_2
  │   |   │   dataset_2.json
  │   │   │   ...

💥 When you open Merge Request pipeline will;

Run entry.sh to deploy your configs. We get files that you created, updated or deleted with :

ADDED_FILE_LIST=($(git diff --no-commit-id --name-only --diff-filter=A -M100% HEAD^ HEAD 'bq_metadata/**/*.json' 'bq_metadata/**/*.sql'))
DELETED_FILE_LIST=($(git diff --no-commit-id --name-only --diff-filter=D -M100% HEAD^ HEAD 'bq_metadata/**/*.json' 'bq_metadata/**/*.sql'))
UPDATED_FILE_LIST=($(git diff --no-commit-id --name-only --diff-filter=M HEAD^ HEAD 'bq_metadata/**/*.json' 'bq_metadata/**/*.sql'))

In the continuation of the code, we run the code in the following order:

Create step
- Dataset
- Table
- View
Update step
- Dataset
- Table
- View
Delete step
- View
- Table
- Dataset

Examples

1️⃣ Dataset:

bq_metadata
  └───project_1
  │   └───dataset_1
  │   │   │   dataset_1.json

{
    "location": "EU",
    "projectId": "project-1",
    "datasetId": "dataset_1",
    "properties": {
        "description": "Test-dataset",
        "default_table_expiration_ms": 86400000,
        "default_partition_expiration_ms": 93600,
        "labels": {
            "type": "test"
        }
    }
}

2️⃣ Tables

bq_metadata
  └───project-1
  │   └───dataset_1
  │   │   └───tables
  │   │       │   table_1.json

{
    "projectId": "project-1",
    "datasetId": "dataset_1",
    "tableId": "table_1",
    "schema": {
        "fields": [
            {
                "description": "Test description",
                "fields": [
                    {
                        "description": "Test nested",
                        "mode": "NULLABLE",
                        "name": "test_nested",
                        "type": "STRING"
                    },
                    {
                        "description": "Test nested2",
                        "fields": [
                            {
                                "description": "Test nested3",
                                "mode": "NULLABLE",
                                "name": "test_nested_3",
                                "type": "FLOAT"
                            }
                        ],
                        "mode": "REPEATED",
                        "name": "test_nested_2",
                        "type": "RECORD"
                    }
                ],
                "mode": "NULLABLE",
                "name": "full_name",
                "type": "RECORD"
            },
            {
                "mode": "REQUIRED",
                "name": "age",
                "type": "INTEGER"
            },
            {
                "mode": "NULLABLE",
                "name": "date_column",
                "type": "TIMESTAMP"
            }
        ]
    },
    "labels": {
        "color": "green",
        "type": "test_label"
    },
    "timePartitioning": {
        "field": "date_column",
        "type": "DAY",
        "expirationDays": 30000
    }
}

3️⃣ Views

bq_metadata
  └───project-1
  │   └───dataset_1
  │   │   └───views
  │   │       │   view_1.sql

create or replace view `project-1.dataset_1.view_1` 
as
select * from `project-1.dataset_1.test_table_1`;

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bq_metadata/project-1		bq_metadata/project-1
scripts		scripts
utils		utils
Dockerfile		Dockerfile
README.md		README.md
credentials.json		credentials.json
environment.yaml		environment.yaml
gitlab-ci.yml		gitlab-ci.yml
importer.py		importer.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BQ Metadata

How is it work?

Examples

About

Releases

Packages

Languages

erkanekser/bq_metadata

Folders and files

Latest commit

History

Repository files navigation

BQ Metadata

How is it work?

Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages