chore(query): improve project set #16326

Dousir9 · 2024-08-25T18:19:59Z

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Improve performance of flatten, json_each, json_array_elements, jq.

Performance Test (Databend Cloud XSMALL)

select sum(id), sum(LENGTH(value)) from t, lateral flatten(input => c);

Query Duration: 7.6s → 3.1s (245%)
Snowflake XSMALL: 2.6s

Full Test Script

Prepare json data

import json
import csv
import os

def create_json_data(num_entries):
    data = []
    for i in range(num_entries):
        entry = {
            "id": i + 1,
            "name": f"name_{i + 1}",
            "age": 20 + (i % 10),
            "email": f"user{i + 1}@example.com",
            "address": {
                "street": f"street_{i + 1}",
                "city": f"city_{i % 5}",
                "postal_code": f"{10000 + i}"
            }
        }
        data.append(entry)
    return data

def save_as_variant_csv(data, filename):
    dir_path = os.path.dirname(__file__)
    with open(f'{dir_path}/{filename}', "w", newline="") as csv_file:
        writer = csv.writer(csv_file)
        for index, entry in enumerate(data):
            variant_format = json.dumps(entry)
            writer.writerow([index, variant_format])

if __name__ == "__main__":
    num_entries = 1000000
    filename = "dataset.csv"

    json_data = create_json_data(num_entries)
    save_as_variant_csv(json_data, filename)

Create table

create or replace table t(id int, c variant);
COPY INTO t FROM 'fs:////Users/xujinkai/Desktop/MyTest/databend/json' files = ('dataset.csv')  file_format = (type = CSV);

Test

select sum(id), sum(LENGTH(value)) from t, lateral flatten(input => c);

Tests

Unit Test
Logic Test
Benchmark Test
No Test - Covered by existing tests

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Breaking Change (fix or feature that could cause existing functionality not to work as expected)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

This change is

github-actions · 2024-08-26T15:03:04Z

Docker Image for PR

tag: pr-16326-88125a4-1724684469

note: this image tag is only available for internal use,
please check the internal doc for more details.

src/query/service/src/pipelines/processors/transforms/transform_srf.rs

chore(query): improve project set

fe5f04f

github-actions bot added the pr-chore this PR only has small changes that no need to record, like coding styles. label Aug 25, 2024

Dousir9 added 5 commits August 26, 2024 02:20

chore(code): merge main

3b069ce

chore(code): make lint

c3cdff8

chore(query): fix project set

33b4d74

chore(query): improve wrap_nullable

3d9f551

chore(query): fix project set flatten

9ca9ddf

Dousir9 added the ci-cloud Build docker image for cloud test label Aug 26, 2024

Dousir9 marked this pull request as ready for review August 26, 2024 15:51

Dousir9 requested review from b41sh, sundy-li and xudong963 August 26, 2024 15:51

b41sh reviewed Aug 27, 2024

View reviewed changes

src/query/service/src/pipelines/processors/transforms/transform_srf.rs Show resolved Hide resolved

chore(query): refine push_null

2253550

b41sh approved these changes Aug 27, 2024

View reviewed changes

Dousir9 added this pull request to the merge queue Aug 27, 2024

BohuTANG removed this pull request from the merge queue due to a manual request Aug 27, 2024

BohuTANG merged commit a055124 into databendlabs:main Aug 27, 2024
71 checks passed

andylokandy mentioned this pull request Oct 15, 2024

feat: implement StringColumn using StringViewArray #16610

Merged

11 tasks

andylokandy mentioned this pull request Oct 25, 2024

refactor: refine cast variant to map #16691

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(query): improve project set #16326

chore(query): improve project set #16326

Dousir9 commented Aug 25, 2024 •

edited

Loading

github-actions bot commented Aug 26, 2024

chore(query): improve project set #16326

chore(query): improve project set #16326

Conversation

Dousir9 commented Aug 25, 2024 • edited Loading

Summary

Performance Test (Databend Cloud XSMALL)

Full Test Script

Prepare json data

Create table

Test

Tests

Type of change

github-actions bot commented Aug 26, 2024

Docker Image for PR

Dousir9 commented Aug 25, 2024 •

edited

Loading