WIP Read as arrow #1831

willdealtry · 2024-09-12T12:24:15Z

WIP read dataframe as Arrow arrays

wuxianliang · 2024-10-16T15:36:25Z

Dear friend, Great Job! I have saved all stocks data in ArcticDB, I would like to read the data from SSD directly into arrow table. Then we can query with DuckDB, LanceDB even KuzuDB in memory. As I understand finally maybe use ArcticDB zero-copy like this?

import arcticdb
import duckdb
import lancedb
import kuzudb

......

arrow_table = lib.read("Symbol", output_format=OutputFormat.ARROW).data

duckdb.sql("SELECT * FROM arrow_table")
lancedb.create_table("Symbol", arrow_table, schema=schema)
kuzudb.execute("COPY Symbol FROM arrow_table")

willdealtry · 2024-10-17T10:40:46Z

Dear friend, Great Job! I have saved all stocks data in ArcticDB, I would like to read the data from SSD directly into arrow table. Then we can query with DuckDB, LanceDB even KuzuDB in memory. As I understand finally maybe use ArcticDB zero-copy like this?
import arcticdb
import duckdb
import lancedb
import kuzudb

......

arrow_table = lib.read("Symbol", output_format=OutputFormat.ARROW).data

duckdb.sql("SELECT * FROM arrow_table")
lancedb.create_table("Symbol", arrow_table, schema=schema)
kuzudb.execute("COPY Symbol FROM arrow_table")

Yes that's exactly right. I'm very pleased to hear that you are excited about this piece of work!

wuxianliang · 2024-11-01T14:10:22Z

Does read_batch method support read_as_arrow too? Sometimes I wish to analysis all symbols in a daterange.

symbols = library.list_symbols()
batch_results = library.read_batch(symbols, date_range=date_range, output_format=OutputFormat.ARROW )

So every batch_data[i].data is an arrow table? Then I let Claude3.5 sonnet code the rest.

def fast_concat_arrow_tables(batch_results):
    """
    Fast concatenation of Arrow tables from batch results
    
    Parameters:
    -----------
    batch_results : List[Union[VersionedItem, DataError]]
        Results from ArcticDB read_batch operation
        
    Returns:
    --------
    pyarrow.Table
        Concatenated table with added symbol column
    """
    # 1. Pre-allocate list with known size for better memory efficiency
    tables_len = len(batch_results)
    tables = [None] * tables_len
    
    # 2. Add symbol column to each table in one pass
    for i, result in enumerate(batch_results):
        if isinstance(result, VersionedItem):
            table = result.data.to_arrow()
            # Create symbol array once per table
            symbol_array = pa.array([result.symbol] * len(table))
            # Store table with appended symbol column
            tables[i] = table.append_column('symbol', symbol_array)
    
    # 3. Filter out None values and concatenate all tables at once
    tables = [t for t in tables if t is not None]
    return pa.concat_tables(tables)

willdealtry force-pushed the read_as_arrow branch from ac514de to 644901e Compare September 13, 2024 10:37

Working arrow tests

cc28399

willdealtry force-pushed the read_as_arrow branch from 21fd9fa to cc28399 Compare September 27, 2024 10:38

willdealtry added 11 commits September 27, 2024 17:22

More work

fc32645

More string handling

de48a4e

More string handling

381c486

Yet more arrow strings

6ca15b2

String irritation

4f7c732

Wouldya believe it, more strings

e932938

Whodathunkit, strings

ac51560

Strings almost working

b19cce4

Strings working

a06b597

Date range sampling

ab51591

sdfsdfs

7a957fa

willdealtry added 6 commits October 21, 2024 22:26

Date range filtering

3423c49

Date range working

84e581e

Date range tests

06cec8e

chunked buffer working

bcef683

Working with querybuilder

f2e34e3

Test dynamic schema

1b53dda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Read as arrow #1831

WIP Read as arrow #1831

willdealtry commented Sep 12, 2024

wuxianliang commented Oct 16, 2024

willdealtry commented Oct 17, 2024

wuxianliang commented Nov 1, 2024 •

edited

Loading

WIP Read as arrow #1831

Are you sure you want to change the base?

WIP Read as arrow #1831

Conversation

willdealtry commented Sep 12, 2024

wuxianliang commented Oct 16, 2024

willdealtry commented Oct 17, 2024

wuxianliang commented Nov 1, 2024 • edited Loading

wuxianliang commented Nov 1, 2024 •

edited

Loading