-
Notifications
You must be signed in to change notification settings - Fork 2
Coming Soon
ccoffey edited this page Jun 29, 2011
·
4 revisions
This sections outlines the features that I am currently working on.
This will be very similar to SQL joins. I think the API will look something like the below.
#Import sql4csv and create some data sets.
from novacode import sql4csv
ds_0 = sql4csv('ds_0.csv')
ds_1 = sql4csv('ds_1.csv')
#Join two data sets using their common age field.
ds_2 = ds_0.join(ds_1, '#0.age = #1.age')
#Query the data set as normal.
ds_2.query('select * where $age > 8')
Currently sql4csv iterates through a .csv file row by row. The processing of each row is however completely independent of any other row. Therefore it should be very profitable to make execution parallel.
This feature will be optional, below is an example of what it should look like.
#Import sql4csv and create some data sets.
from novacode import sql4csv
ds_small = sql4csv('ds_small.csv')
ds_large = sql4csv('ds_large.csv')
#Run one query sequentially and a second in parallel.
ds_small.query(some_query)
ds_large.parallel_query(some_query)
I will post some experimental results here soon. A comparison of different queries ran on different file sizes.