Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing dates with toArrow seems broken #213

Closed
bmesuere opened this issue Jun 15, 2021 · 2 comments · Fixed by #214
Closed

Writing dates with toArrow seems broken #213

bmesuere opened this issue Jun 15, 2021 · 2 comments · Fixed by #214
Labels
bug Something isn't working

Comments

@bmesuere
Copy link
Contributor

bmesuere commented Jun 15, 2021

When processing a csv file with 8M rows containing Date objects, I noticed the dates in many rows (from a certain point over halfway the dataset) went missing after writing the arrowBuffer to file and reading it again. I managed to construct a minimal example using only 10 rows:

import * as aq from "arquero"
import { writeFile } from "fs/promises";

let data = aq.table({ "time": ["2021-06-14 19:32:00", "2021-06-14 19:33:00", "2021-06-14 19:33:00", "2021-06-14 19:33:00", "2021-06-14 19:33:00", "2021-06-14 19:33:00", "2021-06-14 19:33:00", "2021-06-14 19:33:00", "2021-06-14 19:33:00", "2021-06-14 19:33:00"]});

data = data.select("time").derive({time: aq.escape(d => new Date(d.time))});
data.print();

let data2 = aq.fromArrow(data.toArrow());
data2.print();

The first print results in the correct output:

Table: 1 col x 10 rows. Showing 10 rows.
┌─────────┬──────────────────────────┐
│ (index) │           time           │
├─────────┼──────────────────────────┤
│    0    │ 2021-06-14T17:32:00.000Z │
│    1    │ 2021-06-14T17:33:00.000Z │
│    2    │ 2021-06-14T17:33:00.000Z │
│    3    │ 2021-06-14T17:33:00.000Z │
│    4    │ 2021-06-14T17:33:00.000Z │
│    5    │ 2021-06-14T17:33:00.000Z │
│    6    │ 2021-06-14T17:33:00.000Z │
│    7    │ 2021-06-14T17:33:00.000Z │
│    8    │ 2021-06-14T17:33:00.000Z │
│    9    │ 2021-06-14T17:33:00.000Z │
└─────────┴──────────────────────────┘

The second has missing dates:

Table: 1 col x 10 rows. Showing 10 rows.
┌─────────┬──────────────────────────┐
│ (index) │           time           │
├─────────┼──────────────────────────┤
│    0    │ 2021-06-14T17:32:00.000Z │
│    1    │ 2021-06-14T17:33:00.000Z │
│    2    │ 2021-06-14T17:33:00.000Z │
│    3    │ 2021-06-14T17:33:00.000Z │
│    4    │ 2021-06-14T17:33:00.000Z │
│    5    │ 2021-06-14T17:33:00.000Z │
│    6    │ 2021-06-14T17:33:00.000Z │
│    7    │ 2021-06-14T17:33:00.000Z │
│    8    │       Invalid Date       │
│    9    │       Invalid Date       │
└─────────┴──────────────────────────┘

I am using node 16.3.

There is of course always the possibility I made an embarrassing mistake.

edit: an even more minimal example

@bmesuere bmesuere changed the title Writing dates with toArrowBuffer seems broken Writing dates with toArrow seems broken Jun 15, 2021
@jheer jheer added the bug Something isn't working label Jun 15, 2021
@jheer
Copy link
Member

jheer commented Jun 15, 2021

No, the embarrassing mistake is on this end :) Fix coming soon.

@bmesuere
Copy link
Contributor Author

That was quick, thanks! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants