Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-comma separator for CSV input #4238

Closed
philrz opened this issue Dec 2, 2022 · 2 comments · Fixed by #4325 or #4333
Closed

Allow non-comma separator for CSV input #4238

philrz opened this issue Dec 2, 2022 · 2 comments · Fixed by #4325 or #4333

Comments

@philrz
Copy link
Contributor

philrz commented Dec 2, 2022

At the time this issue is being filed, Zed is at commit 78d587d.

A community zync user inquired about this functionality:

Do you know how to import semi-columns based CSV files in Zui (and zq, Zed)? In the French language .csv text files we commonly use ';' as a field separator instead of ',' because we use ',' as a decimal separator for floating point numbers.

It looks like the Go CSV library Zed currently uses has options that would allow the selection of alternate separators, but Zed/Zui don't yet offer ways to invoke such options.

I also asked the user the context in which they primarily imagined using the functionality, since I expect deciding how it's invoked may be more work than leveraging the option. The response:

Zui app mostly, as this problem mostly arises when trying to transfer Excel files into Zui using CSV.

@philrz philrz linked a pull request Jan 24, 2023 that will close this issue
@philrz
Copy link
Contributor Author

philrz commented Jan 24, 2023

I've verified with Zed commit 4aa010c that this now working in the CLI tooling. As test data I'm using the "Excel CSV" links at the Italian fuel prices site.

$ wget "https://dgsaie.mise.gov.it/open_data_export.php?export-id=1&export-type=csvx"

$ head -2 open_data_export.php\?export-id\=1\&export-type\=csvx 
DATA_RILEVAZIONE;BENZINA;GASOLIO_AUTO;GPL;GASOLIO_RISCALDAMENTO;O.C._FLUIDO_BTZ;O.C._DENSO_BTZ
2005-01-03;1.115,75;1.018,28;552,5;948,5;553,25;229,52

$ zq -version
Version: v1.4.0-26-g4aa010ca

If we attempt to read this data with the CSV reader defaults, it fails.

$ zq -Z -i csv 'head 1' open_data_export.php\?export-id\=1\&export-type\=csvx
open_data_export.php?export-id=1&export-type=csvx: record on line 2: wrong number of fields

However if we make use of the new option, it works.

$ zq -Z -i csv -csv.delim \; 'head 1' open_data_export.php\?export-id\=1\&export-type\=csvx
{
    "DATA_RILEVAZIONE": "2005-01-03",
    BENZINA: "1.115,75",
    GASOLIO_AUTO: "1.018,28",
    GPL: "552,5",
    GASOLIO_RISCALDAMENTO: "948,5",
    "O.C._FLUIDO_BTZ": "553,25",
    "O.C._DENSO_BTZ": "229,52"
}

There's some follow-on work that would make this more broadly usable.

  1. This reader option would need to be made available via the Zed API in order for the Brim/Zui app to be able to make use of it as a Preferences setting.

  2. Once it's made available via the API, we can open an issue to make it available in Brim/Zui via Preferences.

  3. I couldn't use the get <uri> [format <format>] variation of the from operator to access the URL directly in my Zed query because, while format csv is available, it doesn't seem like there's a way with format to invoke these kinds of reader options.

nwt added a commit that referenced this issue Jan 25, 2023
The csv.delim query parameter to the load API (POST
/pool/{pool}/branch/{branch}) specifies the field delimiter for CSV
input, allowing use of a character other than ",".

For #4238.
@nwt nwt linked a pull request Jan 25, 2023 that will close this issue
@nwt nwt closed this as completed in #4333 Jan 26, 2023
nwt added a commit that referenced this issue Jan 26, 2023
The csv.delim query parameter to the load API (POST
/pool/{pool}/branch/{branch}) specifies the field delimiter for CSV
input, allowing use of a character other than ",".

For #4238.
@philrz
Copy link
Contributor Author

philrz commented Jan 27, 2023

I've verified with Zed commit 3ca4b8a and the same test data as the previous comment that this is now working over the Zed API as well. Note that with this particular example of a semicolon I had to URL escape it.

$ wget "https://dgsaie.mise.gov.it/open_data_export.php?export-id=1&export-type=csvx"

$ curl http://localhost:9867/version
{"version":"v1.4.0-31-g3ca4b8ac"}

$ zed create fuel
pool created: fuel 2Kt4Ai7vocE5XXsntap3buXDkn7

$ curl -H "Content-Type: text/csv" --data-binary @open_data_export.php\?export-id\=1\&export-type\=csvx http://localhost:9867/pool/fuel/branch/main?csv.delim=%3B
{commit:0x1060d0f81a96360e75df4cd0a53a174b342ed2ba(=ksuid.KSUID),warnings:[]([string])}(=api.CommitResponse)

$ zed query -Z 'from fuel | head 1'
{
    "DATA_RILEVAZIONE": "2023-01-23",
    BENZINA: "1.829,47",
    GASOLIO_AUTO: "1.875,1",
    GPL: "797,85",
    GASOLIO_RISCALDAMENTO: "1.674,7",
    "O.C._FLUIDO_BTZ": "1.129,73",
    "O.C._DENSO_BTZ": "645,22"
}

For the other loose ends cited above, these issues remain open:

Thanks @dianetc and @nwt!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant