Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specification of input format when the user knows it or auto-detect fails #2498

Closed
philrz opened this issue Aug 23, 2022 · 2 comments · Fixed by #2601
Closed

Allow specification of input format when the user knows it or auto-detect fails #2498

philrz opened this issue Aug 23, 2022 · 2 comments · Fixed by #2601
Assignees

Comments

@philrz
Copy link
Contributor

philrz commented Aug 23, 2022

In #1913, @jameskerr remarked:

I'd love a dialog that asks the user which format their data is in to pass to zed load. I've got a csv that it thinks it's zson so it fails with a strange error.

A possible workflow is:

Import the file

  1. If there is an auto detect error, prompt the user for the type of file
  2. It would also be nice to give a summary of what the auto detect detected.

Since users may sometimes know their format in advance, I'd also recommend we offer them the ability to explicitly specify the format on the first try. For instance, brimdata/super#3865 describes the perils of trying to send certain JSON through the auto-detector. If a user works with this kind of JSON a lot, they'd surely appreciate a way to avoid the failure round-trip.

In conclusion, maybe the "prompt" in @jameskerr's list above could be some kind of pull-down list of the supported formats, with "Auto-detect" as the default. Then if the auto-detect fails, the same list could be offered again.

Also, the "summary of what the auto detect detected" exists today as something like this:

image

When looking at this while pretending to be a new user, I admit this could be somewhat confusing, since it's not immediately obvious that each line represents why it failed to auto-detect as that format.

@jameskerr
Copy link
Member

Cool, thank you

@philrz
Copy link
Contributor Author

philrz commented Dec 8, 2022

Verified in Brim commit 9c92c19.

The attached video uses the JSON test data nfcapd.json.gz from brimdata/super#3865. As shown, when this data is subject to attempted load via auto-detect by simple drag & drop into the New Pool window with default settings, it fails with a message showing why each format's reader felt it could not successfully read the file, including the JSON one saying "buffer exceeded max size" as we know from brimdata/super#3865. At that point I'm able to select JSON instead of auto-detect explicitly from the pull-down, and now it works.

Verify.mp4

Per the changes from #2601, the pull-down now has options for all the formats that are loadable via the Zed API, which currently consists of:

  • CSV
  • JSON (which also handles NSJDON)
  • Line (which is handy for turning newline-delimited text files into a non-record value-per-line)
  • Parquet
  • Zeek (that is, the TSV format)
  • ZJSON
  • ZNG
  • ZSON

ZST will be added when brimdata/super#4251 is completed.

Thanks @jameskerr!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants