Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of import attempt that mixes valid and invalid data #1790

Closed
philrz opened this issue Aug 23, 2021 · 1 comment
Closed

Handling of import attempt that mixes valid and invalid data #1790

philrz opened this issue Aug 23, 2021 · 1 comment

Comments

@philrz
Copy link
Contributor

philrz commented Aug 23, 2021

As noted by @jameskerr in #1786 (comment), the Brim app has changed its behavior when a user attempts to import a mix of valid and invalid data, and we should clean this up.

The first video looks back at the behavior as of the last GA Brim release v0.24.0. Here a combination of a valid Zeek weird log and invalid macOS /etc/passwd are simultaneously dragged into the app.

v0.24.0.mp4

Here we observe:

  1. The valid weird records are successfully imported into a newly-created Space.
  2. The invalid /etc/passwd data is rejected.
  3. The user is shown the warnings returned by the Zed backend from the failed attempt to auto-detect the bad input, but there's no indication of which filename triggered the warning, so this information is admittedly a bit lacking.

The second video captures the current behavior as of Brim commit 185344d, with the same pair of files being dragged into the app.

Repro.mp4

Here we observe:

  1. The entire import operation is aborted, such that the Pool is deleted and even the valid records are not imported.
  2. As in v0.24.0, the error message lacks reference to named files, making it difficult to make use of this error info if multiple invalid files were being imported.

In #1786 (comment), @jameskerr made his proposal of what should be done:

  1. If all files fail, display an error, delete the pool
  2. If some files fail, display a warning for those that failed, keep the pool

...and to this I'd add my own suggestion:

  1. Add filename references to the warnings
@philrz philrz added this to the Data MVP1 milestone Aug 24, 2021
@philrz philrz removed this from the ETL Lake milestone Oct 25, 2022
@philrz
Copy link
Contributor Author

philrz commented Apr 19, 2024

Verified in Zui commit f1aef59.

Our approach to mixing valid and invalid data changed with the addition of the Preview & Load workflow in #2834. As shown in the attached video, now the use of zq for preview and shaping gives an early error about invalid data before the final load operation is attempted. The error only mentions the first invalid input file encountered, but this gives the user the ability to review the problem and see if it could be corrected (e.g., select an explicit input format if it's an auto-detect failure) or drop it from the import list entirely. In this case I intentionally used two copies of the /etc/passwd data to show that the file referenced in the error output changed after I dropped the first one and then after dropping the second we see just the valid data that remains for potential shaping and load.

Verify.mp4

This approach takes the place of the previously-envisioned idea of quietly importing just valid data and reporting invalid data after the fact. It seems a good improvement since it gives the user more up-front control and eliminates the potential for a partial load being undesirable and therefore requiring cleanup of the pool after the fact.

@philrz philrz closed this as completed Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant