Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: ignore extra columns in gbsketch input CSV #188

Merged
merged 9 commits into from
Jan 28, 2025
Merged

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Jan 19, 2025

Fixes #187

TODO:

  • add test for extra columns

ctb added 3 commits January 19, 2025 09:00
…sketch into

ignore_extra_columns
merge is necessary, # especially if it merges an updated upstream into a
topic branch. # # Lines starting with '#' will be ignored, and an empty
message aborts # the commit. make test
@ctb ctb changed the title WIP: ignore extra columns MRG: ignore extra columns in gbsketch input CSV Jan 25, 2025
@ctb
Copy link
Contributor Author

ctb commented Jan 25, 2025

Ready for review & merge @bluegenes !!

src/utils/mod.rs Outdated
Comment on lines 127 to 140
for h in expected_header.iter() {
if !header.iter().any(|e| *h == e) {
return Err(anyhow!(
"Missing column name '{}' in CSV file. Columns should be: {:?}",
h,
expected_header
));
}
}

for h in header.iter() {
if !expected_header.iter().any(|e| h == *e) {
eprintln!("WARNING: extra column '{}' in CSV file. Ignoring.", h);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried this accepts input CSVs that contain all columns, but not in the right order. The input code currently uses column indexes, not column names, to read the CSV. This has been on my long list to improve, will make an issue for it now.

For this PR, we could just check for the full expected header first, then allow extra columns? Lmk if you want me to take this on.

@ctb
Copy link
Contributor Author

ctb commented Jan 28, 2025

oh, I see... hmm. I think I have a fix that will let you merge this and then do an improved column-name-based one in the future... let me try.

Check out 5c1946a and let me know what you think!

Copy link
Collaborator

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks! 🎉

@ctb ctb merged commit 77eb8b9 into main Jan 28, 2025
1 check passed
@ctb ctb deleted the ignore_extra_columns branch January 28, 2025 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

could gbsketch ignore additional columns rather than complain?
2 participants