Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support filtering glob paths on known extensions #495

Merged
merged 4 commits into from
Jul 3, 2020
Merged

Support filtering glob paths on known extensions #495

merged 4 commits into from
Jul 3, 2020

Conversation

Artanu
Copy link
Collaborator

@Artanu Artanu commented Jul 1, 2020

The current implementation doesn't filter files (it does filter folders though!). What this means is that if a (glob) path contains an unsupported file path, eventually an unsupported file format error will be thrown. By pre-filtering on known image file extensions, we attempt to prevent having a user specify very complex and specific globs.

This implementation re-uses "DetermineEncodingFormat.by_extension(ext)" to check whether an extension is known. A disadvantage of this approach is that the above trait methods does slightly more than checking just whether the extension is supported (it also constructs an encoding format). Additionally, now this trait method will be run twice. Once here, and once when determining the format to save it to disk. Perhaps it can be cached in the future?

The implementation handles both the extensions supported directly by sic, as well as the extensions as defined by the image crate fallback decider.

An alternative idea to the current implementation (this PR) could add specific only known image extensions glob patterns to globwalk iterator to disallow any unsupported image extensions through globwalk. This would limit to filter the walkdir iterator once during globwalk iteration, instead of twice (during globwalk iteration and the in this PR implemented post globwalk filter). I went with the current implementation because I wasn't sure yet how to merge such patterns with the user given pattern.

closes #494

Artanu added 2 commits July 1, 2020 03:33
The current implementation doesn't filter files (it does filter folders though!). What this means is that if a (glob) path contains an unsupported file path, eventually an unsupported file format error will be thrown. By pre-filtering on known image file extensions, we attempt to prevent having a user specify very complex and specific globs.

This implementation re-uses "DetermineEncodingFormat.by_extension(ext)" to check whether an extension is known. A disadvantage of this approach is that the above trait methods does slightly more than checking just whether the extension is supported (it also constructs an encoding format). Additionally, now this trait method will be run twice. Once here, and once when determining the format to save it to disk. Perhaps it can be cached in the future?

The implementation handles both the extensions supported directly by sic, as well as the extensions as defined by the image crate fallback decider.
…ttern re-use

After this, we can start using GlobWalker::from_patterns and add our custom filters to the patterns
foresterre
foresterre previously approved these changes Jul 2, 2020
Cargo.toml Outdated Show resolved Hide resolved
src/cli/app.rs Outdated Show resolved Hide resolved
use globwalk::GlobWalkerBuilder;
use std::path::PathBuf;

/// Function to determine the base directory and (remaining) pattern.
/// Create a generic glob builder
pub fn glob_builder_base<PAT: AsRef<str>>(
Copy link
Owner

@foresterre foresterre Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised something here: we can probably combine this implementation with CommonDir specified elsewhere (they both have the goal of finding the greatest common base directory; here so the smallest amount of paths have to be matched against, and there to find the top level for mirroring the directory structure). For this PR, that doesn't need to be included however, since it was already separately available prior to this PR.

@foresterre foresterre merged commit 82afa32 into foresterre:master Jul 3, 2020
@foresterre
Copy link
Owner

Thanks again for the PR btw!

@foresterre foresterre added this to the 0.13.0 milestone Aug 1, 2020
@foresterre foresterre mentioned this pull request Aug 7, 2020
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add --skip-unsupported-extension to skip unwanted paths matching a glob
2 participants