Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore broken symlinks for LocalFileSystem object store #2195

Merged
merged 2 commits into from
Jul 28, 2022

Conversation

jccampagne
Copy link
Contributor

Which issue does this PR close?

Closes #2174.

Rationale for this change

convert_walkdir_result now ignores broken symlinks.

Resolution follows suggestions in:

What changes are included in this PR?

convert_walkdir_result now tests if a DirEntry is a symlink and if it is broken or not, function ignores broken symlinks (return Ok(None).

Are there any user-facing changes?

No.

@github-actions github-actions bot added the object-store Object Store Interface label Jul 27, 2022
@alamb
Copy link
Contributor

alamb commented Jul 27, 2022

Thanks @jccampagne ❤️

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jccampagne -- I found the code easy to follow and it made sense to me

I think some might say it is not "idomatic" Rust in that it doesn't use the more functional map / ok type style. I spent some time rewriting to use that style in order to provide some more useful feedback. What I came up with is below but actually think it is harder to understand what is happening than your version 🤷

Thanks again

/// Convert walkdir results and converts not-found errors into `None`.
/// Convert broken symlinks to `None`.
fn convert_walkdir_result(
    res: std::result::Result<walkdir::DirEntry, walkdir::Error>,
) -> Result<Option<walkdir::DirEntry>> {
    match res {
        Ok(entry) => Ok(map_broken_symlink_to_none(entry)),
        Err(walkdir_err) => match walkdir_err.io_error() {
            Some(io_err) => match io_err.kind() {
                io::ErrorKind::NotFound => Ok(None),
                _ => Err(Error::UnableToWalkDir {
                    source: walkdir_err,
                }
                .into()),
            },
            None => Err(Error::UnableToWalkDir {
                source: walkdir_err,
            }
            .into()),
        },
    }
}



fn map_broken_symlink_to_none(entry: walkdir::DirEntry) -> Option<walkdir::DirEntry> {
    // To check for broken symlink: call symlink_metadata() - it does not traverse symlinks);
    // if ok: check if entry is symlink; and try to read it by calling metadata().
    symlink_metadata(entry.path())
        .map(|attr| {
            if attr.is_symlink() {
                metadata(entry.path())
                    // keep entry if metadata returned successfully
                    .map(|_| entry)
                    // discard error
                    .ok()
            }
            else {
                Some(entry)
            }
        })
        // discard the error from symlink etadata
        .ok()
        .flatten()
}

cc @tustvold

@@ -990,8 +1015,6 @@ mod tests {
}

#[tokio::test]
#[cfg(target_os = "linux")]
// macos has some magic in its root '/.VolumeIcon.icns"' which causes this test to fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually a red-herring, mac os doesn't always have this file. In particular in CI. I think it would be great if we could have an explicit test of a broken link so that we have some test coverage of this feature

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment -- the test fails with with this error on (macos) CI. I disabled the check on mac so the CI would pass....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, I had understood that this couldn't be relied on, that's good that we can see the issue and get coverage of it in CI.

However, I still think an explicit symlink test would be worthwhile, especially as I would not be surprised if Windows does something exciting, etc...

// symlink is valid
Ok(Some(entry))
}
Err(_) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to check the ErrorKind is NotFound

Ok(entry) => {
// To check for broken symlink: call symlink_metadata() - it does not traverse symlinks);
// if ok: check if entry is symlink; and try to read it by calling metadata().
match symlink_metadata(entry.path()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should invert the order here, to optimise for the common case of non-broken symlinks.

i.e. call std::fs::metadata if this fails with ErrorKind::NotFound, then call symlink_metadata to check if it is a broken symlink we should just ignore?

@tustvold
Copy link
Contributor

I think some might say it is not "idomatic" Rust in that it doesn't use the more functional map / ok type style

I think using structured matches as this PR does is idiomatic Rust and I'm happy with it, nice work 👍. FWIW I don't think code needs to be written in a functional style to be considered idiomatic, it is frequently harder to understand, and very often the borrow checker prevents it anyway.

@tustvold tustvold merged commit b8e034c into apache:master Jul 28, 2022
@ursabot
Copy link

ursabot commented Jul 28, 2022

Benchmark runs are scheduled for baseline = cc96687 and contender = b8e034c. b8e034c is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@jccampagne
Copy link
Contributor Author

Thanks for the review and comments.
I'll try to add a specific test for broken links; instead of relying on MacOS's idiosyncrasies.

@jccampagne jccampagne deleted the fix-broken-symlinks-ignore branch July 28, 2022 10:29
@tustvold
Copy link
Contributor

tustvold commented Jul 28, 2022

@jccampagne so I started writing some tests for this behaviour, and actually symlinks behave in very unexpected ways in general - as they get canonicalised as part of path resolution. I'm currently coding something up which will simply ignore symlinks, as I don't think we can support them in a meaningful way without significant engineering effort that I think would be hard to justify

@alamb
Copy link
Contributor

alamb commented Jul 28, 2022

Filed #2206 to track writing tests

@tustvold
Copy link
Contributor

Created #2207, please let me know what you both think 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test local::tests::test_list_root fails on main on macos
4 participants