Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cleanliness checks more efficient #14

Closed
jwodder opened this issue Jan 2, 2024 · 3 comments · Fixed by #15
Closed

Make cleanliness checks more efficient #14

jwodder opened this issue Jan 2, 2024 · 3 comments · Fixed by #15
Labels
performance Efficient use of time and space

Comments

@jwodder
Copy link
Member

jwodder commented Jan 2, 2024

Currently, when finishing up syncing of a Dandiset dataset, the code checks whether any changes have yet to be committed using DataLad's status() method, which (as I understand it) does more work than needed for our purposes. Could we achieve the same results (including checking of submodules) but more efficiently using git diff --cached --quiet or by seeing whether git status --porcelain outputs anything?

There are also some spots where the code checks for cleanliness using ds.repo.dirty. What's the difference? Why should repo.dirty be preferred over status() or vice versa?

CC @yarikoptic

@jwodder jwodder added performance Efficient use of time and space needs research More information is required labels Jan 2, 2024
@yarikoptic
Copy link
Member

fwiw here is a code of that repo.dirty -- has a comment on a potential difference from dataset.status which should not be relevant to us here (adjusted branches mode). Altogether I think you can use ds.repo.dirty indeed and it might be providing some speedup over full dataset.status

    def dirty(self) -> bool:
        """Is the repository dirty?
    
        Note: This provides a quick answer when you simply want to know if
        there are any untracked changes or modifications in this repository or
        its submodules. For finer-grained control and more detailed reporting,
        use status() instead.
        """
        stdout = self.call_git(
            ["status", "--porcelain",
             # Ensure the result isn't influenced by status.showUntrackedFiles.
             "--untracked-files=normal",
             # Ensure the result isn't influenced by diff.ignoreSubmodules.
             "--ignore-submodules=none"])
        if bool(stdout.strip()):
            # The quick `git status`-based check can give a different answer
            # than `datalad status` for submodules on an adjusted branch.
            st = self.diffstatus(fr="HEAD" if self.get_hexsha() else None,
                                 to=None, untracked="normal")
            return any(r.get("state") != "clean" for r in st.values())
        return False

@jwodder
Copy link
Member Author

jwodder commented Jan 3, 2024

@yarikoptic And what exactly does the call to self.diffstatus() in that code do? Is it the most efficient way to do whatever it does when just limiting to use-cases for backups2datalad?

@yarikoptic
Copy link
Member

well -- code is here : https://github.com/datalad/datalad/blob/HEAD/datalad/support/gitrepo.py#L3086 and your guess might be even better than mine ;) overall -- I do not think it is pertinent to our case really -- can't tell from that function description ... looking at datalad/datalad@4e18d1b which introduced that call -- just again talks about adjusted branches which we can forget about here since I do not think we coded with that in mind anywhere else. So -- direct call to git status --porcelain ... would be most efficient I guess for you.

@jwodder jwodder removed the needs research More information is required label Jan 4, 2024
yarikoptic added a commit that referenced this issue Jan 4, 2024
Make cleanliness checks more efficient
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Efficient use of time and space
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants