Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cleanliness checks more efficient #15

Merged
merged 1 commit into from
Jan 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 10 additions & 9 deletions src/backups2datalad/adataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
from datetime import datetime
from enum import Enum
from functools import partial
from operator import attrgetter
from pathlib import Path
import re
import subprocess
Expand Down Expand Up @@ -98,15 +97,17 @@ async def ensure_installed(
return True

async def is_dirty(self) -> bool:
return await anyio.to_thread.run_sync(attrgetter("dirty"), self.ds.repo)

async def is_unclean(self) -> bool:
def _unclean() -> bool:
return any(
r["state"] != "clean" for r in self.ds.status(result_renderer=None)
return (
await self.read_git(
"status",
"--porcelain",
# Forcibly use default values for these options in case they
# were overridden by user's gitconfig:
"--untracked-files=normal",
"--ignore-submodules=none",
)

return await anyio.to_thread.run_sync(_unclean)
!= ""
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would indeed be more efficient but not yet "most efficient" for the case of having heavy changes: would it result in early termination of that git status --porcelain ... on receiving any output, which already would be indicative of being dirty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I'm not sure if changing the code to terminating git status early would result in git cleaning everything up properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet it should be fine since I believe it is typically used as git status --porcelain --untracked-files=normal --ignore-submodules=none | grep -q . which results in similar early closure of the pipe etc.

But anyways -- probably most of our cases are not that heavy on the diff here so would not really be anyhow notably different. Let's proceed.


async def get_repo_config(self, key: str) -> str | None:
try:
Expand Down
2 changes: 1 addition & 1 deletion src/backups2datalad/asyncer.py
Original file line number Diff line number Diff line change
Expand Up @@ -616,7 +616,7 @@ async def async_assets(
ts = timestamp
await ds.set_assets_state(AssetsState(timestamp=ts))
manager.log.debug("Checking whether repository is dirty ...")
if await ds.is_unclean():
if await ds.is_dirty():
manager.log.info("Committing changes")
await ds.commit(
message=dm.report.get_commit_message(),
Expand Down
2 changes: 1 addition & 1 deletion src/backups2datalad/datasetter.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ async def sync_dataset(
await syncer.dump_asset_metadata()
assert syncer.report is not None
manager.log.debug("Checking whether repository is dirty ...")
if await ds.is_unclean():
if await ds.is_dirty():
manager.log.info("Committing changes")
await ds.commit(
message=syncer.get_commit_message(),
Expand Down