Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tree#count_recursive #464

Merged
merged 16 commits into from
Mar 9, 2015
Merged

Conversation

adelcambre
Copy link
Contributor

This adds a new count_recursive method to Rugged tree objects. This counts all blobs in the tree recursively, with an optional limit to bail early. This allows asking things like "Are there more than 1 million files in this repo?" without a roundtrip into ruby for each file.

struct rugged_treecount_cb_payload
{
unsigned int count;
int limit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsigned here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, need to do that.

else if (TYPE(rbLimit) != T_FIXNUM)
rb_raise(rb_eTypeError, "limit must be a Fixnum");
else
payload.limit = FIX2INT(rbLimit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could maybe be simplified to something like:

if (!NIL_P(rbLimit)) {
  Check_Type(rbLimit, T_FIXNUM);
  payload.limit = FIX2INT(rbLimit);
}

Check_Type will raise an ArgumentError for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool. I'll switch it to this. Didn't know about Check_Type.

++(payload->count);
return 1;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately in the libgit2 git_tree_walk API, there is no maximum depth check and the implementation is recursive, so there are some theoretical DOS vectors here. I think probably the best bet is to fix it at the libgit2 level and either reimplement git_tree_walk using the git_iterator_for_tree API or simply add a depth check internal to the API. (It would be nice to reflect the depth as a parameter to the callback, but that would be a breaking API change, so probably not worth it.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. Rather than just counting blob, I could count all tree entries (including subtrees). This means that providing a limit will give protection from at least one more DOS vector. Would that cover the case you're thinking of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regrettably, I think not. The issue is actually hard to compensate for except at the libgit2 level (without introducing a waste of time scanning path data) - namely that you can create a 100,000 level deep tree and not hit your walk limit, but burn up a huge amount of stack space internal to libgit2. Let me look over there a bit...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw MRI cannot stack overflow because it protects the end of the stack in the VM and will gracefully raise a StackOverflow exception into Ruby-land if a C extension overflows... So I'm not sure I'd consider this an attack vector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thank you @vmg for bringing actual knowledge to the table. 😽

struct rugged_treecount_cb_payload payload;
VALUE rbLimit;

if (rb_scan_args(argc, argv, "01", &rbLimit) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the if condition and change this to just:

rb_scan_args(argc, argv, "01", &rbLimit);

rb_scan_args will set rbLimit to nil if it was not passed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use caps for class and module names, so rbLimit should be rb_limit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arthurschreiber Coulda sworn I tried that, but that's way better. I'll switch to that.

@vmg Still getting used to the C-ext conventions. Thanks.


error = git_tree_walk(tree, GIT_TREEWALK_PRE, &rugged__treecount_cb, (void *)&payload);

if(error && giterr_last()->klass == GITERR_CALLBACK) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Add a whitespace after if.

@vmg
Copy link
Member

vmg commented Mar 6, 2015

Neat stuff. Props for your first Rugged PR!

@adelcambre
Copy link
Contributor Author

first Rugged PR

Heh! I think this is my 3rd or 4th!

@adelcambre
Copy link
Contributor Author

Anything else need done here? Am I good to go ahead and merge myself?

static VALUE rb_git_tree_entrycount_recursive(int argc, VALUE* argv, VALUE self)
{
git_tree *tree;
Data_Get_Struct(self, git_tree, tree);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to move this statement below other variable declarations, as this generates a warning. (See the travis build logs).

@arthurschreiber
Copy link
Member

Except for the last few nitpicks, this looks great! 👍

@adelcambre
Copy link
Contributor Author

@arthurschreiber Cool, thanks for the feedback! Got those all updated.

vmg pushed a commit that referenced this pull request Mar 9, 2015
@vmg vmg merged commit 4d676f6 into libgit2:master Mar 9, 2015
@vmg
Copy link
Member

vmg commented Mar 9, 2015

@adelcambre adelcambre deleted the tree-count-recursive branch March 9, 2015 16:56
@adelcambre
Copy link
Contributor Author

@vmg 😻 Do you mind pushing up a new beta version of the gem so I can get it merged on our end?

@vmg
Copy link
Member

vmg commented Mar 9, 2015

@adelcambre: For the betas we usualy just SHA-vendor them in GitRPC instead of pushing to Rubygems

@arthurschreiber
Copy link
Member

I really need to kick out a new "stable" release with libgit2 updated to 0.22 so we can start development on 0.23... 😞

@vmg
Copy link
Member

vmg commented Mar 9, 2015

@arthurschreiber: can you start by pushing a new beta to Rubygems with these changes? Looks like @adelcambre needs a release to vendor in GitRPC.

@arthurschreiber
Copy link
Member

Sure, I'll cut a release later today.

@arthurschreiber
Copy link
Member

Yo @adelcambre - I just released 0.22.1b1. Hope that helps. ✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants