Skip to content
This repository has been archived by the owner on Sep 11, 2020. It is now read-only.

Filtering Commits Based on Files They Interact With #562

Open
zachgersh opened this issue Aug 24, 2017 · 9 comments
Open

Filtering Commits Based on Files They Interact With #562

zachgersh opened this issue Aug 24, 2017 · 9 comments

Comments

@zachgersh
Copy link
Contributor

Hey All,

I am essentially trying to replicate this git log command in golang and can get most of the way there. Command looks something like this:

git log -- some-magical-path

Git understands when I say this I mean only the commits that interacted with the some-magical-path and not any others. With the library I only seem to be able to iterate over all of the commits and then ask about their files. When interrogating a commit for its files it gives me all of the files referenced by the tree (even when they have not been interacted with). Does anyone have a good idea how this would be accomplished?

Cheers!

@orirawlings orirawlings changed the title Filtering Commits Based on FIles They Interact With Filtering Commits Based on Files They Interact With Aug 25, 2017
@orirawlings
Copy link
Contributor

orirawlings commented Aug 25, 2017

Hey @zachgersh,

You are correct that there is currently no explicit support in the library for this, as there is with the standard git tooling.

One way to implement something like this is shown below. No promises that this is a particularly fast algorithm. The basic idea is that we keep track of the version of the path at each commit (by remember the hash of the path contents). We can compare the path hashes between a commit and each of its parents to detect if the path has changed.

I believe the standard git tooling would also include a log entry if either the file permissions for the path changed or the file contents changed. This code example does not support changes to file permissions of the path, so that would be an exercise for the reader. 😛

package main

import (
	"fmt"
	"os"
	"strings"

	"gopkg.in/src-d/go-git.v4"
	. "gopkg.in/src-d/go-git.v4/_examples"
	"gopkg.in/src-d/go-git.v4/plumbing"
	"gopkg.in/src-d/go-git.v4/plumbing/object"
)

// Print log for commits with changes to certain file path
func main() {
	CheckArgs("<repoDir> <path>")
	repoDir := os.Args[1]
	path := os.Args[2]

	// We open the repository at given directory
	r, err := git.PlainOpen(repoDir)
	CheckIfError(err)

	Info(fmt.Sprintf("git log -- %s", path))

	// ... retrieves the branch pointed by HEAD
	ref, err := r.Head()
	CheckIfError(err)

	// ... retrieves the commit history
	cIter, err := r.Log(&git.LogOptions{From: ref.Hash()})
	CheckIfError(err)

	// ... just iterates over the commits, printing it
	err = cIter.ForEach(filterByChangesToPath(r, path, func(c *object.Commit) error {
		fmt.Println(c)
		return nil
	}))
	CheckIfError(err)
}

type memo map[plumbing.Hash]plumbing.Hash

// filterByChangesToPath provides a CommitIter callback that only invokes 
// a delegate callback for commits that include changes to the content of path.
func filterByChangesToPath(r *git.Repository, path string, callback func(*object.Commit) error) func(*object.Commit) error {
	m := make(memo)
	return func(c *object.Commit) error {
		if err := ensure(m, c, path); err != nil {
			return err
		}
		if c.NumParents() == 0 && !m[c.Hash].IsZero() {
			// c is a root commit containing the path
			return callback(c)
		}
		// Compare the path in c with the path in each of its parents
		for _, p := range c.ParentHashes {
			if _, ok := m[p]; !ok {
				pc, err := r.CommitObject(p)
				if err != nil {
					return err
				}
				if err := ensure(m, pc, path); err != nil {
					return err
				}
			}
			if m[p] != m[c.Hash] {
				// contents at path are different from parent
				return callback(c)
			}
		}
		return nil
	}
}

// ensure our memoization includes a mapping from commit hash 
// to the hash of path contents.
func ensure(m memo, c *object.Commit, path string) error {
	if _, ok := m[c.Hash]; !ok {
		t, err := c.Tree()
		if err != nil {
			return err
		}
		te, err := t.FindEntry(path)
		if err == object.ErrDirectoryNotFound {
			m[c.Hash] = plumbing.ZeroHash
			return nil
		} else if err != nil {
			if !strings.ContainsRune(path, '/') {
				// path is in root directory of project, but not found in this commit
				m[c.Hash] = plumbing.ZeroHash
				return nil
			}
			return err
		}
		m[c.Hash] = te.Hash
	}
	return nil
}

@orirawlings
Copy link
Contributor

@mcuadros has there been any discussion about adding this type of functionality to the library? Do we want to label this issue as "enhancement"?

@zachgersh
Copy link
Contributor Author

@orirawlings - thanks for writing this up, I am going to give it a go. It would actually seem that References which was previously made private does pretty much the same thing? I wonder if we could just expose that again.

Somewhat related to #343 - which was looking to pattern match on a partial path. Git log totally supports this behavior though it wasn't clear in the example above :D

@zachgersh
Copy link
Contributor Author

This code works perfectly btw. Really appreciate this @orirawlings - hugely helped me with a project I am working on.

@orirawlings
Copy link
Contributor

I'll add the enhancement label on this one. I think it might be feasible to extend some of the example code and work it into the library API.

@ilius
Copy link
Contributor

ilius commented Mar 19, 2018

Since repo.Log returns a object.CommitIter interface, I think it would be nice to have a function that takes a CommitIter and files, filter commits and return a new CommitIter interface.

Something like

func FilterCommitsByFilePath(iter object.CommitIter, files map[string]bool, exclude bool) (object.CommitIter, error) {

This can be also used to exclude some files, by giving exclude = false as well as false values to those files in the files map.

@marians
Copy link

marians commented Nov 29, 2019

Is this a duplicate of #826 which has been closed in #979 ?

@ilius
Copy link
Contributor

ilius commented Nov 29, 2019

I don't think it is.
Since some-magical-path can be the path of a directory (parent of many files)
I wish the Option.FileName field was called Path to allow that.

Or even maybe PathFilter func(string) bool to allow pattern / regexp / globe on file/directory path.

@ilius
Copy link
Contributor

ilius commented Mar 9, 2020

Should be safe to close since this PR is merged and PathFilter func(string) bool is added

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants