-
-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support of Git mailmap files #303
Conversation
PyDriller does not support [Git `.mailmap` files](https://git-scm.com/docs/gitmailmap). That is, authors or committers, which are represented as `Developer` objects always get assigned the name and email values that are stated in respective commits. I would like that PyDriller supported use of `.mailmap` files directly. So far, I have worked around lacking mailmap support by letting my external scripts map users with multiple email addresses or slightly different names to canonical data. I believe there might be more users than me who could be interested in this feature. Currently, PyDriller does not support `.mailmap` files since it relies on GitPython to receive names and emails for `Developer`s. GitPython does not support `.mailmap` files either, see [here](gitpython-developers/GitPython#764). Since they state that GitPython is in maintenance mode, I contribute support of `.mailmap` files to PyDriller and not to the underlying dependency. I believe that this functionality is also better suited to PyDriller, a library with which users want to run analysis of Git repositories, where analysis allow for "deduplication" of physical authors into logical authors. Since parsing of `.mailmap` files is not straight forward and since I do not want to introduce an algorithm that might produce results dissimilar to Git, I decided to wrap the `git check-mailmap` [CLI command](https://git-scm.com/docs/git-check-mailmap/2.31.0). Most of this is done in the new file [pydriller/utils/mailmap.py](utils/mailmap.py). To make the feature work, I had to extend the [`Developer`](pydriller/domain/developer.py) class slightly.
Together with the feature implementation, I provide a set of tests. They rely on an example repository, which I adapted from [here](https://github.com/ContentMine/mailmap). I needed a small repository with a `.mailmap` file and few commits in which some are made by users with different name/email values. I do not have the time to craft one from scratch. Therefore, I hope it is okay to include that repository in the test repositories. I believe there is no issue, since the tests as well as the commit message state the source of the data explicitly.
I believe that I figured out what mypy was complaining about and committed respective changes. |
TestPulse reportTest execution🥳 Congrats, all your tests have passed! Files without coverage
Coverage informationUnit tests
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #303 +/- ##
==========================================
+ Coverage 97.27% 97.38% +0.10%
==========================================
Files 17 18 +1
Lines 1102 1146 +44
==========================================
+ Hits 1072 1116 +44
Misses 30 30
|
Since empty abstract methods in abstract classes cannot be tested, I exclude them from the Codecov report as described in the documentation https://coverage.readthedocs.io/en/latest/excluding.html#advanced-exclusion
Amazing PR! Thanks for adding many tests as well! |
PyDriller does not support Git
.mailmap
files. That is, authors or committers, which are represented asDeveloper
objects always get assigned the name and email values that are stated in respective commits.I would like that PyDriller supported use of
.mailmap
files directly. So far, I have worked around lacking mailmap support by letting my external scripts map users with multiple email addresses or slightly different names to canonical data. I believe there might be more users than me who could be interested in this feature.Currently, PyDriller does not support
.mailmap
files since it relies on GitPython to receive names and emails forDeveloper
s. GitPython does not support.mailmap
files either.Since they state that GitPython is in maintenance mode, I contribute support of
.mailmap
files to PyDriller and not to the underlying dependency. I believe that this functionality is also better suited to PyDriller, a library with which users want to run analysis of Git repositories, where analysis allow for "deduplication" of physical authors into logical authors.Since parsing of
.mailmap
files is not straight forward and since I do not want to introduce an algorithm that might produce results dissimilar to Git, I decided to wrap thegit check-mailmap
CLI command. Most of this is done in the new file pydriller/utils/mailmap.py. To make the feature work, I had to extend theDeveloper
class slightly.Together with the feature implementation, I provide a set of tests. They rely on an example repository, which I adapted from here. I needed a small repository with a
.mailmap
file and few commits in which some are made by users with different name/email values. I do not have the time to craft one from scratch. Therefore, I hope it is okay to include that repository in the test repositories. I believe there is no issue, since the tests as well as the commit message state the source of the data explicitly.