-
-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: allow use of .mailmap #764
Comments
As this library is in |
Thanks for the feedback @Byron. The easiest workaround seems to be to read in the |
PyDriller does not support [Git `.mailmap` files](https://git-scm.com/docs/gitmailmap). That is, authors or committers, which are represented as `Developer` objects always get assigned the name and email values that are stated in respective commits. I would like that PyDriller supported use of `.mailmap` files directly. So far, I have worked around lacking mailmap support by letting my external scripts map users with multiple email addresses or slightly different names to canonical data. I believe there might be more users than me who could be interested in this feature. Currently, PyDriller does not support `.mailmap` files since it relies on GitPython to receive names and emails for `Developer`s. GitPython does not support `.mailmap` files either, see [here](gitpython-developers/GitPython#764). Since they state that GitPython is in maintenance mode, I contribute support of `.mailmap` files to PyDriller and not to the underlying dependency. I believe that this functionality is also better suited to PyDriller, a library with which users want to run analysis of Git repositories, where analysis allow for "deduplication" of physical authors into logical authors. Since parsing of `.mailmap` files is not straight forward and since I do not want to introduce an algorithm that might produce results dissimilar to Git, I decided to wrap the `git check-mailmap` [CLI command](https://git-scm.com/docs/git-check-mailmap/2.31.0). Most of this is done in the new file [pydriller/utils/mailmap.py](utils/mailmap.py). To make the feature work, I had to extend the [`Developer`](pydriller/domain/developer.py) class slightly.
A
.mailmap
file is used to map commit author info where committers use multiple names or email addresses to a single unique name/email. It is usually located in the root of a repo. See e.g. https://stackoverflow.com/questions/13777171/configuring-git-log-to-use-mailmap-by-defaultI cannot find any mention of the feature in this repo;
GitPyhon
seems to ignore the file. It would be quite useful to allow using that file. Maybe replacing the author name and email inraw
, or adding a secondrow-mailmapped
column for example.The text was updated successfully, but these errors were encountered: