Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: allow use of .mailmap #764

Open
rgommers opened this issue Jun 1, 2018 · 2 comments
Open

feature request: allow use of .mailmap #764

rgommers opened this issue Jun 1, 2018 · 2 comments

Comments

@rgommers
Copy link

rgommers commented Jun 1, 2018

A .mailmap file is used to map commit author info where committers use multiple names or email addresses to a single unique name/email. It is usually located in the root of a repo. See e.g. https://stackoverflow.com/questions/13777171/configuring-git-log-to-use-mailmap-by-default

I cannot find any mention of the feature in this repo; GitPyhon seems to ignore the file. It would be quite useful to allow using that file. Maybe replacing the author name and email in raw, or adding a second row-mailmapped column for example.

@Byron
Copy link
Member

Byron commented Jun 5, 2018

As this library is in maintenance mode, essentially everything has to be contributed. Most of the time, this is only bugfixes.
However, it should be possible to workaround these issues by using git directly, and leverage the git program in these cases.

@rgommers
Copy link
Author

rgommers commented Jun 6, 2018

Thanks for the feedback @Byron. The easiest workaround seems to be to read in the .mailmap file separately into a dictionary, and fix the author names with that after extracting them from the 'raw' commit data returned by Repo.

HelgeCPH added a commit to HelgeCPH/pydriller that referenced this issue Oct 11, 2024
PyDriller does not support
[Git `.mailmap` files](https://git-scm.com/docs/gitmailmap). That is, authors
or committers, which are represented as `Developer` objects always get assigned
the name and email values that are stated in respective commits.

I would like that PyDriller supported use of `.mailmap` files directly. So far,
I have worked around lacking mailmap support by letting my external scripts map
users with multiple email addresses or slightly different names to canonical
data. I believe there might be more users than me who could be interested in
this feature.

Currently, PyDriller does not support `.mailmap` files since it relies on
GitPython to receive names and emails for  `Developer`s. GitPython does not
support `.mailmap` files either, see
[here](gitpython-developers/GitPython#764).
Since they state that GitPython is in maintenance mode, I contribute support of
`.mailmap` files to PyDriller and not to the underlying dependency. I believe
that this functionality is also better suited to PyDriller, a library with
which users want to run analysis of Git repositories, where analysis allow for
"deduplication" of physical authors into logical authors.

Since parsing of `.mailmap` files is not straight forward and since I do not
want to introduce an algorithm that might produce results dissimilar to Git,
I decided to wrap the `git check-mailmap`
[CLI command](https://git-scm.com/docs/git-check-mailmap/2.31.0).
Most of this is done in the new file
[pydriller/utils/mailmap.py](utils/mailmap.py). To make the feature work, I had
to extend the [`Developer`](pydriller/domain/developer.py) class slightly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants