Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert git commit summary to valid UTF8. #28356

Merged
merged 1 commit into from
Dec 5, 2023
Merged

Convert git commit summary to valid UTF8. #28356

merged 1 commit into from
Dec 5, 2023

Conversation

darrinsmart
Copy link
Contributor

The summary string ends up in the database, and (at least) MySQL & PostgreSQL require valid UTF8 strings.

Fixes #28178

The summary string ends up in the database, and (at least)
MySQL & PostgreSQL require valid UTF8 strings.

Fixes #28178
@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Dec 5, 2023
@pull-request-size pull-request-size bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 5, 2023
@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Dec 5, 2023
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Dec 5, 2023
@lunny
Copy link
Member

lunny commented Dec 5, 2023

Why validateUTF8 is better than try to convert them into UTF8?

@wxiaoguang
Copy link
Contributor

Why validateUTF8 is better than try to convert them into UTF8?

Because you do not know what encoding it was, then how could you "convert" it?

@lunny
Copy link
Member

lunny commented Dec 5, 2023

Why validateUTF8 is better than try to convert them into UTF8?

Because you do not know what encoding it was, then how could you "convert" it?

We can detect it and convert it like we did on git files if the content is long enough.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Dec 5, 2023

Why validateUTF8 is better than try to convert them into UTF8?

Because you do not know what encoding it was, then how could you "convert" it?

We can detect it and convert it like we did on git files if the content is long enough.

If you mean guessing the content encoding by something like chardet package, I think it's feasible while not necessary in my mind, because the git commit will be displayed by many clients in the end, many git clients also use "UTF-8" (I doubt whether there are enough clients do encoding guessing). And since the commit message (usually) is very short, not sure whether the guessing algorithm is accurate enough.

It's also OK to do the encoding guessing if most people like, I am fine with either approach.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Dec 5, 2023

Some references. According to git official document, using non-UTF-8 encoding is not encouraged, and Gitea server itself doesn't (and should never) use the encoding config option IMO .....

Actually git expects that the "commit message encoding" is properly stored in the commit object, then the message is still able to be displayed as UTF-8 when outputting:

https://git-scm.com/docs/git-commit-tree

image

@lunny lunny added backport/v1.21 This PR should be backported to Gitea 1.21 reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. labels Dec 5, 2023
@lunny lunny added this to the 1.22.0 milestone Dec 5, 2023
@lunny lunny merged commit 38a93a0 into go-gitea:main Dec 5, 2023
25 checks passed
@GiteaBot GiteaBot removed the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Dec 5, 2023
zjjhot added a commit to zjjhot/gitea that referenced this pull request Dec 5, 2023
* giteaofficial/main:
  Convert git commit summary to valid UTF8. (go-gitea#28356)
  Fix RPM/Debian signature key creation (go-gitea#28352)
  Refactor template empty checks (go-gitea#28351)
GiteaBot pushed a commit to GiteaBot/gitea that referenced this pull request Dec 5, 2023
The summary string ends up in the database, and (at least) MySQL &
PostgreSQL require valid UTF8 strings.

Fixes go-gitea#28178

Co-authored-by: Darrin Smart <darrin@filmlight.ltd.uk>
@GiteaBot GiteaBot added the backport/done All backports for this PR have been created label Dec 5, 2023
lunny pushed a commit that referenced this pull request Dec 5, 2023
Backport #28356 by @darrinsmart

The summary string ends up in the database, and (at least) MySQL &
PostgreSQL require valid UTF8 strings.

Fixes #28178

Co-authored-by: darrinsmart <darrin@djs.to>
Co-authored-by: Darrin Smart <darrin@filmlight.ltd.uk>
fuxiaohei pushed a commit to fuxiaohei/gitea that referenced this pull request Jan 17, 2024
The summary string ends up in the database, and (at least) MySQL &
PostgreSQL require valid UTF8 strings.

Fixes go-gitea#28178

Co-authored-by: Darrin Smart <darrin@filmlight.ltd.uk>
silverwind pushed a commit to silverwind/gitea that referenced this pull request Feb 20, 2024
The summary string ends up in the database, and (at least) MySQL &
PostgreSQL require valid UTF8 strings.

Fixes go-gitea#28178

Co-authored-by: Darrin Smart <darrin@filmlight.ltd.uk>
@go-gitea go-gitea locked as resolved and limited conversation to collaborators Mar 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/done All backports for this PR have been created backport/v1.21 This PR should be backported to Gitea 1.21 lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Synchronizing repository branches fails
5 participants