-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some common characters should not be treaded as ambiguous Unicode characters #20999
Comments
At least these Chinese characters are normal and should not be warning. |
If your locale is zh-CN these will not be shown as ambiguous. Further if you look carefully, it's only the when the characters (e.g. |
I think what we should do is only show ambiguous warnings on Source Code and not on rendered files - as per VSCode. |
Only if you're using them in Chinese text - they're very definitely ambiguous otherwise. The difficulty is always what is "Chinese" or "English" text. The algorithm - which is the same as that used in vscode - does this on a word by word basis. One could try to come up with some heuristic to determine whether a paragraph as a whole is some language or other - but that's almost certainly still an open problem in NLP. The best solution I suspect is just drop the warning on Rendered pages, default to unescaped and allow people to escape if they are interested. |
It still shows. https://gitea.com/xorm/xorm/src/branch/master/README_CN.md?lang=zh-CN |
Event gitea's own README is marked as containing "ambiguous Unicode characters" because of the pronouncation characters. I think it's silly that we even mark these and I think the set of matched characters should be reduced to the absolute minimum that may be actually malicious, e.g. non-standard whitespace, text reversal and such. |
I think the only answer is to not render the warning on rendered pages. This is a simple 3 line diff: diff --git a/templates/repo/view_file.tmpl b/templates/repo/view_file.tmpl
index 0fe0a1319..9d82cc018 100644
--- a/templates/repo/view_file.tmpl
+++ b/templates/repo/view_file.tmpl
@@ -58,7 +58,9 @@
</div>
</h4>
<div class="ui attached table unstackable segment">
- {{template "repo/unicode_escape_prompt" dict "EscapeStatus" .EscapeStatus "root" $}}
+ {{if not (or .IsMarkup .IsRenderedHTML)}}
+ {{template "repo/unicode_escape_prompt" dict "EscapeStatus" .EscapeStatus "root" $}}
+ {{end}}
<div class="file-view{{if .IsMarkup}} markup {{.MarkupType}}{{else if .IsRenderedHTML}} plain-text{{else if .IsTextSource}} code-view{{end}}">
{{if .IsMarkup}}
{{if .FileContent}}{{.FileContent | Safe}}{{end}}
|
OK well that's a bug. Looking at ambiguous_gen the locale exceptions there are looking for zh-hant/zh-hans so we'll need to add a mapping from zh-CN to zh-hans |
The real sensitivity of ambiguous characters is in source code - therefore warning about them in rendered pages causes too many warnings. Therefore simply remove the warning on rendered pages. The escape button will remain available and it is present on the view source page. Fix go-gitea#20999 Signed-off-by: Andrew Thornton <art27@cantab.net>
Although there are per-locale fallbacks for ambiguity the locale names for Chinese do not quite match our locales. This PR simply maps zh-CN on to zh-hans and other zh variants on to zh-hant. Ref go-gitea#20999 Signed-off-by: Andrew Thornton <art27@cantab.net>
The real sensitivity of ambiguous characters is in source code - therefore warning about them in rendered pages causes too many warnings. Therefore simply remove the warning on rendered pages. The escape button will remain available and it is present on the view source page. Fix #20999 Signed-off-by: Andrew Thornton <art27@cantab.net>
…2016) Backport go-gitea#22016 The real sensitivity of ambiguous characters is in source code - therefore warning about them in rendered pages causes too many warnings. Therefore simply remove the warning on rendered pages. The escape button will remain available and it is present on the view source page. Fix go-gitea#20999 Signed-off-by: Andrew Thornton <art27@cantab.net>
…22018) Backport #22016 The real sensitivity of ambiguous characters is in source code - therefore warning about them in rendered pages causes too many warnings. Therefore simply remove the warning on rendered pages. The escape button will remain available and it is present on the view source page. Fix #20999 Signed-off-by: Andrew Thornton <art27@cantab.net>
…se (go-gitea#22019) Backport go-gitea#22019 Although there are per-locale fallbacks for ambiguity the locale names for Chinese do not quite match our locales. This PR simply maps zh-CN on to zh-hans and other zh variants on to zh-hant. Ref go-gitea#20999 Signed-off-by: Andrew Thornton <art27@cantab.net>
…se (#22019) (#22030) Backport #22019 Although there are per-locale fallbacks for ambiguity the locale names for Chinese do not quite match our locales. This PR simply maps zh-CN on to zh-hans and other zh variants on to zh-hant. Ref #20999 Signed-off-by: Andrew Thornton <art27@cantab.net> Co-authored-by: Lauris BH <lauris@nix.lv>
Gitea 1.18-dev
https://try.gitea.io/wxiaoguang/test/src/branch/master/test-chars.md
Some common characters should not be treaded as ambiguous Unicode characters.
Many CJK punctuations are quite common in daily usage, they should not be marked as
ambiguous character
.Otherwise the misleading warning appears on every page which contains CJK texts.
The text was updated successfully, but these errors were encountered: