Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/doc: ToHTML should support headings in non-roman scripts #7349

Closed
gopherbot opened this issue Feb 18, 2014 · 15 comments
Closed

go/doc: ToHTML should support headings in non-roman scripts #7349

gopherbot opened this issue Feb 18, 2014 · 15 comments
Milestone

Comments

@gopherbot
Copy link
Contributor

by hiroki.ingk:

I would like to get support for GoDoc to generate headings for non-alphabetic languages.

I tried generating GoDoc with comments in Japanese language. However, as go/doc.ToHTML()
treats only a line beginning with a capital alphabetic letter without punctuation as a
heading, any Japanese lines are treated as normal paragraphs. So I would like to make
comments in Japanese and other non-alphabetic languages available to be judged as
headings properly.

http://golang.org/src/pkg/go/doc/comment.go?s=6479:6541#L248

It is possible to think about some ways to solve it as follows:

  1. Include a language information tag somewhere in the comment and let ToHTML() judge for the corresponding language.

    For example, a GoDoc comment begins like:

    /*
        {"language", "ja"}
        fmtパッケージは、C言語のprintfおよびscanfに似た機能を持つ、
        フォーマットのためのI/Oを実装します。フォーマットに用いる「動詞」は
        C言語から派生されていますが、より簡素なものになっています。

    While we need to decide a rule for the language tag, heading judgment process will be simple.

  2. Make ToHTML() judge for all supported languages universally

    For example, if we support Japanese, Chinese, and Korean as non-alphabetical languages, all the checks for 3 languages and the current alphabetic script will run.
 
    While we do not have to make any changes for the current comment format, the performance will be deteriorated as the number of supported languages increases.
@gopherbot
Copy link
Contributor Author

Comment 1 by hiroki.ingk:

I have also implemented a function for solution 2 above. If it does not bother you, I
would like to share the source with Code Review Tool for your reference.

@gopherbot
Copy link
Contributor Author

Comment 2 by fuzxxl:

Another, more general approach, would be to support headings as done in Markdown.
Markdown supports two styles of headings, both of which are easily readable when seen in
source code:
# This is a first level heading
## This is a second level heading
...
###### repeat until sixth level
This is a first level heading
=============================
This is a second level heading
------------------------------
(for some reason this issue uses a proportional font. The dashes and equals-sign are
supposed to have the same width as the heading)

@cznic
Copy link
Contributor

cznic commented Feb 21, 2014

Comment 3:

I would appreciate very much if comments have no annotations as it is now. Neither
language tags no rendering hints.
For the first issue: I'm not a native English speaker as well, but I firmly believe
programmers must know English and write documentation (which comments are part of) in
English. For the very same reason why MDs know and use Latin.
Markdown and friends: Godoc comments read _and look_ well unprocessed. I prefer to keep
this nice property.

@gopherbot
Copy link
Contributor Author

Comment 4 by sophomoric.periods:

Supporting Markdown seems a good approach if we can keep lower compatibility. I mean we
have to support Markdown as well as the current convention simultaneously. it may be
difficult that they coexist.
If we look at other languages, for example, Java, which is considered as one of the most
popular languages (http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html),
has the Japanese API documentation (http://docs.oracle.com/javase/jp/7/api/). There will
be no reasons that we should neglect localization if we want to make a language more
popular.
Now I am contributing to an activity to translate the Go official website into Japanese
(http://godocjp.herokuapp.com/) to spread the language to more Japanese developers. The
language support for godoc is mandatory to complete the activity.

@gopherbot
Copy link
Contributor Author

Comment 5 by hiroki.ingk:

Supporting Markdown seems a good approach if we can keep lower compatibility. I mean we
have to support Markdown as well as the current convention simultaneously. it may be
difficult that they coexist.
If we look at other languages, for example, Java, which is considered as one of the most
popular languages (http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html),
has the Japanese API documentation (http://docs.oracle.com/javase/jp/7/api/). There will
be no reasons that we should neglect localization if we want to make a language more
popular.
Now I am contributing to an activity to translate the Go official website into Japanese
(http://godocjp.herokuapp.com/) to spread the language to more Japanese developers. The
language support for godoc is mandatory to complete the activity.

@adg
Copy link
Contributor

adg commented Feb 23, 2014

Comment 6:

It is a specific non-goal to introduce any kind of markup to godoc formats.
They should be regular comments that are rendered nicely by godoc. The onus is on godoc
to interpret and present the documentation proerply.
The one exception is indentation for pre-formatted text.
So let's stop talking about markdown.
As for the original issue, godoc should be smart enough to understand other scripts.
This might be hard to do correctly without the packages from the go.text repository, but
I'm no expert.

Labels changed: added release-none, repo-main.

Status changed to HelpWanted.

@vdobler
Copy link
Contributor

vdobler commented Feb 26, 2014

Comment 7:

Markdown or language tags are clearly out of scope.
But relaxing http://golang.org/src/pkg/go/doc/comment.go?s=4263:4296#L176
to not insist on an uppercase letter is pretty non-invasive. All we might
get are more false positives. 
Does anybody have a large and representative corpus of Go packages
to do some statistics? If dropping the uppercase for the first letter
doesn't produce any false positive on 10^6 common Go packages there is
no real need to enforce it.
(My first proposal for headings was to require _two_ blank lines
before a potential heading. I still do like this as the vertical
space in the source looks well to me and shows that something new
is coming. But this was considered as a step too close in the
direction of some formating syntax.)

@gopherbot
Copy link
Contributor Author

Comment 8 by hiroki.ingk:

I think the omission of the upper case condition is a reasonable option if there is not
much impact for existing documents.
In the case, the line http://golang.org/src/pkg/go/doc/comment.go?s=4458:4506#L186
should be changed as follows.
    if !unicode.IsLetter(r) {
Now we can get headings when the first character is in category L and the last character
is in category L or N. This works for Japanese scripts. However, the condition at line
http://golang.org/src/pkg/go/doc/comment.go?s=4712:4781#L197 is void because none of the
characters in the list is not Japanese punctuation. For example, the following is a
Japanese script having a Japanese punctuation, however, it will be judged as a heading.
    これは、ヘッダです 
    (means "This is a header")
A few options can be considered for this problem:
    1. Add punctuation for the languages to the list at line http://golang.org/src/pkg/go/doc/comment.go?s=4712:4781#L197 This will be an arduous work because we need to consider various languages.
    2. Change the statement with IsPunct to check various punctuation.
    3. Just keep the current implementation and allow this small discrepancy

@gopherbot
Copy link
Contributor Author

Comment 9 by sophomoric.periods:

Sorry, the following will be a reasonable solution for the upper case matter at line
http://golang.org/src/pkg/go/doc/comment.go?s=4458:4506#L186 . I believe this will not
cause any impact for the existing documents.
    if !unicode.IsLetter(r) || unicode.IsLower(r) {

@gopherbot
Copy link
Contributor Author

Comment 10 by hiroki.ingk:

Sorry, the following is a reasonable solution for the upper case matter at line
http://golang.org/src/pkg/go/doc/comment.go?s=4458:4506#L186 . I believe this will not
cause any impact for the existing documents.
    if !unicode.IsLetter(r) || unicode.IsLower(r) {

@evankroske
Copy link

Comment 11:

I'll take this one. Expect a CL soon.

@gopherbot
Copy link
Contributor Author

Comment 12:

CL https://golang.org/cl/121040043 mentions this issue.

@evankroske
Copy link

Comment 13:

I've realized that we don't have enough test data in languages other than English to
ensure that our changes don't make ToHTML worse. We need to fix that before we can fix
ToHTML.

@rsc
Copy link
Contributor

rsc commented Sep 10, 2021

See #48305.

@seankhliao
Copy link
Member

I believe this is covered by the new structured headings. (# heading)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants
@rsc @evankroske @vdobler @cznic @adg @gopherbot @seankhliao and others