Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions re. encoding of Linggawangi.xml #10

Open
arlogriffiths opened this issue Feb 18, 2020 · 8 comments
Open

questions re. encoding of Linggawangi.xml #10

arlogriffiths opened this issue Feb 18, 2020 · 8 comments
Assignees

Comments

@arlogriffiths
Copy link
Contributor

@danbalogh I have just pushed a version of the file with some questions addressed to you. Please answer here.

@danbalogh
Copy link
Collaborator

  1. supplying anusvāras for standardisation: no conclusion reached; we should return to this at some point, hopefully after more input from the Markup list. Provisionally I suggest either
    A. no markup (just keep the original reading); or
    B. choice with orig and reg on the whole word.
    For a long-term solution, I am now inclined toward <supplied reason="subaudible">, analogous to supplied avagrahas. But my inclination keeps changing. Gabby on Markup seems set against introducing a new value for supplied.

@danbalogh
Copy link
Collaborator

  1. inscribed susuku for susuk· ku. I'm not sure we need a special rule for the non-writing of gemination. While a large number of special rules makes for more consistency, it also makes the EG that much harder to digest and keep in mind. I'd suggest this be dealt with as per A or B under 1 above: just ignore it, or orig and reg the whole pair of words. If you definitely prefer to supply just the implied , and to supply it as (rather than k), then I agree it should come after susu. But then again, perhaps the inscribed ku is in fact "shorthand" for kku ...

@danbalogh
Copy link
Collaborator

  1. add "entry" to the list of permit values for @Unit in <citedRange>
    Wouldn't "item" be suitable? Now suggested for "a number in an anthology", but if we discard the idea of displaying that as №, it could be applied to such entries. But on second thoughts, perhaps indeed better to keep "entry" distinct. Display as "s. v."? Give me a final word and I'll add it to the EG.

@danbalogh
Copy link
Collaborator

danbalogh commented Feb 18, 2020

  1. Markup for quotes that are not citations from a publication.
    I have no preference and no previous experience. It may be best to get @ajaniak's opinion. I think <quote> may be fine, but by the TEI Guidelines, that is normally for "Quotations from other works". Perhaps <q> is better suited for this purpose, but that would introduce yet another markup element that we do not use otherwise (unless we adopt it for italicisation, which I hope we don't.) At any rate, we should indeed agree on some form of markup, which has the advantage of automatically producing the correct quotation marks.

@arlogriffiths
Copy link
Contributor Author

  1. let's go back to option B then, which is what Aditia had in his initial encoding of Linggawangi.xml. (But, @danbalogh : I remain in favor of a long-term solution allowing mark-up to be limited to the character or chararcters supplied — could you add a stub to EG to mark where any future rule regarding such cases will be presented, and what options are on the table?)
  2. @aditiagunawan : please mark up like this <choice><orig>susuku</orig><reg>susuk ku</reg></choice>. (@danbalogh : agreed? this means enclosing two words in a single <choice>.)
  3. Yes, let's add @unit="entry" to EG and state explicitly that it is intended to be displayed with s.v.
  4. @danbalogh : Let us provisionally add to the EG that <quote> is to be used also to achieve quotation marks around translations of words or phrases, but that if the encoder insists he can override our transformation by typing precisely the unicode signs for the desired quotation marks („...” “...” ‘...’ «...»). Then please add a comment to the new bit of EG contents asking @ajaniak to express her opinion.

@danbalogh
Copy link
Collaborator

  1. The EG stubs are done. See added text in red under §6.1/Good practice in normalisation, §6.2/Editorial deletion, and §6.2/Editorial addition.
    Note that there is also the option of just flagging on the entire word.

  2. I think your suggestion is preferable even if we adopt a method for adding individual characters as normalisation. There will need to be some indication of this in the EG. In your suggestion to Aditia, is <reg>susuk ku</reg> deliberate or is it a typo for <reg>susuk· ku</reg>? I cannot give you a qualified opinion on which would be better, but it seems to me that the latter would be more apt.

Given what Gabby said on Markup about how most people view normalisation, we should still consider how extensively we want to normalise. Flagging may be preferable in most cases, and it is in fact what the EG says at the moment under Good practice in normalisation, which does not explicitly recommend normalisation for any phenomenon, and instead groups non-standard features into three classes, with the following recommended strategies:

  1. ignore unless deemed important on a case-by-case basis
  2. ignore or flag depending on corpus, but do not normalise without good reason
  3. flag and optionally normalise

Arlo, please add some thoughts in comments to the EG, and we should probably have a Skype discussion one of these days.

@danbalogh
Copy link
Collaborator

3 and 4 are done.

@danbalogh
Copy link
Collaborator

Arlo, is there anything in this issue that still needs action from me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants