-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement automatic interlinear for strong-versions #12
Comments
@DavidIB i'm putting this slack-conversation up here so that we can catch up where we left on this. in between you posted:
what is the state on this? is there some new ruleset coming up that we can use to detect the source-text-type to match translation-word+strong => source-word+morph/lemma/etc. |
I agree that this idea is not foolproof, though it is worth looking at to see how many potential instances of problems actually exist - I suspect there aren't too many.
Despite these two problems, I think your idea is a VERY good one.
This should work for all words except articles and pronouns - which in any case we don't want to tag in an automatic process. |
thanks @DavidIB - would you be willing to do some more research on the "To see if this is a common problem, we need to look ..." issues so that we have a more detailed understanding of the problem space before implementing it? Apart from this, as far as i understand there is still the problem of different source texts or? |
first idea that comes to mind:
Would that work? |
Yep, but I'd suggest something more generic, cos I'm thinking that we could add a vocabulary+morphology feature to untagged texts - like in the web STEPBible. That is, if we knew which text the translation is based on, we could give the original+context sensitive gloss+morphology for every word in the verse, so readers could figure for themselves which word translates which. |
Currently i save the sourcetype anyway for every verse that was matched by a v11n rule. So if you would complete the ruleset to capture any verse in any bible (with nothing in the action column), every bible imported into bibleengine would be completely sourcetype-tagged :-) We could make the ruleset smaller by defining default values (on testament, Book or verse level). Would a ruleset like this be possible? |
I‘m wondering: verse numbering and source text used is not necessarily the same or? Something to look out for? |
Also: would assigning a source type to an equivalence translation even make sense? |
These are great ideas. If I understand you correctly you are suggesting:
|
since we have a rule-system and an implementation in place, yes this would be the idea. however as i questioned above: is v11n-source and text-source (in the sense of the actual source words) compatible? i can imagine translators using a specific text-source in a verse while they number the verse according to a different versification scheme to the text-source.
As you mentioned there might be verse-level exceptions but most verses within a translation are probably the same source type. so our rule-schema should support defining a rule for a testament or a book (or chapter?) as well as verses. the most specific matching rule for a verse will be chosen then.
do you mean that verse boundaries can shift between translations by a few words or a sentence? so by a comparison of verse lengths you would be able to tell which original words are in a verse?
questions that come to my find concerning the rules:
well i just say that IF v11n and text source are compatible categories than it might make sense to consider that - you would "just" need to complete the dataset. However i think i am not competent to answer that - and i have the feeling (as mentioned above) that there are differences and it might be better to separate the whole issue. |
@DavidIB i updated the comment above so please look at the current one on github |
|
so, is my understanding correct that you confirmed that we have two distinct types of source: v11n-source-type (Standard, Hebrew, Latin, Greek) and text-source (i guess this would be variant families / greek texts like NA, Tyndale, Majority, etc..?) concerning the v11n source: if there is "Greek+Latin" in the ruleset, did i understand you right that i should choose "Latin" in that case, since it has higher priority (and not the first mentioned)?
so the conclusion for now is that you will - whenever you find the time - create a ruleset that will enable us to determine the source text for any given verse in any translation. correct? something like this?
We could / should also define a default text (NA?) that is assumed when no rules match, which will also reduce the number of rules. |
|
but they may follow certain variants in a verse that can't be identified by one of those two (or three) texts or?
but we might want to display the corresponding greek text in the app. would that be a problem? |
There's no problem displaying portions of NA. The words themselves aren't copyright (after all, they are ancient) but the exact choice of which words over the whole text, and the apparatus, are copyright. |
@david Instone-Brewer you mentioned in private chat that we can be sure that in any translation the order of instances of the same strong would be the same as in the original. i'm still not convinced by this - but since you are the pro i give it the benefit of the doubt: if we work with that assumption i wonder if we can go a different route altogether concerning morphology and strongs. what about having one original version as the only source of truth (which would be compiled of TANTT + TOTHT)? Doing that we wouldn't need to save morphology with the version and we wouldn't need to save a strongs-index with the version. If we would then update the original it automatically update all translations that have strongs.
David Instone-Brewer [3 months ago]
@chris Metz Perhaps I should add a few caveats about assertion that we can assume the same order in text and translation.
I'm referring to words with the same Strongs number in a text and in its translation.
So, if a verse uses the same word twice, we'd expect the translation to use them in the same order as the text. For a silly example: "He said to Jesus, 'Lord Jesus, help me'"
Now, a translation COULD have "'Lord Jesus, help me', he said to Jesus." but it would be strange. And this is an extreme example, where the two words are very close to each other, in phrases that could be swapped round.
BUT I haven't tested this idea, so I don't know how well it works in practice.
So I too am allowing it the benefit of the doubt - though I think it is a fair bet.
The big exception is when there is a variant - when part of a verse is missing in some MSS. This DOES result in some identical words getting mixed up - though in practice this only affects words such as "the" and "his" - ie words are likely to occur frequently in a single verse.
This means it is fairly important to identify the Greek text behind a translation. This kind of problem doesn't occur in the Hebrew OT.
David Instone-Brewer [3 months ago]
I do like your idea of having a single OT+NT text, but there are some BIG problems wrt morphology, because of variants. THe NT texts not only have different wording, but very often different morphology for the same words.
Dan Bennett [3 months ago]
Oooh, tricky
Chris Metz [3 months ago]
I see. I had a longer look at the TANTT dataset. If we take the list of strongs in a translation for a given verse we should be able to infer the type of text that was used using the information in the dataset right? Can we assume that a translation follows only one text type within a verse? I think i came across and example when they didn’t.. though I’m not sure.
David Instone-Brewer [3 months ago]
Bibles tend to fall into two camps: those that use the so-called Textus Receptus (ie the best text available to the KJV translators) and those that follow modern texts (ie NA, SBLGNT or THGNT which all more-or-less agree about the original text). Translations also take some extra decisions about whether to include things like the end of Mark and the forgiven adulteress in John 8. A few perversely use the modern texts but also fill in the so-called 'missing verses' which were duplications in older MSS.
So it would be fairly easy to construct some rules to figure out which of these options a translation was using.
The text was updated successfully, but these errors were encountered: