Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bible parsing error #209

Open
seraphx2 opened this issue Jul 7, 2023 · 8 comments
Open

Bible parsing error #209

seraphx2 opened this issue Jul 7, 2023 · 8 comments

Comments

@seraphx2
Copy link

seraphx2 commented Jul 7, 2023

The Bible file I am using is from here https://ebible.org/Scriptures/details.php?id=eng-kjv2006
The file I am using is the Crosswire Sword module entry
Here is my app. It's really small:

import {
  BeDatabaseCreator,
  V11nImporter,
  SwordImporter,
  OsisImporter,
} from "@bible-engine/importers";

const args = process.argv;
const importer = args[2];
const dataFile = args[3];

const creator = new BeDatabaseCreator({
  type: "mysql",
  host: "127.0.0.1",
  port: 3306,
  username: "bibleengine",
  password: "<password>",
  database: "bibleengine",
  dropSchema: true,
});

creator.addImporter(V11nImporter);

if (importer === "osis")
  creator.addImporter(OsisImporter, {
    sourcePath: `D:/bible-importer/osis/${dataFile}`,
  });

if (importer === "sword")
  creator.addImporter(SwordImporter, {
    sourcePath: `D:/bible-importer/sword/${dataFile}`,
    skip: {
      crossRefs: false,
      notes: true,
      strongs: false,
    },
    logLevel: "verbose",
  });

creator.createDatabase();

I am getting this error when trying to run an import on a sword file:

running importer: Versification Rules
ignored 1769 unsupported or invalid rules from source types: English+Greek,Greek2,Latin,Greek3,English+Latin2,Greek,GreekIntegrated,GreekUndivided,Hebrew+Latin,English,Latin2,English+Latin,Latin=,Latin+Bulgarian,Latin+Greek,English +Latin,Bulgarian (thereof 388 rules for non ap books from source types: Greek2,Latin,Greek) - set DEBUG=true to see details
running importer: SwordImporter
running importer: OSIS
version:  # Sword module configuration fil
SwordImporter failed OsisParseError: text outside of paragraph: "In the " in Gen 1:1 # Sword module configuration fil

container stack:
  root

    at OsisImporter.parseTextNode (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:1097:23)  
    at xmlStream.ontext (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:59:22)
    at emit (D:\git\bible-importer\node_modules\sax\lib\sax.js:624:35)
    at closeText (D:\git\bible-importer\node_modules\sax\lib\sax.js:634:26)
    at emitNode (D:\git\bible-importer\node_modules\sax\lib\sax.js:628:26)
    at newTag (D:\git\bible-importer\node_modules\sax\lib\sax.js:691:5)
    at SAXParser.write (D:\git\bible-importer\node_modules\sax\lib\sax.js:1276:13)
    at D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:83:23
    at new Promise (<anonymous>)
    at OsisImporter.getContextFromXml (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:49:26)
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

OsisParseError: text outside of paragraph: "In the " in Gen 1:1 # Sword module configuration fil

container stack:
  root

    at OsisImporter.parseTextNode (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:1097:23)
    at xmlStream.ontext (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:59:22)
    at emit (D:\git\bible-importer\node_modules\sax\lib\sax.js:624:35)
    at closeText (D:\git\bible-importer\node_modules\sax\lib\sax.js:634:26)
    at emitNode (D:\git\bible-importer\node_modules\sax\lib\sax.js:628:26)
    at newTag (D:\git\bible-importer\node_modules\sax\lib\sax.js:691:5)
    at SAXParser.write (D:\git\bible-importer\node_modules\sax\lib\sax.js:1276:13)
    at D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:83:23
    at new Promise (<anonymous>)
    at OsisImporter.getContextFromXml (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:49:26)

Node.js v18.16.1
@danbenn
Copy link
Member

danbenn commented Jul 9, 2023

Hi @seraphx2 , sorry for the late reply. You'll need to use the plaintext flag, we use this for translations like KJV which often don't have paragraphs or other page-level formatting. Let us know how it goes!

@seraphx2
Copy link
Author

seraphx2 commented Jul 9, 2023

Hi @seraphx2 , sorry for the late reply. You'll need to use the plaintext flag, we use this for translations like KJV which often don't have paragraphs or other page-level formatting. Let us know how it goes!

Sorry if this should be obvious, but I'm not seeing where to specify that flag.

@seraphx2
Copy link
Author

seraphx2 commented Jul 12, 2023

I added versionMeta to the SwordImporter config and still getting the same error:
(though I'm not sure if that is even setup correctly or how to know, as far as the values)

creator.addImporter(SwordImporter, {
  versionMeta: {
    uid: "ENGKJV",
    title: "King James Version 2006",
    isPlaintext: true,
    hasStrongs: true,
  },
  sourcePath: `D:/bible-importer/sword/${dataFile}`,
  skip: {
    crossRefs: false,
    notes: true,
    strongs: false,
  },
  logLevel: "verbose",
});
ignored 1769 unsupported or invalid rules from source types: English+Greek,Greek2,Latin,Greek3,English+Latin2,Greek,GreekIntegrated,GreekUndivided,Hebrew+Latin,English,Latin2,English+Latin,Latin=,Latin+Bulgarian,Latin+Greek,English +Latin,Bulgarian (thereof 388 rules for non ap books from source types: Greek2,Latin,Greek) - set DEBUG=true to see details
running importer: SwordImporter
version:  ENGKJV
running importer: OSIS
version:  ENGKJV
SwordImporter failed OsisParseError: text outside of paragraph: "In the " in Gen 1:1 ENGKJV

container stack:
  root

    at OsisImporter.parseTextNode (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:1097:23)
    at xmlStream.ontext (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:59:22)
    at emit (D:\git\bible-importer\node_modules\sax\lib\sax.js:624:35)
    at closeText (D:\git\bible-importer\node_modules\sax\lib\sax.js:634:26)
    at emitNode (D:\git\bible-importer\node_modules\sax\lib\sax.js:628:26)
    at newTag (D:\git\bible-importer\node_modules\sax\lib\sax.js:691:5)
    at SAXParser.write (D:\git\bible-importer\node_modules\sax\lib\sax.js:1276:13)
    at D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:83:23
    at new Promise (<anonymous>)
    at OsisImporter.getContextFromXml (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:49:26)
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

OsisParseError: text outside of paragraph: "In the " in Gen 1:1 ENGKJV

container stack:
  root

    at OsisImporter.parseTextNode (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:1097:23)
    at xmlStream.ontext (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:59:22)
    at emit (D:\git\bible-importer\node_modules\sax\lib\sax.js:624:35)
    at closeText (D:\git\bible-importer\node_modules\sax\lib\sax.js:634:26)
    at emitNode (D:\git\bible-importer\node_modules\sax\lib\sax.js:628:26)
    at newTag (D:\git\bible-importer\node_modules\sax\lib\sax.js:691:5)
    at SAXParser.write (D:\git\bible-importer\node_modules\sax\lib\sax.js:1276:13)
    at D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:83:23
    at new Promise (<anonymous>)
    at OsisImporter.getContextFromXml (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\index.js:49:26)

Node.js v18.16.1

@seraphx2
Copy link
Author

@danbenn any ideas on what I'm doing wrong?

@seraphx2
Copy link
Author

Hello? Anyone?

@danbenn
Copy link
Member

danbenn commented Jul 21, 2023

Hi @seraphx2 , apologies for the late reply, can you try adding this flag to your code?

autoGenMissingParagraphs: true,

So with your current setup, that would be:

creator.addImporter(SwordImporter, {
  versionMeta: {
    uid: "ENGKJV",
    title: "King James Version 2006",
    isPlaintext: true,
    hasStrongs: true,
  },
  sourcePath: `D:/bible-importer/sword/${dataFile}`,
  skip: {
    crossRefs: false,
    notes: true,
    strongs: false,
  },
  logLevel: "verbose",
  autoGenMissingParagraphs: true,
});

This is what I'm seeing in the source code that causes this error, bible/osis/index.ts:

        if (!stackHasParagraph(context, currentContainer)) {
            if (this.context.hasParagraphsInSourceText && !context.autoGenMissingParagraphs) {
                throw new OsisParseError(`text outside of paragraph: "${text}"`, context);
            }
            if (!this.context.hasParagraphsInSourceText || context.autoGenMissingParagraphs) {
                currentContainer = startNewParagraph(context);
            }
        }

If the autoGenMissingParagraphs flag isn't specified, it will assume that the source file is corrupted. In this case, to the best of my knowledge, KJV genuinely doesn't have paragraphs, so we want to insert them.

Let us know how it goes!

@seraphx2
Copy link
Author

lol. Now this error:

SwordImporter failed OsisParseError: unclean container stack while closing "translationChange" group. Found "paragraph" in Josh 15:1 KJV

container stack:
  root
    translationChange

    at validateGroup (D:\git\bible-importer\node_modules\@bible-engine\importers\lib\bible\osis\functions\validators.functions.js:8:15)

@chriswep
Copy link
Collaborator

chriswep commented Dec 4, 2023

for whomever might come across this in the future: a lot of bible source files out there (especially OSIS, which sword is based on) are of very poor quality, containing lots of structural errors. Bible renderers often work around those issues however our importer needs to be more strict to translate the source file into a well defined format.
I had to manually correct most of the OSIS files i came across in the wild. That's why the error message contains is very specific about the type of error and the location in the source file. However editing the source doesn't work with the Sword format. So you could either use a sword to osis converter and and import osis directly or in the case of bible.org you can also download the USFM format which usually is well defined. I successful used this usfm to osis convert multiple times to import sources from bible.org without issues: https://github.com/adyeths/u2o (download usfm, convert to osis, use osis importer to import into bible engine)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants