Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SG pages for new core fns; many other noticeList fixes #19

Merged
merged 1 commit into from
Sep 8, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 10 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ The basic functions return an object containing two lists:
1. successList: a list of strings giving an overview of what checks have been made,
1. noticeList: a list of objects with fields that can be filtered, sorted, combined, and then displayed as error or warning messages.

Note that the object may also contain other relevant fields such as `checkedFileCount`, `checkedFilenames`, `checkedFilenameExtensions`, `checkedFilesizes`, `checkedRepoNames`, and `elapsedSeconds`.

There are three sample notice processing functions that show how to:

1. Divide the noticeList into a list of errors and a list of warnings,
Expand All @@ -34,7 +36,7 @@ In addition, there are Styleguidist pages viewable at https://unfoldingword.gith

This code is designed to thoroughly check various types of Bible-related content data files. This includes:

1. Unified Standard Format Marker (USFM) Bible content files, including original language Bibles and Bible translations aligned by word/phrase to the original words/phrases
1. [Unified Standard Format Marker](ubsicap.github.io/usfm/) (USFM) Bible content files, including original language Bibles and Bible translations aligned by word/phrase to the original words/phrases
1. Translation Notes (TN) tables in Tab-Separated Values (TSV) files
1. Markdown files (and markdown fields in TSV files)
1. Plain-text files
Expand All @@ -57,21 +59,20 @@ There are two compulsory fields in all of these notice objects:
All of the following fields may be missing or undefined, i.e., they're all optional:

1. `bookID`: The 3-character UPPERCASE [book identifier](http://ubsicap.github.io/usfm/identification/books.html) or [OBS](https://www.openbiblestories.org/) (if relevant)
1. `C`: The chapter number or story number (if relevant)
1. `V`: The verse number or frame number (if relevant)
1. `filename`: filename string (if available)
1. `C`: The chapter number or OBS story number (if relevant)
1. `V`: The verse number or OBS frame number (if relevant)
1. `repoName`: repository name (if available)
1. `filename`: filename string (if available)
1. `lineNumber`: A one-based line number in the file (if available)
1. `characterIndex`: A zero-based integer character index which indicates the position of the error in the given text (line or field) (if available)
1. `extract`: An extract (if available) of the checked text which indicates the area containing the problem. Where helpful, some character substitutions have already been made, for example, if the notice is about spaces, it is generally helpful to display spaces as a visible character in an attempt to best highlight the issue to the user. (The length of the extract defaults to ten characters, but is settable as an option.)
1. `location`: A string indicating the context of the notice, e.g., "in line 17 of 'someBook.usfm'". (Still not completely sure what should be in this string now that we have added optional `filename`, `repoName`, `lineNumber` fields.)

1. `location`: A string indicating the context of the notice, e.g., "in line 17 of 'someBook.usfm'". (Still not completely sure what should be left in this string now that we have added optional `filename`, `repoName`, `lineNumber` fields.)

Keeping our notices in this format, rather than the simplicity of just saving an array of single strings, allows the above *notice components* to be processed at a higher level, e.g., to allow user-controlled filtering, sorting, etc. The default is to funnel them all through the supplied `processNoticesToErrorsWarnings` function (in core/notice-processing-functions.fs) which does the following:

1. Removes excess repeated errors. For example, if there's a systematic error in a file, say with unneeded leading spaces in every field, rather than returning with hundreds of errors, only the first several errors will be returned, followed by an "errors suppressed" message. (The number of each error displayed is settable as an option -- zero means display all errors with no suppression.)
1. Separates notices into error and warning lists based on the priority number. (The switch-over point is settable as an option.)
1. Optionally drops the lowest priority notices.
1. Optionally drops the lowest priority notices and/or certain given notice types (by priority number).

There is a second version of the function which splits into `Severe`, `Medium`, and `Low` priority lists instead. And a third version that leaves them as notices, but allows for a Bright red...Dull red colour gradient instead.

Expand All @@ -80,7 +81,6 @@ However, the user is, of course, free to create their own alternative version of
Still unfinished (in rough priority order):

1. Finish adding lineNumber, fileName, repoName as separate optional notice fields
1. Consider fetching TA and TW as zip files when checking links to those resources
1. Standardise parameters according to best practice (i.e., dereferencing, etc.)
1. Document the API with (JsDoc)
1. Checking of general markdown and naked links (esp. in plain text and markdown files)
Expand All @@ -97,8 +97,9 @@ Still unfinished (in rough priority order):

Known bugs:

1. At the moment, the relevant `repoName`, `filename`, and `lineNumber` information is not yet all properly added to the notice objects -- also the `location` field may still contain overlapping information
1. The line number in the USFM Grammar check doesn't account for blank lines, so the real line number may be larger. (This is a bug in the BCS library.)
1. Work on removing false alarms is not yet completed
1. Work on removing false alarms for end-users is not yet completed
1. Work on checking links (esp. naked links) is not yet completed.

## Functionality and Limitations
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "uw-content-validation",
"description": "Functions for Checking Door43.org Scriptural Content/Resources.",
"version": "0.8.7",
"version": "0.8.8",
"private": false,
"homepage": "https://unfoldingword.github.io/uw-content-validation/",
"repository": {
Expand Down
7 changes: 3 additions & 4 deletions src/core/BCS-usfm-grammar-check.js
Original file line number Diff line number Diff line change
Expand Up @@ -99,12 +99,11 @@ export function checkUSFMGrammar(bookID, strictnessString, filename, givenText,

Returns a result object containing a successList and a noticeList
*/
console.log(`checkUSFMGrammar(${givenText.length.toLocaleString()} chars, '${location}')…`);
console.log(`checkUSFMGrammar(${givenText.length.toLocaleString()} chars, '${givenLocation}')…`);
console.assert(strictnessString === 'strict' || strictnessString === 'relaxed', `Unexpected strictnessString='${strictnessString}'`);

let ourLocation = givenLocation;
if (ourLocation && ourLocation[0] !== ' ') ourLocation = ` ${ourLocation}`;
if (filename) ourLocation = ` in ${filename}${ourLocation}`;


const cugResult = { successList: [], noticeList: [] };
Expand Down Expand Up @@ -146,7 +145,7 @@ export function checkUSFMGrammar(bookID, strictnessString, filename, givenText,

if (!grammarCheckResult.isValidUSFM)
addNotice6to7({priority:944, message:`USFM3 Grammar Check (${strictnessString} mode) doesn't pass`,
location:ourLocation});
filename, location:ourLocation});

// We only get one error if it fails
if (grammarCheckResult.error && grammarCheckResult.priority)
Expand All @@ -155,7 +154,7 @@ export function checkUSFMGrammar(bookID, strictnessString, filename, givenText,
// Display these warnings but with a lowish priority
for (const warningString of grammarCheckResult.warnings)
addNotice6to7({priority:101, message:`USFMGrammar: ${warningString}`,
location:ourLocation});
filename, location:ourLocation});

addSuccessMessage(`Checked USFM Grammar (${strictnessString} mode) ${grammarCheckResult.isValidUSFM ? "without errors" : " (but the USFM DIDN'T validate)"}`);
// console.log(` checkUSFMGrammar returning with ${result.successList.length.toLocaleString()} success(es) and ${result.noticeList.length.toLocaleString()} notice(s).`);
Expand Down
2 changes: 1 addition & 1 deletion src/core/annotation-row-check.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This function checks one tab-separated line for typical formatting errors.

It returns a list of success messages and a list of notice components. (There is always a priority number in the range 0..999 and the main message string, as well as other helpful details as relevant.)
It returns a list of success messages and a list of notice components. (There is always a priority number in the range 0..999 and the main message string, as well as other details to help locate the error as available.)

These raw notice components can then be filtered and/or sorted as required by the calling program, and then divided into a list of errors and a list of warnings or whatever as desired.

Expand Down
11 changes: 5 additions & 6 deletions src/core/annotation-table-check.js
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,6 @@ async function CheckAnnotationRows(annotationType, bookID, tableText, givenLocat
let numVersesThisChapter = 0;
for (let n= 0; n < lines.length; n++) {
// console.log(`CheckAnnotationRows checking line ${n}: ${JSON.stringify(lines[n])}`);
let inString = ` in line ${(n + 1).toLocaleString()}${ourLocation}`;
if (n === 0) {
if (lines[0] === EXPECTED_TN_HEADING_LINE)
addSuccessMessage(`Checked TSV header ${ourLocation}`);
Expand All @@ -97,19 +96,19 @@ async function CheckAnnotationRows(annotationType, bookID, tableText, givenLocat
if (fields.length === NUM_EXPECTED_TN_FIELDS) {
const [reference, fieldID, tags, _support_reference, _quote, _occurrence, _annotation] = fields;
const [C, V] = reference.split(':')
const withString = ` with ID '${fieldID}'${inString}`;
const withString = ` with ID '${fieldID}'${ourLocation}`;
// let CV_withString = ` ${C}:${V}${withString}`;
// let atString = ` at ${Annotation} ${C}:${V} (${fieldID})${inString}`;

// Use the row check to do most basic checks
const firstResult = await checkAnnotationTSVDataRow(annotationType, lines[n], bookID,C,V, withString, optionalCheckingOptions);
// Choose only ONE of the following
// This is the fast way of append the results from this field
result.noticeList = result.noticeList.concat(firstResult.noticeList);
// result.noticeList = result.noticeList.concat(firstResult.noticeList);
// If we need to put everything through addNoticeCV8, e.g., for debugging or filtering
// process results line by line
// for (const noticeEntry of firstResult.noticeList)
// addNoticeCV8({priority:noticeEntry.priority, noticeEntry[1], noticeEntry[2], noticeEntry[3], noticeEntry[4], noticeEntry[5], noticeEntry[6], noticeEntry[7]);
for (const noticeEntry of firstResult.noticeList)
addNoticeCV8({ ...noticeEntry, lineNumber:n+1});

// So here we only have to check against the previous and next fields for out-of-order problems
if (C) {
Expand Down Expand Up @@ -176,7 +175,7 @@ async function CheckAnnotationRows(annotationType, bookID, tableText, givenLocat
// console.log(` Line ${n}: Has ${fields.length} field(s) instead of ${NUM_EXPECTED_TN_FIELDS}: ${EXPECTED_TN_HEADING_LINE.replace(/\t/g, ', ')}`);
// else
if (n !== lines.length - 1) // it's not the last line
addNoticeCV8({priority:988, message:`Wrong number of tabbed fields (expected ${NUM_EXPECTED_TN_FIELDS})`, extract:`Found ${fields.length} field${fields.length===1?'':'s'}`, lineNumber:n+1, location:inString});
addNoticeCV8({priority:988, message:`Wrong number of tabbed fields (expected ${NUM_EXPECTED_TN_FIELDS})`, extract:`Found ${fields.length} field${fields.length===1?'':'s'}`, lineNumber:n+1, location:ourLocation});
}
}
addSuccessMessage(`Checked all ${(lines.length - 1).toLocaleString()} data line${lines.length - 1 === 1 ? '' : 's'}${ourLocation}.`);
Expand Down
2 changes: 1 addition & 1 deletion src/core/annotation-table-check.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This function checks the given block of annotation (TSV) table lines for typical formatting errors.

It returns a list of success messages and a list of notice components. (There is always a priority number in the range 0..999 and the main message string, as well as other helpful details as relevant.)
It returns a list of success messages and a list of notice components. (There is always a priority number in the range 0..999 and the main message string, as well as other details to help locate the error as available.)

These raw notice components can then be filtered and/or sorted as required by the calling program, and then divided into a list of errors and a list of warnings or whatever as desired.

Expand Down
1 change: 1 addition & 0 deletions src/core/book-package-check.js
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,7 @@ export async function checkFile(filename, fileContent, givenLocation, checkingOp
};
// end of checkFile()


/*
checkTQbook
*/
Expand Down
61 changes: 61 additions & 0 deletions src/core/book-package-check.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
## Door43 Book Package Check Sandbox

This function checks the Door43 Book Package (i.e., a Bible book or Open Bible Stories) for the specified language by loading and checking files from several interconnected Door43 Content Service (DCS) repositories.

It returns a list of success messages and a list of notice components. (There is always a priority number in the range 0..999 and the main message string, as well as other details to help locate the error as available.)

These raw notice components can then be filtered and/or sorted as required by the calling program, and then divided into a list of errors and a list of warnings or whatever as desired.

The code below requests some info and then checks the single specified Bible book in several repos. This is convenient to see all these check results collected into one place.

See a list of valid book identifiers [here](http://ubsicap.github.io/usfm/identification/books.html), although only `GEN` to `REV` from that list are useful here.

Note that `OBS` can also be entered here as a *pseudo book identifier* in order to check an **Open Bible Stories** repo.

`Book Package Check` calls `checkBookPackage()` which then calls `checkFile()` for the book file in each repo (or calls `checkRepo()` for **OBS**).

**Warning**: Some book packages contain many files and/or very large files, and downloading them all and then checking them might slow down your browser -- maybe even causing pop-up messages asking to confirm that you want to keep waiting.

**Note**: This demonstration uses cached values of files stored inside the local browser. This makes reruns of the checks much faster, but it won't notice if you have updated the files on Door43. If you want to clear the local caches, use the `Clear Cache` function.

```js
import React, { useState, useEffect } from 'react';
import { checkBookPackage } from './book-package-check';
import { RenderRawResults } from '../demos/RenderProcessedResults';

// You can put your own data into the following fields:
const data = {
username: 'unfoldingWord',
language_code : 'en',
bookID : 'RUT',
givenLocation : 'that was supplied',
checkingOptions: {},
}

function CheckBookPackage(props) {
const { username, language_code, bookID, givenLocation, checkingOptions } = props.data;

const [results, setResults] = useState(null);

// We need the following construction because checkBookPackage is an ASYNC function
useEffect(() => {
// Use an IIFE (Immediately Invoked Function Expression)
// e.g., see https://medium.com/javascript-in-plain-english/https-medium-com-javascript-in-plain-english-stop-feeling-iffy-about-using-an-iife-7b0292aba174
(async () => {
// Display our "waiting" message
setResults(<p style={{ color: 'magenta' }}>Waiting for <b>{username}</b> {language_code} <b>{bookID}</b> check results…</p>);
const rawResults = await checkBookPackage(username, language_code, bookID, setResults, checkingOptions);
setResults(
<div>
<b>Checked</b> Door43 {username} {language_code} {bookID}<br/><br/>
<RenderRawResults results={rawResults} />
</div>
);
})(); // end of async part in unnamedFunction
}, []); // end of useEffect part

return results;
} // end of CheckBookPackage function

<CheckBookPackage data={data}/>
```
Loading