Skip to content

Commit

Permalink
Merge branch 'main' of github.com:pkiraly/qa-catalogue
Browse files Browse the repository at this point in the history
  • Loading branch information
pkiraly committed Jun 13, 2024
2 parents 08a8413 + 4bb0361 commit c3e17f0
Show file tree
Hide file tree
Showing 43 changed files with 271 additions and 143 deletions.
16 changes: 3 additions & 13 deletions .github/workflows/avram.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@ name: Validate Avram Schemas

on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main, develop ]

jobs:
build:
Expand All @@ -14,15 +12,7 @@ jobs:
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm install -g avram@0.6.6 ajv ajv-formats # checking Avram 0.9.6
- run: npm install
- name: Validate Avram Schema files
run: |
avram -s marc-schema/marc-schema.json
avram -s src/main/resources/marc-schema.json
avram -s src/main/resources/marc/authority-schema.avram.json
avram -s src/main/resources/pica-schema.json
avram -s src/main/resources/pica/avram-k10plus-title.json
avram -s src/main/resources/unimarc/avram-unimarc.json
avram -s src/test/resources/pica/schema/pica-schema-extra.json
avram -s src/test/resources/pica/schema/pica-schema.json
avram -s src/test/resources/unimarc/avram-unimarc.json
run: ./avram-schemas/validate-schemas

3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,6 @@ setdir.sh

# used as Docker volume
web-config

# Node packages
node_modules
17 changes: 6 additions & 11 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,10 @@ RUN cd /opt \
&& unzip qa-catalogue-${QA_CATALOGUE_VERSION}-release.zip \
&& rm qa-catalogue-${QA_CATALOGUE_VERSION}-release.zip \
&& mv qa-catalogue-${QA_CATALOGUE_VERSION} qa-catalogue \
&& mv /opt/qa-catalogue/setdir.sh.template /opt/qa-catalogue/setdir.sh \
&& mkdir -p /opt/qa-catalogue/marc \
&& sed -i.bak 's,BASE_INPUT_DIR=./input,BASE_INPUT_DIR=/opt/qa-catalogue/marc,' /opt/qa-catalogue/setdir.sh \
&& sed -i.bak 's,BASE_OUTPUT_DIR=./output,BASE_OUTPUT_DIR=/opt/qa-catalogue/marc/_output,' /opt/qa-catalogue/setdir.sh \
# install web application
&& apt-get update \
&& mkdir -p /opt/qa-catalogue/input /opt/qa-catalogue/output

# install web application
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
locales \
apache2 \
Expand All @@ -77,9 +75,7 @@ RUN cd /opt \
unzip \
composer \
gettext \
&& locale-gen en_GB.UTF-8 \
&& locale-gen de_DE.UTF-8 \
&& locale-gen pt_BR.UTF-8 \
&& locale-gen en_GB.UTF-8 && locale-gen de_DE.UTF-8 && locale-gen pt_BR.UTF-8 \
&& apt-get --assume-yes autoremove \
&& rm -rf /var/lib/apt/lists/* \
&& cd /var/www/html/ \
Expand All @@ -94,12 +90,11 @@ RUN cd /opt \
&& ls -la \
&& unzip -q master.zip \
&& rm master.zip \
# && mv qa-catalogue-web-0.4 qa-catalogue \
&& mv qa-catalogue-web-${QA_CATALOGUE_WEB_VERSION} qa-catalogue \
&& cd qa-catalogue \
&& composer install \
&& mkdir config \
&& echo dir=/opt/qa-catalogue/marc/_output > /var/www/html/qa-catalogue/configuration.cnf \
&& echo dir=/opt/qa-catalogue/output > /var/www/html/qa-catalogue/configuration.cnf \
&& echo include=config/configuration.cnf >> /var/www/html/qa-catalogue/configuration.cnf \
# && cp /var/www/html/qa-catalogue/configuration.js.template /var/www/html/qa-catalogue/configuration.js \
&& touch /var/www/html/qa-catalogue/selected-facets.js \
Expand Down
76 changes: 42 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,26 +103,32 @@ default. Files of each catalogue are in a subdirectory of theses base directorie

### With Docker

An Docker image bundling qa-catalogue with all of its dependencies and the web
interface [qa-catalogue-web] is made available in Docker Hub. To use
qa-catalogue via Docker first run the image in a new container (download may
take some time):
*A more detailed instruction how to use qa-catalogue with Docker can be found [in the wiki](https://github.com/pkiraly/qa-catalogue/wiki/Docker)*

```bash
docker compose up -d
```
A Docker image bundling qa-catalogue with all of its dependencies and the web
interface [qa-catalogue-web] is made available:

- continuously via GitHub as [`ghcr.io/pkiraly/qa-catalogue`](https://github.com/pkiraly/qa-catalogue/pkgs/container/qa-catalogue)

You can configure the container *before* running this command with the
following environment variables:
- and for releases via Docker Hub as [`pkiraly/metadata-qa-marc`](https://hub.docker.com/r/pkiraly/metadata-qa-marc)

- `INPUT`: Base directory to put your bibliographic record files in subdirectory
`qa-catalogue`. Set to `./input` by default, so record files are expected to
be in `input/qa-catalogue`.
To download, configure and start an image in a new container the file
[docker-compose.yml](docker-compose.yml) is needed in the current directory. It
can be configured with the following environment variables:

- `IMAGE`: which Docker image to download and run. By default the most recent
image from Docker Hub is used. For instance if you have locally
[build the Docker image](#appendix-vi-build-docker-image), then set
`IMAGE=metadata-qa-marc`.
- `IMAGE`: which Docker image to download and run. By default the latest
image from Docker Hub is used (`pkiraly/metadata-qa-marc`). Alternatives include

- `IMAGE=ghcr.io/pkiraly/qa-catalogue:main` for most recent image from GitHub packages
- `IMAGE=metadata-qa-marc` if you have locally [build the Docker image](#appendix-vi-build-docker-image)

- `CONTAINER`: the name of the docker container. Default: `metadata-qa-marc`.

- `INPUT`: Base directory to put your bibliographic record files in subdirectories of.
Set to `./input` by default, so record files are expected to be in `input/$NAME`.

- `OUTPUT`: Base directory to put result of qa-catalogue in subdirectory of.
Set to `./output` by default, so files are put in `output/$NAME`.

- `WEBCONFIG`: directory to expose configuration of [qa-catalogue-web]. Set to
`./web-config` by default. If using non-default configuration for data analysis
Expand All @@ -135,18 +141,17 @@ following environment variables:

- `SOLRPORT`: port to expose Solr to. Default: `8983`.

- `CONTAINER`: the name of the docker container. Default: `metadata-qa-marc`.

Environment variables can be set on command line or be put in local file `.env`, e.g.:

```bash
WEBPORT=9000 docker compose up -d
```

When the application has been started this way, run analyses with script
`./docker/qa-catalogue` the same ways as script `./qa-catalogue` is called when
not using Docker (see [usage](#usage) for details). The following example uses
parameters for Gent university library catalogue:
[`./docker/qa-catalogue`](docker/qa-catalogue) the same ways as script
`./qa-catalogue` is called when not using Docker (see [usage](#usage) for
details). The following example uses parameters for Gent university library
catalogue:

```bash
./docker/qa-catalogue \
Expand All @@ -159,8 +164,8 @@ parameters for Gent university library catalogue:
[qa-catalogue-web]: https://github.com/pkiraly/qa-catalogue-web

Now you can reach the web interface ([qa-catalogue-web]) at
<http://localhost:80/metadata-qa> (or at another port as configured with
environment variables). To further modify appearance of the interface,
<http://localhost:80/> (or at another port as configured with
environment variable `WEBPORT`). To further modify appearance of the interface,
create [templates](https://github.com/pkiraly/qa-catalogue-web/?tab=readme-ov-file#customization)
in your `WEBCONFIG` directory and/or create a file `configuration.cnf` in
this directory to extend [UI configuration](https://github.com/pkiraly/qa-catalogue-web/?tab=readme-ov-file#configuration) without having to restart the Docker container.
Expand Down Expand Up @@ -337,15 +342,15 @@ library specific configuration file:
| | `-c`/`--catalogue` | display name of the catalogue | `$NAME` |
| `NAME` | `-n`/`--name` | name of the catalogue | qa-catalogue |
| `BASE_INPUT_DIR` | `-d`/`--input` | parent directory of input file directories | `./input` |
| | `-d`/`--input-dir` | subdirectory of input directory to read files from | |
| `INPUT_DIR` | `-d`/`--input-dir` | subdirectory of input directory to read files from | |
| `BASE_OUTPUT_DIR` | `-o`/`--output` | parent output directory | `./output` |
| `MASK` | `-m`/`--mask` | a file mask which input files to process, e.g. `*.mrc` | `*` |
| `TYPE_PARAMS` | `-p`/`--params` | parameters to pass to individual tasks (see below) | |
| `SCHEMA` | `-s`/`--schema` | record schema | `MARC21` |
| `UPDATE` | `-u`/`--update` | optional date of input files | |
| `VERSION` | `-v`/`--version` | optional version number/date of the catalogue to compare changes | |
| `WEB_CONFIG` | `-w`/`--web-config` | update the specified configuration file of qa-catalogue-web | |
| | `-f`/`--config-file`| a configuration file with catalogue specific environment variables | |
| | `-f`/`--env-file`| configuration file to load environment variables from (default: `.env`) | |

## Detailed instructions

Expand Down Expand Up @@ -2035,14 +2040,13 @@ The MARC JSON file is a JSON serialization of binary MARC file. See more the
Some background info: [MARC21 structure in JSON](http://pkiraly.github.io/2018/01/28/marc21-in-json/).

Usage:

```bash
java -cp $JAR de.gwdg.metadataqa.marc.cli.utils.MappingToJson [options] > marc-schema
```
with script:
```bash
catalogues/[catalogue].sh export-schema-files
java -cp $JAR de.gwdg.metadataqa.marc.cli.utils.MappingToJson [options] > avram-schema.json
```

or

```bash
./qa-catalogue --params="[options]" export-schema-files
```
Expand Down Expand Up @@ -2123,9 +2127,13 @@ An example output:
```

The script version generates 3 files, with different details:
* `marc-schema/marc-schema.json`
* `marc-schema/marc-schema-with-solr.json`
* `marc-schema/marc-schema-with-solr-and-extensions.json`
* `avram-schemas/marc-schema.json`
* `avram-schemas/marc-schema-with-solr.json`
* `avram-schemas/marc-schema-with-solr-and-extensions.json`

To validate these files install the Avram reference implementation in Node with `npm ci` and run:

./avram-schemas/validate-schemas

### to HTML

Expand Down Expand Up @@ -2471,7 +2479,7 @@ docker compose -f docker/build.yml build app
The `docker compose build` command has multiple `--build-arg` arguments to override defaults:

- `QA_CATALOGUE_VERSION`: the QA catalogue version (default: `0.7.0`, current development version is `0.8.0-SNAPSHOT`)
- `QA_CATALOGUE_WEB_VERSION`: it might be a released version such as `0.7.0` (current default), or `main` to use the
- `QA_CATALOGUE_WEB_VERSION`: it might be a released version such as `0.7.0`, or `main` (default) to use the
main branch, or `develop` to use the develop branch.
- `SOLR_VERSION`: the Apache Solr version you would like to use (default: `8.11.1`)
- `SOLR_INSTALL_SOURCE`: if its value is `remote` docker will download it from http://archive.apache.org/.
Expand Down
11 changes: 11 additions & 0 deletions avram-schemas/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
To generate these files, run

```bash
./qa-catalogue export-schema-files
```

To validate the files, run

```bash
./avram-schemas/validate-schemas
```
Original file line number Diff line number Diff line change
Expand Up @@ -13661,7 +13661,7 @@
"b": {
"label": "Source of stock number/acquisition",
"repeatable": false,
"solr": "037b_AcquisitionSourcelabel"
"solr": "037b_AcquisitionSource_label"
},
"c": {
"label": "Terms of availability",
Expand Down Expand Up @@ -14854,7 +14854,7 @@
"d": {
"label": "Populated place name",
"repeatable": true,
"solr": "052d_GeographicClassificationlabel"
"solr": "052d_GeographicClassification_label"
},
"0": {
"label": "Authority record control number or standard number",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13056,7 +13056,7 @@
"b": {
"label": "Source of stock number/acquisition",
"repeatable": false,
"solr": "037b_AcquisitionSourcelabel"
"solr": "037b_AcquisitionSource_label"
},
"c": {
"label": "Terms of availability",
Expand Down Expand Up @@ -13902,7 +13902,7 @@
"d": {
"label": "Populated place name",
"repeatable": true,
"solr": "052d_GeographicClassificationlabel"
"solr": "052d_GeographicClassification_label"
},
"0": {
"label": "Authority record control number or standard number",
Expand Down
File renamed without changes.
15 changes: 15 additions & 0 deletions avram-schemas/validate-schemas
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bash

validate() {
npm run --silent avram -- -s $@
}

validate avram-schemas/marc-schema.json
validate src/main/resources/marc-schema.json
validate src/main/resources/marc/authority-schema.avram.json
validate src/main/resources/pica-schema.json
validate src/main/resources/pica/avram-k10plus-title.json
validate src/main/resources/unimarc/avram-unimarc.json
validate src/test/resources/pica/schema/pica-schema-extra.json
validate src/test/resources/pica/schema/pica-schema.json
validate src/test/resources/unimarc/avram-unimarc.json
1 change: 0 additions & 1 deletion catalogues/K10plus.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
. ./setdir.sh

NAME=K10plus
MARC_DIR=${BASE_INPUT_DIR}/K10plus
TYPE_PARAMS="--marcxml --emptyLargeCollectors --fixAlma"
MASK=od-full_bsz-tit_0??.xml.gz

Expand Down
1 change: 0 additions & 1 deletion catalogues/bnpl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
NAME=bnpl
# TYPE_PARAMS="--marcVersion GENT"
TYPE_PARAMS=" --emptyLargeCollectors --indexWithTokenizedField"
MARC_DIR=${BASE_INPUT_DIR}/bnpl
MASK=bibs-all.marc.gz

. ./common-script
1 change: 0 additions & 1 deletion catalogues/bnpt.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ TYPE_PARAMS="$TYPE_PARAMS --solrForScoresUrl http://localhost:8983/solr/bnpt_val
TYPE_PARAMS="$TYPE_PARAMS --indexWithTokenizedField"
TYPE_PARAMS="$TYPE_PARAMS --indexFieldCounts"

MARC_DIR=${BASE_INPUT_DIR}/bnpt
MASK=bibliographics_*.xml

. ./common-script
1 change: 0 additions & 1 deletion catalogues/bnr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
NAME=bnr
# TYPE_PARAMS="--marcVersion GENT"
TYPE_PARAMS=" --emptyLargeCollectors"
MARC_DIR=${BASE_INPUT_DIR}/bnr
MASK=bnr.*.mrc

. ./common-script
1 change: 0 additions & 1 deletion catalogues/clb.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
. ./setdir.sh

NAME=clb
MARC_DIR=${BASE_INPUT_DIR}/clb
TYPE_PARAMS="--marcxml --emptyLargeCollectors --marcVersion NKCR"
MASK=ucloall.xml.gz

Expand Down
1 change: 0 additions & 1 deletion catalogues/columbia.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
. ./setdir.sh

NAME=columbia
MARC_DIR=${BASE_INPUT_DIR}/columbia
MASK=*.mrc

. ./common-script
1 change: 0 additions & 1 deletion catalogues/ddb.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
# TYPE_PARAMS='--ignorableFields A02,AQN,BGT,BUF,CFI,CNF,DGM,DRT,EST,EXP,FFP,FIN,LAS,LCS,LDO,LEO,LET,MIS,MNI,MPX,NEG,NID,OBJ,OHC,ONS,ONX,PLR,RSC,SRC,SSD,TOC,UNO,VIT,WII --ignorableRecords STA$a=SUPPRESSED'
TYPE_PARAMS='--emptyLargeCollectors --marcxml'
NAME=ddb
MARC_DIR=${BASE_INPUT_DIR}/ddb
MASK=all.xml.gz

. ./common-script
1 change: 0 additions & 1 deletion catalogues/firenze.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@

NAME=firenze
TYPE_PARAMS="--emptyLargeCollectors"
MARC_DIR=${BASE_INPUT_DIR}/firenze
MASK=firenze.*.mrc.gz

. ./common-script
6 changes: 0 additions & 6 deletions catalogues/kb.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,3 @@ TYPE_PARAMS="--marcxml --emptyLargeCollectors --indexWithTokenizedField"
MASK=kb-marc*.xml.gz

. ./common-script

if [[ "$1" != "help" ]]; then
echo "DONE"
fi

exit 0
1 change: 0 additions & 1 deletion catalogues/libris.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
. ./setdir.sh

NAME=libris
MARC_DIR=${BASE_INPUT_DIR}/libris
TYPE_PARAMS="--emptyLargeCollectors --marcxml --indexWithTokenizedField"
MASK=sw-?.xml.gz

Expand Down
1 change: 0 additions & 1 deletion catalogues/mek.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
. ./setdir.sh

NAME=mek
MARC_DIR=${BASE_INPUT_DIR}/mek
TYPE_PARAMS="--emptyLargeCollectors --defaultEncoding MARC8"
MASK=MEKmind.mrc

Expand Down
1 change: 0 additions & 1 deletion catalogues/michigan.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
. ./setdir.sh

NAME=michigan
MARC_DIR=${BASE_INPUT_DIR}/michigan
MASK=*.marc

. ./common-script
1 change: 0 additions & 1 deletion catalogues/mokka.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
NAME=mokka
TYPE_PARAMS="--marcxml"
# TYPE_PARAMS="--marcVersion SZTE"
MARC_DIR=${BASE_INPUT_DIR}/mokka
MASK=all.xml

. ./common-script
Loading

0 comments on commit c3e17f0

Please sign in to comment.