pp2dat

Process possibly dirty internal PICA+ import format to clean Normalized PICA+ format

This script reads PICA+ as serialized by CBS internal C function write_pp and writes normalized PICA+ (.dat).

CBS is a prorietary database management system by OCLC and PICA+ is its internal record format. More information about PICA+ can be found in the German Handbook Einführung in die Verarbeitung von PICA-Daten. This script is needed because the internal serialization of PICA+ is neither defined openly nor strictly validated. In contrast normalized PICA+ has a short definition (one record per line with line separator 0A, fields end with 1E, subfields indicator 1F) and it is supported by open source tools such as pica-rs and QA Catalogue.

The output of pp2dat is ensured to be syntactically valid normalized PICA+. The following is checked:

basic record structure with fields and subfields
field tags must match regular expression ^[012][0-9][0-9][A-Z@]$
fields must not be empty
subfield indicators must match regular expression ^[0-9a-zA-Z]$
field 003@ must have exactely one subfield 0 and its value (PPN) must be two to twelve characters with the right check digit.
maximum record length 1 MiB (restricted by implementation, could be increased if needed)

Invalid input lines are ignored and reported to STDERR with line number, and PPN if known.

The script also collects fields 101@ and prepends field 001@ with a list of ILNs unless the record already contains field 001@.

Usage

Run make to compile and test the C script. Then use ./pp2dat to as filter (read from STDIN, write to STDOUT), e.g.

cat pica.file | ./pp2dat >pica.dat 2>warnings.log

Development

There is a unit test in directory test, automatically run at build. Directly run with:

make test

LICENSE

This script and its source code is made available under MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pp2dat

Usage

Development

LICENSE

About

Languages

License

gbv/pp2dat

Folders and files

Latest commit

History

Repository files navigation

pp2dat

Usage

Development

LICENSE

About

Resources

License

Stars

Watchers

Forks

Languages