-
Notifications
You must be signed in to change notification settings - Fork 19
Application usage
Database preservation toolkit converts from a source database format to a destination database format. The format may be a database management system, a preservation format or even plain text.
To retrieve from a source, the application uses an import module.
To write to a destination, the application uses an export module.
It is the pair composed of an import module and an export module that provides the conversion functionality. There are different modules which can be used and even configured to provide a conversion between database formats.
The command line application takes a series of arguments, that can be provided in any order. These define the application's behavior.
General usage: java [properties] -jar dbptk-app-x.y.z.jar [plugin] <importModule> [import module options] <exportModule> [export module options]
The general use command is generic and cannot be used as is. Here are a list of modifications that must be carried out:
-
java
is the java command, the full path may also be used -
[properties]
may be omitted or replaced with special configurations that influence the conversion(more details) -
-jar dbptk-app-x.y.z.jar
tells java to execute thedbptk-app-x.y.z.jar
file (the file name must be adjusted to match the one you have) -
[plugin]
is optional, and should be replaced with plugin configurations (if any) -
<importModule>
should be replaced with the import module specification, e.g.-i mysql
or--import=postgresql
-
<exportModule>
should be replaced with the export module specification, e.g.-e mysql
or--export=postgresql
-
[import module options]
should be replaced with parameters to specify the behavior of the import module, e.g.--import-username=username --import-password="p4ssw0rd"
(to specify source database username and password) -
[export module options]
should be replaced with parameters to specify the behavior of the export module, e.g.--export-file=filename.siard --export-compress --export-pretty-xml
(to specify the SIARD-2 export module behavior)
Parameters have two interchangeable formats, a longer format for readability (e.g. --import-hostname=localhost
) and a short format which is faster to type (e.g. -i localhost
). Notice that the difference is the shorter/longer parameter name and the number of short dashes used (there is no distinction in using space character or equal sign to separate parameters).
-p, --plugin=plugin.jar (optional) the file containing a plugin module. Several plugins can be specified, separated by a semi-colon (;)
Specify the import module with: -i <module>, --import=module
Import module: jdbc
-id, --import-driver=value (required) the name of the the JDBC driver class. For more info about this refer to the website or the README file
-ic, --import-connection=value (required) the connection url to use in the connection
Import module: microsoft-access
-if, --import-file=value (required) path to the Microsoft Access file
Import module: microsoft-sql-server
-is, --import-server-name=value (required) the name (host name) of the server
-idb, --import-database=value (required) the name of the database we'll be accessing
-iu, --import-username=value (required) the name of the user to use in the connection
-ip, --import-password=value (required) the password of the user to use in the connection
-il, --import-use-integrated-login (optional) use windows login; by default the SQL Server login is used
-ide, --import-disable-encryption (optional) use to turn off encryption in the connection
-iin, --import-instance-name=value (optional) the name of the instance
-ipn, --import-port-number=value (optional) the port number of the server instance, default is 1433
Import module: mysql
-ih, --import-hostname=value (required) the hostname of the MySQL server
-idb, --import-database=value (required) the name of the MySQL database
-iu, --import-username=value (required) the name of the user to use in connection
-ip, --import-password=value (required) the password of the user to use in connection
-ipn, --import-port-number=value (optional) the port that the MySQL server is listening
Import module: oracle
-is, --import-server-name=value (required) the name (or IP address) of the Oracle server
-idb, --import-database=value (required) the name of the database to use in the connection
-iu, --import-username=value (required) the name of the user to use in connection
-ip, --import-password=value (required) the password of the user to use in connection
-ipn, --import-port-number=value (required) the port that the Oracle server is listening
-ial, --import-accept-license (optional) declare that you accept OTN License Agreement, which is necessary to use this module
Import module: postgresql
-ih, --import-hostname=value (required) the name of the PostgreSQL server host (e.g. localhost)
-idb, --import-database=value (required) the name of the database to connect to
-iu, --import-username=value (required) the name of the user to use in connection
-ip, --import-password=value (required) the password of the user to use in connection
-ide, --import-disable-encryption (optional) use to turn off encryption in the connection
-ipn, --import-port-number=value (optional) the port of where the PostgreSQL server is listening, default is 5432
Import module: siard-1
-if, --import-file=value (required) Path to SIARD1 archive file
Import module: siard-2
-if, --import-file=value (required) Path to SIARD2 archive file
Import module: siard-dk
-if, --import-folder=value (required) Path to (the first) SIARDDK archive folder. Archive folder name must match the expression AVID.[A-ZÆØÅ]{2,4}.[1-9][0-9]*.1 Any additional parts of the archive (eg. with suffixes .2 .3 etc) referenced in the tableIndex.xml will also be processed.
-ias, --import-as-schema=value (required) Name of the database schema to use when importing the SIARDDK archive. Suggested values: PostgreSQL:'public', MySQL:'<name of database>', MSSQL:'dbo'
Specify the export module with: -e <module>, --export=module
Export module: solr
-eh, --export-hostname=value (optional) Solr Cloud server hostname or address
-ep, --export-port=value (optional) Solr Cloud server port
-ezh, --export-zookeeper-hostname=value (optional) Zookeeper server hostname or address
-ezp, --export-zookeeper-port=value (optional) Zookeeper server port
Export module: jdbc
-ed, --export-driver=value (required) the name of the the JDBC driver class. For more info about this refer to the website or the README file
-ec, --export-connection=value (required) the connection url to use in the connection
Export module: list-tables
-ef, --export-file=value (required) Path to output file that can be read by SIARD2 export module
Export module: microsoft-sql-server
-es, --export-server-name=value (required) the name (host name) of the server
-edb, --export-database=value (required) the name of the database we'll be accessing
-eu, --export-username=value (required) the name of the user to use in the connection
-ep, --export-password=value (required) the password of the user to use in the connection
-el, --export-use-integrated-login (optional) use windows login; by default the SQL Server login is used
-ede, --export-disable-encryption (optional) use to turn off encryption in the connection
-ein, --export-instance-name=value (optional) the name of the instance
-epn, --export-port-number=value (optional) the port number of the server instance, default is 1433
Export module: mysql
-eh, --export-hostname=value (required) the hostname of the MySQL server
-edb, --export-database=value (required) the name of the MySQL database
-eu, --export-username=value (required) the name of the user to use in connection
-ep, --export-password=value (required) the password of the user to use in connection
-epn, --export-port-number=value (optional) the port that the MySQL server is listening
Export module: oracle
-es, --export-server-name=value (required) the name (or IP address) of the Oracle server
-edb, --export-database=value (required) the name of the database to use in the connection
-eu, --export-username=value (required) the name of the user to use in connection
-ep, --export-password=value (required) the password of the user to use in connection
-epn, --export-port-number=value (required) the port that the Oracle server is listening
-eal, --export-accept-license (optional) declare that you accept OTN License Agreement, which is necessary to use this module
-esc, --export-source-schema=value (optional) the name of the source schema to export to the Oracle database. A schema with this name must exist in the Oracle database and it must be the default tablespace for the specified user. If omitted, the name of the first schema will be used
Export module: postgresql
-eh, --export-hostname=value (required) the name of the PostgreSQL server host (e.g. localhost)
-edb, --export-database=value (required) the name of the database to connect to
-eu, --export-username=value (required) the name of the user to use in connection
-ep, --export-password=value (required) the password of the user to use in connection
-ede, --export-disable-encryption (optional) use to turn off encryption in the connection
-epn, --export-port-number=value (optional) the port of where the PostgreSQL server is listening, default is 5432
Export module: siard-1
-ef, --export-file=value (required) Path to SIARD1 archive file
-ec, --export-compress (optional) use to compress the SIARD1 archive file with deflate method
-ep, --export-pretty-xml (optional) write human-readable XML
-etf, --export-table-filter=value (optional) file with the list of tables that should be exported (this file can be created by the list-tables export module).
-emd, --export-meta-description[=value] (optional) SIARD descriptive metadata field: Description of database meaning and content as a whole.
-ema, --export-meta-archiver[=value] (optional) SIARD descriptive metadata field: Name of the person who carried out the archiving of the database.
-emac, --export-meta-archiver-contact[=value] (optional) SIARD descriptive metadata field: Contact details (telephone, email) of the person who carried out the archiving of the database.
-emdo, --export-meta-data-owner[=value] (optional) SIARD descriptive metadata field: Owner of the data in the database. The person or institution that, at the time of archiving, has the right to grant usage rights for the data and is responsible for compliance with legal obligations such as data protection guidelines.
-emdot, --export-meta-data-origin-timespan[=value] (optional) SIARD descriptive metadata field: Origination period of the data in the database (approximate indication in text form).
-emcm, --export-meta-client-machine[=value] (optional) SIARD descriptive metadata field: DNS name of the (client) computer on which the archiving was carried out.
Export module: siard-2
-ef, --export-file=value (required) Path to SIARD2 archive file
-ec, --export-compress (optional) use to compress the SIARD2 archive file with deflate method
-ep, --export-pretty-xml (optional) write human-readable XML
-etf, --export-table-filter=value (optional) file with the list of tables that should be exported (this file can be created by the list-tables export module).
-eel, --export-external-lobs (optional) Saves any LOBs outside the siard file.
-eelpf, --export-external-lobs-per-folder=value (optional) The maximum number of files present in an external LOB folder. Default: 1000 files.
-eelfs, --export-external-lobs-folder-size=value (optional) Divide LOBs across multiple external folders with (approximately) the specified maximum size (in Megabytes). Default: do not divide.
-emd, --export-meta-description[=value] (optional) SIARD descriptive metadata field: Description of database meaning and content as a whole.
-ema, --export-meta-archiver[=value] (optional) SIARD descriptive metadata field: Name of the person who carried out the archiving of the database.
-emac, --export-meta-archiver-contact[=value] (optional) SIARD descriptive metadata field: Contact details (telephone, email) of the person who carried out the archiving of the database.
-emdo, --export-meta-data-owner[=value] (optional) SIARD descriptive metadata field: Owner of the data in the database. The person or institution that, at the time of archiving, has the right to grant usage rights for the data and is responsible for compliance with legal obligations such as data protection guidelines.
-emdot, --export-meta-data-origin-timespan[=value] (optional) SIARD descriptive metadata field: Origination period of the data in the database (approximate indication in text form).
-emcm, --export-meta-client-machine[=value] (optional) SIARD descriptive metadata field: DNS name of the (client) computer on which the archiving was carried out.
Export module: siard-dk
-ef, --export-folder=value (required) Path to SIARDDK archive folder. Archive folder name must match the expression AVID.[A-ZÆØÅ]{2,4}.[1-9][0-9]*.[1-9][0-9]
-etf, --export-table-filter=value (optional) file with the list of tables that should be exported (this file can be created by the list-tables export module).
-eai, --export-archiveIndex=value (optional) Path to archiveIndex.xml input file
-eci, --export-contextDocumentationIndex=value (optional) Path to contextDocumentationIndex.xml input file
-ecf, --export-contextDocumentationFolder=value (optional) Path to contextDocumentation folder which should contain the context documentation for the archive
Several properties are available to modify specific conversion behaviour. You can consider them as knobs that can be turned to fine-tune the conversion.
The properties have a format like part1.part2.part3
, with multiple lower-case parts separated by dots. All properties have a corresponding environment variable, like PART1_PART2_PART3
(corresponding to the previous example), with the same parts in upper-case and separated by underscores.
Properties are added to the command line like this:
... -Dpart1.part2.part3=value -Danother.property=othervalue ...
Note: in windows, each property and value pair must be enclosed in "
, example ... "-Dpart1.part2.part3=value" ...
If both the environment variable and the property are set, the property is used.
For simplicity, only the properties will be described, and the environment variables can be derived from those by using upper-cased letters and replacing the dots with underscores (as described above).
Controls the amount of rows that are retrieved from the database and stored in memory at once.
-
dbptk.jdbc.fetchsize.default
(Integer) - the first fetch size to try (default: 0, which means "use the default value suggested/calculated by the driver") -
dbptk.jdbc.fetchsize.small
(Integer) - the second fetch size to try, in case the first one caused an issue (default: 10) -
dbptk.jdbc.fetchsize.minimum
(Integer) - the last fetch size to try, in case the second one also caused an issue. This is the last try before giving up on fetching information from this table (default: 1)
Setting dbptk.jdbc.fetchsize.default
to 1 fetches one row at a time, using minimal memory during the conversion but taking longer to convert the database.
For more details check https://github.com/keeps/db-preservation-toolkit/pull/292
Copyright © 2019 by KEEP SOLUTIONS
All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law. For permission requests, write to the publisher, addressed “Attention: Permissions Coordinator,” at the address below.
KEEP SOLUTIONS, LDA.
Rua Rosalvo de Almeida, nº 5
4710-429 Braga, Portugal
W www.keep.pt E info@keep.pt