-
Notifications
You must be signed in to change notification settings - Fork 19
Application usage
Database preservation toolkit converts from a source database format to a destination database format. The format may be a database management system or a preservation format.
To retrieve from a source, the application uses an import module.
To write to a destination, the application uses an export module.
To perform any intermediate actions, the application may use one or more filter modules.
It is the pair composed of an import module and an export module that provides the conversion functionality. There are different modules which can be used and even configured to provide a conversion between database formats.
The command line application takes a series of arguments, that can be provided in any order. These define the application's behavior.
java [properties] -jar dbptk-app-x.y.z.jar migrate <importModule> [import module options] <exportModule> [export module options] [<filterModule(s)> [filter module options]]
The general use command is generic and cannot be used as is. Here are a list of modifications that must be carried out:
-
java
is the java command, the full path may also be used -
[properties]
may be omitted or replaced with special configurations that influence the conversion(more details) -
-jar dbptk-app-x.y.z.jar
tells java to execute thedbptk-app-x.y.z.jar
file (the file name must be adjusted to match the one you have) -
<importModule>
should be replaced with the import module specification, e.g.-i mysql
or--import=postgresql
-
<exportModule>
should be replaced with the export module specification, e.g.-e mysql
or--export=postgresql
-
<filterModule(s)>
should be replaced with a list of filter module specifications separated by ',' with no spaces, e.g.-f external-lobs
or--filter=external-lobs,external-lobs
-
[import module options]
should be replaced with parameters to specify the behavior of the import module, e.g.--import-username=username --import-password="p4ssw0rd"
(to specify source database username and password) -
[export module options]
should be replaced with parameters to specify the behavior of the export module, e.g.--export-file=filename.siard --export-compress --export-pretty-xml
(to specify the SIARD-2 export module behavior) -
[filter module options]
should be replaced with parameters to specify the behavior of the filter module(s). As there can be multiple filters declared, these parameters should contain the index of the filter they refer to in the list (even if this list is composed of only one element), e.g.--filter1-dir /home/user/ --filter1-disable-print-header
(to specify the inventory filter module behavior)
Parameters have two interchangeable formats, a longer format for readability (e.g. --import-hostname=localhost
) and a short format which is faster to type (e.g. -i localhost
). Notice that the difference is the shorter/longer parameter name and the number of short dashes used (there is no distinction in using space character or equal sign to separate parameters).
Specify the import module with: -i <module>, --import=module
Import module: jdbc
-id, --import-driver=value (required) the name of the the JDBC driver class. For more info about this refer to the website or the README file
-ic, --import-connection=value (required) the connection url to use in the connection
Note: In order to use this module you need to a JDBC driver. Please refer to this documentation on how to import your on driver.
Import module: microsoft-access
-if, --import-file=value (required) path to the Microsoft Access file
-ip, --import-password=value (optional) password to the Microsoft Access file
Import module: microsoft-sql-server
-is, --import-server-name=value (required) the name (host name) of the server
-idb, --import-database=value (required) the name of the database we'll be accessing
-iu, --import-username=value (required) the name of the user to use in the connection
-ip, --import-password=value (required) the password of the user to use in the connection
-il, --import-use-integrated-login (optional) use windows login; by default the SQL Server login is used
-ide, --import-disable-encryption (optional) use to turn off encryption in the connection
-iin, --import-instance-name=value (optional) the name of the instance
-ipn, --import-port-number=value (optional) the port number of the server instance, default is 1433
Import module: mysql
-ih, --import-hostname=value (required) the hostname of the MySQL server
-idb, --import-database=value (required) the name of the MySQL database
-iu, --import-username=value (required) the name of the user to use in connection
-ip, --import-password=value (required) the password of the user to use in connection
-ipn, --import-port-number=value (optional) the port that the MySQL server is listening, default is 3306
-ide, --import-disable-encryption (optional) use to turn off encryption in the connection
Import module: oracle
-is, --import-server-name=value (required) the name (or IP address) of the Oracle server
-idb, --import-database=value (required) the name of the database to use in the connection
-iu, --import-username=value (required) the name of the user to use in connection
-ip, --import-password=value (required) the password of the user to use in connection
-ipn, --import-port-number=value (required) the port that the Oracle server is listening, default is 1521
-ial, --import-accept-license (optional) declare that you accept OTN License Agreement, which is necessary to use this module
Import module: postgresql
-ih, --import-hostname=value (required) the name of the PostgreSQL server host (e.g. localhost)
-idb, --import-database=value (required) the name of the database to connect to
-iu, --import-username=value (required) the name of the user to use in connection
-ip, --import-password=value (required) the password of the user to use in connection
-ide, --import-disable-encryption (optional) use to turn off encryption in the connection
-ipn, --import-port-number=value (optional) the port of where the PostgreSQL server is listening, default is 5432
Import module: sybase
-ih, --import-hostname=value (required) the name (host name) of the server
-idb, --import-database=value (required) the name of the database to use in the connection
-iu, --import-username=value (required) the name of the user to use in connection
-ip, --import-password=value (required) the password of the user to use in connection
-ide, --import-disable-encryption (optional) use to turn off encryption in the connection
-ipn, --import-port-number=value (optional) the port of where the Sybase server is listening, default is 2638
Note: In order to use this module you need to use the proprietary driver. Please refer to this documentation on how to import your on driver.
Import module: progress-openedge
-ih, --import-hostname=value (required) the name (host name) of the server
-idb, --import-database=value (required) the name of the database to use in the connection
-iu, --import-username=value (required) the name of the user to use in connection
-ip, --import-password=value (required) the password of the user to use in connection
-ide, --import-disable-encryption (optional) use to turn off encryption in the connection
-ipn, --import-port-number=value (optional) the port of where the Sybase server is listening, default is 20931
Note: In order to use this module you need to use the proprietary driver. Please refer to this documentation on how to import your on driver.
Import module: siard-1
-if, --import-file=value (required) Path to SIARD1 archive file
Import module: siard-2
-if, --import-file=value (required) Path to SIARD2 archive file
Import module: siard-dk
-if, --import-folder=value (required) Path to (the first) SIARDDK archive folder. Archive folder name must match the expression AVID.[A-ZÆØÅ]{2,4}.[1-9][0-9]*.1 Any additional parts of the archive (eg. with suffixes .2 .3 etc) referenced in the tableIndex.xml will also be processed.
-ias, --import-as-schema=value (required) Name of the database schema to use when importing the SIARDDK archive. Suggested values: PostgreSQL:'public', MySQL:'<name of database>', MSSQL:'dbo'
Import module: import-config
-if, --import-file=value (required) path to the import configuration file to be read by the SIARD export module
-ip, --import-parameters=value (required) pair of parameters to be resolved in the YAML configuration file. To define a pair use this syntax: key:value;key:value;
Specify the export module with: -e <module>, --export=module
Export module: jdbc
-ed, --export-driver=value (required) the name of the the JDBC driver class. For more info about this refer to the website or the README file
-ec, --export-connection=value (required) the connection url to use in the connection
Export module: microsoft-sql-server
-es, --export-server-name=value (required) the name (host name) of the server
-edb, --export-database=value (required) the name of the database we'll be accessing
-eu, --export-username=value (required) the name of the user to use in the connection
-ep, --export-password=value (required) the password of the user to use in the connection
-el, --export-use-integrated-login (optional) use windows login; by default the SQL Server login is used
-ede, --export-disable-encryption (optional) use to turn off encryption in the connection
-ein, --export-instance-name=value (optional) the name of the instance
-epn, --export-port-number=value (optional) the port number of the server instance, default is 1433
Export module: mysql
-eh, --export-hostname=value (required) the hostname of the MySQL server
-edb, --export-database=value (required) the name of the MySQL database
-eu, --export-username=value (required) the name of the user to use in connection
-ep, --export-password=value (required) the password of the user to use in connection
-epn, --export-port-number=value (optional) the port that the MySQL server is listening
Export module: oracle
-es, --export-server-name=value (required) the name (or IP address) of the Oracle server
-edb, --export-database=value (required) the name of the database to use in the connection
-eu, --export-username=value (required) the name of the user to use in connection
-ep, --export-password=value (required) the password of the user to use in connection
-epn, --export-port-number=value (required) the port that the Oracle server is listening
-eal, --export-accept-license (optional) declare that you accept OTN License Agreement, which is necessary to use this module
-esc, --export-source-schema=value (optional) the name of the source schema to export to the Oracle database. A schema with this name must exist in the Oracle database and it must be the default tablespace for the specified user. If omitted, the name of the first schema will be used
Export module: postgresql
-eh, --export-hostname=value (required) the name of the PostgreSQL server host (e.g. localhost)
-edb, --export-database=value (required) the name of the database to connect to
-eu, --export-username=value (required) the name of the user to use in connection
-ep, --export-password=value (required) the password of the user to use in connection
-ede, --export-disable-encryption (optional) use to turn off encryption in the connection
-epn, --export-port-number=value (optional) the port of where the PostgreSQL server is listening, default is 5432
Export module: siard-1
-ef, --export-file=value (required) Path to SIARD1 archive file
-ec, --export-compress (optional) use to compress the SIARD1 archive file with deflate method
-ep, --export-pretty-xml (optional) write human-readable XML
-emd, --export-meta-description[=value] (optional) SIARD descriptive metadata field: Description of database meaning and content as a whole.
-ema, --export-meta-archiver[=value] (optional) SIARD descriptive metadata field: Name of the person who carried out the archiving of the database.
-emac, --export-meta-archiver-contact[=value] (optional) SIARD descriptive metadata field: Contact details (telephone, email) of the person who carried out the archiving of the database.
-emdo, --export-meta-data-owner[=value] (optional) SIARD descriptive metadata field: Owner of the data in the database. The person or institution that, at the time of archiving, has the right to grant usage rights for the data and is responsible for compliance with legal obligations such as data protection guidelines.
-emdot, --export-meta-data-origin-timespan[=value] (optional) SIARD descriptive metadata field: Origination period of the data in the database (approximate indication in text form).
-emcm, --export-meta-client-machine[=value] (optional) SIARD descriptive metadata field: DNS name of the (client) computer on which the archiving was carried out.
Export module: siard-2
-ef, --export-file=value (required) Path to SIARD2 archive file
-ec, --export-compress (optional) use to compress the SIARD2 archive file with deflate method
-ep, --export-pretty-xml (optional) write human-readable XML
-eel, --export-external-lobs (optional) Saves any LOBs outside the siard file.
-eelpf, --export-external-lobs-per-folder=value (optional) The maximum number of files present in an external LOB folder. Default: 1000 files.
-eelfs, --export-external-lobs-folder-size=value (optional) Divide LOBs across multiple external folders with (approximately) the specified maximum size (in Megabytes). Default: do not divide.
-emd, --export-meta-description[=value] (optional) SIARD descriptive metadata field: Description of database meaning and content as a whole.
-ema, --export-meta-archiver[=value] (optional) SIARD descriptive metadata field: Name of the person who carried out the archiving of the database.
-emac, --export-meta-archiver-contact[=value] (optional) SIARD descriptive metadata field: Contact details (telephone, email) of the person who carried out the archiving of the database.
-emdo, --export-meta-data-owner[=value] (optional) SIARD descriptive metadata field: Owner of the data in the database. The person or institution that, at the time of archiving, has the right to grant usage rights for the data and is responsible for compliance with legal obligations such as data protection guidelines.
-emdot, --export-meta-data-origin-timespan[=value] (optional) SIARD descriptive metadata field: Origination period of the data in the database (approximate indication in text form).
-emcm, --export-meta-client-machine[=value] (optional) SIARD descriptive metadata field: DNS name of the (client) computer on which the archiving was carried out.
-ed, --export-digest (optional) The message digest algorithm for the type of integrity information. Default: SHA-256
-efc, --export-font-case (optional) Define the type of font case for the message digest. Supported font case are: upper case and lower case. Default: lowercase
Export module: siard-dk
-ef, --export-folder=value (required) Path to SIARDDK archive folder. Archive folder name must match the expression AVID.[A-ZÆØÅ]{2,4}.[1-9][0-9]*.[1-9][0-9]
-eai, --export-archiveIndex=value (optional) Path to archiveIndex.xml input file
-eci, --export-contextDocumentationIndex=value (optional) Path to contextDocumentationIndex.xml input file
-ecf, --export-contextDocumentationFolder=value (optional) Path to contextDocumentation folder which should contain the context documentation for the archive
Export module: import-config
-ef, --export-file=value (required) path to the import configuration file
Specify the filter module(s) with: -f <module(s)>, --filter=module(s)
Filter module: external-lobs
- Refer to External LOBs Filter Module
Filter module: merkle-tree
- Refer to Merkle Tree Filter Module
Filter module: inventory
- Refer to Inventory Filter Module
Several properties are available to modify specific conversion behaviour. You can consider them as knobs that can be turned to fine-tune the conversion.
The properties have a format like part1.part2.part3
, with multiple lower-case parts separated by dots. All properties have a corresponding environment variable, like PART1_PART2_PART3
(corresponding to the previous example), with the same parts in upper-case and separated by underscores.
Properties are added to the command line like this:
... -Dpart1.part2.part3=value -Danother.property=othervalue ...
Note: in windows, each property and value pair must be enclosed in "
, example ... "-Dpart1.part2.part3=value" ...
If both the environment variable and the property are set, the property is used.
For simplicity, only the properties will be described, and the environment variables can be derived from those by using uppercased letters and replacing the dots with underscores (as described above).
Controls the amount of rows that are retrieved from the database and stored in memory at once.
-
dbptk.jdbc.fetchsize.default
(Integer) - the first fetch size to try (default: 0, which means "use the default value suggested/calculated by the driver") -
dbptk.jdbc.fetchsize.small
(Integer) - the second fetch size to try, in case the first one caused an issue (default: 10) -
dbptk.jdbc.fetchsize.minimum
(Integer) - the last fetch size to try, in case the second one also caused an issue. This is the last try before giving up on fetching information from this table (default: 1)
Setting dbptk.jdbc.fetchsize.default
to 1 fetches one row at a time, using minimal memory during the conversion but taking longer to convert the database.
For more details check https://github.com/keeps/db-preservation-toolkit/pull/292
Controls the amount of LOB that is prefetch for each row retrieved from the database and stored in memory at once.
-
dbptk.jdbc.oracle.lobPrefetchSize
(Integer) - This property allows to configure how much of the LOB data is fetched the first time is requested (default: 4000)
For more details check https://github.com/keeps/db-preservation-toolkit/issues/437
Controls the open port search range by defining the minimum and maximum value to search for.
-
dbptk.ssh.port.findmin
(Integer) - the minimum value to included (default: 1024) -
dbptk.ssh.port.findmax
(Integer) - the maximum value to included (default: 49151)
Controls the location of the directory where to save the off-heap file (depending on the size of the SIARD file this off-heap file can grow substantially)
-
dbptk.memory.dir
(String) - the directory path for the off-heap file storage (default: an hidden folder named dbptk under your $HOME directory)
Controls the timestamp field handling from the Java. Thanks to @ateras
-
user.timezone=GMT
- tells Java not to do any unexpected conversions when handling the timestamp fields
Copyright © 2019 by KEEP SOLUTIONS
All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law. For permission requests, write to the publisher, addressed “Attention: Permissions Coordinator,” at the address below.
KEEP SOLUTIONS, LDA.
Rua Rosalvo de Almeida, nº 5
4710-429 Braga, Portugal
W www.keep.pt E info@keep.pt