-
Notifications
You must be signed in to change notification settings - Fork 64
Bulk Importing
❗ | Warning: This page is going to be retired and this documentation will be maintained at - https://github.com/OpenConceptLab/ocl-docs/blob/main/docs/source/oclapi/technical/bulkimporting.md |
---|
OCL exposes a method for submitting a bulk import file to the OCL API that is processed asynchronously on the server. A bulk import file may include creates, updates, or deletes for multiple owners and repositories. This approach is significantly more efficient than using individual REST API calls to modify or create one resource at a time. A bulk import file is processed using the credentials provided in the bulk import request.
Note that OCL also provides two Django management commands for running imports directly on the OCL server by a system administrator (see Server-Side Bulk Imports). Use of the Django management commands is deprecated, however, and we do not guarantee support moving forward. The bulk import API method presented here will replace these Django management commands in the future.
A bulk import script is a JSON lines file, where each line is an OCL-formatted JSON resource. The syntax of each resource is the same as described elsewhere in the OCL documentation, with three modifications:
- Each resource must include a
type
attribute specifying a valid resource type, egConcept
,Source
,Organization
- For all resources other than orgs and users, each resource must define an owner and, if applicable, a repository. These are defined using one or more of these attributes:
owner
,owner_type
,source
,collection
. - Each resource may optionally provide processing directives. Currently supported processing directives are:
-
__action
: There are 4 action types supported:-
CREATE_OR_UPDATE
(default) - By default, the bulk importer will attempt to update a resource if it already exists; otherwise it will try to create a new resource. -
CREATE
- The bulk importer will attempt to create a new resource regardless of whether it already exists -
UPDATE
- The bulk importer will attempt to update a resource regardless of whether it exists -
DELETE
- The bulk importer will attempt to delete a resource -
SKIP
(not currently implemented) - The bulk importer will skip the resource -
DELETE_IF_EXISTS
(not currently implemented) - The bulk importer will attempt to delete a resource if it confirms that it exists
-
-
__cascade
: For resources of typeReference
, it is possible to specify whether and how mappings are cascaded:-
None
(default) - No cascading will occur. Only the -
sourcemappings
- Mappings stored in the same source whosefrom_concept
matches a concept that is being added to a collection will also be added
-
-
The following bulk import script would create an organization, a source, and a concept:
{"type": "Organization", "id": "MyOrg", "name": "My Demo Organization"}
{"type": "Source", "id": "MyTestSource", "short_code": "MyTestSource", "name": "My Test Source", "full_name": "My Test Source", "owner": "MyOrg", "owner_type": "Organization", "description": "Using this source just for testing purposes", "source_type": "Dictionary", "public_access": "View", "default_locale": "en", "supported_locales": "en", "custom_validation_schema": "None"}
{"type": "Concept", "retired": false, "datatype": "None", "concept_class": "Disaggregate", "source": "MyTestSource", "extras": null, "descriptions": null, "owner": "MyOrg", "owner_type": "Organization", "external_id": "HSpL3hSBx6F", "id": "HSpL3hSBx6F", "names": [{"locale": "en", "locale_preferred": true, "external_id": null, "name": "50+, Male, Negative", "name_type": "Fully Specified"}]}
The bulk importer processes a bulk import script using the credentials provided in the bulk import request (eg. the Authorization
request header). All actions taken by the bulk importer use these credentials, meaning that the user must have the required permissions for each action. This includes GET requests that the bulk importer submits to determine whether resources already exist in OCL.
- Post a JSON bulk import file for asynchronous processing in the standard queue. The standard queue has multiple workers processing in parallel, and therefore bulk imports may not be processed in the order that they are submitted.
POST /manage/bulkimport/
- POST Request Parameters:
-
test_mode - default=
false
; set totrue
to only run a test import <NOT CURRENTLY SUPPORTED!> -
update_if_exists - default=
true
; set tofalse
to skip updating resources that already exist
-
test_mode - default=
- Adds a JSON bulk import file for asynchronous processing in a user assigned queue. User assigned queues process bulk import files using only one worker, therefore guaranteeing that they will be processed in the order in which they are submitted.
POST /manage/bulkimport/:queue/
-
:queue
- User-assigned queue mnemonic -
POST Request Parameters:
-
test_mode - default=
false
; set totrue
to only run a test import <NOT CURRENTLY SUPPORTED!> -
update_if_exists - default=
true
; set tofalse
to skip updating resources that already exist
-
test_mode - default=
GET /manage/bulkimport/
- GET Request Parameters:
- Root user only:
- username - optionally filter by username; for root, bulk imports for all users are returned by default
- Root user only:
GET /manage/bulkimport/:queue/
- GET Request Parameters:
- Root user only:
- username - optionally filter by username; for root, bulk imports for all users are returned by default
- Root user only:
GET /manage/bulkimport/?task=:taskid[&result=:format]
- GET Request Parameters:
- task (Required for GET request) - Task ID of a previously submitted bulk import request
-
result (Optional) - default="summary"; format of the results to be returned. Options are:
-
summary -- one line of plain text (see
OclImportResults.get_detailed_summary()
)
-
summary -- one line of plain text (see
Processed 348 of 348 -- 346 NEW (200:39, 201:307); 1 UPDATE (200:1); 1 DELETE (200:1)
-
report -- longer report of plain text (see
OclImportResults.display_report()
)
REPORT OF IMPORT RESULTS:
/orgs/DATIM-MOH-BW-FY19/collections/HTS-TST-N-MOH-HllvX50cXC0/:
NEW 200:
[{"message": "Added the latest versions of concept to the collection. Future updates will not be added automatically.",
"added": true, "expression":
...
-
json -- full results object serialized to JSON (see
OclImportResults.to_json()
)
{
"count": 348,
"elapsed_seconds": 94.10947012901306,
"total_lines": 348,
"num_skipped": 0,
"results": {
"/orgs/DATIM-MOH-BW-FY19/collections/HTS-TST-N-MOH-HllvX50cXC0/": {
"NEW": {
"200": [
{
"obj_type": "Reference",
"text": "{\"data\": {\"expressions\": [\"/orgs/DATIM-MOH-BW-FY19/sources/DATIM-Alignment-Indicators/mappings/MAP-DATIM-HAS-OPTION-HTS_TST_N_MOH-HllvX50cXC0/\", \"/orgs/PEPFAR/sources/DATIM-MOH-FY19/concepts/HTS_TST_N_MOH/\", \"/orgs/PEPFAR/sources/DATIM-MOH-FY19/concepts/HllvX50cXC0/\", \"/orgs/DATIM-MOH-BW-FY19/sources/
- Notes:
- Bulk imports are processed on behalf of the requesting user. Each line in an import file is processed separately and the user must have appropriate permissions for each line to be processed successfully.
- The payload for POSTs should contain resources to be created (each in a new line). Note that you are able to mix multiple resource types.
POST /manage/bulkimport/
{"type": "Source", "id": "JonTestSource", "short_code": "JonTestSource", "name": "Jon test source", "full_name": "Jon test source", "owner": "paynejd", "owner_type": "User", "description": "", "source_type": "Indicator Registry", "public_access": "View", "default_locale": "en", "supported_locales": "en", "custom_validation_schema": "None"}
{"retired": false, "datatype": "None", "type": "Concept", "concept_class": "Disaggregate", "source": "JonTestSource", "extras": null, "descriptions": null, "owner": "paynejd", "owner_type": "User", "external_id": "HSpL3hSBx6F", "id": "HSpL3hSBx6F", "names": [{"locale": "en", "locale_preferred": true, "external_id": null, "name": "50+, Male, Negative", "name_type": "Fully Specified"}]}
- In response you will receive a JSON with
status
andtask
attributes. Thetask
attribute contains a UUID of the asynchronous task, which you can use to further query for the status of the task or, when processing is complete, the results of the import. For example:
{
"status": "PENDING",
"task": "2344a457-cfdf-4985-ae0f-b2797d33a1a2"
}
GET /manage/bulkimport/?task=2344a457-cfdf-4985-ae0f-b2797d33a1a2&result=json
- 200: Import finished processing (though there may have been an error)
- If the import is complete, the response will contain the results of the import in the requested format (summary, report, or JSON)
- 200 (in the future this will be 202):
*PENDING: If the bulk import is queued and processing has not yet begun, the response will have a status of
PENDING
(the same as above)-
STARTED: If the bulk import is being processed and is not yet complete, the response will have a status of
STARTED
-
STARTED: If the bulk import is being processed and is not yet complete, the response will have a status of
- 404: The import task ID was not found for the requesting user
Overview
Resources
Import / Export
- CSV Import
- Bulk Import
- Org/Source Import
- Export API
- Subscriptions
- Subscription Client Testing Process
- OpenMRS to OCL Mapping
Troubleshooting & Operations
- Data integrity checks
- Maintaining OCLAPI's Docker containers
- Maintaining MongoDB and Solr
- How to check logs
- NewRelic monitoring setup
- Configuration changes to make tests and import job run faster
- Accessing Solr UI Remotely
- Data Backup and Restore
- SSL Configuration
- Flower
- Switching to Maintenance Mode on Production Server
- Docker networking and Security
Other