Skip to content

Latest commit

 

History

History
147 lines (102 loc) · 13 KB

README.md

File metadata and controls

147 lines (102 loc) · 13 KB

tap-mongodb

validate

Code style: black Ruff

tap-mongodb is a Singer tap for extracting data from a MongoDB or AWS DocumentDB database. The tap supports extracting records from the database directly (incremental replication mode, the default) and also supports extracting change events from the database's Change Stream API (in log-based replication mode).

Built with the Meltano Tap SDK for Singer Taps.

Installation

Install from GitHub:

pipx install git+https://github.com/MeltanoLabs/tap-mongodb.git@main

Configuration

Accepted Config Options

Setting Type Required Default Description
database string True - Database from which records will be extracted.
mongodb_connection_string password False - MongoDB connection string. See the MongoDB documentation for specification. The username and password included in this string must be url-encoded - the tap will not url-encode it.
documentdb_credential_json_string password False - JSON string with keys 'username', 'password', 'engine', 'host', 'port', 'dbClusterIdentifier' or 'dbName', 'ssl'. See example and strucure in the AWS documentation here. The password from this JSON object will be url-encoded by the tap before opening the database connection. The intent of this setting is to enable management of an AWS DocumentDB database credential via AWS SecretsManager
documentdb_credential_json_extra_options string False - JSON string containing key-value pairs which will be added to the connection string options when using documentdb_credential_json_string. For example, when set to the string {"tls":"true","tlsCAFile":"my-ca-bundle.pem"}, the options tls=true&tlsCAFile=my-ca-bundle.pem will be passed to the MongoClient.
datetime_conversion string False datetime Parameter passed to MongoClient 'datetime_conversion' parameter. See documentation at https://pymongo.readthedocs.io/en/stable/examples/datetimes.html#handling-out-of-range-datetimes for details. The default value is 'datetime', which will throw a bson.errors.InvalidBson error if a document contains a date outside the range of datetime.MINYEAR (year 1) to datetime.MAXYEAR (9999).
prefix string False '' An optional prefix which will be added to the name of each stream.
filter_collections string[] False [] Collections to discover (default: all) - filtering is case-insensitive. Useful for improving catalog discovery performance.
start_date date_iso8601 False 1970-01-01 Start date - used for incremental replication only. In log-based replication mode, this setting is ignored.
add_record_metadata boolean False False When true, _sdc metadata fields will be added to records produced by the tap.
allow_modify_change_streams boolean False False In AWS DocumentDB (unlike MongoDB), change streams must be enabled specifically (see the documentation here ). If attempting to open a change stream against a collection on which change streams have not been enabled, an OperationFailure error will be raised. If this property is set to True, when this error is seen, the tap will execute an admin command to enable change streams and then retry the read operation. Note: this may incur new costs in AWS DocumentDB.
operation_types list(string) False create,delete,insert,replace,update List of MongoDB change stream operation types to include in tap output. The default behavior is to limit to document-level operation types. See full list of operation types in the MongoDB documentation. Note that the list of allowed_values for this property includes some values not available to all MongoDB versions.

Configure using environment variables

This Singer tap will automatically import any environment variables within the working directory's .env if the --config=ENV is provided, such that config values will be considered if a matching environment variable is set either in the terminal context or in the .env file.

Source Configuration

MongoDB

AWS DocumentDB

Source Authentication and Authorization

Incremental replication mode

In order to run tap-mongodb in incremental replication mode, the credential used must have read privileges to the collections from which you wish to extract records. If your credential has the readAnyDatabase@admin permission, for example, or read@test_database (where test_database is the database setting in the tap's configuration), that should be sufficient.

Collection-level read permissions are untested but are expected to work as well:

privileges: [
    {resource: {db: "test_database", collection: "TestOrders"}, actions: ["find"]}
]

The above collection-level read permission should allow the tap to extract from the test_database.TestOrders collection in incremental replication mode.

Log-based replication

In order to run tap-mongodb in log-based replication mode, which extracts records via the database's Change Streams API, MongoDB and AWS DocumentDB have different requirements around permissions.

In MongoDB, the credential must have both find and changeStreams permissions on a database collection in order to use tap-mongodb in log-based replication mode. The readAnyDatabase@admin built-in role provides this for all databases, while read@test_database will provide the necessary access for all collections in the test_database database.

Usage

You can easily run tap-mongodb by itself or in a pipeline using Meltano.

Executing the Tap Directly

tap-mongodb --version
tap-mongodb --help
tap-mongodb --config CONFIG --discover > ./catalog.json

Developer Resources

Follow these instructions to contribute to this project.

Initialize your Development Environment

pipx install poetry
poetry install

Create and Run Tests

Create tests within the tap_mongodb/tests subfolder and then run:

poetry run pytest

You can also test the tap-mongodb CLI interface directly using poetry run:

poetry run tap-mongodb --help

Testing with Meltano

Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.

Next, install Meltano (if you haven't already) and any needed plugins:

# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-mongodb
meltano install

Now you can test and orchestrate using Meltano:

# Test invocation:
meltano invoke tap-mongodb --version
# OR run a test `elt` pipeline:
meltano run tap-mongodb target-jsonl

SDK Dev Guide

See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.