-
Notifications
You must be signed in to change notification settings - Fork 2
Home
The Coyote DX project is a toolkit for developers to read and write data from and to different sources and formats. It is designed to provide a quick (5 minutes or less) route to basic extraction, transformation and loading (ETL) of data.
The goal has evolved into creating a data exchange utility along the lines of build tools similar to Maven, Gradle and Ant, where a configuration file (pom.xml
, build.gradle
and build.xml
respectively) is written and run (mvn
, gradle
and ant
respectively). The result being data read from one system and written to another with a simple, easy to maintain script.
Coyote DX is intended to bring together basic tools for reading, writing and marshaling data between systems. Primarily, reading data from one system and writing it into another.
This toolkit was created during the prototyping of interfaces between multiple back-office systems and SaaS instances where CSV files and database records were needed to be exchanged with web services. The SaaS instances could only support web service interactions, but the back office systems only supported flat file (CSV, JSON) or database (JDBC) access. This toolkit allowed the development team to read data records in one format and write them to another allowing for bi-directional data flow using a simple command line interface which then was easily deployed to multiple scheduling systems like Windows, Task Scheduler, cron
and at
.
Once we fully understood the exchange and modeled the database, CSV files, and web service endpoints, everything was then turned over to the application development teams who developed the production integration components with the corporate standard ETL products. It was then discovered that Coyote data transfer jobs took less time to create, deploy and operate than the corporate Integration Tools with the same and often better reliability.
This is not an ETL framework. It is a collection of Java code which reads and writes data in different formats in a shared operational context using a standardized life cycle. Unlike frameworks which require the developer to write code in a particular way, the operator simply edits a configuration file (in JSON) and calls the job processor to use that configuration.
The primary goal of this project is to create a feature-rich JAR which can be used from the command line like this:
java -jar CoyoteDX.jar extract_my_data.json
where we can just edit the extract_my_data.json
file and re-run the job again with the changes.
No "coding" is necessary, the internal transform engine can be used as it is to read from a source and write to a target, all controlled by the simple configuration file. The configuration file controls what tools are called from the library at particular points in the transformation engine run cycle.
There are many useful tools in the Coyote DX library already with more being written weekly. These tools read data, perform tasks and write data and can be used for a variety of applications.
This toolkit has many classes which can be called in different contexts. If you want a component to read CSV files or write fixed-length records, just include Coyote DX in your classpath and use the classes in your code. The toolkit is small so it can be included with little burden on your project.
Unlike a traditional framework, the Coyote DX toolkit does not "dictate the architecture" of your application. It provides atomic functionality which can be called from within any application.
This toolkit does not require a server or any other enabling technologies which makes it ideal for prototyping, testing, troubleshooting and development activities. It is single threaded, meaning it is easy to troubleshoot and debug. It is lightweight in that it only needs a few small libraries (included in the FatJar). It has been run from laptops during design meetings to test basic assumptions and profile data from the command line as well as team servers (via at
, cron
, and Windows Task Scheduler) to perform data migration tasks and implement full integrations with back office and external systems.
Most of the codebase comes from projects in the utility industry which uses embedded systems running in the field for the collection of grid telemetry. Notoriously limited in resources, field devices cannot support large generic libraries with their equally large footprints and assumptions of copious memory, processing, storage, and network bandwidth. This code is designed from the bottom, up with an eye to efficiency and eliminating bloat. So go ahead, run it on your Raspberry Pi; you will have plenty of room for your own code and data. Coyote is a good choice for your embedded system data mediation.
Coyote is designed to support 12 Factor applications, enabling the support of modern, scalable, and cloud-ready systems. The configuration files make use of templating which pulls configuration data from system properties, environment variables, and the command line. Each job is expected to be run as its own stateless process. It is easy to add more processes to your workflow to scale your integrations out as well as up. Logging can be directed to Standard Output (STDOUT) and Standard Error (STDERR) for easy collection by process containers such as Cloud Foundry, OpenShift, and Heroku. Transform engines launch within a second or two and come down just as quick (depending on the in-flight processing of records) so scaling, blue/green deployments and crash recovery are quick. The concept of backing services is designed into the toolkit so one configuration can be used across many different instances of databases, for example, by simply supplying a different URL or connection string as a system property or environment variable. This isolates your data mediation code from data transport and storage details. Your configuration does not have to change when running in development, testing, or production environments.
For example, it is easy to stand up a CRUD (Create, Read Update, Delete) ReST API via web services for an endpoint with a Coyote configuration. As load increases, your container manager can auto-scale, adding additional instances by calling the application with a new IP port. This allows the load balancer to start sharing the request load with the new instance on that new port. Because your configurations use templates, the new IP and port values can be easily substituted at runtime, allowing your web service to scale horizontally without human intervention. Because your logs are directed to STDERR and STDOUT, your container manager can forward those (log) event streams to your log manager of choice, such as Sumo Logic, Papertrail, Coralogix, LogDNA, and others.
The toolkit natively supports IoT concepts and principles. As mentioned above, Coyote was born in field telemetry for the utility industry and used to enable distributed computing years before the concept of IoT was widely accepted. Coyote was created from the codebase developed to implement parallelized distributed computing in resource-limited environments. This means that this toolkit is vectored toward operating on multiple instances each running in parallel within small runtime environments. These tools are designed to facilitate the exchange of data in the most efficient and manageable manner possible.
It is possible, and encouraged, to write your own Readers, Writers, Tasks, Transforms, Validators and Listeners to meet your needs and have the toolkit (actually the Transform Engine) call them as you specify in your job configuration. As long as they are in your classpath and implement the appropriate interfaces, the engine will use them as you specify in the configuration. For example, your transformation logic may be far too complex for simple, generic tools to handle. In this case, it is possible to encapsulate all your business logic in a transform and place it in the transform section of your configuration. All the other tools can still be used and your custom logic is all that needs to be developed and maintained. This greatly reduces development costs and keeps your team focused on the problem domain.
There are other projects which extend this toolkit, contributing tools, and components of their own. The CoyoteFT project offers support for SFTP, FTP, FTPS and SCP allowing you to send and receive files as part of your pre and post-processing Tasks. Using CoyoteFT, it is a simple matter to download a file from one system, transform it and upload it to another in a different format.
The CoyoteWS project contributes Readers, Writers and other components to interact with other systems via SOAP and ReST web services. True to the toolkit goals, the components can be utilized without writing code and included in existing Coyote transfer job configurations, but it also contributes useful classes which can be called from your own Java code to make using web services easier.
Another Coyote DX project is CoyoteSN which uses the SNapi Java library to support Readers and Writers for ServiceNow data. With Coyote SN, it is possible to easily and quickly exchange data between a ServiceNow instance and almost any back office system; with little or no coding. While the MID server is required for many ServiceNow products, Coyote can provide organizations with a faster, more efficient option by writing data directly to Import Sets. Let the other departments write their own Coyote data transfer jobs which write directly into Import Sets for complete control over data ingestion. Coyote data transfer jobs can also easily read from the Table API allowing other teams to retrieve data directly from tables and views while leveraging the ServiceNow security model.
Like Coyote SN, it is expected that other product specific projects will be written to make integrating with systems and technologies as easy as adding a reader or writer to a Coyote DX configuration file.
There is even a user interface in the works, CoyoteUI, which allows you to create a web interface into a running Coyote DX instance to manage long-running jobs or services. Keeping true to its modularity, you can optionally contribute "Responders" to the installation and host your own HTTP pages and processing (or your own web services if desired). Again, CoyoteUI is not required and can be added to an existing installation only if that instance needs this capability. The primary use case is a single instance containing CoyoteUI and that instance exchanging data and commands with other Coyote DX instances running in that subnet.
Bottom line: deploy only the components you need for your task at hand but if you need more, it is easy to add other components to those instances as necessary.
- Concepts
- Features
- Transform Engine
- Quick Start
- Configuration
- Secrets Vault
-
Readers
- List of Readers
- Custom Readers
-
Writers
- List of Writers
- Custom Writers
-
Filters
- Accept
- Reject
- Custom Filters
-
Tasks
- List of Tasks
- Custom Tasks
-
Validators
- List of Validators
- Custom Validators
-
Listeners
- List of Listeners
- Custom Listeners
-
Transforms
- List of Transforms
- Custom Transforms
- Mappers
- Context
- Databases
- Templates
- Logging
- Encryption
- Usage
- Expressions
- Examples