Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Glossary of Terms to Understanding Airbyte #6235

Merged
merged 3 commits into from
Sep 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/.gitbook/assets/glossary_dag_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@
* [Technical Stack](understanding-airbyte/tech-stack.md)
* [Change Data Capture \(CDC\)](understanding-airbyte/cdc.md)
* [Namespaces](understanding-airbyte/namespaces.md)
* [Glossary of Terms](understanding-airbyte/glossary.md)
* [API documentation](api-documentation.md)
* [Project Overview](project-overview/README.md)
* [Roadmap](project-overview/roadmap.md)
Expand Down
54 changes: 54 additions & 0 deletions docs/understanding-airbyte/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Glossary of Terms

### Airbyte CDK

The Airbyte CDK (Connector Development Kit) allows you to create connectors for Sources or Destinations. If your source or destination doesn't exist, you can use the CDK to make the building process a lot easier. It generates all the tests and files you need and all you need to do is write the connector-specific code for your source or destination. We created one in Python which you can check out [here](../connector-development/cdk-python) and the Faros AI team created a Javascript/Typescript one that you can check out [here](../connector-development/cdk-faros-js).

### DAG

DAG stands for **Directed Acyclic Graph**. It's a term originally coined by math graph theorists that describes a tree-like process that cannot contain loops. For example, in the following diagram, you start at A and can choose B or C, which then proceed to D and E, respectively. This kind of structure is great for representing workflows and is what tools like [Airflow](https://airflow.apache.org/) use to orchestrate the execution of software based on different cases or states.
![](../.gitbook/assets/glossary_dag_example.png)

### ETL/ELT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this in alphabetical order

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep done.

Stands for **E**xtract, **T**ransform, and **L**oad and **E**xtract, **L**oad, and **T**ransform, respectively.

**Extract**: Retrieve data from a [source](../integrations/sources), which can be an application, database, anything really.

**Load**: Move data to your [destination](../integrations/destinations).

**Transform**: Clean up the data. This is referred to as [normalization](./basic-normalization.md) in Airbyte and involves [deduplication](./connections/incremental-deduped-history.md), changing data types, formats, and more.

### Full Refresh Sync

A **Full Refresh Sync** will attempt to retrieve all data from the source every time a sync is run. Then there are two choices, **Overwrite** and **Append**. **Overwrite** deletes the data in the destination before running the sync and **Append** doesn't.

### Incremental Sync

An **Incremental Sync** will only retrieve new data from the source when a sync occurs. The first sync will always attempt to retrieve all the data. If the [destination supports it](https://discuss.airbyte.io/t/what-destinations-support-the-incremental-deduped-sync-mode/89), you can have your data deduplicated. Simply put, this just means that if you sync an updated version of a record you've already synced, it will remove the old record.

### Raw Tables

Airbyte spits out tables with the prefix `_airbyte_raw_`. This is your replicated data, but the prefix indicates that it's not normalized. If you select basic normalization, Airbyte will create renamed versions without the prefix.

## Advanced Terms

### AirbyteCatalog
{% hint style="info" %}
This is only relevant for individuals who want to create a connector.
{% endhint %}

This refers to how you define the data that you can retrieve from a Source. For example, if you want to retrieve information from an API, the data that you can receive needs to be defined clearly so that Airbyte can have a clear expectation of what endpoints are supported and what the objects that the streams return look like. This is represented as a sort of schema that Airbyte can interpret. Learn more [here](./beginners-guide-to-catalog.md).

### Airbyte Specification
{% hint style="info" %}
This is only relevant for individuals who want to create a connector.
{% endhint %}

This refers to the functions that a Source or Destination must implement to successfully retrieve data and load it, respectively. Implementing these functions using the Airbyte Specification makes a Source or Destination work correctly. Learn more [here](./airbyte-specification.md).

### Temporal
{% hint style="info" %}
This is only relevant for individuals who want to learn about or contribute to our underlying platform.
{% endhint %}

[Temporal](https://temporal.io/) is a development kit that lets you create workflows, parallelize them, and handle failures/retries gracefully. We use it to reliably schedule each step of the ELT process, and a Temporal service is always deployed with each Airbyte installation.