-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Glossary of Terms to Understanding Airbyte #6235
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# Glossary of Terms | ||
|
||
### Airbyte CDK | ||
|
||
The Airbyte CDK (Connector Development Kit) allows you to create connectors for Sources or Destinations. If your source or destination doesn't exist, you can use the CDK to make the building process a lot easier. It generates all the tests and files you need and all you need to do is write the connector-specific code for your source or destination. We created one in Python which you can check out [here](../connector-development/cdk-python) and the Faros AI team created a Javascript/Typescript one that you can check out [here](../connector-development/cdk-faros-js). | ||
|
||
### DAG | ||
|
||
DAG stands for **Directed Acyclic Graph**. It's a term originally coined by math graph theorists that describes a tree-like process that cannot contain loops. For example, in the following diagram, you start at A and can choose B or C, which then proceed to D and E, respectively. This kind of structure is great for representing workflows and is what tools like [Airflow](https://airflow.apache.org/) use to orchestrate the execution of software based on different cases or states. | ||
![](../.gitbook/assets/glossary_dag_example.png) | ||
|
||
### ETL/ELT | ||
Stands for **E**xtract, **T**ransform, and **L**oad and **E**xtract, **L**oad, and **T**ransform, respectively. | ||
|
||
**Extract**: Retrieve data from a [source](../integrations/sources), which can be an application, database, anything really. | ||
|
||
**Load**: Move data to your [destination](../integrations/destinations). | ||
|
||
**Transform**: Clean up the data. This is referred to as [normalization](./basic-normalization.md) in Airbyte and involves [deduplication](./connections/incremental-deduped-history.md), changing data types, formats, and more. | ||
|
||
### Full Refresh Sync | ||
|
||
A **Full Refresh Sync** will attempt to retrieve all data from the source every time a sync is run. Then there are two choices, **Overwrite** and **Append**. **Overwrite** deletes the data in the destination before running the sync and **Append** doesn't. | ||
|
||
### Incremental Sync | ||
|
||
An **Incremental Sync** will only retrieve new data from the source when a sync occurs. The first sync will always attempt to retrieve all the data. If the [destination supports it](https://discuss.airbyte.io/t/what-destinations-support-the-incremental-deduped-sync-mode/89), you can have your data deduplicated. Simply put, this just means that if you sync an updated version of a record you've already synced, it will remove the old record. | ||
|
||
### Raw Tables | ||
|
||
Airbyte spits out tables with the prefix `_airbyte_raw_`. This is your replicated data, but the prefix indicates that it's not normalized. If you select basic normalization, Airbyte will create renamed versions without the prefix. | ||
|
||
## Advanced Terms | ||
|
||
### AirbyteCatalog | ||
{% hint style="info" %} | ||
This is only relevant for individuals who want to create a connector. | ||
{% endhint %} | ||
|
||
This refers to how you define the data that you can retrieve from a Source. For example, if you want to retrieve information from an API, the data that you can receive needs to be defined clearly so that Airbyte can have a clear expectation of what endpoints are supported and what the objects that the streams return look like. This is represented as a sort of schema that Airbyte can interpret. Learn more [here](./beginners-guide-to-catalog.md). | ||
|
||
### Airbyte Specification | ||
{% hint style="info" %} | ||
This is only relevant for individuals who want to create a connector. | ||
{% endhint %} | ||
|
||
This refers to the functions that a Source or Destination must implement to successfully retrieve data and load it, respectively. Implementing these functions using the Airbyte Specification makes a Source or Destination work correctly. Learn more [here](./airbyte-specification.md). | ||
|
||
### Temporal | ||
{% hint style="info" %} | ||
This is only relevant for individuals who want to learn about or contribute to our underlying platform. | ||
{% endhint %} | ||
|
||
[Temporal](https://temporal.io/) is a development kit that lets you create workflows, parallelize them, and handle failures/retries gracefully. We use it to reliably schedule each step of the ELT process, and a Temporal service is always deployed with each Airbyte installation. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this in alphabetical order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep done.