186 Branches 6 Tags

This branch is 4 commits ahead of, 68 commits behind apache/gobblin:master.

Name	Name	Last commit message	Last commit date
Latest commit arjun4084346 fix bug Jul 23, 2024 546dc74 · Jul 23, 2024 History 6,398 Commits
.github	.github	[GOBBLIN-2098]Update the test Mysql version, and getting rid of com.w…	Jun 26, 2024
bin	bin	[GOBBLIN1893] Upgrade Guava to 21.0.0 (apache#3757 )	Aug 31, 2023
buildSrc/src/main/groovy/org/apache/gobblin/gradle	buildSrc/src/main/groovy/org/apache/gobblin/gradle	Updated package names, imports and shell scripts	Jul 31, 2017
conf	conf	[GOBBLIN1893] Upgrade Guava to 21.0.0 (apache#3757 )	Aug 31, 2023
config/checkstyle	config/checkstyle	[GOBBLIN-1480] Cleanup unused imports and enable checkstyle for them (a…	Jul 1, 2021
dev	dev	[GOBBLIN-1385] Renaming references from incubator gobblin to gobblin	Feb 9, 2021
gobblin-admin	gobblin-admin	[GOBBLIN-1807] Replaces conjars.org with conjars.wensel.net (apache#3668	Apr 5, 2023
gobblin-all	gobblin-all	[GOBBLIN-1998] Add Gobblin-temporal to gobblin-all (apache#3873 )	Feb 5, 2024
gobblin-api	gobblin-api	[GOBBLIN-2115] implement DagNodeStateStore (apache#3999 )	Jul 18, 2024
gobblin-audit	gobblin-audit	[GOBBLIN-1293] Revert " Upgrades to gradle 5"	Oct 23, 2020
gobblin-aws	gobblin-aws	[GOBBLIN1893] Upgrade Guava to 21.0.0 (apache#3757 )	Aug 31, 2023
gobblin-binary-management	gobblin-binary-management	Exclude transitive dependency	Jun 9, 2021
gobblin-cluster	gobblin-cluster	[GOBBLIN-2052] release those containers which are running helix task …	May 30, 2024
gobblin-compaction	gobblin-compaction	[GOBBLIN-2105] Ensure the destination path does not exist before rena…	Jul 11, 2024
gobblin-completeness	gobblin-completeness	GOBBLIN-1933]Change the logic in completeness verifier to support mul…	Oct 28, 2023
gobblin-config-management	gobblin-config-management	[GOBBLIN-1635] Avoid loading env configuration when using config stor…	Apr 27, 2022
gobblin-core-base	gobblin-core-base	[GOBBLIN-2040] Abstract comparable watermark (apache#3919 )	Apr 19, 2024
gobblin-core	gobblin-core	Expose functions to fetch record partitionColumn value (apache#3810 )	Oct 26, 2023
gobblin-data-management	gobblin-data-management	Update Hive retention to add table location to retention dataset root…	Jul 18, 2024
gobblin-distribution	gobblin-distribution	[GOBBLIN-1726] Avro 1.9 upgrade of Gobblin OSS (apache#3581 )	Oct 18, 2022
gobblin-docker	gobblin-docker	[GOBBLIN-1357] Updates the cluster to use up-to-date images and non-d…	Jan 22, 2021
gobblin-docs	gobblin-docs	[GOBBLIN-1749] Add dependency for handling xz-compressed Avro file (a…	Feb 14, 2023
gobblin-example	gobblin-example	[GOBBLIN-1767] Update references to deprecated Mysql connector/j driv…	Jan 18, 2023
gobblin-hive-registration	gobblin-hive-registration	[GOBBLIN1893] Upgrade Guava to 21.0.0 (apache#3757 )	Aug 31, 2023
gobblin-iceberg	gobblin-iceberg	make TableMetadata constructor public (apache#3957 )	Jun 3, 2024
gobblin-kubernetes/gobblin-service	gobblin-kubernetes/gobblin-service	[GOBBLIN-1357] Updates the cluster to use up-to-date images and non-d…	Jan 22, 2021
gobblin-metastore	gobblin-metastore	[GOBBLIN-2115] implement DagNodeStateStore (apache#3999 )	Jul 18, 2024
gobblin-metrics-libs	gobblin-metrics-libs	[GOBBLIN-2117] Initialize metrics map for DagProcEngineMetrics (apach…	Jul 22, 2024
gobblin-modules	gobblin-modules	Implement dagProcessingEngine metrics (apache#3983 )	Jul 19, 2024
gobblin-oozie/src/test/resources	gobblin-oozie/src/test/resources	[GOBBLIN-222] Fix silent failure for loading incompatible state-store	Aug 29, 2017
gobblin-rest-service	gobblin-rest-service	[GOBBLIN-2098]Update the test Mysql version, and getting rid of com.w…	Jun 26, 2024
gobblin-restli	gobblin-restli	[GOBBLIN-2102]Concurrent flow status check fix (apache#3989 )	Jun 28, 2024
gobblin-runtime-hadoop	gobblin-runtime-hadoop	[GOBBLIN-1726] Avro 1.9 upgrade of Gobblin OSS (apache#3581 )	Oct 18, 2022
gobblin-runtime	gobblin-runtime	[GOBBLIN-2108] fix quartz not able to create non static inner class, …	Jul 15, 2024
gobblin-salesforce	gobblin-salesforce	[GOBBLIN-1917] Logging updates for Salesforce classes (apache#3786 )	Sep 22, 2023
gobblin-service	gobblin-service	fix bug	Jul 23, 2024
gobblin-temporal	gobblin-temporal	[GOBBLIN-2085] Increase hard-coded `startToCloseTimeout` for `Execute…	Jun 12, 2024
gobblin-test-harness	gobblin-test-harness	[GOBBLIN-1312][GOBBLIN-1318] Bumping parquet lib to 1.11.1 to remove …	Nov 18, 2020
gobblin-test-utils	gobblin-test-utils	[GOBBLIN-1670] Remove rat tasks and unneeded checkstyles blocking bui…	Jul 21, 2022
gobblin-test/resource	gobblin-test/resource	Changed package from gobblin to org.apache.gobblin in docs and pull f…	Jul 31, 2017
gobblin-tunnel	gobblin-tunnel	[GOBBLIN-2083] set max connections to 2000 in mysql started through t…	Jun 11, 2024
gobblin-utility	gobblin-utility	[GOBBLIN-2110]Made retry_exception_predicate configurable in RetryerF…	Jul 17, 2024
gobblin-yarn	gobblin-yarn	[GOBBLIN-2052] release those containers which are running helix task …	May 30, 2024
gradle	gradle	update version for gradle-nexus-plugin and build-info-extractor-gradle (	Jul 16, 2024
ligradle/findbugs	ligradle/findbugs	[GOBBLIN-1897] Fix the findBugsMain complaining redundant null check …	Sep 6, 2023
maven-nexus	maven-nexus	[GOBBLIN-1335] Publish GMCE(GobblinMetadataChangeEvent) publisher and…	Feb 4, 2021
maven-sonatype	maven-sonatype	[GOBBLIN-1335] Publish GMCE(GobblinMetadataChangeEvent) publisher and…	Feb 4, 2021
.asf.yaml	.asf.yaml	[GOBBLIN-1349] Bring label size under Apache requirements	Jan 3, 2021
.codecov_bash	.codecov_bash	[GOBBLIN-821] Adding Codecov	Jul 15, 2019
.dockerignore	.dockerignore	[GOBBLIN-1346] removes old docker images, updates docs and only tag o…	Jan 1, 2021
.gitignore	.gitignore	[GOBBLIN-1372] Generalization of GobblinClusterUtils#setSystemProperties	Jan 26, 2021
CHANGELOG.md	CHANGELOG.md	Update CHANGELOG to reflect changes in 0.17.0	Jun 14, 2023
FlowTriggerHandlerTest.java	FlowTriggerHandlerTest.java	[GOBBLIN-1884] Delete Dag Action After Loading from Store Upon Startup (	Aug 22, 2023
HEADER	HEADER	[GOBBLIN-355] Add HEADER as per release process	Jan 3, 2018
LICENSE	LICENSE	Add GLYPHICONS Halflings license reference in LICENSE file	Jul 2, 2018
NOTICE	NOTICE	[GOBBLIN-1245] Update CHANGELOG and NOTICE files in preparation for 0.…	Aug 21, 2020
README.md	README.md	[GOBBLIN-1612] Add description about downloading gradle wrapper (apac…	Jan 19, 2023
build.gradle	build.gradle	update version for gradle-nexus-plugin and build-info-extractor-gradle (	Jul 16, 2024
defaultEnvironment.gradle	defaultEnvironment.gradle	[GOBBLIN-1807] Replaces conjars.org with conjars.wensel.net (apache#3668	Apr 5, 2023
gobblin-flavored-build.gradle	gobblin-flavored-build.gradle	Changed license to Apache 2.0 in source files for incubation	Jan 6, 2017
gradle.properties	gradle.properties	Reserving 0.18.0 version for next release	Jun 14, 2023
gradlew	gradlew	[GOBBLIN-563] Upgrade to gradle 4.x	Aug 15, 2018
gradlew.bat	gradlew.bat	[GOBBLIN-563] Upgrade to gradle 4.x	Aug 15, 2018
mkdocs.yml	mkdocs.yml	[GOBBLIN-1354] Change homepage link in documentation to index.md for …	Apr 17, 2021
query_github_issues.py	query_github_issues.py	[GOBBLIN-577] pep-0020 - Readability counts	Sep 10, 2018
readthedocs.yml	readthedocs.yml	Initial commit for mkdocs and readthedocs integration	Mar 9, 2016
settings.gradle	settings.gradle	[GOBBLIN-1915] Gobblin on Temporal proof of concept implementation (a…	Sep 22, 2023

Repository files navigation

Apache Gobblin

Apache Gobblin is a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems.

Capabilities

Ingestion and export of data from a variety of sources and sinks into and out of the data lake. Gobblin is optimized and designed for ELT patterns with inline transformations on ingest (small t).
Data Organization within the lake (e.g. compaction, partitioning, deduplication)
Lifecycle Management of data within the lake (e.g. data retention)
Compliance Management of data across the ecosystem (e.g. fine-grain data deletions)

Highlights

Battle tested at scale: Runs in production at petabyte-scale at companies like LinkedIn, PayPal, Verizon etc.
Feature rich: Supports task partitioning, state management for incremental processing, atomic data publishing, data quality checking, job scheduling, fault tolerance etc.
Supports stream and batch execution modes
Control Plane (Gobblin-as-a-service) supports programmatic triggering and orchestration of data plane operations.

Common Patterns used in production

Stream / Batch ingestion of Kafka to Data Lake (HDFS, S3, ADLS)
Bulk-loading serving stores from the Data Lake (e.g. HDFS -> Couchbase)
Support for data sync across Federated Data Lake (HDFS <-> HDFS, HDFS <-> S3, S3 <-> ADLS)
Integrate external vendor API-s (e.g. Salesforce, Dynamics etc.) with data store (HDFS, Couchbase etc)
Enforcing Data retention policies and GDPR deletion on HDFS / ADLS

Apache Gobblin is NOT

A general purpose data transformation engine like Spark or Flink. Gobblin can delegate complex-data processing tasks to Spark, Hive etc.
A data storage system like Apache Kafka or HDFS. Gobblin integrates with these systems as sources or sinks.
A general-purpose workflow execution system like Airflow, Azkaban, Dagster, Luigi.

Requirements

Java >= 1.8

If building the distribution with tests turned on:

Maven version 3.5.3

Instructions to download gradle wrapper

If you are going to build Gobblin from the source distribution, run the following command for downloading the gradle-wrapper.jar from Gobblin git repository to gradle/wrapper directory (replace GOBBLIN_VERSION in the URL with the version you downloaded).

wget --no-check-certificate -P gradle/wrapper https://github.com/apache/gobblin/raw/${GOBBLIN_VERSION}/gradle/wrapper/gradle-wrapper.jar

(or)

curl --insecure -L https://github.com/apache/gobblin/raw/${GOBBLIN_VERSION}/gradle/wrapper/gradle-wrapper.jar > gradle/wrapper/gradle-wrapper.jar

Alternatively, you can download it manually from: https://github.com/apache/gobblin/blob/${GOBBLIN_VERSION}/gradle/wrapper/gradle-wrapper.jar

Make sure that you download it to gradle/wrapper directory.

Instructions to run Apache RAT (Release Audit Tool)

Extract the archive file to your local directory.
Run ./gradlew rat. Report will be generated under build/rat/rat-report.html

Instructions to build the distribution

Extract the archive file to your local directory.
Skip tests and build the distribution: Run ./gradlew build -x findbugsMain -x test -x rat -x checkstyleMain The distribution will be created in build/gobblin-distribution/distributions directory. (or)
Run tests and build the distribution (requires Maven): Run ./gradlew build The distribution will be created in build/gobblin-distribution/distributions directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Gobblin

Capabilities

Highlights

Common Patterns used in production

Apache Gobblin is NOT

Requirements

Instructions to download gradle wrapper

Instructions to run Apache RAT (Release Audit Tool)

Instructions to build the distribution

Quick Links

About

Releases

Packages

Languages

License

arjun4084346/gobblin

Folders and files

Latest commit

History

Repository files navigation

Apache Gobblin

Capabilities

Highlights

Common Patterns used in production

Apache Gobblin is NOT

Requirements

Instructions to download gradle wrapper

Instructions to run Apache RAT (Release Audit Tool)

Instructions to build the distribution

Quick Links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages