Skip to content

Commit

Permalink
Added an explicit call out about the data and metadata flow differenc…
Browse files Browse the repository at this point in the history
…e between the gateway and the clients.
  • Loading branch information
iddoavn authored Nov 26, 2024
1 parent 7f58e03 commit 9511458
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 0 deletions.
Binary file added docs/assets/img/s3gatewayvsclientdataflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions docs/understand/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,15 @@ Using [lakeFSFileSystem][hadoopfs] increases Spark ETL jobs performance by execu
and all data operations directly through the same underlying object store that lakeFS uses.


## How lakeFS Clients and Gateway Handle Metadata and Data Access


When using the Python client, lakeCTL, or the lakeFS Spark client, these clients communicate with the lakeFS server to retrieve metadata information. For example, they may query lakeFS to understand which version of a file is needed or to track changes in branches and commits. This communication does not include the actual data transfer, but instead involves passing only metadata about data locations and versions.
Once the client knows the exact data location from the lakeFS metadata, it directly accesses the data in the underlying object storage (potentially using presigned URLs) without routing through lakeFS. For instance, if data is stored in S3, the Spark client will retrieve the S3 paths from lakeFS, then directly read and write to those paths in S3 without involving lakeFS in the data transfer.

<img src="{{ site.baseurl }}/assets/img/s3gatewayvsclientdataflow.png" alt="lakeFS Clients vs Gateway Data Flow" width="500px"/>


[data-quality-gates]: {% link understand/use_cases/cicd_for_data.md %}#using-hooks-as-data-quality-gates
[dynamodb-permissions]: {% link howto/deploy/aws.md %}#grant-dynamodb-permissions-to-lakefs
[roadmap]: {% link project/index.md %}#roadmap
Expand Down

0 comments on commit 9511458

Please sign in to comment.