Support Nessie catalog #19

Fokko · 2023-10-02T09:46:53Z

Feature Request / Improvement

PyIceberg has added support for glue catalog. We need to have support for Nessie catalog too just like hive, glue, REST catalogs.

Migrated from apache/iceberg#6414

zeddit · 2023-10-19T08:14:07Z

looking forward for this feature to conduct testing.

seunggs · 2024-02-09T17:36:01Z

Any update on supporting nessie catalog?

ajantha-bhat · 2024-02-10T01:29:57Z

@jbonofre might take it up after java 1.5.0 release.

fraibacas · 2024-03-18T17:24:35Z

@ajantha-bhat Any rough idea about when this will be available? thanks!

RobPrat · 2024-03-19T16:35:11Z

I would also like to know if it is estimated to be worked on soon, I'd find it very useful. Thx!

alonahmias · 2024-06-03T17:54:19Z

Hi, we would like to contribute to this issue, is it possible?

Fokko · 2024-06-03T18:12:42Z

It looks like that Nessie has announced REST catalog support. This would make the native Nessie integration redundant.

dimas-b · 2024-06-03T18:22:02Z

ATM, Nessie has Iceberg REST API on main, but it's not released yet.

chayalipy · 2024-06-03T18:44:28Z

Is there a release date?

dimas-b · 2024-06-03T22:25:23Z

It might be best to talk about Nessie releases in the project's Zulip chat (the join link is on projectnessie.org) :)

dimas-b · 2024-06-21T13:43:10Z

Nessie 0.90.2 and later support the Iceberg REST Catalog API.

jbonofre · 2024-06-21T13:46:26Z

I think this issue can be considered like fixed thanks to the REST Catalog API support by Nessie.

Fokko · 2024-06-21T13:54:55Z

@dimas-b Thanks for the update here, and I agree with @jbonofre, let's close this issue!

cee-shubham · 2024-09-27T05:56:13Z

I want to create iceberg tables using pyiceberg and store it in minio store, so for this i have created docker containers for services named as: nessie, minio, dremio
Earlier i was using pyspark and was able to create tables using code:
import pyspark
from pyspark.sql import SparkSession
import os

DEFINE SENSITIVE VARIABLES

NESSIE_URI = "http://nessie:19120/api/v1"
MINIO_ACCESS_KEY = "my_access_key"
MINIO_SECRET_KEY = "my_secret_access_key"

conf = (
pyspark.SparkConf()
.setAppName('app_name')
#packages
.set('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.3.1,org.projectnessie.nessie-integrations:nessie-spark-extensions-3.3_2.12:0.67.0,software.amazon.awssdk:bundle:2.17.178,software.amazon.awssdk:url-connection-client:2.17.178')
#SQL Extensions
.set('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions')
#Configuring Catalog
.set('spark.sql.catalog.nessie', 'org.apache.iceberg.spark.SparkCatalog')
.set('spark.sql.catalog.nessie.uri', NESSIE_URI)
.set('spark.sql.catalog.nessie.ref', 'main')
.set('spark.sql.catalog.nessie.authentication.type', 'NONE')
.set('spark.sql.catalog.nessie.catalog-impl', 'org.apache.iceberg.nessie.NessieCatalog')
.set('spark.sql.catalog.nessie.warehouse', 's3a://warehouse')
.set('spark.sql.catalog.nessie.s3.endpoint', 'http://minio:9000')
.set('spark.sql.catalog.nessie.io-impl', 'org.apache.iceberg.aws.s3.S3FileIO')
#MINIO CREDENTIALS
.set('spark.hadoop.fs.s3a.access.key', MINIO_ACCESS_KEY)
.set('spark.hadoop.fs.s3a.secret.key', MINIO_SECRET_KEY)
)

Start Spark Session

spark = SparkSession.builder.config(conf=conf).getOrCreate()
print("Spark Running")

LOAD A CSV INTO AN SQL VIEW

csv_df = spark.read.format("csv").option("header", "true").load("../datasets/df_open_2023.csv")
csv_df.createOrReplaceTempView("csv_open_2023")

CREATE AN ICEBERG TABLE FROM THE SQL VIEW

spark.sql("CREATE TABLE IF NOT EXISTS nessie.df_open_2023 USING iceberg AS SELECT * FROM csv_open_2023").show()

QUERY THE ICEBERG TABLE

spark.sql("SELECT * FROM nessie.df_open_2023 limit 10").show()

Please tell me how to do it with pyiceberg

XN137 · 2024-09-27T07:50:02Z

Please tell me how to do it with pyiceberg

generally speaking you use the REST catalog
these docs may help:
https://py.iceberg.apache.org/configuration/#rest-catalog
https://kevinjqliu.substack.com/i/147257480/connect-to-the-rest-catalog

running the nessie server:
https://projectnessie.org/guides/iceberg-rest/

XN137 · 2024-09-27T10:22:03Z

RestCatalog class seems to live in pyiceberg.catalog.rest:

iceberg-python/pyiceberg/catalog/rest.py

Line 248 in c30e43a

class RestCatalog(Catalog):

however according to https://py.iceberg.apache.org/api/
one is now supposed to use something like:

from pyiceberg.catalog import load_catalog

catalog = load_catalog("rest", <optional_config_dict>)

cee-shubham · 2024-10-03T09:07:23Z

I encountered an issue while using the load_catalog() method, where it shows the following error:
load_catalog() takes from 0 to 1 positional arguments but 2 were given.

To address this, I attempted to use load_rest("rest", <config_dict>), but I encountered a validation issue in the ConfigResponse model while working with the RestCatalog from PyIceberg. It seems that the defaults and overrides fields are required in the ConfigResponse model, but the Nessie REST API is not responding with these fields as expected.

Even after passing them explicitly in the response, I am still getting a validation error.

sean-pasabi · 2024-10-15T17:43:50Z

@cee-shubham I am having a similar issue. If someone has managed to load a Nessie catalog using pyiceberg's RestCatalog, that would be greatly appreciated.

edgarrmondragon · 2024-10-16T01:31:47Z

@sean-pasabi I was able to get pyiceberg working with REST catalog exposed by Nessie, at least as a proof of concept: https://github.com/edgarrmondragon/-learn-iceberg-nessie

sean-pasabi · 2024-10-16T08:05:47Z

@edgarrmondragon I have a similar .pyiceberg.ymal, but without the token. I am using minio which requires additional work to add some sort of OAuth2 flow, and I would be surprised if that was the issue. Can your example run without the token, or is it required?

cee-shubham · 2024-10-16T10:50:13Z

@edgarrmondragon I have followed your code, and while the namespace and table were successfully created and are visible in the MinIO bucket, I encountered an error when appending data to the table. The error is related to AWS access permissions, specifically an "ACCESS_DENIED" issue during a HeadObject operation. Below is the relevant error message:
OSError: When getting information for key 'demo2/taxi_dataset_f684e603-b914-4f6b-91db-b9f86a2846b3' in bucket 'demobucket': AWS Error ACCESS_DENIED during HeadObject operation: No response body.

sean-pasabi · 2024-10-16T12:48:05Z

Hey @cee-shubham, did you mean @edgarrmondragon, because I haven't given any code?

Fokko mentioned this issue Oct 2, 2023

Python: Support Nessie catalog apache/iceberg#6414

Closed

kevinjqliu mentioned this issue May 14, 2024

PyIceberg Near-Term Roadmap #736

Open

39 tasks

Fokko closed this as completed Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Nessie catalog #19

Support Nessie catalog #19

Fokko commented Oct 2, 2023

zeddit commented Oct 19, 2023

seunggs commented Feb 9, 2024

ajantha-bhat commented Feb 10, 2024

fraibacas commented Mar 18, 2024

RobPrat commented Mar 19, 2024

alonahmias commented Jun 3, 2024

Fokko commented Jun 3, 2024

dimas-b commented Jun 3, 2024

chayalipy commented Jun 3, 2024

dimas-b commented Jun 3, 2024

dimas-b commented Jun 21, 2024

jbonofre commented Jun 21, 2024

Fokko commented Jun 21, 2024

cee-shubham commented Sep 27, 2024

XN137 commented Sep 27, 2024

XN137 commented Sep 27, 2024

cee-shubham commented Oct 3, 2024

sean-pasabi commented Oct 15, 2024 •

edited

Loading

edgarrmondragon commented Oct 16, 2024

sean-pasabi commented Oct 16, 2024

cee-shubham commented Oct 16, 2024 •

edited

Loading

sean-pasabi commented Oct 16, 2024

Support Nessie catalog #19

Support Nessie catalog #19

Comments

Fokko commented Oct 2, 2023

Feature Request / Improvement

zeddit commented Oct 19, 2023

seunggs commented Feb 9, 2024

ajantha-bhat commented Feb 10, 2024

fraibacas commented Mar 18, 2024

RobPrat commented Mar 19, 2024

alonahmias commented Jun 3, 2024

Fokko commented Jun 3, 2024

dimas-b commented Jun 3, 2024

chayalipy commented Jun 3, 2024

dimas-b commented Jun 3, 2024

dimas-b commented Jun 21, 2024

jbonofre commented Jun 21, 2024

Fokko commented Jun 21, 2024

cee-shubham commented Sep 27, 2024

DEFINE SENSITIVE VARIABLES

Start Spark Session

LOAD A CSV INTO AN SQL VIEW

CREATE AN ICEBERG TABLE FROM THE SQL VIEW

QUERY THE ICEBERG TABLE

XN137 commented Sep 27, 2024

XN137 commented Sep 27, 2024

cee-shubham commented Oct 3, 2024

sean-pasabi commented Oct 15, 2024 • edited Loading

edgarrmondragon commented Oct 16, 2024

sean-pasabi commented Oct 16, 2024

cee-shubham commented Oct 16, 2024 • edited Loading

sean-pasabi commented Oct 16, 2024

sean-pasabi commented Oct 15, 2024 •

edited

Loading

cee-shubham commented Oct 16, 2024 •

edited

Loading