check if table exist #406

djouallah · 2024-02-10T11:26:17Z

Feature Request / Improvement

it will be nice to have an API to quickly check if a table exist, or alternatively create table if it does not exist
current I am doing this

try:
   table = catalog.create_table("aemo.price",schema=price.schema)
  except:
   pass
  table = catalog.load_table("aemo.price")
  table.append(price)

The text was updated successfully, but these errors were encountered:

Fokko · 2024-02-13T11:02:04Z

I'm hesitant if we want to add this. I'd rather add the CREATE OR REPLACE semantic.

The following logic will avoid fetching the table when not needed:

try:
    table = catalog.create_table("aemo.price",schema=price.schema)
except:
    table = catalog.load_table("aemo.price")
table.append(price)

I think in your case a CREATE OR REPLACE is more feasible. Otherwise you might append the data twice, right?

djouallah · 2024-02-13T11:09:05Z

I don't want to replace the table, if the table exist leave it as it is , otherwise create a new one, but not replace it.

actually this is what I use with delta

if spark.catalog.tableExists("scada"):

Fokko · 2024-02-13T12:35:33Z

I like the table_exists method 👍

sungwy · 2024-02-13T15:28:33Z

I think table_exists function that @djouallah proposed and the PR that @hussein-awala is working on to support CREATE TABLE IF NOT EXISTS both serve different purposes. And I think that we should support both in PyIceberg:

table_exists:

important if we just want to check that a table exists in a namespace. I'd argue this is the same as calling list_tables and checking if the table exists in the returned list, and hence isn't as critical to implement as 'CREATE TABLE IF NOT EXISTS'
It is however, very simple to implement, and we could just support it

CREATE TABLE IF NOT EXISTS

allows users to deploy an idempotent table creation statement into Production, so that the same code can be run to first create a table, and then ignore the creation of the table henceforth without requiring a code change.
This semantic is different from running table_exists and then invoking create_table sequentially, because CREATE TABLE IF NOT EXISTS is a single call to the catalog. In table_exists + create_table, the two calls are made separately and sequentially, meaning there is a probability that a concurrent process could have created a table, leading to create_table failing, even if table_exists returned False for a given process.
An alternative is just to ask users to try and catch TableAlreadyExistsError in their code when calling create_table

Gowthami03B · 2024-02-15T22:40:09Z

Can I take a stab at the table_exists method proposed here? @Fokko @djouallah @syun64

Fokko · 2024-02-16T11:04:19Z

@Gowthami03B That would be great! 👍

One note here:

important if we just want to check that a table exists in a namespace. I'd argue this is the same as calling list_tables and checking if the table exists in the returned list, and hence isn't as critical to implement as 'CREATE TABLE IF NOT EXISTS'

I think it would make more sense to do an actual load_table instead of calling list_tables, mostly because there is a discussion on the REST spec to add pagination. Calling the list_table would then result in many consecutive requests to build up the list, which is not very performant. For the load_table we load the metadata, but I think that's okay.

Thanks @syun64 for summarizing the options.

An alternative is just to ask users to try and catch TableAlreadyExistsError in their code when calling create_table

I believe this is the most Pythonic way of doing it, but I agree that we could mirror the SQL CREATE IF NOT EXISTS. How about adding this to the Catalog(ABC) itself. This way we don't have to add this logic to each of the implementations:

catalog = load_catalog('default')
catalog.create_table_if_not_exists('schema.table', schema=...)

Thoughts?

hussein-awala mentioned this issue Feb 12, 2024

Feat: Implement create_table_if_not_exists #415

Merged

sungwy assigned Gowthami03B Feb 15, 2024

Fokko closed this as completed in #415 Feb 20, 2024

Fokko mentioned this issue Mar 9, 2024

Add table_exists method to the Catalog #507

Closed

Gowthami03B removed their assignment Mar 11, 2024

kevinjqliu mentioned this issue May 14, 2024

PyIceberg Near-Term Roadmap #736

Open

39 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check if table exist #406

check if table exist #406

djouallah commented Feb 10, 2024

Fokko commented Feb 13, 2024

djouallah commented Feb 13, 2024 •

edited

Loading

Fokko commented Feb 13, 2024

sungwy commented Feb 13, 2024 •

edited

Loading

Gowthami03B commented Feb 15, 2024

Fokko commented Feb 16, 2024

check if table exist #406

check if table exist #406

Comments

djouallah commented Feb 10, 2024

Feature Request / Improvement

Fokko commented Feb 13, 2024

djouallah commented Feb 13, 2024 • edited Loading

Fokko commented Feb 13, 2024

sungwy commented Feb 13, 2024 • edited Loading

Gowthami03B commented Feb 15, 2024

Fokko commented Feb 16, 2024

djouallah commented Feb 13, 2024 •

edited

Loading

sungwy commented Feb 13, 2024 •

edited

Loading