Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(snowflake): add blog post showing insertion into snowflake from postgres #8426

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Feb 22, 2024

Add a blog post showing data movement from postgres -> snowflake

@cpcloud cpcloud added this to the 9.0 milestone Feb 22, 2024
@cpcloud cpcloud added docs Documentation related issues or PRs postgres The PostgreSQL backend snowflake The Snowflake backend labels Feb 22, 2024
@cpcloud
Copy link
Member Author

cpcloud commented Feb 22, 2024

cc @IndexSeek

@cpcloud cpcloud added the docs-preview Add this label to trigger a docs preview label Feb 22, 2024
@ibis-docs-bot ibis-docs-bot bot removed the docs-preview Add this label to trigger a docs preview label Feb 22, 2024
@ibis-docs-bot
Copy link

ibis-docs-bot bot commented Feb 22, 2024

Copy link
Member

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits, nothing blocking.

docs/posts/into-snowflake/index.qmd Outdated Show resolved Hide resolved
docs/posts/into-snowflake/index.qmd Outdated Show resolved Hide resolved
docs/posts/into-snowflake/index.qmd Outdated Show resolved Hide resolved
Copy link
Contributor

@ncclementi ncclementi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of small comments.

docs/posts/into-snowflake/index.qmd Outdated Show resolved Hide resolved
@cpcloud cpcloud added the docs-preview Add this label to trigger a docs preview label Feb 22, 2024
@ibis-docs-bot ibis-docs-bot bot removed the docs-preview Add this label to trigger a docs preview label Feb 22, 2024
Copy link
Contributor

@ncclementi ncclementi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@ibis-docs-bot
Copy link

ibis-docs-bot bot commented Feb 22, 2024

Copy link
Member

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! @cpcloud I'll leave merging this to you in case you'd like to wait for @IndexSeek to chime in.

@sfc-gh-twhite
Copy link

This looks awesome! The one thing I think we could do to help with the column casing requirements of Snowflake would be to use the rename method to apply an ALL_CAPS operation so that it plays nicer with Snowflake.

https://docs.snowflake.com/en/sql-reference/identifiers-syntax#migrating-from-databases-that-treat-double-quoted-identifiers-as-case-insensitive

Maybe somewhere in the section:

We can compute the average RBI per year per team and relabel the columns as Snowflake would resolve them with quoted identifiers.

pg_expr = pg_batting.group_by(("year_id", "team_id")).agg(avg_rbi=_.rbi.mean()).rename("ALL_CAPS")

This would change the outputs in the remaining rich tables.

This can often cause a lot of challenges for people that load with case-insensitive columns. Being able to fix it all like that is so nice. I think that would really drive it home and enable folks to use it!

@cpcloud
Copy link
Member Author

cpcloud commented Feb 23, 2024

@sfc-gh-twhite Happy to adjust!

@IndexSeek
Copy link
Member

@sfc-gh-twhite Happy to adjust!

I think it will be good to go with that and very complete in that case. 👍

I'm fooling around with Quarto, I had heard of it before, I have to say, it's pretty cool!

@lostmygithubaccount
Copy link
Member

is there any rush to merge this in? if so, whenever is fine, but if not might want to wait until early next week to avoid blog saturation and posting on a Friday. or merge today/tomorrow but promote next week

@lostmygithubaccount lostmygithubaccount added the docs-preview Add this label to trigger a docs preview label Feb 23, 2024
@ibis-docs-bot ibis-docs-bot bot removed the docs-preview Add this label to trigger a docs preview label Feb 23, 2024
@ibis-docs-bot
Copy link

ibis-docs-bot bot commented Feb 23, 2024

Copy link
Member

@lostmygithubaccount lostmygithubaccount left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, great stuff!

Comment on lines +25 to +26
We'll connect to a postgres database running locally in a container. You
should be able to swap in your own connection details as needed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be worth mentioning just up postgres if they want to try this themselves? of course in practice they'll have their own database somewhere

Comment on lines 60 to 61
1. Set the `SNOWFLAKE_URL` environment variable to your Snowflake connection
string, which will look like `snowflake://user:pass@account/database/schema?warehouse=my_warehouse`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't really matter but looks like an extra space on the second line

also, given how long this code tip is, perhaps consider a callout block above?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it's more that this doesn't look good when hovered over -- the long code tip below is fine -- because of the long URI here you have to scroll right while hovering over

image

```{python}
pg_batting = pg_con.create_table(
"batting",
ibis.examples.Batting.fetch().to_pandas(), # <1>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pandas instead of pyarrow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

psycopg2 doesn't seem to be buying what we're selling with to_pyarrow() :)

I'll open an issue about it.


### Insert the computed results into Snowflake

Because all of our backends implement the `to_pyarrow()` method, we can

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "of our" -> "Ibis"


## Conclusion

In this post we show how easy it is to move data from one backend into Snowflake using Ibis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the blog post because I think this is an important use case to move data from transactional data stores into an OLAP system. The blog provides an effective way for the batch move. There is a subset of users who will need to keep the OLTP and OLAP in sync with minimal latency, which will require CDC support (we don't have the support yet). It'll be a good idea to allude to this capability and get feedback from the community for this need. Here is a good read to keep MySQL and Iceberg in sync. https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/build-real-time-data-lake-tutorial.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear what you're saying, and I do think this particular need is worth soliciting feedback about but I'd rather not bring in CDC to this blog post. The post is meant to be very focused on a specific use case and show case the simplicity of movement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completely reasonable

@cpcloud cpcloud added the docs-preview Add this label to trigger a docs preview label Feb 23, 2024
@ibis-docs-bot ibis-docs-bot bot removed the docs-preview Add this label to trigger a docs preview label Feb 23, 2024
@ibis-docs-bot
Copy link

ibis-docs-bot bot commented Feb 23, 2024

@cpcloud cpcloud enabled auto-merge (squash) March 6, 2024 14:11
@cpcloud cpcloud merged commit 3a8c7cc into ibis-project:main Mar 6, 2024
15 checks passed
@cpcloud cpcloud deleted the into-snowflake branch March 6, 2024 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blog Posts for the Ibis blog. docs Documentation related issues or PRs postgres The PostgreSQL backend snowflake The Snowflake backend
Projects
Status: done
Development

Successfully merging this pull request may close these issues.

7 participants