docs(snowflake): add blog post showing insertion into snowflake from postgres #8426

cpcloud · 2024-02-22T16:22:25Z

Add a blog post showing data movement from postgres -> snowflake

cpcloud · 2024-02-22T16:22:58Z

cc @IndexSeek

ibis-docs-bot · 2024-02-22T16:44:28Z

Docs preview: https://pr-8426-16684618c287f22b74b0e721e12e4e4545021c10--ibis-quarto.netlify.app

gforsyth

small nits, nothing blocking.

docs/posts/into-snowflake/index.qmd

ncclementi

Just a couple of small comments.

docs/posts/into-snowflake/index.qmd

ncclementi

LGTM! 🚀

ibis-docs-bot · 2024-02-22T17:42:05Z

Docs preview: https://pr-8426-10d861844f71bb8aa23c7a8a963216c3c81bafef--ibis-quarto.netlify.app

gforsyth

Looks good! @cpcloud I'll leave merging this to you in case you'd like to wait for @IndexSeek to chime in.

sfc-gh-twhite · 2024-02-22T21:41:13Z

This looks awesome! The one thing I think we could do to help with the column casing requirements of Snowflake would be to use the rename method to apply an ALL_CAPS operation so that it plays nicer with Snowflake.

https://docs.snowflake.com/en/sql-reference/identifiers-syntax#migrating-from-databases-that-treat-double-quoted-identifiers-as-case-insensitive

Maybe somewhere in the section:

We can compute the average RBI per year per team and relabel the columns as Snowflake would resolve them with quoted identifiers.

pg_expr = pg_batting.group_by(("year_id", "team_id")).agg(avg_rbi=_.rbi.mean()).rename("ALL_CAPS")

This would change the outputs in the remaining rich tables.

This can often cause a lot of challenges for people that load with case-insensitive columns. Being able to fix it all like that is so nice. I think that would really drive it home and enable folks to use it!

cpcloud · 2024-02-23T00:11:42Z

@sfc-gh-twhite Happy to adjust!

IndexSeek · 2024-02-23T00:22:08Z

@sfc-gh-twhite Happy to adjust!

I think it will be good to go with that and very complete in that case. 👍

I'm fooling around with Quarto, I had heard of it before, I have to say, it's pretty cool!

lostmygithubaccount · 2024-02-23T00:42:19Z

is there any rush to merge this in? if so, whenever is fine, but if not might want to wait until early next week to avoid blog saturation and posting on a Friday. or merge today/tomorrow but promote next week

ibis-docs-bot · 2024-02-23T00:56:45Z

Docs preview: https://pr-8426-a83c0b5e96d47ddaea9d8f98df6ca2431738d3bc--ibis-quarto.netlify.app

lostmygithubaccount

minor comments, great stuff!

lostmygithubaccount · 2024-02-23T01:05:51Z

docs/posts/into-snowflake/index.qmd

+We'll connect to a postgres database running locally in a container. You
+should be able to swap in your own connection details as needed.


would it be worth mentioning just up postgres if they want to try this themselves? of course in practice they'll have their own database somewhere

lostmygithubaccount · 2024-02-23T01:06:57Z

docs/posts/into-snowflake/index.qmd

+1. Set the `SNOWFLAKE_URL` environment variable to your Snowflake connection
+   string, which will look like `snowflake://user:pass@account/database/schema?warehouse=my_warehouse`.


doesn't really matter but looks like an extra space on the second line

also, given how long this code tip is, perhaps consider a callout block above?

actually it's more that this doesn't look good when hovered over -- the long code tip below is fine -- because of the long URI here you have to scroll right while hovering over

lostmygithubaccount · 2024-02-23T01:07:13Z

docs/posts/into-snowflake/index.qmd

+```{python}
+pg_batting = pg_con.create_table(
+    "batting",
+    ibis.examples.Batting.fetch().to_pandas(),  # <1>


why pandas instead of pyarrow?

psycopg2 doesn't seem to be buying what we're selling with to_pyarrow() :)

I'll open an issue about it.

lostmygithubaccount · 2024-02-23T01:08:12Z

docs/posts/into-snowflake/index.qmd

+
+### Insert the computed results into Snowflake
+
+Because all of our backends implement the `to_pyarrow()` method, we can


nit: "of our" -> "Ibis"

zhenzhongxu · 2024-02-23T06:35:40Z

docs/posts/into-snowflake/index.qmd

+
+## Conclusion
+
+In this post we show how easy it is to move data from one backend into Snowflake using Ibis.


I love the blog post because I think this is an important use case to move data from transactional data stores into an OLAP system. The blog provides an effective way for the batch move. There is a subset of users who will need to keep the OLTP and OLAP in sync with minimal latency, which will require CDC support (we don't have the support yet). It'll be a good idea to allude to this capability and get feedback from the community for this need. Here is a good read to keep MySQL and Iceberg in sync. https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/build-real-time-data-lake-tutorial.html

I hear what you're saying, and I do think this particular need is worth soliciting feedback about but I'd rather not bring in CDC to this blog post. The post is meant to be very focused on a specific use case and show case the simplicity of movement.

completely reasonable

ibis-docs-bot · 2024-02-23T16:03:30Z

Docs preview: https://pr-8426-78a2b95cd13979225e58842b07488da33f42a71d--ibis-quarto.netlify.app

docs/posts/into-snowflake/index.qmd

…postgres

cpcloud added this to the 9.0 milestone Feb 22, 2024

cpcloud added docs Documentation related issues or PRs postgres The PostgreSQL backend snowflake The Snowflake backend labels Feb 22, 2024

cpcloud requested a review from lostmygithubaccount February 22, 2024 16:22

cpcloud force-pushed the into-snowflake branch from 845a058 to 1668461 Compare February 22, 2024 16:25

cpcloud added the docs-preview Add this label to trigger a docs preview label Feb 22, 2024

ibis-docs-bot bot removed the docs-preview Add this label to trigger a docs preview label Feb 22, 2024

gforsyth reviewed Feb 22, 2024

View reviewed changes

docs/posts/into-snowflake/index.qmd Outdated Show resolved Hide resolved

docs/posts/into-snowflake/index.qmd Outdated Show resolved Hide resolved

docs/posts/into-snowflake/index.qmd Outdated Show resolved Hide resolved

cpcloud force-pushed the into-snowflake branch from 9ff31d8 to c168505 Compare February 22, 2024 17:10

ncclementi reviewed Feb 22, 2024

View reviewed changes

docs/posts/into-snowflake/index.qmd Outdated Show resolved Hide resolved

ncclementi reviewed Feb 22, 2024

View reviewed changes

docs/posts/into-snowflake/index.qmd Outdated Show resolved Hide resolved

cpcloud requested review from gforsyth and ncclementi February 22, 2024 17:23

cpcloud added the docs-preview Add this label to trigger a docs preview label Feb 22, 2024

ibis-docs-bot bot removed the docs-preview Add this label to trigger a docs preview label Feb 22, 2024

ncclementi approved these changes Feb 22, 2024

View reviewed changes

gforsyth approved these changes Feb 22, 2024

View reviewed changes

cpcloud force-pushed the into-snowflake branch from 10d8618 to a83c0b5 Compare February 23, 2024 00:17

lostmygithubaccount added the docs-preview Add this label to trigger a docs preview label Feb 23, 2024

ibis-docs-bot bot removed the docs-preview Add this label to trigger a docs preview label Feb 23, 2024

lostmygithubaccount approved these changes Feb 23, 2024

View reviewed changes

zhenzhongxu reviewed Feb 23, 2024

View reviewed changes

cpcloud force-pushed the into-snowflake branch from a83c0b5 to 78a2b95 Compare February 23, 2024 15:46

cpcloud added the docs-preview Add this label to trigger a docs preview label Feb 23, 2024

ibis-docs-bot bot removed the docs-preview Add this label to trigger a docs preview label Feb 23, 2024

lostmygithubaccount mentioned this pull request Feb 25, 2024

feat(duckdb/pyspark): load data from duckdb into pyspark #8440

Closed

1 task

lostmygithubaccount added the blog Posts for the Ibis blog. label Feb 26, 2024

lostmygithubaccount requested changes Mar 6, 2024

View reviewed changes

docs/posts/into-snowflake/index.qmd Outdated Show resolved Hide resolved

docs(snowflake): add blog post showing insertion into snowflake from …

b59f63b

…postgres

cpcloud force-pushed the into-snowflake branch from 78a2b95 to b59f63b Compare March 6, 2024 14:04

cpcloud requested a review from lostmygithubaccount March 6, 2024 14:11

cpcloud enabled auto-merge (squash) March 6, 2024 14:11

lostmygithubaccount approved these changes Mar 6, 2024

View reviewed changes

cpcloud merged commit 3a8c7cc into ibis-project:main Mar 6, 2024
15 checks passed

cpcloud deleted the into-snowflake branch March 6, 2024 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(snowflake): add blog post showing insertion into snowflake from postgres #8426

docs(snowflake): add blog post showing insertion into snowflake from postgres #8426

cpcloud commented Feb 22, 2024

cpcloud commented Feb 22, 2024

ibis-docs-bot bot commented Feb 22, 2024

gforsyth left a comment

ncclementi left a comment

ncclementi left a comment

ibis-docs-bot bot commented Feb 22, 2024

gforsyth left a comment

sfc-gh-twhite commented Feb 22, 2024

cpcloud commented Feb 23, 2024

IndexSeek commented Feb 23, 2024

lostmygithubaccount commented Feb 23, 2024

ibis-docs-bot bot commented Feb 23, 2024

lostmygithubaccount left a comment

lostmygithubaccount Feb 23, 2024

lostmygithubaccount Feb 23, 2024

lostmygithubaccount Feb 23, 2024

lostmygithubaccount Feb 23, 2024

cpcloud Feb 23, 2024

lostmygithubaccount Feb 23, 2024

zhenzhongxu Feb 23, 2024

cpcloud Feb 23, 2024

zhenzhongxu Feb 23, 2024

ibis-docs-bot bot commented Feb 23, 2024

		We'll connect to a postgres database running locally in a container. You
		should be able to swap in your own connection details as needed.

		1. Set the `SNOWFLAKE_URL` environment variable to your Snowflake connection
		string, which will look like `snowflake://user:pass@account/database/schema?warehouse=my_warehouse`.


		### Insert the computed results into Snowflake

		Because all of our backends implement the `to_pyarrow()` method, we can


		## Conclusion

		In this post we show how easy it is to move data from one backend into Snowflake using Ibis.

docs(snowflake): add blog post showing insertion into snowflake from postgres #8426

docs(snowflake): add blog post showing insertion into snowflake from postgres #8426

Conversation

cpcloud commented Feb 22, 2024

cpcloud commented Feb 22, 2024

ibis-docs-bot bot commented Feb 22, 2024

gforsyth left a comment

Choose a reason for hiding this comment

ncclementi left a comment

Choose a reason for hiding this comment

ncclementi left a comment

Choose a reason for hiding this comment

ibis-docs-bot bot commented Feb 22, 2024

gforsyth left a comment

Choose a reason for hiding this comment

sfc-gh-twhite commented Feb 22, 2024

cpcloud commented Feb 23, 2024

IndexSeek commented Feb 23, 2024

lostmygithubaccount commented Feb 23, 2024

ibis-docs-bot bot commented Feb 23, 2024

lostmygithubaccount left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibis-docs-bot bot commented Feb 23, 2024