-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(snowflake): add blog post showing insertion into snowflake from postgres #8426
Conversation
cc @IndexSeek |
845a058
to
1668461
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nits, nothing blocking.
9ff31d8
to
c168505
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of small comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! @cpcloud I'll leave merging this to you in case you'd like to wait for @IndexSeek to chime in.
This looks awesome! The one thing I think we could do to help with the column casing requirements of Snowflake would be to use the Maybe somewhere in the section:
This would change the outputs in the remaining rich tables. This can often cause a lot of challenges for people that load with case-insensitive columns. Being able to fix it all like that is so nice. I think that would really drive it home and enable folks to use it! |
@sfc-gh-twhite Happy to adjust! |
10d8618
to
a83c0b5
Compare
I think it will be good to go with that and very complete in that case. 👍 I'm fooling around with Quarto, I had heard of it before, I have to say, it's pretty cool! |
is there any rush to merge this in? if so, whenever is fine, but if not might want to wait until early next week to avoid blog saturation and posting on a Friday. or merge today/tomorrow but promote next week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, great stuff!
We'll connect to a postgres database running locally in a container. You | ||
should be able to swap in your own connection details as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be worth mentioning just up postgres
if they want to try this themselves? of course in practice they'll have their own database somewhere
docs/posts/into-snowflake/index.qmd
Outdated
1. Set the `SNOWFLAKE_URL` environment variable to your Snowflake connection | ||
string, which will look like `snowflake://user:pass@account/database/schema?warehouse=my_warehouse`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't really matter but looks like an extra space on the second line
also, given how long this code tip is, perhaps consider a callout block above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```{python} | ||
pg_batting = pg_con.create_table( | ||
"batting", | ||
ibis.examples.Batting.fetch().to_pandas(), # <1> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why pandas instead of pyarrow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
psycopg2 doesn't seem to be buying what we're selling with to_pyarrow()
:)
I'll open an issue about it.
docs/posts/into-snowflake/index.qmd
Outdated
|
||
### Insert the computed results into Snowflake | ||
|
||
Because all of our backends implement the `to_pyarrow()` method, we can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "of our" -> "Ibis"
|
||
## Conclusion | ||
|
||
In this post we show how easy it is to move data from one backend into Snowflake using Ibis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the blog post because I think this is an important use case to move data from transactional data stores into an OLAP system. The blog provides an effective way for the batch move. There is a subset of users who will need to keep the OLTP and OLAP in sync with minimal latency, which will require CDC support (we don't have the support yet). It'll be a good idea to allude to this capability and get feedback from the community for this need. Here is a good read to keep MySQL and Iceberg in sync. https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/build-real-time-data-lake-tutorial.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear what you're saying, and I do think this particular need is worth soliciting feedback about but I'd rather not bring in CDC to this blog post. The post is meant to be very focused on a specific use case and show case the simplicity of movement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completely reasonable
a83c0b5
to
78a2b95
Compare
Add a blog post showing data movement from postgres -> snowflake