-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: a user/organization testimonials or similar page #7341
Comments
cc-ing a few folks that might be interested in contributing |
Sure. Give me a template to fill in and guidelines on length, etc and I'll fill it in |
Most of my work recently has been in the context of using Ibis to build the BigQuery DataFrames interface. Happy to highlight that work if you feel it'd be useful, though quite a bit different from the typical target user I'd think. |
My story around pyspark -> trying a bunch of stuff -> Ibis, which has feature of lazy computation. Our company has implemented an OLAP platform with its persistence layer on hdfs and the query engine being Presto. Typically, the OLAP platform is geared towards agile analysis, and its table structure is based on an event-driven model. As we delve deeper into machine learning modeling, we often need to transition from this event-based structure to a wide-table feature construction. Back between 2019 and 2020, I worked on a similar OLAP platform during my tenure at Tencent. I developed some generic analysis model tools, and at that time, the query engine was Impala. My approach was to dynamically concatenate SQL, which unfortunately was not conducive to code encapsulation, modularization, and future maintenance. In my pursuit of better code encapsulation and to decouple different parts of logic, I was initially inclined to use PySpark. However, when PySpark connects to Presto via JDBC, if we use the dataframe interface, the aggregation operations run on Spark. This doesn't harness the full power of Presto, leading to slow performances. On the other hand, if we use Spark's SQL interface, aggregation is processed on Presto. But in doing so, we lose the original intent of using Spark - which is better code encapsulation and the decoupling of different processes. The dataframe interface of Ibis and its feature of lazy computation perfectly align with my needs. In fact, back in 2019, I was on the hunt for such a tool. Sadly, I didn't come across Ibis at that time and even contemplated creating a set on my own. |
fyi have opened a PR for this: #7897 |
## Description of changes add a user testimonials page. three recent examples, we can add more ## Issues closed closes #7341
Please describe the issue
on the front page, or a new page, or somewhere else, consider user testimonials from GitHub and other places in the community for why they've chosen Ibis. many have a similar story around pandas -> trying a bunch of stuff -> Ibis and being happy with it
Code of Conduct
The text was updated successfully, but these errors were encountered: