Skip to content

Commit

Permalink
docs(blog): adds to file sneak peek blog
Browse files Browse the repository at this point in the history
  • Loading branch information
ksuarez1423 authored and cpcloud committed Mar 9, 2023
1 parent 5a8ffe9 commit 128194f
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
* [Backend Operations Matrix](backends/support_matrix.md)
* [Releases](release_notes.md)
* Blog
* [Ibis Sneak Peek: Writing to Files](blog/ibis-to-file.md)
* [Ibis Sneak Peek: Examples](blog/ibis-examples.md)
* [Maximizing Productivity with Selectors](blog/selectors.md)
* [Ibis + DuckDB + Substrait](blog/ibis_substrait_to_duckdb.md)
Expand Down
56 changes: 56 additions & 0 deletions docs/blog/ibis-to-file.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Ibis Sneak Peek: Writing to Files

**by Kae Suarez**

Ibis 5.0 is coming soon and will offer new functionality and fixes to users. To enhance clarity around this process, we’re sharing a sneak peek into what we’re working on.

In Ibis 4.0, we added the ability to read CSVs and Parquet via the Ibis interface. We felt this was important because, well, the ability to read files is simply necessary, be it on a local scale, legacy data, data not yet in a database, and so on. However, for a user, the natural next question was “can I go ahead and write when I’m done?” The answer was no. We didn’t like that, especially since we do care about file-based use cases.

So, we’ve gone ahead and fixed that for Ibis 5.0.

## Files in, Files out

Before we can write a file, we need data — so let’s read in a file, to start this off:

```python
t = ibis.read_csv(
"https://storage.googleapis.com/ibis-examples/data/penguins.csv.gz"
)
```

Of course, we could just write out, but let’s do an operation first — how about using selectors, which you can read more about [here](https://ibis-project.org/blog/selectors/)? Self-promotion aside, here’s an operation:

```python
expr = (
t.group_by("species")
.mutate(s.across(s.numeric() & ~s.c("year"), (_ - _.mean()) / _.std()))
)
```

Now, finally, time to do the exciting part:

```python
expr.to_parquet("normalized.parquet")
```

Like many things in Ibis, this is as simple and plain-looking as it is important. Being able to create files from Ibis instead of redirecting into other libraries first enables operation at larger scales and fewer steps. Where desired, you can address a backend directly to use its native export functionality — we want to make sure you have the flexibility to use Ibis or the backend as you see fit.

## Wrapping Up

Ibis is an interface tool for analytical engines that can reach scales far beyond a laptop. Files are important to Ibis because:

- Ibis also supports local execution, where files are the standard unit of data — we want to support all our users.
- Files are useful for moving between platforms, and long-term storage that isn’t tied to a particular backend.
- Files can move more easily between our backends than database files, so we think this adds some convenience for the multi-backend use case.

We’re excited to release this functionality in Ibis 5.0.

Interested in Ibis? Docs are available on this very website, at:

- [Ibis Docs](https://ibis-project.org/)

and the repo is always at:

- [Ibis GitHub](https://github.com/ibis-project/ibis)

Please feel free to reach out on GitHub!

0 comments on commit 128194f

Please sign in to comment.