From 1527655b9cbf455ecc82757d19fa78184415db3a Mon Sep 17 00:00:00 2001 From: Phillip Cloud <417981+cpcloud@users.noreply.github.com> Date: Thu, 31 Aug 2023 09:39:09 -0400 Subject: [PATCH] docs(blog): add tags to blog posts --- .../posts/ibis-examples/index/execute-results/html.json | 4 ++-- .../posts/selectors/index/execute-results/html.json | 8 +++++--- docs/posts/campaign-finance/index.qmd | 4 ++++ docs/posts/ci-analysis/index.qmd | 4 ++++ docs/posts/ffill-and-bfill-using-ibis/index.qmd | 2 ++ docs/posts/ibis-analytics/index.qmd | 4 +++- docs/posts/ibis-examples/index.qmd | 2 ++ docs/posts/ibis-to-file/index.qmd | 3 +++ docs/posts/ibis_substrait_to_duckdb/index.qmd | 3 +++ docs/posts/selectors/index.qmd | 3 +++ docs/posts/torch/index.qmd | 4 ++++ 11 files changed, 35 insertions(+), 6 deletions(-) diff --git a/docs/_freeze/posts/ibis-examples/index/execute-results/html.json b/docs/_freeze/posts/ibis-examples/index/execute-results/html.json index 7c7a6dce63c0..7f240b300b33 100644 --- a/docs/_freeze/posts/ibis-examples/index/execute-results/html.json +++ b/docs/_freeze/posts/ibis-examples/index/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "8b6f5258426c33dcdd642ed7e29b303b", + "hash": "bf567943230271e22d981edfc997116e", "result": { - "markdown": "---\ntitle: \"Ibis sneak peek: examples\"\nauthor: Kae Suarez\ndate: 2023-03-08\ncategories:\n - blog\n---\n\nIbis has been moving quickly to provide a powerful but easy-to-use interface for interacting with analytical engines. However, as we’re approaching the 5.0 release of Ibis, we’ve realized that moving from not knowing Ibis to writing a first expression is not trivial.\n\nAs is, in our tutorial structure, work must be done on the user’s part — though we do provide the commands — to download a SQLite database onto disk, which can only be used with said backend. We feel that this put too much emphasis on a single backend, and added too much effort into picking the right backend for the first tutorial. We want minimal steps between users and learning the Ibis API.\n\nThis is why we’ve added the `ibis.examples` module.\n\n## Getting Started with Examples\n\nThis module offers in-Ibis access to multiple small tables (the largest is around only 30k rows), which are downloaded when requested and immediately read into the backend upon completion. We worked to keep pulling in examples simple, so it looks like this:\n\n::: {#eec00f31 .cell execution_count=1}\n``` {.python .cell-code}\nimport ibis\nimport ibis.examples as ex\n\nibis.options.interactive = True\n\nt = ex.penguins.fetch()\nt\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64int64int64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen39.118.71813750male  2007 │\n│ Adelie Torgersen39.517.41863800female2007 │\n│ Adelie Torgersen40.318.01953250female2007 │\n│ Adelie TorgersennannanNULLNULLNULL2007 │\n│ Adelie Torgersen36.719.31933450female2007 │\n│ Adelie Torgersen39.320.61903650male  2007 │\n│ Adelie Torgersen38.917.81813625female2007 │\n│ Adelie Torgersen39.219.61954675male  2007 │\n│ Adelie Torgersen34.118.11933475NULL2007 │\n│ Adelie Torgersen42.020.21904250NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\nAnother advantage of this new method is that we were able to register all of them so you can tab-complete, as you can see here:\n\n![Tab Complete](./tab_complete.png)\n\nOnce you’ve retrieved an example table, you can get straight to learning and experimenting, instead of struggling with just getting the data itself.\n\nIn the future, our tutorials will use the _examples_ module to to help speed up learning of the Ibis framework.\n\nInterested in Ibis? Docs are available on this very website, at:\n\n- [Ibis Docs](https://ibis-project.org/)\n\nand the repo is always at:\n\n- [Ibis GitHub](https://github.com/ibis-project/ibis)\n\nPlease feel free to reach out on GitHub!\n\n", + "markdown": "---\ntitle: \"Ibis sneak peek: examples\"\nauthor: Kae Suarez\ndate: 2023-03-08\ncategories:\n - blog\n - new feature\n - sneak peek\n---\n\nIbis has been moving quickly to provide a powerful but easy-to-use interface for interacting with analytical engines. However, as we’re approaching the 5.0 release of Ibis, we’ve realized that moving from not knowing Ibis to writing a first expression is not trivial.\n\nAs is, in our tutorial structure, work must be done on the user’s part — though we do provide the commands — to download a SQLite database onto disk, which can only be used with said backend. We feel that this put too much emphasis on a single backend, and added too much effort into picking the right backend for the first tutorial. We want minimal steps between users and learning the Ibis API.\n\nThis is why we’ve added the `ibis.examples` module.\n\n## Getting Started with Examples\n\nThis module offers in-Ibis access to multiple small tables (the largest is around only 30k rows), which are downloaded when requested and immediately read into the backend upon completion. We worked to keep pulling in examples simple, so it looks like this:\n\n::: {#f1971251 .cell execution_count=1}\n``` {.python .cell-code}\nimport ibis\nimport ibis.examples as ex\n\nibis.options.interactive = True\n\nt = ex.penguins.fetch()\nt\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64int64int64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen39.118.71813750male  2007 │\n│ Adelie Torgersen39.517.41863800female2007 │\n│ Adelie Torgersen40.318.01953250female2007 │\n│ Adelie TorgersennannanNULLNULLNULL2007 │\n│ Adelie Torgersen36.719.31933450female2007 │\n│ Adelie Torgersen39.320.61903650male  2007 │\n│ Adelie Torgersen38.917.81813625female2007 │\n│ Adelie Torgersen39.219.61954675male  2007 │\n│ Adelie Torgersen34.118.11933475NULL2007 │\n│ Adelie Torgersen42.020.21904250NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\nAnother advantage of this new method is that we were able to register all of them so you can tab-complete, as you can see here:\n\n![Tab Complete](./tab_complete.png)\n\nOnce you’ve retrieved an example table, you can get straight to learning and experimenting, instead of struggling with just getting the data itself.\n\nIn the future, our tutorials will use the _examples_ module to to help speed up learning of the Ibis framework.\n\nInterested in Ibis? Docs are available on this very website, at:\n\n- [Ibis Docs](https://ibis-project.org/)\n\nand the repo is always at:\n\n- [Ibis GitHub](https://github.com/ibis-project/ibis)\n\nPlease feel free to reach out on GitHub!\n\n", "supporting": [ "index_files" ], diff --git a/docs/_freeze/posts/selectors/index/execute-results/html.json b/docs/_freeze/posts/selectors/index/execute-results/html.json index 40446b2874b4..3f88ec9b722d 100644 --- a/docs/_freeze/posts/selectors/index/execute-results/html.json +++ b/docs/_freeze/posts/selectors/index/execute-results/html.json @@ -1,8 +1,10 @@ { - "hash": "912e7803cf9a47bd95646d5c669a05fb", + "hash": "332826e0cfba3cf42ba1c48417205cc4", "result": { - "markdown": "---\ntitle: \"Maximizing productivity with selectors\"\nauthor: Phillip Cloud\ndate: 2023-02-27\ncategories:\n - blog\n---\n\nBefore Ibis 5.0 it's been challenging to concisely express whole-table\noperations with ibis. Happily this is no longer the case in ibis 5.0.\n\nLet's jump right in!\n\nWe'll look at selectors examples using the [`palmerpenguins` data\nset](https://allisonhorst.github.io/palmerpenguins/) with the [DuckDB\nbackend](https://ibis-project.org/backends/DuckDB/).\n\n## Setup\n\n::: {#7ea414d5 .cell execution_count=1}\n``` {.python .cell-code}\nfrom ibis.interactive import *\n\nt = ex.penguins.fetch()\nt\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64int64int64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen39.118.71813750male  2007 │\n│ Adelie Torgersen39.517.41863800female2007 │\n│ Adelie Torgersen40.318.01953250female2007 │\n│ Adelie TorgersennannanNULLNULLNULL2007 │\n│ Adelie Torgersen36.719.31933450female2007 │\n│ Adelie Torgersen39.320.61903650male  2007 │\n│ Adelie Torgersen38.917.81813625female2007 │\n│ Adelie Torgersen39.219.61954675male  2007 │\n│ Adelie Torgersen34.118.11933475NULL2007 │\n│ Adelie Torgersen42.020.21904250NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\n## Examples\n\n### Normalization\n\nLet's say you want to compute the\n[z-score](https://en.wikipedia.org/wiki/Standard_score) of every numeric column\nand replace the existing data with that normalized value. Here's how you'd do\nthat with selectors:\n\n::: {#98645d71 .cell execution_count=2}\n``` {.python .cell-code}\nt.mutate(s.across(s.numeric(), (_ - _.mean()) / _.std()))\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year      ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━┩\n│ stringstringfloat64float64float64float64stringfloat64   │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────────┤\n│ Adelie Torgersen-0.8832050.784300-1.416272-0.563317male  -1.257484 │\n│ Adelie Torgersen-0.8099390.126003-1.060696-0.500969female-1.257484 │\n│ Adelie Torgersen-0.6634080.429833-0.420660-1.186793female-1.257484 │\n│ Adelie TorgersennannannannanNULL-1.257484 │\n│ Adelie Torgersen-1.3227991.088129-0.562890-0.937403female-1.257484 │\n│ Adelie Torgersen-0.8465721.746426-0.776236-0.688012male  -1.257484 │\n│ Adelie Torgersen-0.9198370.328556-1.416272-0.719186female-1.257484 │\n│ Adelie Torgersen-0.8648881.240044-0.4206600.590115male  -1.257484 │\n│ Adelie Torgersen-1.7990250.480471-0.562890-0.906229NULL-1.257484 │\n│ Adelie Torgersen-0.3520291.543873-0.7762360.060160NULL-1.257484 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────────┘\n
\n```\n:::\n:::\n\n\n### What's Up With the `year` Column?\n\nWhoops, looks like we included `year` in our normalization because it's an\n`int64` column (and therefore numeric) but normalizing the year doesn't make\nsense.\n\nWe can exclude `year` from the normalization using another selector:\n\n::: {#8beef7e9 .cell execution_count=3}\n``` {.python .cell-code}\nt.mutate(s.across(s.numeric() & ~s.c(\"year\"), (_ - _.mean()) / _.std()))\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64float64float64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen-0.8832050.784300-1.416272-0.563317male  2007 │\n│ Adelie Torgersen-0.8099390.126003-1.060696-0.500969female2007 │\n│ Adelie Torgersen-0.6634080.429833-0.420660-1.186793female2007 │\n│ Adelie TorgersennannannannanNULL2007 │\n│ Adelie Torgersen-1.3227991.088129-0.562890-0.937403female2007 │\n│ Adelie Torgersen-0.8465721.746426-0.776236-0.688012male  2007 │\n│ Adelie Torgersen-0.9198370.328556-1.416272-0.719186female2007 │\n│ Adelie Torgersen-0.8648881.240044-0.4206600.590115male  2007 │\n│ Adelie Torgersen-1.7990250.480471-0.562890-0.906229NULL2007 │\n│ Adelie Torgersen-0.3520291.543873-0.7762360.060160NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\n`c` is short for \"column\" and the `~` means \"negate\". Combining those we get \"not the year column\"!\n\nPretty neat right?\n\n### Composable Group By\n\nThe power of this approach comes in when you want the grouped version. Perhaps\nwe think some of these columns vary by species.\n\nWith selectors, all you need to do is slap a `.group_by(\"species\")` onto `t`:\n\n::: {#4a01494a .cell execution_count=4}\n``` {.python .cell-code}\nt.group_by(\"species\").mutate(\n s.across(s.numeric() & ~s.c(\"year\"), (_ - _.mean()) / _.std())\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64float64float64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen0.791697-1.2709970.160007-0.001444female2008 │\n│ Adelie Biscoe   1.467524-0.0381030.9245960.816322male  2009 │\n│ Adelie Dream    0.3786920.619441-0.9104182.070231male  2007 │\n│ Adelie Dream    -0.860324-0.284681-1.216254-1.200835female2007 │\n│ Adelie Torgersen-0.7852320.7838270.465843-0.546622female2007 │\n│ Adelie Torgersen0.1909621.8523350.007089-0.110480male  2007 │\n│ Adelie Torgersen0.040778-0.449067-1.369172-0.164997female2007 │\n│ Adelie Torgersen0.1534161.0304050.7716782.124749male  2007 │\n│ Adelie Torgersen-1.761426-0.2024890.465843-0.492104NULL2007 │\n│ Adelie Torgersen1.2047021.5235630.0070891.197947NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\nSince ibis translates this into a run-of-the-mill selection as if you had\ncalled `select` or `mutate` without selectors, nothing special is needed for a\nbackend to work with these new constructs.\n\nLet's look at some more examples.\n\n### Min-max Normalization\n\nGrouped min/max normalization? Easy:\n\n::: {#fa8fb365 .cell execution_count=5}\n``` {.python .cell-code}\nt.group_by(\"species\").mutate(\n s.across(s.numeric() & ~s.c(\"year\"), (_ - _.min()) / (_.max() - _.min()))\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64float64float64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen0.6330940.2166670.5000000.441558female2008 │\n│ Adelie Biscoe   0.7625900.4666670.6315790.636364male  2009 │\n│ Adelie Dream    0.5539570.6000000.3157890.935065male  2007 │\n│ Adelie Dream    0.3165470.4166670.2631580.155844female2007 │\n│ Adelie Torgersen0.3309350.6333330.5526320.311688female2007 │\n│ Adelie Torgersen0.5179860.8500000.4736840.415584male  2007 │\n│ Adelie Torgersen0.4892090.3833330.2368420.402597female2007 │\n│ Adelie Torgersen0.5107910.6833330.6052630.948052male  2007 │\n│ Adelie Torgersen0.1438850.4333330.5526320.324675NULL2007 │\n│ Adelie Torgersen0.7122300.7833330.4736840.727273NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\n### Casting and Munging\n\nHow about casting every column whose name ends with any of the strings `\"mm\"`\nor `\"g\"` to a `float32`? No problem!\n\n::: {#3ee1a042 .cell execution_count=6}\n``` {.python .cell-code}\nt.mutate(s.across(s.endswith((\"mm\", \"g\")), _.cast(\"float32\")))\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat32float32float32float32stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen39.09999818.700001181.03750.0male  2007 │\n│ Adelie Torgersen39.50000017.400000186.03800.0female2007 │\n│ Adelie Torgersen40.29999918.000000195.03250.0female2007 │\n│ Adelie TorgersennannannannanNULL2007 │\n│ Adelie Torgersen36.70000119.299999193.03450.0female2007 │\n│ Adelie Torgersen39.29999920.600000190.03650.0male  2007 │\n│ Adelie Torgersen38.90000217.799999181.03625.0female2007 │\n│ Adelie Torgersen39.20000119.600000195.04675.0male  2007 │\n│ Adelie Torgersen34.09999818.100000193.03475.0NULL2007 │\n│ Adelie Torgersen42.00000020.200001190.04250.0NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\nWe can make all string columns have the same case too!\n\n::: {#fe2957b8 .cell execution_count=7}\n``` {.python .cell-code}\nt.mutate(s.across(s.of_type(\"string\"), _.lower()))\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64int64int64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ adelie torgersen39.118.71813750male  2007 │\n│ adelie torgersen39.517.41863800female2007 │\n│ adelie torgersen40.318.01953250female2007 │\n│ adelie torgersennannanNULLNULLNULL2007 │\n│ adelie torgersen36.719.31933450female2007 │\n│ adelie torgersen39.320.61903650male  2007 │\n│ adelie torgersen38.917.81813625female2007 │\n│ adelie torgersen39.219.61954675male  2007 │\n│ adelie torgersen34.118.11933475NULL2007 │\n│ adelie torgersen42.020.21904250NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\n### Multiple Computations per Column\n\nWhat if I want to compute multiple things? Heck yeah!\n\n::: {#c6a0de34 .cell execution_count=8}\n``` {.python .cell-code}\nt.group_by(\"sex\").mutate(\n s.across(\n s.numeric() & ~s.c(\"year\"),\n dict(centered=_ - _.mean(), zscore=(_ - _.mean()) / _.std()),\n )\n).select(\"sex\", s.endswith((\"_centered\", \"_zscore\")))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓\n┃ sex     bill_length_mm_centered  bill_depth_mm_centered  flipper_length_mm_centered  body_mass_g_centered  bill_length_mm_zscore  bill_depth_mm_zscore  flipper_length_mm_zscore  body_mass_g_zscore ┃\n┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩\n│ stringfloat64float64float64float64float64float64float64float64            │\n├────────┼─────────────────────────┼────────────────────────┼────────────────────────────┼──────────────────────┼───────────────────────┼──────────────────────┼──────────────────────────┼────────────────────┤\n│ female4.10303-1.92545511.636364937.7272730.836760-1.0722700.9308511.407635 │\n│ female1.20303-2.42545510.636364712.7272730.245342-1.3507160.8508561.069885 │\n│ female-8.096970.674545-12.363636-462.272727-1.6512710.375649-0.989030-0.693924 │\n│ female-5.896970.874545-10.363636-562.272727-1.2026100.487027-0.829039-0.844035 │\n│ female-0.996971.174545-15.363636-662.272727-0.2033190.654095-1.229015-0.994147 │\n│ female-5.496971.374545-12.363636-162.272727-1.1210350.765473-0.989030-0.243590 │\n│ female-3.396972.574545-2.363636-412.272727-0.6927681.433743-0.189079-0.618868 │\n│ female-7.696971.974545-13.363636-537.272727-1.5696971.099608-1.069025-0.806507 │\n│ female-4.296971.874545-23.363636-462.272727-0.8763111.043919-1.868975-0.693924 │\n│ female-6.196972.774545-8.363636-62.272727-1.2637911.545122-0.669049-0.093478 │\n│  │\n└────────┴─────────────────────────┴────────────────────────┴────────────────────────────┴──────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────┘\n
\n```\n:::\n:::\n\n\nDon't like the naming convention?\n\nPass a function to make your own name!\n\n::: {#1b283810 .cell execution_count=9}\n``` {.python .cell-code}\nt.select(s.startswith(\"bill\")).mutate(\n s.across(\n s.all(),\n dict(x=_ - _.mean(), y=_.max()),\n names=lambda col, fn: f\"{col}_{fn}_improved\",\n )\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ bill_length_mm  bill_depth_mm  bill_length_mm_x_improved  bill_depth_mm_x_improved  bill_length_mm_y_improved  bill_depth_mm_y_improved ┃\n┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ float64float64float64float64float64float64                  │\n├────────────────┼───────────────┼───────────────────────────┼──────────────────────────┼───────────────────────────┼──────────────────────────┤\n│           39.118.7-4.821931.5488359.621.5 │\n│           39.517.4-4.421930.2488359.621.5 │\n│           40.318.0-3.621930.8488359.621.5 │\n│            nannannannan59.621.5 │\n│           36.719.3-7.221932.1488359.621.5 │\n│           39.320.6-4.621933.4488359.621.5 │\n│           38.917.8-5.021930.6488359.621.5 │\n│           39.219.6-4.721932.4488359.621.5 │\n│           34.118.1-9.821930.9488359.621.5 │\n│           42.020.2-1.921933.0488359.621.5 │\n│               │\n└────────────────┴───────────────┴───────────────────────────┴──────────────────────────┴───────────────────────────┴──────────────────────────┘\n
\n```\n:::\n:::\n\n\nDon't like lambda functions? We support a format string too!\n\n::: {#5dfcfe9e .cell execution_count=10}\n``` {.python .cell-code}\nt.select(s.startswith(\"bill\")).mutate(\n s.across(\n s.all(),\n func=dict(x=_ - _.mean(), y=_.max()),\n names=\"{col}_{fn}_improved\",\n )\n).head(2)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ bill_length_mm  bill_depth_mm  bill_length_mm_x_improved  bill_depth_mm_x_improved  bill_length_mm_y_improved  bill_depth_mm_y_improved ┃\n┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ float64float64float64float64float64float64                  │\n├────────────────┼───────────────┼───────────────────────────┼──────────────────────────┼───────────────────────────┼──────────────────────────┤\n│           39.118.7-4.821931.5488359.621.5 │\n│           39.517.4-4.421930.2488359.621.5 │\n└────────────────┴───────────────┴───────────────────────────┴──────────────────────────┴───────────────────────────┴──────────────────────────┘\n
\n```\n:::\n:::\n\n\n### Working with other Ibis APIs\n\nWe've seen lots of mutate use, but selectors also work with `.agg`:\n\n::: {#27630eb6 .cell execution_count=11}\n``` {.python .cell-code}\nt.group_by(\"year\").agg(s.across(s.numeric() & ~s.c(\"year\"), _.mean())).order_by(\"year\")\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n
┏━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n┃ year   bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g ┃\n┡━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n│ int64float64float64float64float64     │\n├───────┼────────────────┼───────────────┼───────────────────┼─────────────┤\n│  200743.74036717.427523196.8807344124.541284 │\n│  200843.54122816.914035202.7982464266.666667 │\n│  200944.45294117.125210202.8067234210.294118 │\n└───────┴────────────────┴───────────────┴───────────────────┴─────────────┘\n
\n```\n:::\n:::\n\n\nNaturally, selectors work in grouping keys too, for even more convenience:\n\n::: {#49f1501d .cell execution_count=12}\n``` {.python .cell-code}\nt.group_by(~s.numeric() | s.c(\"year\")).mutate(\n s.across(s.numeric() & ~s.c(\"year\"), dict(centered=_ - _.mean(), std=_.std()))\n).select(\"species\", s.endswith((\"_centered\", \"_std\")))\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓\n┃ species  bill_length_mm_centered  bill_depth_mm_centered  flipper_length_mm_centered  body_mass_g_centered  bill_length_mm_std  bill_depth_mm_std  flipper_length_mm_std  body_mass_g_std ┃\n┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩\n│ stringfloat64float64float64float64float64float64float64float64         │\n├─────────┼─────────────────────────┼────────────────────────┼────────────────────────────┼──────────────────────┼────────────────────┼───────────────────┼───────────────────────┼─────────────────┤\n│ Adelie -1.460.400000-1.600000-170.0000001.3277800.6819092.302173189.076704 │\n│ Adelie -0.96-0.2000003.400000180.0000001.3277800.6819092.302173189.076704 │\n│ Adelie -0.36-1.100000-1.60000030.0000001.3277800.6819092.302173189.076704 │\n│ Adelie 1.440.3000001.400000-220.0000001.3277800.6819092.302173189.076704 │\n│ Adelie 1.340.600000-1.600000180.0000001.3277800.6819092.302173189.076704 │\n│ Gentoo 1.000.93529411.117647147.0588243.0567550.6707664.973459349.763576 │\n│ Gentoo 1.00-0.164706-0.882353147.0588243.0567550.6707664.973459349.763576 │\n│ Gentoo -1.40-0.864706-3.882353-152.9411763.0567550.6707664.973459349.763576 │\n│ Gentoo -2.30-0.0647060.117647-352.9411763.0567550.6707664.973459349.763576 │\n│ Gentoo -2.200.035294-3.882353-402.9411763.0567550.6707664.973459349.763576 │\n│  │\n└─────────┴─────────────────────────┴────────────────────────┴────────────────────────────┴──────────────────────┴────────────────────┴───────────────────┴───────────────────────┴─────────────────┘\n
\n```\n:::\n:::\n\n\n### Filtering Selectors\n\nYou can also express complex filters more concisely.\n\nLet's say we only want to keep rows where all the bill size z-score related\ncolumns' absolute values are greater than 2.\n\n::: {#d62a82d1 .cell execution_count=13}\n``` {.python .cell-code}\nt.drop(\"year\").group_by(\"species\").mutate(\n s.across(s.numeric(), dict(zscore=(_ - _.mean()) / _.std()))\n).filter(s.if_all(s.startswith(\"bill\") & s.endswith(\"_zscore\"), _.abs() > 2))\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     bill_length_mm_zscore  bill_depth_mm_zscore  flipper_length_mm_zscore  body_mass_g_zscore ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩\n│ stringstringfloat64float64int64int64stringfloat64float64float64float64            │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────────────────────┼──────────────────────┼──────────────────────────┼────────────────────┤\n│ Adelie Torgersen46.021.51944200male  2.7065392.5920710.6187601.088911 │\n│ Adelie Dream    32.115.51883050female-2.512345-2.339505-0.298747-1.418906 │\n│ Gentoo Biscoe   55.917.02285600male  2.7240462.0565081.6673941.039411 │\n│ Gentoo Biscoe   59.617.02306050male  3.9246212.0565081.9757991.932062 │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────┘\n
\n```\n:::\n:::\n\n\n### Bonus: Generated SQL\n\nThe SQL for that last expression is pretty gnarly:\n\n::: {#118b13de .cell execution_count=14}\n``` {.python .cell-code}\nibis.to_sql(\n t.drop(\"year\")\n .group_by(\"species\")\n .mutate(s.across(s.numeric(), dict(zscore=(_ - _.mean()) / _.std())))\n .filter(s.if_all(s.startswith(\"bill\") & s.endswith(\"_zscore\"), _.abs() > 2))\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```sql\nWITH t0 AS (\n SELECT\n t2.species AS species,\n t2.island AS island,\n t2.bill_length_mm AS bill_length_mm,\n t2.bill_depth_mm AS bill_depth_mm,\n t2.flipper_length_mm AS flipper_length_mm,\n t2.body_mass_g AS body_mass_g,\n t2.sex AS sex\n FROM main._ibis_examples_penguins_zkpeihb5b5aw7al5hilxgkenfe AS t2\n), t1 AS (\n SELECT\n t0.species AS species,\n t0.island AS island,\n t0.bill_length_mm AS bill_length_mm,\n t0.bill_depth_mm AS bill_depth_mm,\n t0.flipper_length_mm AS flipper_length_mm,\n t0.body_mass_g AS body_mass_g,\n t0.sex AS sex,\n (\n t0.bill_length_mm - AVG(t0.bill_length_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\n ) / STDDEV_SAMP(t0.bill_length_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS bill_length_mm_zscore,\n (\n t0.bill_depth_mm - AVG(t0.bill_depth_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\n ) / STDDEV_SAMP(t0.bill_depth_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS bill_depth_mm_zscore,\n (\n t0.flipper_length_mm - AVG(t0.flipper_length_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\n ) / STDDEV_SAMP(t0.flipper_length_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS flipper_length_mm_zscore,\n (\n t0.body_mass_g - AVG(t0.body_mass_g) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\n ) / STDDEV_SAMP(t0.body_mass_g) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS body_mass_g_zscore\n FROM t0\n)\nSELECT\n t1.species,\n t1.island,\n t1.bill_length_mm,\n t1.bill_depth_mm,\n t1.flipper_length_mm,\n t1.body_mass_g,\n t1.sex,\n t1.bill_length_mm_zscore,\n t1.bill_depth_mm_zscore,\n t1.flipper_length_mm_zscore,\n t1.body_mass_g_zscore\nFROM t1\nWHERE\n ABS(t1.bill_length_mm_zscore) > CAST(2 AS TINYINT)\n AND ABS(t1.bill_depth_mm_zscore) > CAST(2 AS TINYINT)\n```\n:::\n:::\n\n\nGood thing you didn't have to write that by hand!\n\n## Summary\n\nThis blog post illustrates the ability to apply computations to many columns at\nonce and the power of ibis as a composable, expressive library for analytics.\n\n- [Get involved!](https://ibis-project.org/community/contribute/)\n- [Report issues!](https://github.com/ibis-project/ibis/issues/new/choose)\n\n", - "supporting": ["index_files"], + "markdown": "---\ntitle: \"Maximizing productivity with selectors\"\nauthor: Phillip Cloud\ndate: 2023-02-27\ncategories:\n - blog\n - new feature\n - productivity\n - duckdb\n---\n\nBefore Ibis 5.0 it's been challenging to concisely express whole-table\noperations with ibis. Happily this is no longer the case in ibis 5.0.\n\nLet's jump right in!\n\nWe'll look at selectors examples using the [`palmerpenguins` data\nset](https://allisonhorst.github.io/palmerpenguins/) with the [DuckDB\nbackend](https://ibis-project.org/backends/DuckDB/).\n\n## Setup\n\n::: {#bc7b7606 .cell execution_count=1}\n``` {.python .cell-code}\nfrom ibis.interactive import *\n\nt = ex.penguins.fetch()\nt\n```\n\n::: {.cell-output .cell-output-display execution_count=1}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64int64int64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen39.118.71813750male  2007 │\n│ Adelie Torgersen39.517.41863800female2007 │\n│ Adelie Torgersen40.318.01953250female2007 │\n│ Adelie TorgersennannanNULLNULLNULL2007 │\n│ Adelie Torgersen36.719.31933450female2007 │\n│ Adelie Torgersen39.320.61903650male  2007 │\n│ Adelie Torgersen38.917.81813625female2007 │\n│ Adelie Torgersen39.219.61954675male  2007 │\n│ Adelie Torgersen34.118.11933475NULL2007 │\n│ Adelie Torgersen42.020.21904250NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\n## Examples\n\n### Normalization\n\nLet's say you want to compute the\n[z-score](https://en.wikipedia.org/wiki/Standard_score) of every numeric column\nand replace the existing data with that normalized value. Here's how you'd do\nthat with selectors:\n\n::: {#475b3449 .cell execution_count=2}\n``` {.python .cell-code}\nt.mutate(s.across(s.numeric(), (_ - _.mean()) / _.std()))\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year      ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━┩\n│ stringstringfloat64float64float64float64stringfloat64   │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────────┤\n│ Adelie Torgersen-0.8832050.784300-1.416272-0.563317male  -1.257484 │\n│ Adelie Torgersen-0.8099390.126003-1.060696-0.500969female-1.257484 │\n│ Adelie Torgersen-0.6634080.429833-0.420660-1.186793female-1.257484 │\n│ Adelie TorgersennannannannanNULL-1.257484 │\n│ Adelie Torgersen-1.3227991.088129-0.562890-0.937403female-1.257484 │\n│ Adelie Torgersen-0.8465721.746426-0.776236-0.688012male  -1.257484 │\n│ Adelie Torgersen-0.9198370.328556-1.416272-0.719186female-1.257484 │\n│ Adelie Torgersen-0.8648881.240044-0.4206600.590115male  -1.257484 │\n│ Adelie Torgersen-1.7990250.480471-0.562890-0.906229NULL-1.257484 │\n│ Adelie Torgersen-0.3520291.543873-0.7762360.060160NULL-1.257484 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────────┘\n
\n```\n:::\n:::\n\n\n### What's Up With the `year` Column?\n\nWhoops, looks like we included `year` in our normalization because it's an\n`int64` column (and therefore numeric) but normalizing the year doesn't make\nsense.\n\nWe can exclude `year` from the normalization using another selector:\n\n::: {#01f0665f .cell execution_count=3}\n``` {.python .cell-code}\nt.mutate(s.across(s.numeric() & ~s.c(\"year\"), (_ - _.mean()) / _.std()))\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64float64float64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen-0.8832050.784300-1.416272-0.563317male  2007 │\n│ Adelie Torgersen-0.8099390.126003-1.060696-0.500969female2007 │\n│ Adelie Torgersen-0.6634080.429833-0.420660-1.186793female2007 │\n│ Adelie TorgersennannannannanNULL2007 │\n│ Adelie Torgersen-1.3227991.088129-0.562890-0.937403female2007 │\n│ Adelie Torgersen-0.8465721.746426-0.776236-0.688012male  2007 │\n│ Adelie Torgersen-0.9198370.328556-1.416272-0.719186female2007 │\n│ Adelie Torgersen-0.8648881.240044-0.4206600.590115male  2007 │\n│ Adelie Torgersen-1.7990250.480471-0.562890-0.906229NULL2007 │\n│ Adelie Torgersen-0.3520291.543873-0.7762360.060160NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\n`c` is short for \"column\" and the `~` means \"negate\". Combining those we get \"not the year column\"!\n\nPretty neat right?\n\n### Composable Group By\n\nThe power of this approach comes in when you want the grouped version. Perhaps\nwe think some of these columns vary by species.\n\nWith selectors, all you need to do is slap a `.group_by(\"species\")` onto `t`:\n\n::: {#dc862f0a .cell execution_count=4}\n``` {.python .cell-code}\nt.group_by(\"species\").mutate(\n s.across(s.numeric() & ~s.c(\"year\"), (_ - _.mean()) / _.std())\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64float64float64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen0.791697-1.2709970.160007-0.001444female2008 │\n│ Adelie Biscoe   1.467524-0.0381030.9245960.816322male  2009 │\n│ Adelie Dream    0.3786920.619441-0.9104182.070231male  2007 │\n│ Adelie Dream    -0.860324-0.284681-1.216254-1.200835female2007 │\n│ Adelie Torgersen-0.7852320.7838270.465843-0.546622female2007 │\n│ Adelie Torgersen0.1909621.8523350.007089-0.110480male  2007 │\n│ Adelie Torgersen0.040778-0.449067-1.369172-0.164997female2007 │\n│ Adelie Torgersen0.1534161.0304050.7716782.124749male  2007 │\n│ Adelie Torgersen-1.761426-0.2024890.465843-0.492104NULL2007 │\n│ Adelie Torgersen1.2047021.5235630.0070891.197947NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\nSince ibis translates this into a run-of-the-mill selection as if you had\ncalled `select` or `mutate` without selectors, nothing special is needed for a\nbackend to work with these new constructs.\n\nLet's look at some more examples.\n\n### Min-max Normalization\n\nGrouped min/max normalization? Easy:\n\n::: {#da876bd8 .cell execution_count=5}\n``` {.python .cell-code}\nt.group_by(\"species\").mutate(\n s.across(s.numeric() & ~s.c(\"year\"), (_ - _.min()) / (_.max() - _.min()))\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64float64float64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen0.6330940.2166670.5000000.441558female2008 │\n│ Adelie Biscoe   0.7625900.4666670.6315790.636364male  2009 │\n│ Adelie Dream    0.5539570.6000000.3157890.935065male  2007 │\n│ Adelie Dream    0.3165470.4166670.2631580.155844female2007 │\n│ Adelie Torgersen0.3309350.6333330.5526320.311688female2007 │\n│ Adelie Torgersen0.5179860.8500000.4736840.415584male  2007 │\n│ Adelie Torgersen0.4892090.3833330.2368420.402597female2007 │\n│ Adelie Torgersen0.5107910.6833330.6052630.948052male  2007 │\n│ Adelie Torgersen0.1438850.4333330.5526320.324675NULL2007 │\n│ Adelie Torgersen0.7122300.7833330.4736840.727273NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\n### Casting and Munging\n\nHow about casting every column whose name ends with any of the strings `\"mm\"`\nor `\"g\"` to a `float32`? No problem!\n\n::: {#c48e4798 .cell execution_count=6}\n``` {.python .cell-code}\nt.mutate(s.across(s.endswith((\"mm\", \"g\")), _.cast(\"float32\")))\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat32float32float32float32stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ Adelie Torgersen39.09999818.700001181.03750.0male  2007 │\n│ Adelie Torgersen39.50000017.400000186.03800.0female2007 │\n│ Adelie Torgersen40.29999918.000000195.03250.0female2007 │\n│ Adelie TorgersennannannannanNULL2007 │\n│ Adelie Torgersen36.70000119.299999193.03450.0female2007 │\n│ Adelie Torgersen39.29999920.600000190.03650.0male  2007 │\n│ Adelie Torgersen38.90000217.799999181.03625.0female2007 │\n│ Adelie Torgersen39.20000119.600000195.04675.0male  2007 │\n│ Adelie Torgersen34.09999818.100000193.03475.0NULL2007 │\n│ Adelie Torgersen42.00000020.200001190.04250.0NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\nWe can make all string columns have the same case too!\n\n::: {#46d8930e .cell execution_count=7}\n``` {.python .cell-code}\nt.mutate(s.across(s.of_type(\"string\"), _.lower()))\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩\n│ stringstringfloat64float64int64int64stringint64 │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤\n│ adelie torgersen39.118.71813750male  2007 │\n│ adelie torgersen39.517.41863800female2007 │\n│ adelie torgersen40.318.01953250female2007 │\n│ adelie torgersennannanNULLNULLNULL2007 │\n│ adelie torgersen36.719.31933450female2007 │\n│ adelie torgersen39.320.61903650male  2007 │\n│ adelie torgersen38.917.81813625female2007 │\n│ adelie torgersen39.219.61954675male  2007 │\n│ adelie torgersen34.118.11933475NULL2007 │\n│ adelie torgersen42.020.21904250NULL2007 │\n│  │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘\n
\n```\n:::\n:::\n\n\n### Multiple Computations per Column\n\nWhat if I want to compute multiple things? Heck yeah!\n\n::: {#f1208079 .cell execution_count=8}\n``` {.python .cell-code}\nt.group_by(\"sex\").mutate(\n s.across(\n s.numeric() & ~s.c(\"year\"),\n dict(centered=_ - _.mean(), zscore=(_ - _.mean()) / _.std()),\n )\n).select(\"sex\", s.endswith((\"_centered\", \"_zscore\")))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓\n┃ sex     bill_length_mm_centered  bill_depth_mm_centered  flipper_length_mm_centered  body_mass_g_centered  bill_length_mm_zscore  bill_depth_mm_zscore  flipper_length_mm_zscore  body_mass_g_zscore ┃\n┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩\n│ stringfloat64float64float64float64float64float64float64float64            │\n├────────┼─────────────────────────┼────────────────────────┼────────────────────────────┼──────────────────────┼───────────────────────┼──────────────────────┼──────────────────────────┼────────────────────┤\n│ female4.10303-1.92545511.636364937.7272730.836760-1.0722700.9308511.407635 │\n│ female1.20303-2.42545510.636364712.7272730.245342-1.3507160.8508561.069885 │\n│ female-8.096970.674545-12.363636-462.272727-1.6512710.375649-0.989030-0.693924 │\n│ female-5.896970.874545-10.363636-562.272727-1.2026100.487027-0.829039-0.844035 │\n│ female-0.996971.174545-15.363636-662.272727-0.2033190.654095-1.229015-0.994147 │\n│ female-5.496971.374545-12.363636-162.272727-1.1210350.765473-0.989030-0.243590 │\n│ female-3.396972.574545-2.363636-412.272727-0.6927681.433743-0.189079-0.618868 │\n│ female-7.696971.974545-13.363636-537.272727-1.5696971.099608-1.069025-0.806507 │\n│ female-4.296971.874545-23.363636-462.272727-0.8763111.043919-1.868975-0.693924 │\n│ female-6.196972.774545-8.363636-62.272727-1.2637911.545122-0.669049-0.093478 │\n│  │\n└────────┴─────────────────────────┴────────────────────────┴────────────────────────────┴──────────────────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────┘\n
\n```\n:::\n:::\n\n\nDon't like the naming convention?\n\nPass a function to make your own name!\n\n::: {#ca542835 .cell execution_count=9}\n``` {.python .cell-code}\nt.select(s.startswith(\"bill\")).mutate(\n s.across(\n s.all(),\n dict(x=_ - _.mean(), y=_.max()),\n names=lambda col, fn: f\"{col}_{fn}_improved\",\n )\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ bill_length_mm  bill_depth_mm  bill_length_mm_x_improved  bill_depth_mm_x_improved  bill_length_mm_y_improved  bill_depth_mm_y_improved ┃\n┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ float64float64float64float64float64float64                  │\n├────────────────┼───────────────┼───────────────────────────┼──────────────────────────┼───────────────────────────┼──────────────────────────┤\n│           39.118.7-4.821931.5488359.621.5 │\n│           39.517.4-4.421930.2488359.621.5 │\n│           40.318.0-3.621930.8488359.621.5 │\n│            nannannannan59.621.5 │\n│           36.719.3-7.221932.1488359.621.5 │\n│           39.320.6-4.621933.4488359.621.5 │\n│           38.917.8-5.021930.6488359.621.5 │\n│           39.219.6-4.721932.4488359.621.5 │\n│           34.118.1-9.821930.9488359.621.5 │\n│           42.020.2-1.921933.0488359.621.5 │\n│               │\n└────────────────┴───────────────┴───────────────────────────┴──────────────────────────┴───────────────────────────┴──────────────────────────┘\n
\n```\n:::\n:::\n\n\nDon't like lambda functions? We support a format string too!\n\n::: {#aeedd50b .cell execution_count=10}\n``` {.python .cell-code}\nt.select(s.startswith(\"bill\")).mutate(\n s.across(\n s.all(),\n func=dict(x=_ - _.mean(), y=_.max()),\n names=\"{col}_{fn}_improved\",\n )\n).head(2)\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ bill_length_mm  bill_depth_mm  bill_length_mm_x_improved  bill_depth_mm_x_improved  bill_length_mm_y_improved  bill_depth_mm_y_improved ┃\n┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ float64float64float64float64float64float64                  │\n├────────────────┼───────────────┼───────────────────────────┼──────────────────────────┼───────────────────────────┼──────────────────────────┤\n│           39.118.7-4.821931.5488359.621.5 │\n│           39.517.4-4.421930.2488359.621.5 │\n└────────────────┴───────────────┴───────────────────────────┴──────────────────────────┴───────────────────────────┴──────────────────────────┘\n
\n```\n:::\n:::\n\n\n### Working with other Ibis APIs\n\nWe've seen lots of mutate use, but selectors also work with `.agg`:\n\n::: {#6b1950e6 .cell execution_count=11}\n``` {.python .cell-code}\nt.group_by(\"year\").agg(s.across(s.numeric() & ~s.c(\"year\"), _.mean())).order_by(\"year\")\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n
┏━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n┃ year   bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g ┃\n┡━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n│ int64float64float64float64float64     │\n├───────┼────────────────┼───────────────┼───────────────────┼─────────────┤\n│  200743.74036717.427523196.8807344124.541284 │\n│  200843.54122816.914035202.7982464266.666667 │\n│  200944.45294117.125210202.8067234210.294118 │\n└───────┴────────────────┴───────────────┴───────────────────┴─────────────┘\n
\n```\n:::\n:::\n\n\nNaturally, selectors work in grouping keys too, for even more convenience:\n\n::: {#da973a3f .cell execution_count=12}\n``` {.python .cell-code}\nt.group_by(~s.numeric() | s.c(\"year\")).mutate(\n s.across(s.numeric() & ~s.c(\"year\"), dict(centered=_ - _.mean(), std=_.std()))\n).select(\"species\", s.endswith((\"_centered\", \"_std\")))\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓\n┃ species  bill_length_mm_centered  bill_depth_mm_centered  flipper_length_mm_centered  body_mass_g_centered  bill_length_mm_std  bill_depth_mm_std  flipper_length_mm_std  body_mass_g_std ┃\n┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩\n│ stringfloat64float64float64float64float64float64float64float64         │\n├─────────┼─────────────────────────┼────────────────────────┼────────────────────────────┼──────────────────────┼────────────────────┼───────────────────┼───────────────────────┼─────────────────┤\n│ Adelie -1.460.400000-1.600000-170.0000001.3277800.6819092.302173189.076704 │\n│ Adelie -0.96-0.2000003.400000180.0000001.3277800.6819092.302173189.076704 │\n│ Adelie -0.36-1.100000-1.60000030.0000001.3277800.6819092.302173189.076704 │\n│ Adelie 1.440.3000001.400000-220.0000001.3277800.6819092.302173189.076704 │\n│ Adelie 1.340.600000-1.600000180.0000001.3277800.6819092.302173189.076704 │\n│ Gentoo 1.000.93529411.117647147.0588243.0567550.6707664.973459349.763576 │\n│ Gentoo 1.00-0.164706-0.882353147.0588243.0567550.6707664.973459349.763576 │\n│ Gentoo -1.40-0.864706-3.882353-152.9411763.0567550.6707664.973459349.763576 │\n│ Gentoo -2.30-0.0647060.117647-352.9411763.0567550.6707664.973459349.763576 │\n│ Gentoo -2.200.035294-3.882353-402.9411763.0567550.6707664.973459349.763576 │\n│  │\n└─────────┴─────────────────────────┴────────────────────────┴────────────────────────────┴──────────────────────┴────────────────────┴───────────────────┴───────────────────────┴─────────────────┘\n
\n```\n:::\n:::\n\n\n### Filtering Selectors\n\nYou can also express complex filters more concisely.\n\nLet's say we only want to keep rows where all the bill size z-score related\ncolumns' absolute values are greater than 2.\n\n::: {#a9bc15e3 .cell execution_count=13}\n``` {.python .cell-code}\nt.drop(\"year\").group_by(\"species\").mutate(\n s.across(s.numeric(), dict(zscore=(_ - _.mean()) / _.std()))\n).filter(s.if_all(s.startswith(\"bill\") & s.endswith(\"_zscore\"), _.abs() > 2))\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓\n┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     bill_length_mm_zscore  bill_depth_mm_zscore  flipper_length_mm_zscore  body_mass_g_zscore ┃\n┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩\n│ stringstringfloat64float64int64int64stringfloat64float64float64float64            │\n├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────────────────────┼──────────────────────┼──────────────────────────┼────────────────────┤\n│ Adelie Torgersen46.021.51944200male  2.7065392.5920710.6187601.088911 │\n│ Adelie Dream    32.115.51883050female-2.512345-2.339505-0.298747-1.418906 │\n│ Gentoo Biscoe   55.917.02285600male  2.7240462.0565081.6673941.039411 │\n│ Gentoo Biscoe   59.617.02306050male  3.9246212.0565081.9757991.932062 │\n└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────┘\n
\n```\n:::\n:::\n\n\n### Bonus: Generated SQL\n\nThe SQL for that last expression is pretty gnarly:\n\n::: {#fdd76c58 .cell execution_count=14}\n``` {.python .cell-code}\nibis.to_sql(\n t.drop(\"year\")\n .group_by(\"species\")\n .mutate(s.across(s.numeric(), dict(zscore=(_ - _.mean()) / _.std())))\n .filter(s.if_all(s.startswith(\"bill\") & s.endswith(\"_zscore\"), _.abs() > 2))\n)\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```sql\nWITH t0 AS (\n SELECT\n t2.species AS species,\n t2.island AS island,\n t2.bill_length_mm AS bill_length_mm,\n t2.bill_depth_mm AS bill_depth_mm,\n t2.flipper_length_mm AS flipper_length_mm,\n t2.body_mass_g AS body_mass_g,\n t2.sex AS sex\n FROM main._ibis_examples_penguins_wyios4y4nbfd5oqs53lzomk2ee AS t2\n), t1 AS (\n SELECT\n t0.species AS species,\n t0.island AS island,\n t0.bill_length_mm AS bill_length_mm,\n t0.bill_depth_mm AS bill_depth_mm,\n t0.flipper_length_mm AS flipper_length_mm,\n t0.body_mass_g AS body_mass_g,\n t0.sex AS sex,\n (\n t0.bill_length_mm - AVG(t0.bill_length_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\n ) / STDDEV_SAMP(t0.bill_length_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS bill_length_mm_zscore,\n (\n t0.bill_depth_mm - AVG(t0.bill_depth_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\n ) / STDDEV_SAMP(t0.bill_depth_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS bill_depth_mm_zscore,\n (\n t0.flipper_length_mm - AVG(t0.flipper_length_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\n ) / STDDEV_SAMP(t0.flipper_length_mm) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS flipper_length_mm_zscore,\n (\n t0.body_mass_g - AVG(t0.body_mass_g) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\n ) / STDDEV_SAMP(t0.body_mass_g) OVER (PARTITION BY t0.species ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS body_mass_g_zscore\n FROM t0\n)\nSELECT\n t1.species,\n t1.island,\n t1.bill_length_mm,\n t1.bill_depth_mm,\n t1.flipper_length_mm,\n t1.body_mass_g,\n t1.sex,\n t1.bill_length_mm_zscore,\n t1.bill_depth_mm_zscore,\n t1.flipper_length_mm_zscore,\n t1.body_mass_g_zscore\nFROM t1\nWHERE\n ABS(t1.bill_length_mm_zscore) > CAST(2 AS TINYINT)\n AND ABS(t1.bill_depth_mm_zscore) > CAST(2 AS TINYINT)\n```\n:::\n:::\n\n\nGood thing you didn't have to write that by hand!\n\n## Summary\n\nThis blog post illustrates the ability to apply computations to many columns at\nonce and the power of ibis as a composable, expressive library for analytics.\n\n- [Get involved!](https://ibis-project.org/community/contribute/)\n- [Report issues!](https://github.com/ibis-project/ibis/issues/new/choose)\n\n", + "supporting": [ + "index_files" + ], "filters": [], "includes": { "include-in-header": [ diff --git a/docs/posts/campaign-finance/index.qmd b/docs/posts/campaign-finance/index.qmd index 6c0d46916940..35558b9566fe 100644 --- a/docs/posts/campaign-finance/index.qmd +++ b/docs/posts/campaign-finance/index.qmd @@ -4,6 +4,10 @@ author: "Nick Crews" date: "2023-03-24" categories: - blog + - data engineering + - case study + - duckdb + - performance --- Hi! My name is [Nick Crews](https://www.linkedin.com/in/nicholas-b-crews/), and I'm a data engineer that looks at public campaign finance data. diff --git a/docs/posts/ci-analysis/index.qmd b/docs/posts/ci-analysis/index.qmd index 7c734185fe4c..271161cecaa1 100644 --- a/docs/posts/ci-analysis/index.qmd +++ b/docs/posts/ci-analysis/index.qmd @@ -4,6 +4,10 @@ author: "Phillip Cloud" date: "2023-01-09" categories: - blog + - bigquery + - continuous integration + - data engineering + - dogfood --- ## Summary diff --git a/docs/posts/ffill-and-bfill-using-ibis/index.qmd b/docs/posts/ffill-and-bfill-using-ibis/index.qmd index b25e52eddbd3..c623cfea4e80 100644 --- a/docs/posts/ffill-and-bfill-using-ibis/index.qmd +++ b/docs/posts/ffill-and-bfill-using-ibis/index.qmd @@ -4,6 +4,8 @@ author: Patrick Clarke date: 2022-09-09 categories: - blog + - window functions + - time series --- Suppose you have a table of data mapping events and dates to values, and that this data contains gaps in values. diff --git a/docs/posts/ibis-analytics/index.qmd b/docs/posts/ibis-analytics/index.qmd index d110064b0ed7..5cf5d1754b78 100644 --- a/docs/posts/ibis-analytics/index.qmd +++ b/docs/posts/ibis-analytics/index.qmd @@ -6,7 +6,9 @@ image: "thumbnail.png" code-annotations: below categories: - blog - - demo + - duckdb + - case study + - dogfood draft: true --- diff --git a/docs/posts/ibis-examples/index.qmd b/docs/posts/ibis-examples/index.qmd index cdb574dcef2a..f27d54bbbfb1 100644 --- a/docs/posts/ibis-examples/index.qmd +++ b/docs/posts/ibis-examples/index.qmd @@ -4,6 +4,8 @@ author: Kae Suarez date: 2023-03-08 categories: - blog + - new feature + - sneak peek --- Ibis has been moving quickly to provide a powerful but easy-to-use interface for interacting with analytical engines. However, as we’re approaching the 5.0 release of Ibis, we’ve realized that moving from not knowing Ibis to writing a first expression is not trivial. diff --git a/docs/posts/ibis-to-file/index.qmd b/docs/posts/ibis-to-file/index.qmd index 71346c540960..5737e0ebf7e2 100644 --- a/docs/posts/ibis-to-file/index.qmd +++ b/docs/posts/ibis-to-file/index.qmd @@ -4,6 +4,9 @@ author: Kae Suarez date: 2023-03-09 categories: - blog + - io + - new feature + - sneak peek --- Ibis 5.0 is coming soon and will offer new functionality and fixes to users. To enhance clarity around this process, we’re sharing a sneak peek into what we’re working on. diff --git a/docs/posts/ibis_substrait_to_duckdb/index.qmd b/docs/posts/ibis_substrait_to_duckdb/index.qmd index 7b1fac3522bd..fcd8d2fd71bf 100644 --- a/docs/posts/ibis_substrait_to_duckdb/index.qmd +++ b/docs/posts/ibis_substrait_to_duckdb/index.qmd @@ -4,6 +4,9 @@ author: Gil Forsyth date: 2023-02-01 categories: - blog + - substrait + - ecosystem + - duckdb --- Ibis strives to provide a consistent interface for interacting with a multitude diff --git a/docs/posts/selectors/index.qmd b/docs/posts/selectors/index.qmd index bfae465b8061..6e57967315ab 100644 --- a/docs/posts/selectors/index.qmd +++ b/docs/posts/selectors/index.qmd @@ -4,6 +4,9 @@ author: Phillip Cloud date: 2023-02-27 categories: - blog + - new feature + - productivity + - duckdb --- Before Ibis 5.0 it's been challenging to concisely express whole-table diff --git a/docs/posts/torch/index.qmd b/docs/posts/torch/index.qmd index 2e706028b47f..ba23299d4e8c 100644 --- a/docs/posts/torch/index.qmd +++ b/docs/posts/torch/index.qmd @@ -4,6 +4,10 @@ author: "Phillip Cloud" date: "2023-06-27" categories: - blog + - case study + - machine learning + - ecosystem + - new feature --- In this blog post we show how to leverage ecosystem tools to build an end-to-end ML pipeline using Ibis, DuckDB and PyTorch.