Skip to content

Commit

Permalink
Add column API mvp (#100)
Browse files Browse the repository at this point in the history
* Add namespace stub

* Add super naive colunn fn

* Add some simple column fns

* Add typeof function for column

* Save work on column exploration doc

* Upgrade to latest clay version

* Save scratch work in column.clj

* Polishing up existing column fns

* added some docstrings
* re-organized a little

* Move column ns into own domain tablecloth.column.api

* Add tests for `tablecloth.column.api/column`

* Add tests for `zeros` and `ones`

* Use api template to write public api

* Write tests against `tablecloth.column.api.column` ns

* Add column exploration html

* Add `typeof?` function to check datatype of column els

* Use buffer when creating zeros & ones columns

* Use `dtype` alias in ns

* Add comment to code snippet generating column api

* Fix comment syntax

* Use `tech.v3.datatype/const-reader` for `zeros` and `ones` function

* Update type interface to use type hierarchy in tablecloth.api.util (#76)

* Add ->general-types function

* Add a general type :logical

* Use type hierarchy in tablecloth.api.utils for `typeof` functions

* Add column dev branch to pr workflow

* Add tests for typeof

* Fix tests for typeof

* Return the concrete type from `typeof`

* Simplify `concrete-types` fn

* Optimize ->general-types by using static lookup

* Adjust fns listing types

* We decided that the default meaning of type points to the "concrete"
type, and not the general type.
* So `types` now returns the set of concrete types and `general-types`
returns the general types.

* Revert "Adjust fns listing types"

This reverts commit d93e34f.

* Fix `typeof` test to test for concerete types

* Reorganize `typeof?` tests

* Reword docstring for `typeof?` slightly

* Update column api template and add missing `typeof?`

* Add commment to `general-types-lookup`

* Improve `->general-types` docstring

* Add `general-types` fn that returns sets of general types

* Adjust util `types` fn to return concrete types

* Lift `tech.v3.datatype.functional` operations (#90)

* Add ->general-types function

* Add a general type :logical

* Use type hierarchy in tablecloth.api.utils for `typeof` functions

* Add column dev branch to pr workflow

* Add tests for typeof

* Fix tests for typeof

* Return the concrete type from `typeof`

* Simplify `concrete-types` fn

* Optimize ->general-types by using static lookup

* Adjust fns listing types

* We decided that the default meaning of type points to the "concrete"
type, and not the general type.
* So `types` now returns the set of concrete types and `general-types`
returns the general types.

* Revert "Adjust fns listing types"

This reverts commit d93e34f.

* Fix `typeof` test to test for concerete types

* Reorganize `typeof?` tests

* Reword docstring for `typeof?` slightly

* Update column api template and add missing `typeof?`

* Add commment to `general-types-lookup`

* Improve `->general-types` docstring

* Add `general-types` fn that returns sets of general types

* Adjust util `types` fn to return concrete types

* Save changes to column api.clj

* Save ongoing experiments with lifting

* Save ongoing work on lifting

* Adjust lift-ops-1 to handle any number of args with rest arg

* Working `rearrange-args` fn

* Save work actually writing lifted fns

* Saving first attempt to writer operators

* Add `percentiiles test

* Adjust `rearrange-args to take new-args in option map

* Unify two lift functions

* Add in docstrings when present

* Move lift utils into utils ns

* Rename lifting namespaces

* Lift some more fns

* Make exclusions for ns header helper an arg

* Add new operators and tests

* Add ops with lhs rhs arg pattern

* Lift '*

* Add require to operators ns for utils

* Update test to make it more complete

* Lift `equals

* Make test more accurate

* Reorganize tests

* Fix grammar

* Lift 'shift

* Uncomment 'or test

* Lift 'normalize op

* Life 'magnitude

* Lifting bit manipulation ops

* lift ieee-remainder

* Lifting more functions

* Add excludes

* Lift a bunch of new functions

* Alphebetize some lists

* More alphebitization

* Clean up

* Instead of using `col` as arg conform to using `x & and `y

* Temporarily disable failing test fix in 7.000-beta23

* Disable the correct test

* Just some minor cleanup in op tests

* Some more cleanup/reorg in op tests

* Update generated operators namespace with switch from col -> x etc

* Lift 'descriptive-statistics

* Fix messed up test layout

* Lift 'quartiles

* Lift 'fill-range and a bunch of reduce operations

* Lift 'mean-fast 'sum-fast 'magnitude-squared

* Lift correlation fns

kendalls, pearsons, and spearmans

* Lift cumulative ops

* cleanup

* Bring column exploration doc up-to-date (#95)

* Upgrade to latest clay version

* Show using tablecloth.column.api.operators ns

* Cleanup whitespace

* Add method for subsetting (#96)

* Export tech.ml.dataset `select` fn for column api

* Update docstring exported to api

* Update column-exploration with basic illustration of select

* Add `slice`

* clean up tests a bit

* Improve `slice` docstring slightly

* Export `slice` to column api

* Add stuff about `slice` to column exploration doc

* Move accesssing & subsetting seciton above basic ops

* Update column_expolration.html

* Update comment block

* Add iteration support by wrapping tech.v3.dataset.column/column-map (#97)

* Export tech.ml.dataset `select` fn for column api

* Update docstring exported to api

* Update column-exploration with basic illustration of select

* Add `slice`

* clean up tests a bit

* Improve `slice` docstring slightly

* Export `slice` to column api

* Add stuff about `slice` to column exploration doc

* Move accesssing & subsetting seciton above basic ops

* Update column_expolration.html

* Update comment block

* Add column-map wrapper over tech.v3.dataset.column/column-mapping

* Accepts columns in the first position to support use with pipes
* If `col` is a vector of columns, then map-fn is run on all

* Fix arg name

* Clean up

* Add iteration to column exploration and reorganize

* Add column-map to column api_template

* Add example of using column-map with multiple columns

* Update column_exploration html doc

* Update column_exploration html doc

* Add sorting support for column (#99)

* Add rough version of `sort-column` with some tests

* Add basic docstring

* Add support for `:asc` and `:desc` to sort-column

* Add note to handle missing values

* Make slight improvement to sort-column docstringa

* Improve support for missing values for column api (#101)

* Export tech.ml.dataset `select` fn for column api

* Update docstring exported to api

* Update column-exploration with basic illustration of select

* Add `slice`

* clean up tests a bit

* Improve `slice` docstring slightly

* Export `slice` to column api

* Add stuff about `slice` to column exploration doc

* Move accesssing & subsetting seciton above basic ops

* Update column_expolration.html

* Update comment block

* Add column-map wrapper over tech.v3.dataset.column/column-mapping

* Accepts columns in the first position to support use with pipes
* If `col` is a vector of columns, then map-fn is run on all

* Fix arg name

* Clean up

* Add iteration to column exploration and reorganize

* Add column-map to column api_template

* Add example of using column-map with multiple columns

* Update column_exploration html doc

* Update column_exploration html doc

* Export tech.v3.dataset.column's missing fns

* Remove `set-missing`

I think this may be more of an internal fn

* Add `count-missing` function

* Add test for `sort-column` for missing values

* Activate test that wil now pass due to tmd upgrade

* Add sort-column to api-template

* Add sort-column section to column_exploration doc

* Add more missing apidoc

* move fns to their own namespace to mirror main tc api
* add `drop-missing` and `replace-missing`

* Add details about missing api to column exploration

* Add a exmaple of using count to column exploration

* Add a few simple tests for missing ns

* Fix docstrings

* Add proof of concept

* Consolidate tablecloth.column.api/operators args (#106)

* Conslidate ops args to x y z

* Fix lift op for comparison ops

* Update lift-op fn to handle multiple ar lookups

Case that required this was the comparison ops. We
want (> x y z) from (> lhs rhs) (> lhs mid rhs). We
can't universally map y to rhs because it would be
wront for the 3-arity option.

* Lift column ops to the dataset level (#107)

* Readme: Replace `lein test` with `lein midje`

* Add proof of concept for lifting

* Clean up

* Fix magnitude arguments

* Fix typo breaking lift operation for `magnitude

* Save prototype working example that handles optional arguments

* Clean up

* Reorganize codegen utilities

* moved hopefully common utilities up  into 'tablecloth.utils.codegen
* retooled those helpers in that ns to be a bit more accessible (WIP)

* Clean up

* Clean up

* Rejigger codegen for column ops to take just fn-sym arglists

* Try lifting all column ops to ds (no tests yet)

* Exclude ops that do not potentially return column

* Do not lift options that do not return columns

* Add docstrings for some codegen

Also regenerated operators to make sure tests pass.

* Add docstring to ds col ops

* version bump and small fix

* Modify ds-level lift op to also return fn that returns column

This is a breaking change for the column api lifting until I adapt
the lift-op to the changes made in the codegen where the argument
is supplied in data rather than within a fn.

* example added for replace-missing

* Add tests for ops that take inf number of cols

* Add tests for ops returning ds taking max of three cols

* Add tests for ops returning ds and taking two columns max

* Test for ops returning ds and max of one column

* Add more functions to test for ops taking one col

* Clean up

* Lifted ops taking one column and returning a scalar

* Lift functions taking two columns and returning a scalar

* Clean up

* Clean up

* bump to 7.000-beta-50

* fixes #108

* hashing in joins enabled for every case

* 7.000-beta-51

* Clean up

* Lift functions taking 1 col and returning scalar

* Adjust column api lift ops to new declarative syntax

* Adjust lift plan for tablecloth.column.api for tmd v7

* Remove mention of tech.ml.datatype

* Add missing word

* Bump tmd version to 7.006 for fix to fns that were erroring

fns are: quartiles-1, quartiles-3 and median

* Fixing more tests

* Comment some code to keep around for a spell

* Remove special lift op for 'round

It's arugments were fixed.

* Cleanup

* 7.007

---------

Co-authored-by: Teodor Heggelund <git@teod.eu>
Co-authored-by: genmeblog <38646601+genmeblog@users.noreply.github.com>
Co-authored-by: GenerateMe <generateme.blog@gmail.com>
Co-authored-by: adham-omran <git@adham-omran.com>

* Ethan/lift scalar ops to ds as aggregators (#118)

* Fix indentation

* Save rough working example

Not fully tested

* Fix tests for new aggregator form of ops that return scalar

* Add `column` API documentation (#120)

* Add a sample notebook file

* Save draft work on column api doc

* Add doc entry for tcc/select boolean select

This appears to be broken now, but ti shouldn't be.

* Export column api operators in column api ns

* Add in some documentation of operations

* Hide namespace expression from generated doc

* Fix circular dependency

* Update generated docs

* Update text in colum operations section

* More updates to the docs

* Remove "Functionality" header in TOC

This way Dataset is an entry, and I can add Column after that.

* Add Column API documentation

* Add an indication of column op signature to docs

* Export lifted column operators in dataset api template

* Add documentation for column operations on datasets

* Some minor changes

* Rename the two headers for Dataset and Column, adding API onto the
end.
* A few small fixes.

* Remove the `Functions` section

This is essentially replaced by the Column API that lifts these
functions into Tablecloth

* Try to remove cyclical dependency

* Revert "Try to remove cyclical dependency"

This reverts commit fcb16c4.

* Fix circular dependency

* Actually fix cyclical dependency

* Undo added line

* Try deploying a documentation preview

* Add preview-branch to docs preview action

Default was gh-pages, we use master.

* Try adding umbrella-dir setting

* Try removing docs folder in umbrella-dir

* Remove old pr docs preview workflow

* Regenerated docs after merge from master

* Add section about column missing values to docs

* Regenerated docs after merge from master

* Remove draft notebook

* Remove temporary trigger for dev branch since it was target of prs

---------

Co-authored-by: Teodor Heggelund <git@teod.eu>
Co-authored-by: genmeblog <38646601+genmeblog@users.noreply.github.com>
Co-authored-by: GenerateMe <generateme.blog@gmail.com>
Co-authored-by: adham-omran <git@adham-omran.com>
  • Loading branch information
5 people authored Apr 13, 2024
1 parent f551a6a commit 9d72a88
Show file tree
Hide file tree
Showing 50 changed files with 22,378 additions and 2,798 deletions.
704 changes: 704 additions & 0 deletions docs/.clay.html

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions docs/.clay_files/bootstrap0.css

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions docs/.clay_files/html-default0.js

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions docs/.clay_files/html-default1.js

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions docs/.clay_files/html-default2.js

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions docs/.clay_files/html-default3.js

Large diffs are not rendered by default.

716 changes: 716 additions & 0 deletions docs/column_api.html

Large diffs are not rendered by default.

740 changes: 740 additions & 0 deletions docs/column_api.qmd

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions docs/column_api_files/bootstrap0.css

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions docs/column_api_files/bootstrap2.css

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions docs/column_api_files/html-default1.js

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions docs/column_api_files/html-default2.js

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions docs/column_api_files/html-default3.js

Large diffs are not rendered by default.

2,018 changes: 2,018 additions & 0 deletions docs/column_api_files/libs/bootstrap/bootstrap-icons.css

Large diffs are not rendered by default.

Binary file not shown.
10 changes: 10 additions & 0 deletions docs/column_api_files/libs/bootstrap/bootstrap.min.css

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions docs/column_api_files/libs/bootstrap/bootstrap.min.js

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions docs/column_api_files/libs/clipboard/clipboard.min.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions docs/column_api_files/libs/quarto-html/anchor.min.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions docs/column_api_files/libs/quarto-html/popper.min.js

Large diffs are not rendered by default.

189 changes: 189 additions & 0 deletions docs/column_api_files/libs/quarto-html/quarto-syntax-highlighting.css

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 9d72a88

Please sign in to comment.