-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lift tech.v3.datatype.functional
operations
#90
Merged
ezmiller
merged 74 commits into
ethan/column-api-dev-branch-1
from
ethan/lift-dtype-next-functional-ops
Feb 10, 2023
Merged
Lift tech.v3.datatype.functional
operations
#90
ezmiller
merged 74 commits into
ethan/column-api-dev-branch-1
from
ethan/lift-dtype-next-functional-ops
Feb 10, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…nto ethan/column-type-api
* We decided that the default meaning of type points to the "concrete" type, and not the general type. * So `types` now returns the set of concrete types and `general-types` returns the general types.
This reverts commit d93e34f.
…xt-functional-ops
…xt-functional-ops
kendalls, pearsons, and spearmans
Almalibro
referenced
this pull request
in clojurists-together/clojuriststogether.org
Feb 8, 2023
ezmiller
added a commit
that referenced
this pull request
Apr 13, 2024
* Add namespace stub * Add super naive colunn fn * Add some simple column fns * Add typeof function for column * Save work on column exploration doc * Upgrade to latest clay version * Save scratch work in column.clj * Polishing up existing column fns * added some docstrings * re-organized a little * Move column ns into own domain tablecloth.column.api * Add tests for `tablecloth.column.api/column` * Add tests for `zeros` and `ones` * Use api template to write public api * Write tests against `tablecloth.column.api.column` ns * Add column exploration html * Add `typeof?` function to check datatype of column els * Use buffer when creating zeros & ones columns * Use `dtype` alias in ns * Add comment to code snippet generating column api * Fix comment syntax * Use `tech.v3.datatype/const-reader` for `zeros` and `ones` function * Update type interface to use type hierarchy in tablecloth.api.util (#76) * Add ->general-types function * Add a general type :logical * Use type hierarchy in tablecloth.api.utils for `typeof` functions * Add column dev branch to pr workflow * Add tests for typeof * Fix tests for typeof * Return the concrete type from `typeof` * Simplify `concrete-types` fn * Optimize ->general-types by using static lookup * Adjust fns listing types * We decided that the default meaning of type points to the "concrete" type, and not the general type. * So `types` now returns the set of concrete types and `general-types` returns the general types. * Revert "Adjust fns listing types" This reverts commit d93e34f. * Fix `typeof` test to test for concerete types * Reorganize `typeof?` tests * Reword docstring for `typeof?` slightly * Update column api template and add missing `typeof?` * Add commment to `general-types-lookup` * Improve `->general-types` docstring * Add `general-types` fn that returns sets of general types * Adjust util `types` fn to return concrete types * Lift `tech.v3.datatype.functional` operations (#90) * Add ->general-types function * Add a general type :logical * Use type hierarchy in tablecloth.api.utils for `typeof` functions * Add column dev branch to pr workflow * Add tests for typeof * Fix tests for typeof * Return the concrete type from `typeof` * Simplify `concrete-types` fn * Optimize ->general-types by using static lookup * Adjust fns listing types * We decided that the default meaning of type points to the "concrete" type, and not the general type. * So `types` now returns the set of concrete types and `general-types` returns the general types. * Revert "Adjust fns listing types" This reverts commit d93e34f. * Fix `typeof` test to test for concerete types * Reorganize `typeof?` tests * Reword docstring for `typeof?` slightly * Update column api template and add missing `typeof?` * Add commment to `general-types-lookup` * Improve `->general-types` docstring * Add `general-types` fn that returns sets of general types * Adjust util `types` fn to return concrete types * Save changes to column api.clj * Save ongoing experiments with lifting * Save ongoing work on lifting * Adjust lift-ops-1 to handle any number of args with rest arg * Working `rearrange-args` fn * Save work actually writing lifted fns * Saving first attempt to writer operators * Add `percentiiles test * Adjust `rearrange-args to take new-args in option map * Unify two lift functions * Add in docstrings when present * Move lift utils into utils ns * Rename lifting namespaces * Lift some more fns * Make exclusions for ns header helper an arg * Add new operators and tests * Add ops with lhs rhs arg pattern * Lift '* * Add require to operators ns for utils * Update test to make it more complete * Lift `equals * Make test more accurate * Reorganize tests * Fix grammar * Lift 'shift * Uncomment 'or test * Lift 'normalize op * Life 'magnitude * Lifting bit manipulation ops * lift ieee-remainder * Lifting more functions * Add excludes * Lift a bunch of new functions * Alphebetize some lists * More alphebitization * Clean up * Instead of using `col` as arg conform to using `x & and `y * Temporarily disable failing test fix in 7.000-beta23 * Disable the correct test * Just some minor cleanup in op tests * Some more cleanup/reorg in op tests * Update generated operators namespace with switch from col -> x etc * Lift 'descriptive-statistics * Fix messed up test layout * Lift 'quartiles * Lift 'fill-range and a bunch of reduce operations * Lift 'mean-fast 'sum-fast 'magnitude-squared * Lift correlation fns kendalls, pearsons, and spearmans * Lift cumulative ops * cleanup * Bring column exploration doc up-to-date (#95) * Upgrade to latest clay version * Show using tablecloth.column.api.operators ns * Cleanup whitespace * Add method for subsetting (#96) * Export tech.ml.dataset `select` fn for column api * Update docstring exported to api * Update column-exploration with basic illustration of select * Add `slice` * clean up tests a bit * Improve `slice` docstring slightly * Export `slice` to column api * Add stuff about `slice` to column exploration doc * Move accesssing & subsetting seciton above basic ops * Update column_expolration.html * Update comment block * Add iteration support by wrapping tech.v3.dataset.column/column-map (#97) * Export tech.ml.dataset `select` fn for column api * Update docstring exported to api * Update column-exploration with basic illustration of select * Add `slice` * clean up tests a bit * Improve `slice` docstring slightly * Export `slice` to column api * Add stuff about `slice` to column exploration doc * Move accesssing & subsetting seciton above basic ops * Update column_expolration.html * Update comment block * Add column-map wrapper over tech.v3.dataset.column/column-mapping * Accepts columns in the first position to support use with pipes * If `col` is a vector of columns, then map-fn is run on all * Fix arg name * Clean up * Add iteration to column exploration and reorganize * Add column-map to column api_template * Add example of using column-map with multiple columns * Update column_exploration html doc * Update column_exploration html doc * Add sorting support for column (#99) * Add rough version of `sort-column` with some tests * Add basic docstring * Add support for `:asc` and `:desc` to sort-column * Add note to handle missing values * Make slight improvement to sort-column docstringa * Improve support for missing values for column api (#101) * Export tech.ml.dataset `select` fn for column api * Update docstring exported to api * Update column-exploration with basic illustration of select * Add `slice` * clean up tests a bit * Improve `slice` docstring slightly * Export `slice` to column api * Add stuff about `slice` to column exploration doc * Move accesssing & subsetting seciton above basic ops * Update column_expolration.html * Update comment block * Add column-map wrapper over tech.v3.dataset.column/column-mapping * Accepts columns in the first position to support use with pipes * If `col` is a vector of columns, then map-fn is run on all * Fix arg name * Clean up * Add iteration to column exploration and reorganize * Add column-map to column api_template * Add example of using column-map with multiple columns * Update column_exploration html doc * Update column_exploration html doc * Export tech.v3.dataset.column's missing fns * Remove `set-missing` I think this may be more of an internal fn * Add `count-missing` function * Add test for `sort-column` for missing values * Activate test that wil now pass due to tmd upgrade * Add sort-column to api-template * Add sort-column section to column_exploration doc * Add more missing apidoc * move fns to their own namespace to mirror main tc api * add `drop-missing` and `replace-missing` * Add details about missing api to column exploration * Add a exmaple of using count to column exploration * Add a few simple tests for missing ns * Fix docstrings * Add proof of concept * Consolidate tablecloth.column.api/operators args (#106) * Conslidate ops args to x y z * Fix lift op for comparison ops * Update lift-op fn to handle multiple ar lookups Case that required this was the comparison ops. We want (> x y z) from (> lhs rhs) (> lhs mid rhs). We can't universally map y to rhs because it would be wront for the 3-arity option. * Lift column ops to the dataset level (#107) * Readme: Replace `lein test` with `lein midje` * Add proof of concept for lifting * Clean up * Fix magnitude arguments * Fix typo breaking lift operation for `magnitude * Save prototype working example that handles optional arguments * Clean up * Reorganize codegen utilities * moved hopefully common utilities up into 'tablecloth.utils.codegen * retooled those helpers in that ns to be a bit more accessible (WIP) * Clean up * Clean up * Rejigger codegen for column ops to take just fn-sym arglists * Try lifting all column ops to ds (no tests yet) * Exclude ops that do not potentially return column * Do not lift options that do not return columns * Add docstrings for some codegen Also regenerated operators to make sure tests pass. * Add docstring to ds col ops * version bump and small fix * Modify ds-level lift op to also return fn that returns column This is a breaking change for the column api lifting until I adapt the lift-op to the changes made in the codegen where the argument is supplied in data rather than within a fn. * example added for replace-missing * Add tests for ops that take inf number of cols * Add tests for ops returning ds taking max of three cols * Add tests for ops returning ds and taking two columns max * Test for ops returning ds and max of one column * Add more functions to test for ops taking one col * Clean up * Lifted ops taking one column and returning a scalar * Lift functions taking two columns and returning a scalar * Clean up * Clean up * bump to 7.000-beta-50 * fixes #108 * hashing in joins enabled for every case * 7.000-beta-51 * Clean up * Lift functions taking 1 col and returning scalar * Adjust column api lift ops to new declarative syntax * Adjust lift plan for tablecloth.column.api for tmd v7 * Remove mention of tech.ml.datatype * Add missing word * Bump tmd version to 7.006 for fix to fns that were erroring fns are: quartiles-1, quartiles-3 and median * Fixing more tests * Comment some code to keep around for a spell * Remove special lift op for 'round It's arugments were fixed. * Cleanup * 7.007 --------- Co-authored-by: Teodor Heggelund <git@teod.eu> Co-authored-by: genmeblog <38646601+genmeblog@users.noreply.github.com> Co-authored-by: GenerateMe <generateme.blog@gmail.com> Co-authored-by: adham-omran <git@adham-omran.com> * Ethan/lift scalar ops to ds as aggregators (#118) * Fix indentation * Save rough working example Not fully tested * Fix tests for new aggregator form of ops that return scalar * Add `column` API documentation (#120) * Add a sample notebook file * Save draft work on column api doc * Add doc entry for tcc/select boolean select This appears to be broken now, but ti shouldn't be. * Export column api operators in column api ns * Add in some documentation of operations * Hide namespace expression from generated doc * Fix circular dependency * Update generated docs * Update text in colum operations section * More updates to the docs * Remove "Functionality" header in TOC This way Dataset is an entry, and I can add Column after that. * Add Column API documentation * Add an indication of column op signature to docs * Export lifted column operators in dataset api template * Add documentation for column operations on datasets * Some minor changes * Rename the two headers for Dataset and Column, adding API onto the end. * A few small fixes. * Remove the `Functions` section This is essentially replaced by the Column API that lifts these functions into Tablecloth * Try to remove cyclical dependency * Revert "Try to remove cyclical dependency" This reverts commit fcb16c4. * Fix circular dependency * Actually fix cyclical dependency * Undo added line * Try deploying a documentation preview * Add preview-branch to docs preview action Default was gh-pages, we use master. * Try adding umbrella-dir setting * Try removing docs folder in umbrella-dir * Remove old pr docs preview workflow * Regenerated docs after merge from master * Add section about column missing values to docs * Regenerated docs after merge from master * Remove draft notebook * Remove temporary trigger for dev branch since it was target of prs --------- Co-authored-by: Teodor Heggelund <git@teod.eu> Co-authored-by: genmeblog <38646601+genmeblog@users.noreply.github.com> Co-authored-by: GenerateMe <generateme.blog@gmail.com> Co-authored-by: adham-omran <git@adham-omran.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Goal
This PR is "lifting" the functions in tech.v3.datatype.functional into the tablecloth.column.api. The rationale behind this is twofold. We want to prevent users from needing to import the functional namespace from dtype-next, and we want to normalize the behavior of these functions such that they return columns.
Later these functions will correspond to similar dataset-level functions like those that @ribelo was working on in #47.
So what this PR should do:
tech.v3.datatype.functional
Approach
There's some existing code in
tablecloth.pipeline
that "lifts" tablecloth functions so that they work with the context object for doing machine learning workflows using scicloj.ml. @behrica mentioned that this method had worked, but had some limits especially regarding tooling. So he suggested using successive code generation to write code before compile time.So that's what is happening here now. There are a couple of namespaces that can be used to write new "lifted" functions to a namespace
tablecloth.column.api.operators
. These functions simply call the original function fromtech.v3.datatype.functional
, check its return value, and if it is an:iterable
, we pack it into a column and return.There's also some code generation machinery to control what the user experiences as the API:
col