You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, tablecloth provides an easy-to-use wrapper over tech.ml.dataset’s high-performance dataset processing constructs. One part of the tech.ml stack that tablecloth has not directly covered is dtype-next, which provides a highly performant basis for array-like numerical processing, similar to Numpy. The project I am proposing aims to wrap dtype-next within tablecloth, providing a new easy-to-use API for numerical structures for the emerging Clojure data processing ecosystem.
Rough Outline of Steps
During this project, I will focus on the following tasks:
Add a new function to tablecloth (perhaps named column or array) that will return a typed, countable, random-access data structure backed by dtype-next’s abstractions;
Design two API pathways to interact with this structure: one that realizes the data fully at each step, providing more straightforward but less efficient interaction; and another, more performant but slightly harder to use, that allows users to wrap processing steps in a "transaction";
Mimic the Numpy (and possibly R vector) APIs ensuring an equivalently complete functional interface for numerical processing;
Validate the usefulness of the API by implementing real-world examples with various characteristics (missing values, various data types, challenging sizes, etc.) and comparing the ergonomics with other platforms such as Python and R.
Open Questions
What will the name of this entity be? Some options could be: array, column, buffer, column-vector.
Does it make sense for this API to live within tablecloth or might we want to break it out into its own library?
Goal
Currently, tablecloth provides an easy-to-use wrapper over tech.ml.dataset’s high-performance dataset processing constructs. One part of the tech.ml stack that tablecloth has not directly covered is dtype-next, which provides a highly performant basis for array-like numerical processing, similar to Numpy. The project I am proposing aims to wrap dtype-next within tablecloth, providing a new easy-to-use API for numerical structures for the emerging Clojure data processing ecosystem.
Rough Outline of Steps
During this project, I will focus on the following tasks:
column
orarray
) that will return a typed, countable, random-access data structure backed by dtype-next’s abstractions;Open Questions
array
,column
,buffer
,column-vector
.dtype-next
column functions intablecloth.api
ns #47 )?The text was updated successfully, but these errors were encountered: