Build API for array processing built on dtype-next #48

ezmiller · 2021-08-14T22:45:04Z

Goal

Currently, tablecloth provides an easy-to-use wrapper over tech.ml.dataset’s high-performance dataset processing constructs. One part of the tech.ml stack that tablecloth has not directly covered is dtype-next, which provides a highly performant basis for array-like numerical processing, similar to Numpy. The project I am proposing aims to wrap dtype-next within tablecloth, providing a new easy-to-use API for numerical structures for the emerging Clojure data processing ecosystem.

Rough Outline of Steps

During this project, I will focus on the following tasks:

Add a new function to tablecloth (perhaps named column or array) that will return a typed, countable, random-access data structure backed by dtype-next’s abstractions;
Design two API pathways to interact with this structure: one that realizes the data fully at each step, providing more straightforward but less efficient interaction; and another, more performant but slightly harder to use, that allows users to wrap processing steps in a "transaction";
Mimic the Numpy (and possibly R vector) APIs ensuring an equivalently complete functional interface for numerical processing;
Ensure support reading-friendly format for printing columns in the Clojure REPL (see reading-friendly format for printing columns techascent/tech.ml.dataset#203);
Validate the usefulness of the API by implementing real-world examples with various characteristics (missing values, various data types, challenging sizes, etc.) and comparing the ergonomics with other platforms such as Python and R.

Open Questions

What will the name of this entity be? Some options could be: array, column, buffer, column-vector.
Does it make sense for this API to live within tablecloth or might we want to break it out into its own library?
Are there ways that this work needs to align with the work that @ribelo and @genmeblog are doing to define a syntax for operations on dataset columns (e.g. Expose dtype-next column functions in tablecloth.api ns #47 )?

The text was updated successfully, but these errors were encountered:

ezmiller added the enhancement New feature or request label Aug 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build API for array processing built on dtype-next #48

Build API for array processing built on dtype-next #48

ezmiller commented Aug 14, 2021 •

edited

Loading

Build API for array processing built on dtype-next #48

Build API for array processing built on dtype-next #48

Comments

ezmiller commented Aug 14, 2021 • edited Loading

Goal

Rough Outline of Steps

Open Questions

ezmiller commented Aug 14, 2021 •

edited

Loading