Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation workflow based on Kindly & Clay #119

Merged
merged 64 commits into from
Dec 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
d7da9d5
merging clay-and-note-to-test into the current tablecloth master
daslu Aug 19, 2023
b582db5
generated tests for the old version
daslu Aug 19, 2023
2f55119
adapted notebook and test to current version
daslu Aug 19, 2023
30e7064
quarto setup
daslu Aug 21, 2023
c4370b2
test deps
daslu Aug 21, 2023
ed67d14
regenerated tests
daslu Aug 21, 2023
7d3f8a3
README
daslu Aug 21, 2023
843b2ff
README
daslu Aug 21, 2023
5bfe8c3
README
daslu Aug 21, 2023
0945db2
README
daslu Aug 21, 2023
2e62a70
README
daslu Aug 21, 2023
1bfe39e
README
daslu Aug 21, 2023
5e53675
added remote-repo info to docs
daslu Aug 21, 2023
a1b6ec3
README
daslu Sep 5, 2023
639ca6e
Merge branch 'master' into clay-and-note-to-test-1
daslu Sep 22, 2023
0622945
updated tutorial namespace
daslu Sep 22, 2023
dad6c2f
adapting tutorial namespace to kindly and clay changes
daslu Sep 22, 2023
e9484cf
deps
daslu Sep 22, 2023
aba7545
some skipping of problematic tests
daslu Sep 22, 2023
8eb2216
updated generated tests (currently broken)
daslu Sep 22, 2023
f1184ee
fixed .dir-locals.el
daslu Sep 22, 2023
72b9d58
cleaning up old tutorial
daslu Sep 22, 2023
1b5f5d5
adapting to Clay version 1-alpha40
daslu Nov 25, 2023
5b12aff
Clay version
daslu Nov 25, 2023
3f6222c
rerendered tutorial
daslu Nov 25, 2023
b71025d
Merge remote-tracking branch 'origin/master' into clay-and-note-to-te…
daslu Dec 8, 2023
d26d7c1
docs: using comments rather than `kind/md` in some places, to avoid n…
daslu Dec 8, 2023
68f1a25
clay version
daslu Dec 8, 2023
e329379
README
daslu Dec 8, 2023
3605b72
updated docs
daslu Dec 8, 2023
3a87d85
typo
daslu Dec 8, 2023
553db1a
clay version
daslu Dec 12, 2023
7e44d8a
docs - cleanup
daslu Dec 13, 2023
b736d74
adapting to Clay 2-alpha52
daslu Dec 13, 2023
187e7b2
docs: removed problematic "----" decorations that confused quarto
daslu Dec 13, 2023
a5c55b2
rerendered tutorial
daslu Dec 13, 2023
5ae6258
cleanup
daslu Dec 13, 2023
9b9c81b
cleaning up auto generated tests for now
daslu Dec 13, 2023
7c56d8e
moved `conversion.clj` from `src` to `dev`
daslu Dec 13, 2023
fd691e0
cleaning up an old attempt to generate tests
daslu Dec 13, 2023
1a019ac
fixed path in README
daslu Dec 13, 2023
572745d
brought README.md updates back to README.Rmd
daslu Dec 14, 2023
1fd8773
renamed clj tutorial to draft.clj
daslu Dec 14, 2023
191d223
brought back the usual Rmarkdown-based `docs/index.html` (till the ne…
daslu Dec 14, 2023
ed02cac
rendered draft tutorial
daslu Dec 14, 2023
8ca0493
Merge remote-tracking branch 'origin/master' into clay-and-note-to-te…
daslu Dec 14, 2023
d58ae17
fixed paths on README
daslu Dec 16, 2023
940e73f
fixed more references of index to draft in README
daslu Dec 17, 2023
4f379f5
gitignore
daslu Dec 24, 2023
6eeb55a
fixed dev conversion namespace
daslu Dec 24, 2023
9b386e9
moved old docs under `docs/old/`
daslu Dec 24, 2023
b48f3ea
added a script for README generation to replace the Rmarkdown-based w…
daslu Dec 24, 2023
af74f3e
renamed main Kindly doc from `draft` to `index`
daslu Dec 24, 2023
1ad8f03
rendered Kindly doc `index` (replacing `draft`)
daslu Dec 24, 2023
b3e9fcf
gitignore
daslu Dec 24, 2023
4ca5cbf
clay config, notebook title
daslu Dec 24, 2023
0822628
README generation script -- actually overwrite README.md
daslu Dec 24, 2023
7538e5d
added README-source.md -- source for evaluated README.md
daslu Dec 24, 2023
ad3f4ff
fixed readme generation main function
daslu Dec 24, 2023
d60b334
generated README.md using README-source.md for the first time
daslu Dec 24, 2023
2f7b197
moved the old README.Rmd source under docs/old/
daslu Dec 24, 2023
ba40402
updated README
daslu Dec 24, 2023
903cb5a
README minor fixes
daslu Dec 24, 2023
3d0700c
README minor updates
daslu Dec 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .dir-locals.el
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
((nil
.
((cider-clojure-cli-aliases
.
"dev"))))
daslu marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,6 @@ pom.xml.asc
.R*
*.txt*
.lsp/
.clj-kondo/
.clj-kondo/
.clay.html
*.qmd
132 changes: 132 additions & 0 deletions README-source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Tablecloth

Dataset (data frame) manipulation API for the tech.ml.dataset library


[![](https://img.shields.io/clojars/v/scicloj/tablecloth)](https://clojars.org/scicloj/tablecloth)
[![](https://api.travis-ci.org/scicloj/tablecloth.svg?branch=master)](https://travis-ci.org/github/scicloj/tablecloth)
[![](https://img.shields.io/badge/zulip-discussion-yellowgreen)](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api)

## Versions

### tech.ml.dataset 7.x (master branch)

[![](https://img.shields.io/clojars/v/scicloj/tablecloth)](https://clojars.org/scicloj/tablecloth)

### tech.ml.dataset 4.x (4.0 branch)

`[scicloj/tablecloth "4.04"]`

## Introduction

[tech.ml.dataset](https://github.com/techascent/tech.ml.dataset) is a great and fast library which brings columnar dataset to the Clojure. Chris Nuernberger has been working on this library for last year as a part of bigger `tech.ml` stack.

I've started to test the library and help to fix uncovered bugs. My main goal was to compare functionalities with the other standards from other platforms. I focused on R solutions: [dplyr](https://dplyr.tidyverse.org/), [tidyr](https://tidyr.tidyverse.org/) and [data.table](https://rdatatable.gitlab.io/data.table/).

During conversions of the examples I've come up how to reorganized existing `tech.ml.dataset` functions into simple to use API. The main goals were:

* Focus on dataset manipulation functionality, leaving other parts of `tech.ml` like pipelines, datatypes, readers, ML, etc.
* Single entry point for common operations - one function dispatching on given arguments.
* `group-by` results with special kind of dataset - a dataset containing subsets created after grouping as a column.
* Most operations recognize regular dataset and grouped dataset and process data accordingly.
* One function form to enable thread-first on dataset.

Important! This library is not the replacement of `tech.ml.dataset` nor a separate library. It should be considered as a addition on the top of `tech.ml.dataset`.

If you want to know more about `tech.ml.dataset` and `dtype-next` please refer their documentation:

* [tech.ml.dataset walkthrough](https://techascent.github.io/tech.ml.dataset/walkthrough.html)
* [dtype-next overview](https://cnuernber.github.io/dtype-next/overview.html)
* [dtype-next cheatsheet](https://cnuernber.github.io/dtype-next/cheatsheet.html)

Join the discussion on [Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api)

## Documentation

Please refer [detailed documentation with examples](https://scicloj.github.io/tablecloth).

The old documentation (till the end of 2023) is [here](https://scicloj.github.io/tablecloth/old).

## Usage example

```{clojure results="hide"}
(require '[tablecloth.api :as tc])
```

```{clojure results="asis"}
(-> "https://raw.githubusercontent.com/techascent/tech.ml.dataset/master/test/data/stocks.csv"
(tc/dataset {:key-fn keyword})
(tc/group-by (fn [row]
{:symbol (:symbol row)
:year (tech.v3.datatype.datetime/long-temporal-field :years (:date row))}))
(tc/aggregate #(tech.v3.datatype.functional/mean (% :price)))
(tc/order-by [:symbol :year])
(tc/head 10))
```

## Contributing

`Tablecloth` is open for contribution. The best way to start is discussion on [Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api).

### Development tools for documentation

Documentation is written in the [Kindly](https://scicloj.github.io/kindly/) convention and is rendered using [Clay](https://scicloj.github.io/clay/) composed with [Quarto](https://quarto.org/).

The old documentation was written in RMarkdown and is kept under [docs/old/](./docs/old/).

Documentation contains around 600 code snippets which are run during build. There are two source files:

* [README-source.md](./README-source.md) for README.md
* [notebooks/index.clj](./notebooks/index.clj) for the detailed documentation

(`notebooks/index.clj` was generated by [dev/conversion.clj](dev/conversion.clj) from the earlier Rmarkdown-based `index.Rmd` with asome additional manual editing. Starting at 2024, it will diverge from that source, that will no longer be maintained.)

### README generation

To generate `README.md`, run the `generate!` function at the [dev/readme_teneration.clj](./dev/readme_teneration.clj) script.

### Detailed documentation generation

To generate the detailed documentation, call the following. You will need the Quarto CLI [installed](https://quarto.org/docs/get-started/) in your system.

```{clojure eval=FALSE}
(require '[scicloj.clay.v2.api :as clay])
(clay/make! {:format [:quarto :html]
:source-path "notebooks/index.clj"})
```


### API file generation

`tablecloth.api` namespace is generated out of `api-template`, please run it before making documentation

```{clojure eval=FALSE}
(exporter/write-api! 'tablecloth.api.api-template
'tablecloth.api
"src/tablecloth/api.clj"
'[group-by drop concat rand-nth first last shuffle])
```

### Guideline

1. Before commiting changes please perform tests. I ususally do: `lein do clean, check, test` and build documentation as described above (which also tests whole library).
2. Keep API as simple as possible:
- first argument should be a dataset
- if parametrizations is complex, last argument should accept a map with not obligatory function arguments
- avoid variadic associative destructuring for function arguments
- usually function should working on grouped dataset as well, accept `parallel?` argument then (if applied).
3. Follow `potemkin` pattern and import functions to the API namespace using `tech.v3.datatype.export-symbols/export-symbols` function
4. Functions which are composed out of API function to cover specific case(s) should go to `tablecloth.utils` namespace.
5. Always update `README.Rmd`, `CHANGELOG.md`, `docs/index.Rmd`, tests and function docs are highly welcomed
6. Always discuss changes and PRs first

## TODO

* tests
* tutorials

## Licence

Copyright (c) 2020 Scicloj

The MIT Licence
184 changes: 85 additions & 99 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# Tablecloth

Dataset (data frame) manipulation API for the tech.ml.dataset library


[![](https://img.shields.io/clojars/v/scicloj/tablecloth)](https://clojars.org/scicloj/tablecloth)
[![](https://api.travis-ci.org/scicloj/tablecloth.svg?branch=master)](https://travis-ci.org/github/scicloj/tablecloth)
[![](https://img.shields.io/badge/zulip-discussion-yellowgreen)](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api)
Expand All @@ -14,61 +19,42 @@

## Introduction

[tech.ml.dataset](https://github.com/techascent/tech.ml.dataset) is a
great and fast library which brings columnar dataset to the Clojure.
Chris Nuernberger has been working on this library for last year as a
part of bigger `tech.ml` stack.

I’ve started to test the library and help to fix uncovered bugs. My main
goal was to compare functionalities with the other standards from other
platforms. I focused on R solutions:
[dplyr](https://dplyr.tidyverse.org/),
[tidyr](https://tidyr.tidyverse.org/) and
[data.table](https://rdatatable.gitlab.io/data.table/).

During conversions of the examples I’ve come up how to reorganized
existing `tech.ml.dataset` functions into simple to use API. The main
goals were:

- Focus on dataset manipulation functionality, leaving other parts of
`tech.ml` like pipelines, datatypes, readers, ML, etc.
- Single entry point for common operations - one function dispatching
on given arguments.
- `group-by` results with special kind of dataset - a dataset
containing subsets created after grouping as a column.
- Most operations recognize regular dataset and grouped dataset and
process data accordingly.
- One function form to enable thread-first on dataset.

Important! This library is not the replacement of `tech.ml.dataset` nor
a separate library. It should be considered as a addition on the top of
`tech.ml.dataset`.

If you want to know more about `tech.ml.dataset` and `dtype-next` please
refer their documentation:

- [tech.ml.dataset
walkthrough](https://techascent.github.io/tech.ml.dataset/walkthrough.html)
- [dtype-next
overview](https://cnuernber.github.io/dtype-next/overview.html)
- [dtype-next
cheatsheet](https://cnuernber.github.io/dtype-next/cheatsheet.html)

Join the discussion on
[Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api)
[tech.ml.dataset](https://github.com/techascent/tech.ml.dataset) is a great and fast library which brings columnar dataset to the Clojure. Chris Nuernberger has been working on this library for last year as a part of bigger `tech.ml` stack.

I've started to test the library and help to fix uncovered bugs. My main goal was to compare functionalities with the other standards from other platforms. I focused on R solutions: [dplyr](https://dplyr.tidyverse.org/), [tidyr](https://tidyr.tidyverse.org/) and [data.table](https://rdatatable.gitlab.io/data.table/).

During conversions of the examples I've come up how to reorganized existing `tech.ml.dataset` functions into simple to use API. The main goals were:

* Focus on dataset manipulation functionality, leaving other parts of `tech.ml` like pipelines, datatypes, readers, ML, etc.
* Single entry point for common operations - one function dispatching on given arguments.
* `group-by` results with special kind of dataset - a dataset containing subsets created after grouping as a column.
* Most operations recognize regular dataset and grouped dataset and process data accordingly.
* One function form to enable thread-first on dataset.

Important! This library is not the replacement of `tech.ml.dataset` nor a separate library. It should be considered as a addition on the top of `tech.ml.dataset`.

If you want to know more about `tech.ml.dataset` and `dtype-next` please refer their documentation:

* [tech.ml.dataset walkthrough](https://techascent.github.io/tech.ml.dataset/walkthrough.html)
* [dtype-next overview](https://cnuernber.github.io/dtype-next/overview.html)
* [dtype-next cheatsheet](https://cnuernber.github.io/dtype-next/cheatsheet.html)

Join the discussion on [Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api)

## Documentation

Please refer [detailed documentation with
examples](https://scicloj.github.io/tablecloth/index.html)
Please refer [detailed documentation with examples](https://scicloj.github.io/tablecloth).

The old documentation (till the end of 2023) is [here](https://scicloj.github.io/tablecloth/old).

## Usage example

``` clojure
```{clojure}
(require '[tablecloth.api :as tc])
```

``` clojure

```{clojure}
(-> "https://raw.githubusercontent.com/techascent/tech.ml.dataset/master/test/data/stocks.csv"
(tc/dataset {:key-fn keyword})
(tc/group-by (fn [row]
Expand All @@ -78,88 +64,88 @@ examples](https://scicloj.github.io/tablecloth/index.html)
(tc/order-by [:symbol :year])
(tc/head 10))
```

\_unnamed \[10 3\]:
_unnamed [10 3]:

| :symbol | :year | summary |
|---------|------:|-------------:|
| AAPL | 2000 | 21.74833333 |
| AAPL | 2001 | 10.17583333 |
| AAPL | 2002 | 9.40833333 |
| AAPL | 2003 | 9.34750000 |
| AAPL | 2004 | 18.72333333 |
| AAPL | 2005 | 48.17166667 |
| AAPL | 2006 | 72.04333333 |
| AAPL | 2007 | 133.35333333 |
| AAPL | 2008 | 138.48083333 |
| AAPL | 2009 | 150.39333333 |
| AAPL | 2000 | 21.74833333 |
| AAPL | 2001 | 10.17583333 |
| AAPL | 2002 | 9.40833333 |
| AAPL | 2003 | 9.34750000 |
| AAPL | 2004 | 18.72333333 |
| AAPL | 2005 | 48.17166667 |
| AAPL | 2006 | 72.04333333 |
| AAPL | 2007 | 133.35333333 |
| AAPL | 2008 | 138.48083333 |
| AAPL | 2009 | 150.39333333 |



## Contributing

`Tablecloth` is open for contribution. The best way to start is
discussion on
[Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api).
`Tablecloth` is open for contribution. The best way to start is discussion on [Zulip](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/api).

### Development tools for documentation

Documentation is written in RMarkdown, that means that you need R to
create html/md/pdf files. Documentation contains around 600 code
snippets which are run during build. There are two files:
Documentation is written in the [Kindly](https://scicloj.github.io/kindly/) convention and is rendered using [Clay](https://scicloj.github.io/clay/) composed with [Quarto](https://quarto.org/).

The old documentation was written in RMarkdown and is kept under [docs/old/](./docs/old/).

Documentation contains around 600 code snippets which are run during build. There are two source files:

* [README-source.md](./README-source.md) for README.md
* [notebooks/index.clj](./notebooks/index.clj) for the detailed documentation

(`notebooks/index.clj` was generated by [dev/conversion.clj](dev/conversion.clj) from the earlier Rmarkdown-based `index.Rmd` with asome additional manual editing. Starting at 2024, it will diverge from that source, that will no longer be maintained.)

- `README.Rmd`
- `docs/index.Rmd`
### README generation

To generate `README.md`, run the `generate!` function at the [dev/readme_teneration.clj](./dev/readme_teneration.clj) script.

### Detailed documentation generation

To generate the detailed documentation, call the following. You will need the Quarto CLI [installed](https://quarto.org/docs/get-started/) in your system.

```{clojure}
(require '[scicloj.clay.v2.api :as clay])
(clay/make! {:format [:quarto :html]
:source-path "notebooks/index.clj"})
```

Prepare following software:

1. Install [R](https://www.r-project.org/)
2. Install [rep](https://github.com/eraserhd/rep), nRepl client
3. Install `pandoc`
4. Run nRepl
5. Run R and install R packages:
`install.packages(c("rmarkdown","knitr"), dependencies=T)`
6. Load rmarkdown: `library(rmarkdown)`
7. Render readme: `render("README.Rmd","md_document")`
8. Render documentation: `render("docs/index.Rmd","all")`

### API file generation

`tablecloth.api` namespace is generated out of `api-template`, please
run it before making documentation
`tablecloth.api` namespace is generated out of `api-template`, please run it before making documentation

``` clojure
```{clojure}
(exporter/write-api! 'tablecloth.api.api-template
'tablecloth.api
"src/tablecloth/api.clj"
'[group-by drop concat rand-nth first last shuffle])
```


### Guideline

1. Before commiting changes please perform tests. I ususally do:
`lein do clean, check, test` and build documentation as described
above (which also tests whole library).
2. Keep API as simple as possible:
- first argument should be a dataset
- if parametrizations is complex, last argument should accept a
map with not obligatory function arguments
- avoid variadic associative destructuring for function arguments
- usually function should working on grouped dataset as well,
accept `parallel?` argument then (if applied).
3. Follow `potemkin` pattern and import functions to the API namespace
using `tech.v3.datatype.export-symbols/export-symbols` function
4. Functions which are composed out of API function to cover specific
case(s) should go to `tablecloth.utils` namespace.
5. Always update `README.Rmd`, `CHANGELOG.md`, `docs/index.Rmd`, tests
and function docs are highly welcomed
6. Always discuss changes and PRs first
1. Before commiting changes please perform tests. I ususally do: `lein do clean, check, test` and build documentation as described above (which also tests whole library).
2. Keep API as simple as possible:
- first argument should be a dataset
- if parametrizations is complex, last argument should accept a map with not obligatory function arguments
- avoid variadic associative destructuring for function arguments
- usually function should working on grouped dataset as well, accept `parallel?` argument then (if applied).
3. Follow `potemkin` pattern and import functions to the API namespace using `tech.v3.datatype.export-symbols/export-symbols` function
4. Functions which are composed out of API function to cover specific case(s) should go to `tablecloth.utils` namespace.
5. Always update `README.Rmd`, `CHANGELOG.md`, `docs/index.Rmd`, tests and function docs are highly welcomed
6. Always discuss changes and PRs first

## TODO

- tests
- tutorials
* tests
* tutorials

## Licence

Copyright (c) 2020 Scicloj

The MIT Licence
The MIT Licence
8 changes: 8 additions & 0 deletions clay.edn
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{:remote-repo {:git-url "https://github.com/daslu/tablecloth"
:branch "main"}
:quarto {:format {:html {:toc false
:theme :spacelab}}
:highlight-style :solarized
:code-block-background true
:include-in-header {:text "<link rel = \"icon\" href = \"data:,\" />"}
:title "Tablecloth documentation"}}
Loading
Loading