Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative docs paths #602

Merged
merged 7 commits into from
Dec 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 3 additions & 11 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,26 +9,18 @@ install:
pip install -e ".[dev]"
pre-commit install

doctest:
python -m doctest -v sklego/*.py

test-notebooks:
pytest --nbval-lax doc/*.ipynb

test: doctest
test:
pytest --disable-warnings --cov=sklego
rm -rf .coverage*
pytest --nbval-lax doc/*.ipynb

precommit:
pre-commit run

docs:
pip install -e ".[docs]"
mkdocs serve

docs-deploy: docs
netlify deploy --dir=docs --prod
docs-deploy:
mkdocs gh-deploy

clean:
rm -rf .pytest_cache
Expand Down
2 changes: 1 addition & 1 deletion docs/contribution.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Contribution

<p align="center">
<img src="/_static/contribution/contribute.png" />
<img src="../_static/contribution/contribute.png" />
</p>

This project started because we saw people rewrite the same transformers and estimators at clients over and over again.
Expand Down
4 changes: 2 additions & 2 deletions docs/rstudio.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ ggplot(data=cv_df) +
```

<p align="center">
<img src="/_static/rstudio/Rplot1.png" />
<img src="../_static/rstudio/Rplot1.png" />
</p>

```r
Expand All @@ -123,7 +123,7 @@ ggplot(data=cv_df) +
```

<p align="center">
<img src="/_static/rstudio/Rplot2.png" />
<img src="../_static/rstudio/Rplot2.png" />
</p>

## Important
Expand Down
12 changes: 6 additions & 6 deletions docs/user-guide/cross-validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,37 +27,37 @@ Let's make some random data to start with, and next define a plotting function.
--8<-- "docs/_scripts/cross-validation.py:example-1"
```

![example-1](/_static/cross-validation/example-1.png)
![example-1](../_static/cross-validation/example-1.png)

```py title="Example 2"
--8<-- "docs/_scripts/cross-validation.py:example-2"
```

![example-2](/_static/cross-validation/example-2.png)
![example-2](../_static/cross-validation/example-2.png)

`window="expanding"` is the closest to scikit-learn implementation:

```py title="Example 3"
--8<-- "docs/_scripts/cross-validation.py:example-3"
```

![example-3](/_static/cross-validation/example-3.png)
![example-3](../_static/cross-validation/example-3.png)

If `train_duration` is not passed the training duration is the maximum without overlapping validation folds:

```py title="Example 4"
--8<-- "docs/_scripts/cross-validation.py:example-4"
```

![example-4](/_static/cross-validation/example-4.png)
![example-4](../_static/cross-validation/example-4.png)

If train and valid duration would lead to unwanted amounts of splits n_splits can set a maximal amount of splits

```py title="Example 5"
--8<-- "docs/_scripts/cross-validation.py:example-5"
```

![example-5](/_static/cross-validation/example-5.png)
![example-5](../_static/cross-validation/example-5.png)

```py title="Summary"
--8<-- "docs/_scripts/cross-validation.py:summary"
Expand Down Expand Up @@ -109,7 +109,7 @@ Train = [2004, 2004, 2004, 2004, 2004]
Test = [2005, 2005, 2006, 2006, 2007]
```

![grp-ts-split](/_static/cross-validation/group-time-series-split.png)
![grp-ts-split](../_static/cross-validation/group-time-series-split.png)

As you can see above `GroupTimeSeriesSplit` keeps the order of the time chronological and makes sure that the same time value won't appear in both the train and test set of the same fold.

Expand Down
16 changes: 8 additions & 8 deletions docs/user-guide/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Loads the abalone dataset where the goal is to predict the gender of the creatur
--8<-- "docs/_scripts/datasets.py:plot-abalone"
```

![abalone](/_static/datasets/abalone.png)
![abalone](../_static/datasets/abalone.png)

## Arrests

Expand All @@ -58,7 +58,7 @@ The goal is to predict whether or not the arrestee was released with a summons w
--8<-- "docs/_scripts/datasets.py:plot-arrests"
```

![arrests](/_static/datasets/arrests.png)
![arrests](../_static/datasets/arrests.png)

## Chickens

Expand All @@ -78,7 +78,7 @@ There were four groups on chicks on different protein diets.
--8<-- "docs/_scripts/datasets.py:plot-chicken"
```

![chickens](/_static/datasets/chicken.png)
![chickens](../_static/datasets/chicken.png)

## Hearts

Expand All @@ -99,7 +99,7 @@ This implementation loads the Cleveland dataset of the research which is the onl
--8<-- "docs/_scripts/datasets.py:plot-hearts"
```

![hearts](/_static/datasets/hearts.png)
![hearts](../_static/datasets/hearts.png)

## Heroes

Expand All @@ -119,7 +119,7 @@ Note that the pandas dataset returns more information.
--8<-- "docs/_scripts/datasets.py:plot-heroes"
```

![heroes](/_static/datasets/heroes.png)
![heroes](../_static/datasets/heroes.png)

## Penguins

Expand All @@ -140,7 +140,7 @@ The goal of the dataset is to predict which species of penguin a penguin belongs
--8<-- "docs/_scripts/datasets.py:plot-penguins"
```

![penguins](/_static/datasets/penguins.png)
![penguins](../_static/datasets/penguins.png)

## Creditcard frauds

Expand Down Expand Up @@ -179,7 +179,7 @@ The dataset is highly unbalanced, the positive class (frauds) account for 0.172%
--8<-- "docs/_scripts/datasets.py:plot-creditcards"
```

![creditcards](/_static/datasets/creditcards.png)
![creditcards](../_static/datasets/creditcards.png)

## Simpleseries

Expand All @@ -195,7 +195,7 @@ Generate a *very simple* timeseries dataset to play with. The generator assumes
--8<-- "docs/_scripts/datasets.py:plot-ts"
```

![timeseries](/_static/datasets/timeseries.png)
![timeseries](../_static/datasets/timeseries.png)

[abalone-api]: /api/datasets#sklego.datasets.load_abalone
[arrests-api]: /api/datasets#sklego.datasets.load_arrests
Expand Down
14 changes: 7 additions & 7 deletions docs/user-guide/fairness.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Scikit learn (pre version 1.2) came with the boston housing dataset. We can make
--8<-- "docs/_scripts/fairness.py:predict-boston-simple"
```

![boston-simple](/_static/fairness/predict-boston-simple.png)
![boston-simple](../_static/fairness/predict-boston-simple.png)

We could stop our research here if we think that our MSE is _good enough_ but this would be _dangerous_. To find out why, we should look at the variables that are being used in our model.

Expand Down Expand Up @@ -107,7 +107,7 @@ It does this by projecting all vectors away such that the remaining dataset is o
The [`InformationFilter`][filter-information-api] uses a variant of the [Gram–Schmidt process][gram–schmidt-process] to filter information out of the dataset. We can make it visual in two dimensions;

<p align="center">
<img src="/_static/fairness/projections.png" />
<img src="../_static/fairness/projections.png" />
</p>

To explain what occurs in higher dimensions we need to resort to maths. Take a training matrix $X$ that contains columns $x_1, ..., x_k$.
Expand Down Expand Up @@ -159,23 +159,23 @@ We can see that the coefficients of the three models are indeed different.
```py
--8<-- "docs/_scripts/fairness.py:original-situation"
```
![original-situation](/_static/fairness/original-situation.png)
![original-situation](../_static/fairness/original-situation.png)

#### 2. Drop two columns

??? example "Code to generate the plot"
```py
--8<-- "docs/_scripts/fairness.py:drop-two"
```
![drop-two](/_static/fairness/drop-two.png)
![drop-two](../_static/fairness/drop-two.png)

#### 3. Use the Information Filter

??? example "Code to generate the plot"
```py
--8<-- "docs/_scripts/fairness.py:use-info-filter"
```
![use-info-filter](/_static/fairness/use-info-filter.png)
![use-info-filter](../_static/fairness/use-info-filter.png)

There definitely is a balance between fairness and model accuracy. Which model you'll use depends on the world you want to create by applying your model.

Expand Down Expand Up @@ -241,7 +241,7 @@ The results of the grid search are shown below. Note that the logistic regressio
```py
--8<-- "docs/_scripts/fairness.py:demographic-parity-grid-results"
```
![demographic-parity-grid-results](/_static/fairness/demographic-parity-grid-results.png)
![demographic-parity-grid-results](../_static/fairness/demographic-parity-grid-results.png)

## Equal opportunity

Expand All @@ -267,7 +267,7 @@ where POS is the subset of the population where `y_true = positive_target`.
```py
--8<-- "docs/_scripts/fairness.py:equal-opportunity-grid-results"
```
![equal-opportunity-grid-results](/_static/fairness/equal-opportunity-grid-results.png)
![equal-opportunity-grid-results](../_static/fairness/equal-opportunity-grid-results.png)

[^1]: M. Zafar et al. (2017), Fairness Constraints: Mechanisms for Fair Classification
[^2]: M. Hardt, E. Price and N. Srebro (2016), Equality of Opportunity in Supervised Learning
Expand Down
22 changes: 11 additions & 11 deletions docs/user-guide/linear-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,24 +20,24 @@ Lowess stands for LOcally WEighted Scatterplot Smoothing and has historically be
--8<-- "docs/_scripts/linear-models.py:plot-lowess"
```

![lowess](/_static/linear-models/lowess.png)
![lowess](../_static/linear-models/lowess.png)

The line does not look linear but that's because internally, during prediction, many weighted linear regressions are happening. The gif below demonstrates how the data is being weighted when we would make a prediction.

![lowess-rolling](/_static/linear-models/lowess-rolling.gif)
![lowess-rolling](../_static/linear-models/lowess-rolling.gif)

### Details on `sigma`

We'll also show two different prediction outcomes depending on the hyperparameter `sigma`:

![lowess-rolling-01](/_static/linear-models/lowess-rolling-01.gif)
![lowess-rolling-01](../_static/linear-models/lowess-rolling-01.gif)

![lowess-rolling-001](/_static/linear-models/lowess-rolling-001.gif)
![lowess-rolling-001](../_static/linear-models/lowess-rolling-001.gif)

You may be tempted now to think that a lower sigma always has a better fit, but you need to be careful here.
The data might have gaps and larger sigma values will be able to properly regularize.

![lowess-two-predictions](/_static/linear-models/lowess-two-predictions.gif)
![lowess-two-predictions](../_static/linear-models/lowess-two-predictions.gif)

Note that this regression also works in higher dimensions but the main downside of this approach is that it is _really slow_ when making predictions.

Expand All @@ -52,11 +52,11 @@ away.

The effect of the `span` parameter on the weights can be seen below:

![grid-span-sigma-02](/_static/linear-models/grid-span-sigma-01.png)
![grid-span-sigma-02](../_static/linear-models/grid-span-sigma-01.png)

This will also effect the predictions.

![grid-span-sigma-01](/_static/linear-models/grid-span-sigma-02.png)
![grid-span-sigma-01](../_static/linear-models/grid-span-sigma-02.png)

You may need to squint your eyes a bit to see it, but lower spans cause more jiggles and less smooth curves.

Expand Down Expand Up @@ -119,7 +119,7 @@ Imagine that you have a dataset with some outliers.
--8<-- "docs/_scripts/linear-models.py:lad-data"
```

![lad-01](/_static/linear-models/lad-data.png)
![lad-01](../_static/linear-models/lad-data.png)

A simple linear regression will not do a good job since it is distracted by the outliers. That is because it optimizes the mean squared error

Expand All @@ -135,7 +135,7 @@ Hence, linear regression does the following:
--8<-- "docs/_scripts/linear-models.py:lr-fit"
```

![lad-02](/_static/linear-models/lr-fit.png)
![lad-02](../_static/linear-models/lr-fit.png)

By changing the loss function to the mean absolute deviation

Expand All @@ -151,7 +151,7 @@ Here an example of [LADRegression][lad-api] in action:
--8<-- "docs/_scripts/linear-models.py:lad-fit"
```

![lad-03](/_static/linear-models/lad-fit.png)
![lad-03](../_static/linear-models/lad-fit.png)

### See also

Expand All @@ -172,7 +172,7 @@ then around 80% of the data is between these two lines.
--8<-- "docs/_scripts/linear-models.py:quantile-fit"
```

![quantile](/_static/linear-models/quantile-fit.png)
![quantile](../_static/linear-models/quantile-fit.png)

[lowess-api]: /api/linear-model#sklego.linear_model.LowessRegression
[prob-weight-api]: /api/linear-model#sklego.linear_model.ProbWeightRegression
Expand Down
Loading
Loading