feat(section): add information about mlr3inferr #855

sebffischer · 2025-01-09T09:04:30Z

No description provided.

larskotthoff · 2025-02-03T16:12:53Z

book/chapters/chapter3/evaluation_and_benchmarking.qmd

@@ -285,6 +285,31 @@ print(plt2)
 ```


+### Confidence Intervals {#sec-resampling-ci}
+
+Confidence intervals (CIs) provide a range of values within which we can be confident that it covers the true generalization error.


I would delete the first sentence. The next sentence basically says the same, but this one isn't quite accurate.

larskotthoff · 2025-02-03T16:14:34Z

book/chapters/chapter3/evaluation_and_benchmarking.qmd

+
+Confidence intervals (CIs) provide a range of values within which we can be confident that it covers the true generalization error.
+Instead of relying solely on a single point estimate, CIs offer a measure of uncertainty around this estimate, allowing us to understand the reliability of our performance estimate.
+While constructing CIs for the generalization error is challenging due to the complex nature of the inference problem, some methods have been shown to work well in practice @kuempelfischer2024ciforge.


I would add some more context here -- some learners/models can provide these directly (and often those calculations aren't all that complex), but if the learner doesn't support it, we have to do something else. Then describe in a sentence or two what those methods do.

larskotthoff · 2025-02-03T16:15:18Z

book/chapters/chapter3/evaluation_and_benchmarking.qmd

+rr$aggregate(msr_ci)
+```
+
+We can also use `msr("ci")`, which will automatically select the appropriate method based on the `Resampling` object, if an inference method is available for it.


How do I know what resamplings have inference methods?

larskotthoff · 2025-02-03T16:15:52Z

book/chapters/chapter3/evaluation_and_benchmarking.qmd

@@ -576,6 +601,22 @@ plt = plt + ggplot2::scale_fill_manual(values = c("grey30", "grey50", "grey70"))
 print(plt)
 ```

+It is also possible to plot confidence intervals by setting the type of plot to `"ci"`.
+Ignoring the multiple testing problem, @fig-benchmark-ci shows that the difference between the random forest and both other learners is statistically significant for the sonar task, whereas no final conclusion can be drawn for the german credit problem.


Can we not ignore the multiple testing problem? I would show results for a single learner here.

feat(section): add information about mlr3inferr

8dbfbed

sebffischer mentioned this pull request Jan 9, 2025

TODOs mlr-org/mlr3inferr#1

Open

17 tasks

sebffischer added 4 commits January 14, 2025 14:04

update renv

c6d42c9

...

8cb3751

remove renv

99a06f3

Merge branch 'main' into feat/cis

7aeb637

sebffischer requested review from larskotthoff and be-marc February 3, 2025 07:05

larskotthoff requested changes Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(section): add information about mlr3inferr #855

feat(section): add information about mlr3inferr #855

sebffischer commented Jan 9, 2025

larskotthoff Feb 3, 2025

larskotthoff Feb 3, 2025

larskotthoff Feb 3, 2025

larskotthoff Feb 3, 2025

feat(section): add information about mlr3inferr #855

Are you sure you want to change the base?

feat(section): add information about mlr3inferr #855

Conversation

sebffischer commented Jan 9, 2025

larskotthoff Feb 3, 2025

Choose a reason for hiding this comment

larskotthoff Feb 3, 2025

Choose a reason for hiding this comment

larskotthoff Feb 3, 2025

Choose a reason for hiding this comment

larskotthoff Feb 3, 2025

Choose a reason for hiding this comment