Release v1.0 #957

bbengfort · 2019-08-28T22:38:32Z

Version 1.0 Release.

New procedure: when we merge PRs we also request that a note is added to the changelog so that we can keep track of all the changes without having to do a full review of all commits on the release day!

Fixes a mistake in the Rank2D documentation stating that covariance is the default ranking algorithm instead of Pearson correlation score. Resolves #660

* allowing once more mpl 3.x * explicitely exclude mpl 3.0.0 due to bug

* Added code review checklist Broke down the contributing.rst file into multiple smaller files for easier editing. In the advanced development topics section, I added a section for common code, testing, and documentation conventions, listing some things that Nathan mentioned and that are part of the ongoing visualizer audit process. Fixes #345

Add Kendall-Tau correlation metric to the Rank2D visualizer. Additionally, extends and completes the Rank2D tests and verifies the Spearman metric. Fixes #628 and #435

The test for the DispersionPlot quickmethod was never created. I just overlooked its creation. This PR adds an 'assert_images_similar' image comparison for the quickmethod.

Fix subsection headings in the documentation that caused the "API Reference" subsection not to be displayed in the Table of Contents.

Created a text-specific visualizer to project a vectorized corpus in two dimensions using UMAP (Uniform Manifold Approximation and Projection). This implementation is very similar to the TSNE implementation, but is fast, scalable, and can be applied directly to sparse matrices without a preprocessing step such as SVD.

Updates v1.0 changelog to reflect new PRs (fixing last name type, dispersion plot quick method, target color type update)

Wraps up the enhancement to the FeatureImportances visualizer which added a stacked bar chart optional parameter in the case of multi-dimensional importances. Updated the documentation to reflect the stack=True situation as well as issued a warning if stack is False but should be True. Updated tests for better coverage. Closes #531

Implements a helper function that returns continuous or discrete depending on the type of target variable, `y`. This function is similar to the functionality in the Manifold visualizer but makes use of sklearn.util.multiclass.type_of_target to make its determination, along with a limit to the number of discrete colors that can be drawn. Was undecided if this belonged in `yellowbrick.utils` or in `yellowbrick.target` -- am open to discussion on this topic. Fixes #73

…sot (#680) * fixes resolve colors bug in tsne visualizer identified by jerome massot * adds test coverage for user-supplied color list in TSNE * fixes analogous resolve colors bug in UMAP visualizer and symmetric test coverage

This PR implements a significant change in the way yellowbrick handles datasets, moving them from data that can be downloaded and loaded using example code to prime time members of the library that can be loaded into pandas data frames and series or into well-structured numpy arrays with correct data types. We have completely overhauled dataset management using the yellowbrick-datasets repository as our data management tool. Data is still stored on S3 but contains .csv.gz and meta.json files for loading into pandas if it's installed or .npz files for loading into valid numpy arrays. New `Dataset` and `Corpus` manage access to the data, downloading it if it's not already on disk and providing access to the contents in the source directory. We maintain our security checking with sha256 hashes and a new manifest.json method. Fixes #416

The RFECV visualizer had a bug when the hyperparameter step > 1. The step was correctly passed to the internal RFE estimator, which removed that number of features per iteration, however the feature subsets that were tried for cross-validation did not match the step resulting in a figure that looked like no step was actually applied. This patch fixes the bug and creates a test to ensure this works correctly. To manage the feature selection subspace, a new learned attribute, `n_feature_subsets_` was added. Fixes #664

…th some small rewrites (#692) enhance: add link in readme to testing instructions add installation section

Repairs breaking tests to resolve travis pyflakes error and appveyer value error, updates baseline images, adds skips for some unresolved tests from contrib package.

Resolves errors related to the release of pytest 4.2, which broke some custom YB test code that controlled how are tests are printed.

* Documents the yellowbrick.download script. Adds documentation for the `yellowbrick.download` script that is included for dataset management. The documentation is located in both the README.md and in the contributor's guide. This documentation will hopefully also assist developers who are having trouble with older dataset versions on their computers. Closes #693

POC for auto-generation of images in the scikit-yb docs, focused on feature and cluster visualizers.

Updates code to import data in regression notebook following recent overhaul of datasets module

Repairs broken links for rank2d and jointplots in walkthrough docs

This PR was primarily intended to add plot directives to the quickstart guide, ensuring that the images in the tutorial were always up to date with the library. The quickstart guide separates the code blocks from the plot directives so that the code is a linear narrative rather than the verbosity required for each independent code block. This duplicates the code a little bit but makes it more readable. To ensure that the code is correct I also created a notebook examples/walkthrough.ipynb with the code from the quickstart. Along the way there were some bugs with jointplot and rankd that I also fixed. The JointPlotVisualizer needs some more work but it is now stable. Additionally, in poof() if the visualizer didn't have an axes object it would exit silently. This made it hard to find bugs. Now instead of exiting we simply issue a warning and carry on.

Streamlines readme and adds gallery of visualizers

Adds tag and summary of hotfix v0.9.1 to changelog in docs

Refreshes the JointPlot visualizer for the machine learning workflow, fixes #605, fixes #434, fixes #214.

Updates documentation for DispersionPlot but leaves static image for use in the gallery. Part of #687

Makes some small modifications to PRCurve to allow users to specify the `iso_f1_values` rather than hardcoding them and to allow users to provide an optional X_test and y_test to the quick method. Part of #610

This PR is intended to refresh the contributor's docs a bit, adding in a few more of our tips and conventions, specifically around things like installing the library in editable mode, merging in PRs, and using feature branches. It follows up on #689 and a few of the conversations that have come up amongst the maintainers recently.

This PR closes #931 and updates yellowbrick's port of the kneed library.

* updates to headers and minor audit cleanup * elbow fix when score doesn't exist

Added two helper utilities: `is_fitted` and `check_fitted` that control the fitted estimator checking. If the user supplies `is_fitted='auto'`, the default, the model visualizer will check if it's fitted using a mechanism recommended by the scikit-learn team and Stack Overflow and will not fit a fitted estimator. Otherwise, the model visualizer will accept the recommendation of the user to fit or not fit the wrapped estimator. Fixes #297

* fixed doc build warnings and RTD theme issues * detailed dataset pages * adds readme content to seed datasets pages and rectifies double hobbies page * black datasets and finish api docs * contributor's guide This PR closes #441 and resolves #683

* audit of the first half of the feature visualizers, moving rfecv and importances to model selection, still need to fix some broken tests * set kwargs properly in pcoords and scatter and remove unused import * remove unused mpl import in scatter

This PR fixes some minor bugs in Manifold, adjusts the proportions of PCA with respect to the feature strength heatmap, tweaks axis labels and tests.

This PR is towards #669 and #456 and #600 and #509 but focused on the `yellowbrick.features` module and completes the work started in #945. - Performed general linting and applied black formatting to the files & made code header updates. - Updated quick methods to return the visualizer rather than the axes - Update the docs to reflect the move of RFECV and FeatureImportances from features to model_selection module Closes #669

This bugfix closes #943, handling the case where there are no elbows detected by kneed.py.

…elbow (#942)

This PR adds in a new page to the docs that illustrates the usage of the Yellowbrick quick methods.

Switches to using a markdown version of DESCRIPTION and moves to using the banner image throughout the docs and readme, removing the old individual files and adding in the affiliate images.

Shores up classifiers, introducing new base-level helpers for label encoding, test coverage for fitted and unfitted classification visualizers and label edge cases, super score calls in each subclass, and repaired some quick methods

This PR ensures poof always returns ax. Closes #375

bbengfort and others added 30 commits November 14, 2018 18:24

Merge branch 'release-0.9' into develop

94dfd00

Added changelog.rst stub for v1.0

47826d9

New procedure: when we merge PRs we also request that a note is added to the changelog so that we can keep track of all the changes without having to do a full review of all commits on the release day!

Update default ranking algorithm in Rank2D docs (#661)

d067d3e

Fixes a mistake in the Rank2D documentation stating that covariance is the default ranking algorithm instead of Pearson correlation score. Resolves #660

Reinstate mpl 3 (#671)

8f17f28

* allowing once more mpl 3.x * explicitely exclude mpl 3.0.0 due to bug

Add Kendall-Tau metric to Rank2D (#645)

4bbff0b

Add Kendall-Tau correlation metric to the Rank2D visualizer. Additionally, extends and completes the Rank2D tests and verifies the Spearman metric. Fixes #628 and #435

Adds test for DispersionPlot quickmethod (#674)

75e04cc

The test for the DispersionPlot quickmethod was never created. I just overlooked its creation. This PR adds an 'assert_images_similar' image comparison for the quickmethod.

Docs headings (#673)

0a721ea

Fix subsection headings in the documentation that caused the "API Reference" subsection not to be displayed in the Table of Contents.

Updates changelog with recent PRs

d3a09b1

Updates v1.0 changelog to reflect new PRs (fixing last name type, dispersion plot quick method, target color type update)

added PR #680 to the change log

cad0d25

enhance: add installing test suite dependencing to contrib section wi…

a5941a6

…th some small rewrites (#692) enhance: add link in readme to testing instructions add installation section

Fixes CI Tests

870022a

Repairs breaking tests to resolve travis pyflakes error and appveyer value error, updates baseline images, adds skips for some unresolved tests from contrib package.

Fix errors related to release of pytest 4.2 (#712)

913aea3

Resolves errors related to the release of pytest 4.2, which broke some custom YB test code that controlled how are tests are printed.

Auto-generate Images for Documentation

05e0769

POC for auto-generation of images in the scikit-yb docs, focused on feature and cluster visualizers.

Update code to import data in regression notebook

55a6149

Updates code to import data in regression notebook following recent overhaul of datasets module

Repairs broken links in walkthrough docs

d15f8ba

Repairs broken links for rank2d and jointplots in walkthrough docs

Streamlining the README

5513bc7

Streamlines readme and adds gallery of visualizers

Adds hotfix tag and summary to changelog in docs

17f08e1

Adds tag and summary of hotfix v0.9.1 to changelog in docs

Refresh the JointPlot Visualizer for ML

01c54a7

Refreshes the JointPlot visualizer for the machine learning workflow, fixes #605, fixes #434, fixes #214.

Implement plot directive for DispersionPlot (#730)

2321839

Updates documentation for DispersionPlot but leaves static image for use in the gallery. Part of #687

Modifications to PrecisionRecall Curve Visualizer (#686)

22b8ad0

Makes some small modifications to PRCurve to allow users to specify the `iso_f1_values` rather than hardcoding them and to allow users to provide an optional X_test and y_test to the quick method. Part of #610

naresh-bachwani and others added 19 commits August 2, 2019 17:24

Updated PCA visualizer to extend Projection Visualizer (#937)

e24661a

Refactor to be current with kneed v0.4.1 (#935)

653dd6a

This PR closes #931 and updates yellowbrick's port of the kneed library.

Clusterers Audit (#940)

2a224bc

* updates to headers and minor audit cleanup * elbow fix when score doesn't exist

Enhance datasets documentation (#921)

41aa437

* fixed doc build warnings and RTD theme issues * detailed dataset pages * adds readme content to seed datasets pages and rectifies double hobbies page * black datasets and finish api docs * contributor's guide This PR closes #441 and resolves #683

Target and Model Selection audit (#941)

6c16c5e

Features Audit (part 1) (#945)

a4599db

* audit of the first half of the feature visualizers, moving rfecv and importances to model selection, still need to fix some broken tests * set kwargs properly in pcoords and scatter and remove unused import * remove unused mpl import in scatter

Final Projection Visualizer Tweaks (#946)

2c5f0e9

This PR fixes some minor bugs in Manifold, adjusts the proportions of PCA with respect to the feature strength heatmap, tweaks axis labels and tests.

Warn and handle if no elbow found inside range (#944)

6a319bb

This bugfix closes #943, handling the case where there are no elbows detected by kneed.py.

small bugfix to readme image generation script given new changes to k…

4e63bc6

…elbow (#942)

Finish updating headers to use template & textviz audit (#950)

3a81846

Remove old baseline images from tests (#951)

7a6d8be

Adds new oneliners page to docs (#952)

24c6167

This PR adds in a new page to the docs that illustrates the usage of the Yellowbrick quick methods.

Replace DESCRIPTION.rst with Markdown version of the file (#947)

9f44fb0

Switches to using a markdown version of DESCRIPTION and moves to using the banner image throughout the docs and readme, removing the old individual files and adding in the affiliate images.

Clean up classifiers technical debt. (#939)

da729da

Shores up classifiers, introducing new base-level helpers for label encoding, test coverage for fitted and unfitted classification visualizers and label edge cases, super score calls in each subclass, and repaired some quick methods

Ensure poof returns ax (#953)

25a9bd1

This PR ensures poof always returns ax. Closes #375

version bump

e68d662

Merge branch 'master' into release-v1.0

11eb41d

bbengfort mentioned this pull request Aug 28, 2019

Repair hotfix on v1.0 merge to master #725

Closed

add code headers

f58fec5

bbengfort mentioned this pull request Aug 28, 2019

Fix code headers and run ID line script #509

Closed

bbengfort and others added 3 commits August 28, 2019 19:29

black code formatting

c93282c

updated CLASSIFIERS list (#959)

8838a64

updates min matplotlib dep to 2.0.2

12e1821

bbengfort mentioned this pull request Aug 28, 2019

API Discussion for Figures and Axes #826

Closed

rebeccabilbro mentioned this pull request Aug 28, 2019

Implement black code formatting as pre-commit #456

Closed

added changelog

861850e

bbengfort closed this Aug 29, 2019

bbengfort deleted the release-v1.0 branch August 29, 2019 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v1.0 #957

Release v1.0 #957

bbengfort commented Aug 28, 2019

Release v1.0 #957

Release v1.0 #957

Conversation

bbengfort commented Aug 28, 2019