Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v1.0 #957

Closed
wants to merge 158 commits into from
Closed

Release v1.0 #957

wants to merge 158 commits into from

Conversation

bbengfort
Copy link
Member

Version 1.0 Release.

bbengfort and others added 30 commits November 14, 2018 18:24
New procedure: when we merge PRs we also request that a note is added to the changelog so that we can keep track of all the changes without having to do a full review of all commits on the release day!
Fixes a mistake in the Rank2D documentation stating that covariance is the default ranking algorithm instead of Pearson correlation score. Resolves #660
* allowing once more mpl 3.x
* explicitely exclude mpl 3.0.0 due to bug
* Added code review checklist

Broke down the contributing.rst file into multiple smaller files for
easier editing. In the advanced development topics section, I added a
section for common code, testing, and documentation conventions, listing
some things that Nathan mentioned and that are part of the ongoing
visualizer audit process.

Fixes #345
Add Kendall-Tau correlation metric to the Rank2D visualizer. Additionally, extends and completes the Rank2D tests and verifies the Spearman metric. 

Fixes #628 and #435
The test for the DispersionPlot quickmethod was never
created.  I just overlooked its creation.  This PR
adds an 'assert_images_similar' image comparison for
the quickmethod.
Fix subsection headings in the documentation that caused the "API Reference" subsection not to be displayed in the Table of Contents.
Created a text-specific visualizer to project a vectorized corpus in two dimensions using UMAP (Uniform Manifold Approximation and Projection). This implementation is very similar to the TSNE implementation, but is fast, scalable, and can be applied directly to sparse matrices without a preprocessing step such as SVD.
Updates v1.0 changelog to reflect new PRs (fixing last name type, dispersion plot quick method, target color type update)
Wraps up the enhancement to the FeatureImportances visualizer which
added a stacked bar chart optional parameter in the case of
multi-dimensional importances. Updated the documentation to reflect the
stack=True situation as well as issued a warning if stack is False but
should be True. Updated tests for better coverage.

Closes #531
Implements a helper function that returns continuous or discrete
depending on the type of target variable, `y`. This function is similar
to the functionality in the Manifold visualizer but makes use of
sklearn.util.multiclass.type_of_target to make its determination, along
with a limit to the number of discrete colors that can be drawn.

Was undecided if this belonged in `yellowbrick.utils` or in
`yellowbrick.target` -- am open to discussion on this topic.

Fixes #73
…sot (#680)

* fixes resolve colors bug in tsne visualizer identified by jerome massot
* adds test coverage for user-supplied color list in TSNE
* fixes analogous resolve colors bug in UMAP visualizer and symmetric test coverage
This PR implements a significant change in the way yellowbrick handles datasets, moving them from data that can be downloaded and loaded using example code to prime time members of the library that can be loaded into pandas data frames and series or into well-structured numpy arrays with correct data types. 

We have completely overhauled dataset management using the yellowbrick-datasets repository as our data management tool. Data is still stored on S3 but contains .csv.gz and meta.json files for loading into pandas if it's installed or .npz files for loading into valid numpy arrays. New `Dataset` and `Corpus` manage access to the data, downloading it if it's not already on disk and providing access to the contents in the source directory. We maintain our security checking with sha256 hashes and a new manifest.json method. 

Fixes #416
The RFECV visualizer had a bug when the hyperparameter step > 1. The
step was correctly passed to the internal RFE estimator, which removed
that number of features per iteration, however the feature subsets that
were tried for cross-validation did not match the step resulting in a
figure that looked like no step was actually applied. This patch fixes
the bug and creates a test to ensure this works correctly.

To manage the feature selection subspace, a new learned attribute,
`n_feature_subsets_` was added.

Fixes #664
…th some small rewrites (#692)

enhance: add link in readme to testing instructions

add installation section
Repairs breaking tests to resolve travis pyflakes error and appveyer value error, updates baseline images, adds skips for some unresolved tests from contrib package.
Resolves errors related to the release of pytest 4.2, which broke some custom YB test code that controlled how are tests are printed.
* Documents the yellowbrick.download script.

Adds documentation for the `yellowbrick.download` script that is
included for dataset management. The documentation is located in both
the README.md and in the contributor's guide. This documentation will
hopefully also assist developers who are having trouble with older
dataset versions on their computers.

Closes #693
POC for auto-generation of images in the scikit-yb docs, focused on feature and cluster visualizers.
Updates code to import data in regression notebook following recent overhaul of datasets module
Repairs broken links for rank2d and jointplots in walkthrough docs
This PR was primarily intended to add plot directives to the quickstart
guide, ensuring that the images in the tutorial were always up to date
with the library. The quickstart guide separates the code blocks from
the plot directives so that the code is a linear narrative rather than
the verbosity required for each independent code block. This duplicates
the code a little bit but makes it more readable.

To ensure that the code is correct I also created a notebook
examples/walkthrough.ipynb with the code from the quickstart.

Along the way there were some bugs with jointplot and rankd that I also
fixed. The JointPlotVisualizer needs some more work but it is now
stable.

Additionally, in poof() if the visualizer didn't have an axes object it
would exit silently. This made it hard to find bugs. Now instead of
exiting we simply issue a warning and carry on.
Streamlines readme and adds gallery of visualizers
Adds tag and summary of hotfix v0.9.1 to changelog in docs
Refreshes the JointPlot visualizer for the machine learning workflow, fixes #605, fixes #434, fixes #214.
Updates documentation for DispersionPlot but leaves static image for use in the gallery. 

Part of #687
Makes some small modifications to PRCurve to allow users to specify the `iso_f1_values` rather than hardcoding them and to allow users to provide an optional X_test and y_test to the quick method. 

Part of #610
This PR is intended to refresh the contributor's docs a bit, adding in a few more of our tips and conventions, specifically around things like installing the library in editable mode, merging in PRs, and using feature branches. It follows up on #689 and a few of the conversations that have come up amongst the maintainers recently.
naresh-bachwani and others added 19 commits August 2, 2019 17:24
This PR closes #931 and updates yellowbrick's port of the kneed library.
* updates to headers and minor audit cleanup
* elbow fix when score doesn't exist
Added two helper utilities: `is_fitted` and `check_fitted` that control the fitted estimator checking. If the user supplies `is_fitted='auto'`, the default, the model visualizer will check if it's fitted using a mechanism recommended by the scikit-learn team and Stack Overflow and will not fit a fitted estimator. Otherwise, the model visualizer will accept the recommendation of the user to fit or not fit the wrapped estimator. 

Fixes #297
* fixed doc build warnings and RTD theme issues

* detailed dataset pages

* adds readme content to seed datasets pages and rectifies double hobbies page

* black datasets and finish api docs

* contributor's guide

This PR closes #441 and resolves #683
* audit of the first half of the feature visualizers, moving rfecv and importances to model selection, still need to fix some broken tests

* set kwargs properly in pcoords and scatter and remove unused import

* remove unused mpl import in scatter
This PR fixes some minor bugs in Manifold, adjusts the proportions of PCA with respect to the feature strength heatmap, tweaks axis labels and tests.
This PR is towards #669 and #456 and #600 and #509 but focused on the 
`yellowbrick.features` module and completes the work started in #945. 

- Performed general linting and applied black formatting to the files & 
   made code header updates.
- Updated quick methods to return the visualizer rather than the axes
- Update the docs to reflect the move of RFECV and FeatureImportances 
   from features to model_selection module

Closes #669
This bugfix closes #943, handling the case where there are no elbows detected by kneed.py.
This PR adds in a new page to the docs that illustrates the usage of the Yellowbrick quick methods.
Switches to using a markdown version of DESCRIPTION and moves to using the banner image throughout the docs and readme, removing the old individual files and adding in the affiliate images.
Shores up classifiers, introducing new base-level helpers for label encoding, test coverage for fitted and unfitted classification visualizers and label edge cases, super score calls in each subclass, and repaired some quick methods
This PR ensures poof always returns ax.  Closes #375
@bbengfort bbengfort closed this Aug 29, 2019
@bbengfort bbengfort deleted the release-v1.0 branch August 29, 2019 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.