Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat_categorical_labels #1053

Merged
merged 17 commits into from
Sep 10, 2024
Merged

Feat_categorical_labels #1053

merged 17 commits into from
Sep 10, 2024

Conversation

sammlapp
Copy link
Collaborator

@sammlapp sammlapp commented Sep 5, 2024

these methods replace multihot_clip_labels() and multihot_labels_like(), and support four different output formats: multihot, categorical with integers, categorical with class names, or CategoricalLabels class.

Adds CategoricalLabels class, which stores labels as lists of integers and provides methods to "view" or access the labels in various formats including sparse or dense 2d arrays or dataframes. Also provides methods to create from multihot or categorical dataframes. Will be useful for storing labels in a lightweight format when there are many classes and samples.

Adds tests for the new BoxedAnnotations methods and removes/updates outdated tests. Still needs tests for the CategoricalLabels class itself.

louisfh and others added 17 commits June 6, 2024 16:30
might approximately work already, but haven't tested much yet

basic idea is (1) a CategoricalLabels class which stores lists of integer indices for class labels along with the class list, and provides various "views" such as sparse or dense matrices or dfs, or multihot labels for a single row; (2) convert sparse label dtypes to dense when creating AudioSample objects
these methods replace multihot_clip_labels() and multihot_labels_like(), and support four different output formats: multihot, categorical with integers, categorical with class names, or CategoricalLabels class.

Adds CategoricalLabels class, which stores labels as lists of integers and provides methods to "view" or access the labels in various formats including sparse or dense 2d arrays or dataframes. Also provides methods to create from multihot or categorical dataframes. Will be useful for storing labels in a lightweight format when there are many classes and samples.

Adds tests for the new BoxedAnnotations methods and removes/updates outdated tests. Still needs tests for the CategoricalLabels class itself.
avoids FutureWarnings when creating Pandas dfs from scipy sparse matrix types, but also seems like unnecessary memory use, consider reverting to bool pending info on this issue: pandas-dev/pandas#59739
this property wont exist when the method is called with Lightning
fixes passing list of classes (rather than pd.Index) in classmethod from_multihot_df

adds properties .labels and .class_labels for CategoricalLabels

adds missing docstrings for properties and methods of CategoricalLabels
updated labels_df to have categorical rather than int labels, needed to update assertion in test accordingly
@sammlapp sammlapp merged commit fdd1266 into develop Sep 10, 2024
3 checks passed
@sammlapp sammlapp deleted the feat_categorical_labels branch September 10, 2024 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow training labels to be categorial one_hot_clip_labels is slow
2 participants