Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: too many values to unpack (expected 3) from ast.predict_celltypes() #23

Open
jgu13 opened this issue Aug 9, 2022 · 4 comments

Comments

@jgu13
Copy link
Contributor

jgu13 commented Aug 9, 2022

Hi,

I am trying to predict cell types by calling ast.predict_celltypes(dset : pd.DataFrame). I first wanted to test the function with the publicly available dataset basel_22k_subset.h5ad which is the one being loaded into astir_tutorial jupyter notebook. I then got the error from line 297 _, exprs_X, _ = new_dset[:] in celtype.predict(). Here is how to reproduce the error:

I have trained a CellTypeModel with the basel_22k_subset.h5ad dataset, with the initial parameters:

N = ast.get_type_dataset().get_exprs_df().shape[0]
batch_size = int(N/100)

max_epochs = 1000

learning_rate = 2e-3

initial_epochs = 3

Then I saved the trained model by calling ast.save_model('trained_model.hdf5') and loaded the model by calling
ast.load_model('trained_model.hdf5').

To convert the basel_22k_subset.h5ad dataset into a dataframe, I did

ad = anndata.read_h5ad("basel_22k_subset.h5ad")
df = ad.to_df()

The data frame is properly loaded.
When I finally tried to predict cell types by calling ast.predict_celltypes(df), the error was raised.

Any insight would be appreciated.

@kieranrcampbell
Copy link
Member

Hi @jgu13
Thanks for raising this -- I'll look into it now

@jgu13
Copy link
Contributor Author

jgu13 commented Aug 12, 2022

Hi Kieran!

I just would like to let you know that I modified line 297 of celltypes.predict() to be exprs_X = torch.tensor(new_dset[:].values). I also modified ast.predict_celltypes(dset) slightly to make it able to make predictions on a new dataset. The downside of my modification is that the user needs to provide two new parameters: expected_cell_types and cell_names. Here is the function:

    def predict_celltypes(self, dset: pd.DataFrame = None, expected_cell_types: list = None, cell_names: list = None) -> pd.DataFrame:
        """Predict the probabilities of different cell type assignments.

        :param dset: the single cell protein expression dataset to predict, defaults to None
        :type dset: pd.DataFrame, optional
        :raises Exception: when the type model is not trained when this function is called
        :return: the probabilities of different cell type assignments
        :rtype: pd.DataFrame
        """
        if self._type_ast is None:
            raise Exception("The type model has not been trained yet")
        if not self._type_ast.is_converged():
            msg = "The state model has not been trained for enough epochs yet"
            warnings.warn(msg)
        if dset is None:
            dset = self.get_type_dataset()
        if expected_cell_types is None:
            cell_types = dset.get_classes() + ['Other']
        if cell_names is None:
            cell_names = dset.get_cell_names()

        cell_types = expected_cell_types + ['Other']
        type_assignments = self._type_ast.predict(dset)
        type_assignments.columns = cell_types # predict(dset) returned DataFrame but here we expect dset to be SCDataset
        type_assignments.index = cell_names
        return type_assignments

I tested on the dataset basel_22k_subset.h5ad by loading it into a dataframe df, and calling ast.predict_celltypes(dset = df, expected_cell_types = expected_cell_types, cell_names=cell_names) and was able to get the correct number of cells for each cell type:

cell_type           
Stromal                 6410
Epithelial (luminal)    4698
Unknown                 4438
Other                   3108
Epithelial (other)      1643
Epithelial (basal)      1321
Macrophage               416
Endothelial              336
T cells                  270
B cells                   77

I hope this could help!

@kieranrcampbell
Copy link
Member

I modified line 297 of celltypes.predict() to be exprs_X = torch.tensor(new_dset[:].values)

yes that looks right, the original was _, exprs_X, _ = new_dset[:] but I'm not sure what that even does.

Re cell names, would it make sense to assume the rownames of dset are the cell names, i.e. something like

type_assignments.index = dset.index

?

Finally, the cell types are stored internally so shouldn't have to be supplied, so

type_assignments.columns = self._type_dset.get_classes() + ["Other"]

if that makes sense.

Thanks for catching this. Would you like to structure as a pull request to get contribution credit? Otherwise I'm happy to modify directly.

@jgu13
Copy link
Contributor Author

jgu13 commented Aug 17, 2022

Finally, the cell types are stored internally so shouldn't have to be supplied, so
type_assignments.columns = self._type_dset.get_classes() + ["Other"]
if that makes sense.

I have got an AttributeError from using get_classes()

File e:\Users\Claris_Gu\PycharmProjects\python_project\Cell Segmentation\astir\astir\astir.py:551, in Astir.predict_celltypes(self, dset)
    548     dset = self.get_type_dataset()
    550 type_assignments = self._type_ast.predict(dset)
--> 551 type_assignments.columns = self._type_dset.get_classes() + ['Other'] # predict(dset) returned DataFrame but here we expect dset to be SCDataset
    552 type_assignments.index = dset.index
    553 return type_assignments

AttributeError: 'NoneType' object has no attribute 'get_classes'

This could be caused by that _type_dset is not saved in the returned .h5df file as seen in ast.save_model(). When the model is loaded using ast.load_model(), the _type_dset is initialized with None (line70).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants