ValueError: too many values to unpack (expected 3) from ast.predict_celltypes() #23

jgu13 · 2022-08-09T22:58:59Z

Hi,

I am trying to predict cell types by calling ast.predict_celltypes(dset : pd.DataFrame). I first wanted to test the function with the publicly available dataset basel_22k_subset.h5ad which is the one being loaded into astir_tutorial jupyter notebook. I then got the error from line 297 _, exprs_X, _ = new_dset[:] in celtype.predict(). Here is how to reproduce the error:

I have trained a CellTypeModel with the basel_22k_subset.h5ad dataset, with the initial parameters:

N = ast.get_type_dataset().get_exprs_df().shape[0]
batch_size = int(N/100)

max_epochs = 1000

learning_rate = 2e-3

initial_epochs = 3

Then I saved the trained model by calling ast.save_model('trained_model.hdf5') and loaded the model by calling
ast.load_model('trained_model.hdf5').

To convert the basel_22k_subset.h5ad dataset into a dataframe, I did

ad = anndata.read_h5ad("basel_22k_subset.h5ad")
df = ad.to_df()

The data frame is properly loaded.
When I finally tried to predict cell types by calling ast.predict_celltypes(df), the error was raised.

Any insight would be appreciated.

The text was updated successfully, but these errors were encountered:

kieranrcampbell · 2022-08-12T16:44:46Z

Hi @jgu13
Thanks for raising this -- I'll look into it now

jgu13 · 2022-08-12T23:30:26Z

Hi Kieran!

I just would like to let you know that I modified line 297 of celltypes.predict() to be exprs_X = torch.tensor(new_dset[:].values). I also modified ast.predict_celltypes(dset) slightly to make it able to make predictions on a new dataset. The downside of my modification is that the user needs to provide two new parameters: expected_cell_types and cell_names. Here is the function:

    def predict_celltypes(self, dset: pd.DataFrame = None, expected_cell_types: list = None, cell_names: list = None) -> pd.DataFrame:
        """Predict the probabilities of different cell type assignments.

        :param dset: the single cell protein expression dataset to predict, defaults to None
        :type dset: pd.DataFrame, optional
        :raises Exception: when the type model is not trained when this function is called
        :return: the probabilities of different cell type assignments
        :rtype: pd.DataFrame
        """
        if self._type_ast is None:
            raise Exception("The type model has not been trained yet")
        if not self._type_ast.is_converged():
            msg = "The state model has not been trained for enough epochs yet"
            warnings.warn(msg)
        if dset is None:
            dset = self.get_type_dataset()
        if expected_cell_types is None:
            cell_types = dset.get_classes() + ['Other']
        if cell_names is None:
            cell_names = dset.get_cell_names()

        cell_types = expected_cell_types + ['Other']
        type_assignments = self._type_ast.predict(dset)
        type_assignments.columns = cell_types # predict(dset) returned DataFrame but here we expect dset to be SCDataset
        type_assignments.index = cell_names
        return type_assignments

I tested on the dataset basel_22k_subset.h5ad by loading it into a dataframe df, and calling ast.predict_celltypes(dset = df, expected_cell_types = expected_cell_types, cell_names=cell_names) and was able to get the correct number of cells for each cell type:

cell_type           
Stromal                 6410
Epithelial (luminal)    4698
Unknown                 4438
Other                   3108
Epithelial (other)      1643
Epithelial (basal)      1321
Macrophage               416
Endothelial              336
T cells                  270
B cells                   77

I hope this could help!

kieranrcampbell · 2022-08-15T21:42:45Z

I modified line 297 of celltypes.predict() to be exprs_X = torch.tensor(new_dset[:].values)

yes that looks right, the original was _, exprs_X, _ = new_dset[:] but I'm not sure what that even does.

Re cell names, would it make sense to assume the rownames of dset are the cell names, i.e. something like

type_assignments.index = dset.index

?

Finally, the cell types are stored internally so shouldn't have to be supplied, so

type_assignments.columns = self._type_dset.get_classes() + ["Other"]

if that makes sense.

Thanks for catching this. Would you like to structure as a pull request to get contribution credit? Otherwise I'm happy to modify directly.

jgu13 · 2022-08-17T23:27:18Z

Finally, the cell types are stored internally so shouldn't have to be supplied, so
type_assignments.columns = self._type_dset.get_classes() + ["Other"]
if that makes sense.

I have got an AttributeError from using get_classes()

File e:\Users\Claris_Gu\PycharmProjects\python_project\Cell Segmentation\astir\astir\astir.py:551, in Astir.predict_celltypes(self, dset)
    548     dset = self.get_type_dataset()
    550 type_assignments = self._type_ast.predict(dset)
--> 551 type_assignments.columns = self._type_dset.get_classes() + ['Other'] # predict(dset) returned DataFrame but here we expect dset to be SCDataset
    552 type_assignments.index = dset.index
    553 return type_assignments

AttributeError: 'NoneType' object has no attribute 'get_classes'

This could be caused by that _type_dset is not saved in the returned .h5df file as seen in ast.save_model(). When the model is loaded using ast.load_model(), the _type_dset is initialized with None (line70).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: too many values to unpack (expected 3) from ast.predict_celltypes() #23

ValueError: too many values to unpack (expected 3) from ast.predict_celltypes() #23

jgu13 commented Aug 9, 2022 •

edited

Loading

kieranrcampbell commented Aug 12, 2022

jgu13 commented Aug 12, 2022

kieranrcampbell commented Aug 15, 2022

jgu13 commented Aug 17, 2022 •

edited

Loading

ValueError: too many values to unpack (expected 3) from ast.predict_celltypes() #23

ValueError: too many values to unpack (expected 3) from ast.predict_celltypes() #23

Comments

jgu13 commented Aug 9, 2022 • edited Loading

kieranrcampbell commented Aug 12, 2022

jgu13 commented Aug 12, 2022

kieranrcampbell commented Aug 15, 2022

jgu13 commented Aug 17, 2022 • edited Loading

jgu13 commented Aug 9, 2022 •

edited

Loading

jgu13 commented Aug 17, 2022 •

edited

Loading