-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError when writing table column with mixed string / NaN #399
Comments
Hi! Thanks for reporting, I have just commented on #298, where you had also mentioned the typing problem in the presence of nan values. As you also observed, the reported bug arises as a consequence of the single-table design. @melonora is going to work on multiple tables while @giovp and I will work on disk and in-memory representation. Therefore, if this is viable for you, I would maybe suggest a workaround like using a placeholder value for "NaN" values so that the type is not affected. The plan is to have the table implemented (or in a good state) by the end of the year. If this would not work for your please let me know and I can try to find a better fix for this bug. |
Thanks! I was considering looking into creating a simple fix, but since the tables are undergoing major changes, it's probably better not to interfere with that. I have a workaround, so for me the issue is not solved. For reference to anyone with the same issue: def workaround_spatialdata_nan_in_str_columns(obs: pd.DataFrame):
# When writing a SpatialData table (AnnData) which contains np.nan for missing/unspecified
# values in string columns, it raises "expected unicode string, found nan"
# See https://github.com/scverse/spatialdata/issues/399
for column in obs.select_dtypes(include=[object]).columns:
obs[column] = obs[column].astype("category") |
Thanks for sharing the workaround! |
When writing a SpatialData object which has a column containing string values or empty values, an error is raised.
Empty values are a very common use-case:
This can also be a huge problem for integer columns which have no NaN. Pandas silently changes the dtype to float. Then whoever wants to read values from the supposed integer column must precautionarily convert to int to avoid follow up errors.
Example
Backtrace
This is not directly caused by
anndata
, and interestingly, anndata modifies the table so that the column "column_only_region1" gets dtype"category"
instead ofobject
, and the error does not occur anymore:The text was updated successfully, but these errors were encountered: