Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stored in memory and saved data are not the same #3236

Closed
philmar1 opened this issue Oct 27, 2023 · 9 comments
Closed

Stored in memory and saved data are not the same #3236

philmar1 opened this issue Oct 27, 2023 · 9 comments
Labels
Community Issue/PR opened by the open-source community

Comments

@philmar1
Copy link

Hello,

I am working with an AnnData (https://anndata.readthedocs.io/en/latest/). I realized that when I execute nodes without saving intermediate data, I get a mismatch between shapes of output data of node i and shape of input data of node (i+1), while the output data is the input data of the following node.

It generates an error while processing data in node i+1. However, the issue disappears when I register output(i+1) (sc_filtered) in the catalog

I attached the kedro pipeline. The error appears at node "add_phase", where data "sc_labeled" should have a len of 23819, just like "sc_filtered", but it in fact has the shape 25060, while function between these two "add_cell_perturbation_type" doesn't change the data shape (see the function below)

def add_cell_perturbation_type(adata, control_cells):
    """Add label for each cell. Possible values are: 
        - control (infected by NO-TARGET sgRNA)
        - infected (by at least one other guidethan NO-TARGET sgRNA)
        - not infected
    """
    adata.obs["perturbation"] = np.nan
    adata.obs[adata.obs.index.isin(control_cells)]["perturbation"] = "control"
    adata.obs[adata.obs['infected'] == False]["perturbation"] = "not infected"
    adata.obs["perturbation"].fillna("infected", inplace=True)
    return adata

kedro-pipeline

Here is the complete error:

Running node: filter_cells: filter_cells([sc_w_targeted_genes,efficient_perturbed_cells,control_cells,not_infected_cells]) -> [sc_filtered]                                                                                                     node.py:331
                    INFO     Keeping only cells in being efficient (includes control), control or not infected in adata AnnData object with n_obs × n_vars = 25060 × 32630                                                                                                  nodes.py:186
                                 obs: 'Dataset', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'guides', 'num_guides', 'num_guides_clip', 'infected', 'targeted_genes', 'num_targeted_genes'                                                      
                                 var: 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells'                                                                                                                                                      
                                 uns: 'log1p'                                                                                                                                                                                                                                           
                                 layers: 'CPM', 'logCPM' ...                                                                                                                                                                                                                            
                    INFO     Found 10967 efficient cells, 260 control cells and 12852 not infected cells                                                                                                                                                                    nodes.py:187
                    INFO     Keeping 23819/25060 cells                                                                                                                                                                                                                      nodes.py:188
[10/27/23 14:42:12] INFO     Output adata.obs shape: (23819, 12)                                                                                                                                                                                                                         nodes.py:191
                    INFO     Saving data to 'sc_filtered' (MemoryDataset)...                                                                                                                                                                                         data_catalog.py:384
                    INFO     Completed 5 out of 9 tasks                                                                                                                                                                                                          sequential_runner.py:85
                    INFO     Loading data from 'sc_filtered' (MemoryDataset)...                                                                                                                                                                                      data_catalog.py:345
[10/27/23 14:42:13] INFO     Loading data from 'control_cells' (MemoryDataset)...                                                                                                                                                                                    data_catalog.py:345
                    INFO     Running node: add_cell_perturbation_type: add_cell_perturbation_type([sc_filtered,control_cells]) -> [sc_labeled]                                                                                                                               node.py:331
                    INFO     Input adata.obs shape: (25060, 12)                                                                                                                                                                                                                                    nodes.py:200
                    WARNING  /Users/philippemartin/Documents/Curie/Projets/PerturbSeq/kedro_style/src/perturbseq_kedro_project/pipelines/preprocessing/nodes.py:202: SettingWithCopyWarning:                                                                             warnings.py:109
                             A value is trying to be set on a copy of a slice from a DataFrame.                                                                                                                                                                                         
                             Try using .loc[row_indexer,col_indexer] = value instead                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                        
                             See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy                                                                                                                 
                               adata.obs[adata.obs.index.isin(control_cells)]["perturbation"] = "control"                                                                                                                                                                               
                                                                                                                                                                                                                                                                                        
                    WARNING  /Users/philippemartin/Documents/Curie/Projets/PerturbSeq/kedro_style/src/perturbseq_kedro_project/pipelines/preprocessing/nodes.py:203: SettingWithCopyWarning:                                                                             warnings.py:109
                             A value is trying to be set on a copy of a slice from a DataFrame.                                                                                                                                                                                         
                             Try using .loc[row_indexer,col_indexer] = value instead                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                        
                             See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy                                                                                                                 
                               adata.obs[adata.obs['infected'] == False]["perturbation"] = "not infected"                                                                                                                                                                               
                                                                                                                                                                                                                                                                                        
                    INFO     Output adata.obs.shape: (25060, 13)                                                                                                                                                                                                                                    nodes.py:205
                    INFO     Saving data to 'sc_labeled' (MemoryDataset)...                                                                                                                                                                                          data_catalog.py:384
[10/27/23 14:42:14] INFO     Completed 6 out of 9 tasks                                                                                                                                                                                                          sequential_runner.py:85
                    INFO     Loading data from 'sc_labeled' (MemoryDataset)...                                                                                                                                                                                       data_catalog.py:345
[10/27/23 14:42:18] INFO     Loading data from 's_genes' (MemoryDataset)...                                                                                                                                                                                          data_catalog.py:345
                    INFO     Loading data from 'g2m_genes' (MemoryDataset)...                                                                                                                                                                                        data_catalog.py:345
                    INFO     Running node: add_phase: add_phase([sc_labeled,s_genes,g2m_genes]) -> [sc_w_phase]                                                                                                                                                              node.py:331
                    INFO     View of AnnData object with n_obs × n_vars = 23819 × 32630                                                                                                                                                                                      nodes.py:96
                                 obs: 'Dataset', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'guides', 'num_guides', 'num_guides_clip', 'infected', 'targeted_genes', 'num_targeted_genes', 'perturbation'                                      
                                 var: 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'n_cells'                                                                                                                                                      
                                 uns: 'log1p'                                                                                                                                                                                                                                           
                                 layers: 'CPM', 'logCPM'                                                                                                                                                                                                                                
[10/27/23 14:42:19] INFO     adata.X.shape: (23819, 32630)                                                                                                                                                                                                                                  nodes.py:97
                    INFO     adata.obs.shape: (25060, 13)                                                                                                                                                                                                                                     nodes.py:98
[10/27/23 14:42:22] ERROR    Node 'add_phase: add_phase([sc_labeled,s_genes,g2m_genes]) -> [sc_w_phase]' failed with error:                                                                                                                                                  node.py:356
                             Observations annot. `obs` must have number of rows of `X` (23819), but has 25060 rows.                                                                                                                                                                     
                    WARNING  There are 3 nodes that have not run.                                                                                                                                                                                                          runner.py:206
                             You can resume the pipeline run from the nearest nodes with persisted inputs by adding the following argument to your previous command:                                                                                                                    
                               --from-nodes "get_efficient_targeted_genes_and_cells,get_control_cells,get_not_infected_cells"                                                                                                                                                           
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/philippemartin/miniconda3/envs/perturb-seq/bin/kedro:8 in <module>                        │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/framework/c │
│ li/cli.py:211 in main                                                                            │
│                                                                                                  │
│   208 │   """                                                                                    │
│   209 │   _init_plugins()                                                                        │
│   210 │   cli_collection = KedroCLI(project_path=Path.cwd())                                     │
│ ❱ 211 │   cli_collection()                                                                       │
│   212                                                                                            │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/click/core.py:115 │
│ 7 in __call__                                                                                    │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/framework/c │
│ li/cli.py:139 in main                                                                            │
│                                                                                                  │
│   136 │   │   )                                                                                  │
│   137 │   │                                                                                      │
│   138 │   │   try:                                                                               │
│ ❱ 139 │   │   │   super().main(                                                                  │
│   140 │   │   │   │   args=args,                                                                 │
│   141 │   │   │   │   prog_name=prog_name,                                                       │
│   142 │   │   │   │   complete_var=complete_var,                                                 │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/click/core.py:107 │
│ 8 in main                                                                                        │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/click/core.py:168 │
│ 8 in invoke                                                                                      │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/click/core.py:143 │
│ 4 in invoke                                                                                      │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/click/core.py:783 │
│ in invoke                                                                                        │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/framework/c │
│ li/project.py:459 in run                                                                         │
│                                                                                                  │
│   456 │   with KedroSession.create(                                                              │
│   457 │   │   env=env, conf_source=conf_source, extra_params=params                              │
│   458 │   ) as session:                                                                          │
│ ❱ 459 │   │   session.run(                                                                       │
│   460 │   │   │   tags=tag,                                                                      │
│   461 │   │   │   runner=runner(is_async=is_async),                                              │
│   462 │   │   │   node_names=node_names,                                                         │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/framework/s │
│ ession/session.py:425 in run                                                                     │
│                                                                                                  │
│   422 │   │   )                                                                                  │
│   423 │   │                                                                                      │
│   424 │   │   try:                                                                               │
│ ❱ 425 │   │   │   run_result = runner.run(                                                       │
│   426 │   │   │   │   filtered_pipeline, catalog, hook_manager, session_id                       │
│   427 │   │   │   )                                                                              │
│   428 │   │   │   self._run_called = True                                                        │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/runner/runn │
│ er.py:92 in run                                                                                  │
│                                                                                                  │
│    89 │   │   │   self._logger.info(                                                             │
│    90 │   │   │   │   "Asynchronous mode is enabled for loading and saving data"                 │
│    91 │   │   │   )                                                                              │
│ ❱  92 │   │   self._run(pipeline, catalog, hook_manager, session_id)                             │
│    93 │   │                                                                                      │
│    94 │   │   self._logger.info("Pipeline execution completed successfully.")                    │
│    95                                                                                            │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/runner/sequ │
│ ential_runner.py:70 in _run                                                                      │
│                                                                                                  │
│   67 │   │                                                                                       │
│   68 │   │   for exec_index, node in enumerate(nodes):                                           │
│   69 │   │   │   try:                                                                            │
│ ❱ 70 │   │   │   │   run_node(node, catalog, hook_manager, self._is_async, session_id)           │
│   71 │   │   │   │   done_nodes.add(node)                                                        │
│   72 │   │   │   except Exception:                                                               │
│   73 │   │   │   │   self._suggest_resume_scenario(pipeline, done_nodes, catalog)                │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/runner/runn │
│ er.py:320 in run_node                                                                            │
│                                                                                                  │
│   317 │   if is_async:                                                                           │
│   318 │   │   node = _run_node_async(node, catalog, hook_manager, session_id)                    │
│   319 │   else:                                                                                  │
│ ❱ 320 │   │   node = _run_node_sequential(node, catalog, hook_manager, session_id)               │
│   321 │                                                                                          │
│   322 │   for name in node.confirms:                                                             │
│   323 │   │   catalog.confirm(name)                                                              │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/runner/runn │
│ er.py:416 in _run_node_sequential                                                                │
│                                                                                                  │
│   413 │   )                                                                                      │
│   414 │   inputs.update(additional_inputs)                                                       │
│   415 │                                                                                          │
│ ❱ 416 │   outputs = _call_node_run(                                                              │
│   417 │   │   node, catalog, inputs, is_async, hook_manager, session_id=session_id               │
│   418 │   )                                                                                      │
│   419                                                                                            │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/runner/runn │
│ er.py:382 in _call_node_run                                                                      │
│                                                                                                  │
│   379 │   │   │   is_async=is_async,                                                             │
│   380 │   │   │   session_id=session_id,                                                         │
│   381 │   │   )                                                                                  │
│ ❱ 382 │   │   raise exc                                                                          │
│   383 │   hook_manager.hook.after_node_run(                                                      │
│   384 │   │   node=node,                                                                         │
│   385 │   │   catalog=catalog,                                                                   │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/runner/runn │
│ er.py:372 in _call_node_run                                                                      │
│                                                                                                  │
│   369 ) -> dict[str, Any]:                                                                       │
│   370 │   # pylint: disable=too-many-arguments                                                   │
│   371 │   try:                                                                                   │
│ ❱ 372 │   │   outputs = node.run(inputs)                                                         │
│   373 │   except Exception as exc:                                                               │
│   374 │   │   hook_manager.hook.on_node_error(                                                   │
│   375 │   │   │   error=exc,                                                                     │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/pipeline/no │
│ de.py:357 in run                                                                                 │
│                                                                                                  │
│   354 │   │   # purposely catch all exceptions                                                   │
│   355 │   │   except Exception as exc:                                                           │
│   356 │   │   │   self._logger.error("Node '%s' failed with error: \n%s", str(self), str(exc))   │
│ ❱ 357 │   │   │   raise exc                                                                      │
│   358 │                                                                                          │
│   359 │   def _run_with_no_inputs(self, inputs: dict[str, Any]):                                 │
│   360 │   │   if inputs:                                                                         │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/pipeline/no │
│ de.py:348 in run                                                                                 │
│                                                                                                  │
│   345 │   │   │   elif isinstance(self._inputs, str):                                            │
│   346 │   │   │   │   outputs = self._run_with_one_input(inputs, self._inputs)                   │
│   347 │   │   │   elif isinstance(self._inputs, list):                                           │
│ ❱ 348 │   │   │   │   outputs = self._run_with_list(inputs, self._inputs)                        │
│   349 │   │   │   elif isinstance(self._inputs, dict):                                           │
│   350 │   │   │   │   outputs = self._run_with_dict(inputs, self._inputs)                        │
│   351                                                                                            │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/kedro/pipeline/no │
│ de.py:388 in _run_with_list                                                                      │
│                                                                                                  │
│   385 │   │   │   │   f"{sorted(inputs.keys())}."                                                │
│   386 │   │   │   )                                                                              │
│   387 │   │   # Ensure the function gets the inputs in the correct order                         │
│ ❱ 388 │   │   return self._func(*(inputs[item] for item in node_inputs))                         │
│   389 │                                                                                          │
│   390 │   def _run_with_dict(self, inputs: dict[str, Any], node_inputs: dict[str, str]):         │
│   391 │   │   # Node inputs and provided run inputs should completely overlap                    │
│                                                                                                  │
│ /Users/philippemartin/Documents/Curie/Projets/PerturbSeq/kedro_style/src/perturbseq_kedro_projec │
│ t/pipelines/preprocessing/nodes.py:99 in add_phase                                               │
│                                                                                                  │
│    96 │   logger.info(adata)                                                                     │
│    97 │   logger.info(adata.X.shape)                                                             │
│    98 │   logger.info(adata.obs.shape)                                                           │
│ ❱  99 │   temp_adata = adata.copy()                                                              │
│   100 │   temp_adata.X = adata.layers["logCPM"]                                                  │
│   101 │                                                                                          │
│   102 │   sc.tl.score_genes_cell_cycle(temp_adata, s_genes=s_genes, g2m_genes=g2m_genes)         │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/anndata/_core/ann │
│ data.py:1513 in copy                                                                             │
│                                                                                                  │
│   1510 │   │   │   │   # Subsetting this way means we don’t have to have a view type             │
│   1511 │   │   │   │   # defined for the matrix, which is needed for some of the                 │
│   1512 │   │   │   │   # current distributed backend. Specifically Dask.                         │
│ ❱ 1513 │   │   │   │   return self._mutated_copy(                                                │
│   1514 │   │   │   │   │   X=_subset(self._adata_ref.X, (self._oidx, self._vidx)).copy()         │
│   1515 │   │   │   │   )                                                                         │
│   1516 │   │   │   else:                                                                         │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/anndata/_core/ann │
│ data.py:1458 in _mutated_copy                                                                    │
│                                                                                                  │
│   1455 │   │   │   new["raw"] = kwargs["raw"]                                                    │
│   1456 │   │   elif self.raw is not None:                                                        │
│   1457 │   │   │   new["raw"] = self.raw.copy()                                                  │
│ ❱ 1458 │   │   return AnnData(**new)                                                             │
│   1459 │                                                                                         │
│   1460 │   def to_memory(self, copy=False) -> "AnnData":                                         │
│   1461 │   │   """Return a new AnnData object with all backed arrays loaded into memory.         │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/anndata/_core/ann │
│ data.py:285 in __init__                                                                          │
│                                                                                                  │
│    282 │   │   │   │   raise ValueError("`X` has to be an AnnData object.")                      │
│    283 │   │   │   self._init_as_view(X, oidx, vidx)                                             │
│    284 │   │   else:                                                                             │
│ ❱  285 │   │   │   self._init_as_actual(                                                         │
│    286 │   │   │   │   X=X,                                                                      │
│    287 │   │   │   │   obs=obs,                                                                  │
│    288 │   │   │   │   var=var,                                                                  │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/anndata/_core/ann │
│ data.py:505 in _init_as_actual                                                                   │
│                                                                                                  │
│    502 │   │   # Backwards compat for connectivities matrices in uns["neighbors"]                │
│    503 │   │   _move_adj_mtx({"uns": self._uns, "obsp": self._obsp})                             │
│    504 │   │                                                                                     │
│ ❱  505 │   │   self._check_dimensions()                                                          │
│    506 │   │   self._check_uniqueness()                                                          │
│    507 │   │                                                                                     │
│    508 │   │   if self.filename:                                                                 │
│                                                                                                  │
│ /Users/philippemartin/miniconda3/envs/perturb-seq/lib/python3.10/site-packages/anndata/_core/ann │
│ data.py:1845 in _check_dimensions                                                                │
│                                                                                                  │
│   1842 │   │   else:                                                                             │
│   1843 │   │   │   key = {key}                                                                   │
│   1844 │   │   if "obs" in key and len(self._obs) != self._n_obs:                                │
│ ❱ 1845 │   │   │   raise ValueError(                                                             │
│   1846 │   │   │   │   "Observations annot. `obs` must have number of rows of `X`"               │
│   1847 │   │   │   │   f" ({self._n_obs}), but has {self._obs.shape[0]} rows."                   │
│   1848 │   │   │   )                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

ValueError: Observations annot. obs must have number of rows of X (23819), but has 25060 rows.

@astrojuanlu
Copy link
Member

Hi @philmar1 , thanks a lot for reporting and sorry you're experiencing trouble. The full traceback will help us understand the issue, but is there a way you can narrow the problem down a bit and share a toy input dataset we can use?

@astrojuanlu astrojuanlu added the Community Issue/PR opened by the open-source community label Oct 27, 2023
@astrojuanlu
Copy link
Member

I'm noting this by the way:

                    WARNING  /Users/philippemartin/Documents/Curie/Projets/PerturbSeq/kedro_style/src/perturbseq_kedro_project/pipelines/preprocessing/nodes.py:203: SettingWithCopyWarning:                                                                             warnings.py:109
                             A value is trying to be set on a copy of a slice from a DataFrame.                                                                                                                                                                                         
                             Try using .loc[row_indexer,col_indexer] = value instead                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                        
                             See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy                                                                                                                 
                               adata.obs[adata.obs['infected'] == False]["perturbation"] = "not infected" 

Can you try addressing those warnings and run the pipeline again?

@astrojuanlu
Copy link
Member

it would need to be

adata.obs.loc[adata.obs['infected'] == False, "perturbation"] = "not infected" 

rather than adata.obs[adata.obs['infected'] == False]["perturbation"] = "not infected".

@merelcht
Copy link
Member

Hi @philmar1 are you still facing issues after the suggestions @astrojuanlu made?

@merelcht
Copy link
Member

I'm closing this for now. If you @philmar1 or anyone else needs help to solve the above issue, feel free to re-open it again.

@merelcht merelcht closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2024
@philmar1
Copy link
Author

Hi Merelcht,

I have not working on this for a while and somehow managed to deal with that issue.

I still have a general question though. I'm running several successive nodes and sometimes the run is just killed. I know that it comes from RAM management because it runs great with a subsample of data.
I believe the intermediate outputs are stored in MemoryDataset until the end of the kedro run. Do you agree with that assumption or can you confirm that intermediate outputs stored in MemoryDatasets are dynamically removed when they are not useful anymore ? For instance, when "outputN" is required only for node N+1, will it be removed once nodeN+1 is finished?

Thanks a lot for your answer

@noklam
Copy link
Contributor

noklam commented Mar 20, 2024 via email

@merelcht
Copy link
Member

Hi @philmar1! Glad to hear your issue was solved 🙂 What @noklam says is correct. In our runners we have logic to release a dataset as soon as it's not needed anymore in the rest of the pipeline. See e.g.:

https://github.com/kedro-org/kedro/blob/main/kedro/runner/sequential_runner.py#L81-L88

@noklam
Copy link
Contributor

noklam commented Mar 21, 2024

To clarify, the runner code above only affect datasets that implemented the _release method. For most case, the Dataset object itself doesn't save the data (with CacheDataset as exception), so it is release as soon as it go outside of the function, which is purely handled by Python GC.

Related:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community
Projects
Archived in project
Development

No branches or pull requests

4 participants