[BUG] PT: `display_if_exist` is blocking #3991

njzjz · 2024-07-17T23:39:51Z

Bug summary

The profiler shows that cudaStreamSynchronize happens in display_if_exist.

In display_if_exist, find_property is expected to be float. However, it is tensor(1., device='cuda:0'), a float32 tensor on the GPU, causing the synchronization.

deepmd-kit/deepmd/pt/loss/loss.py

Lines 32 to 43 in 0c0878e

    
               @staticmethod 
        
               def display_if_exist(loss: torch.Tensor, find_property: float) -> torch.Tensor: 
        
                   """Display NaN if labeled property is not found. 
        
                   Parameters 
        
                   ---------- 
        
                   loss : torch.Tensor 
        
                       the loss tensor 
        
                   find_property : float 
        
                       whether the property is found 
        
                   """ 
        
                   return loss if bool(find_property) else torch.nan

DeePMD-kit Version

0c0878e

Backend and its version

PyTorch 2.3.1

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

Use examples/water/se_atten_compressible to debug.

Steps to Reproduce

cd examples/water/se_atten_compressible
dp --pt train input.json

Further Information, Files, and Links

No response

The text was updated successfully, but these errors were encountered:

make 'find_' to be float in get data, fix #3991 . On my device, the profiler indicates that `cudaStreamSynchronize` takes negligible time, resulting in minimal speedup.  ## Summary by CodeRabbit - **New Features** - Enhanced data loading by adding a `collate_fn` parameter for more flexible data collation. - Improved data filtering by excluding keys containing "find_" in addition to existing filters.

make 'find_' to be float in get data, fix deepmodeling#3991 . On my device, the profiler indicates that `cudaStreamSynchronize` takes negligible time, resulting in minimal speedup.  ## Summary by CodeRabbit - **New Features** - Enhanced data loading by adding a `collate_fn` parameter for more flexible data collation. - Improved data filtering by excluding keys containing "find_" in addition to existing filters.

njzjz added the bug label Jul 17, 2024

njzjz changed the title ~~[BUG] display_if_exist is blocking~~ [BUG] PT: display_if_exist is blocking Jul 17, 2024

iProzd mentioned this issue Jul 18, 2024

fix(pt): make 'find_' to be float in get data #3992

Merged

njzjz linked a pull request Jul 18, 2024 that will close this issue

fix(pt): make 'find_' to be float in get data #3992

Merged

njzjz closed this as completed Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] PT: `display_if_exist` is blocking #3991

[BUG] PT: `display_if_exist` is blocking #3991

njzjz commented Jul 17, 2024

[BUG] PT: display_if_exist is blocking #3991

[BUG] PT: display_if_exist is blocking #3991

Comments

njzjz commented Jul 17, 2024

Bug summary

DeePMD-kit Version

Backend and its version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links

[BUG] PT: `display_if_exist` is blocking #3991

[BUG] PT: `display_if_exist` is blocking #3991