Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Went into (Pdb) mode when using 'exact' as inner_ot_method #4

Open
ZeratuuLL opened this issue May 6, 2021 · 2 comments
Open

Went into (Pdb) mode when using 'exact' as inner_ot_method #4

ZeratuuLL opened this issue May 6, 2021 · 2 comments

Comments

@ZeratuuLL
Copy link

I run into (Pdb) mode when I was testing the code with inner_ot_method='exact'. It does not happen when inner_ot_method='gaussian_approx'. Basically I just followed the vanilla example. Here is my code:

dist = DatasetDistance(dataset1, dataset2,
                       inner_ot_method = 'exact',
                       debiased_loss = True,
                       p = 2, entreg = 1e-1,
                       device='cpu')
d = dist.distance(maxsamples = 1000)

Both dataset1 and dataset2 are created by TensorDataset.

And below is the output from the program.

Computing label-to-label distances:   0%|                                                                                                                                                                                                                              | 0/4 [00:00<?, ?it/s]
> mypath/envs/OTDD/lib/python3.6/site-packages/otdd/pytorch/wasserstein.py(338)pwdist_exact()
-> if symmetric:
(Pdb) True
True
(Pdb) False
False
(Pdb) 1
1
(Pdb) 0
0
(Pdb) 

There is no way for me to quit (Pdb) except for quit, which would terminal the program as well with an error returned

Traceback (most recent call last):                                                                                                                                                                                                                                                           
  File "otdd_distance.py", line 63, in <module>
    d = dist.distance(maxsamples = 1000)
  File "mypath/envs/OTDD/lib/python3.6/site-packages/otdd/pytorch/distance.py", line 595, in distance
    _ = self._get_label_distances()
  File "mypath/envs/OTDD/lib/python3.6/site-packages/otdd/pytorch/distance.py", line 529, in _get_label_distances
    DYY12 = pwdist(self.X1,self.Y1,self.X2, self.Y2)
  File "mypath/envs/OTDD/lib/python3.6/site-packages/otdd/pytorch/wasserstein.py", line 338, in pwdist_exact
    if symmetric:
  File "mypath/envs/OTDD/lib/python3.6/site-packages/otdd/pytorch/wasserstein.py", line 338, in pwdist_exact
    if symmetric:
  File "mypath/envs/OTDD/lib/python3.6/bdb.py", line 51, in trace_dispatch
    return self.dispatch_line(frame)
  File "mypath/envs/OTDD/lib/python3.6/bdb.py", line 70, in dispatch_line
    if self.quitting: raise BdbQuit
bdb.BdbQuit

About my environment:
A new environment with conda create -n name python=3.6
Pytorch installed by conda install pytorch torchvision torchaudio cudatoolkit=10.1 -c pytorch
Then I followed the instructions in this repository

python -m pip install -r requirements.txt
python -m pip install .

Please let me know how to solve this problem. Thank you very much!

@dmelis
Copy link
Contributor

dmelis commented May 6, 2021

Hi. Can you tell me in which pdb.set_trace() you're falling? You can check by typing executing (Pdb) l . I suspect it's happening at the end of pwdist_exact. I also pushed a fix that should print an error message and abort instead of that set_trace().

Now, debugging the cause of that is a bit trick without more details of the data you're running this on. That error typically indicates that you're running into memory issues. Can you provide some statistics on your datasets? E.g., feature dimension, no. of classes, max no. of instances per class, and the hardware you're running this on.

@ZeratuuLL
Copy link
Author

Hello!

Thank you for your quick response! Here is my result of (Pdb) l:

-> if symmetric:
(Pdb) l
333         for i, j in pbar:
334             try:
335                 D[i, j] = distance(X1[Y1==c1[i]].to(device), X2[Y2==c2[j]].to(device)).item()
336             except:
337                 pdb.set_trace()
338  ->         if symmetric:
339                 D[j, i] = D[i, j]
340         return D
341     
342     
343     
(Pdb) 

As for my datasets, dataset1 has 25000 samples, feature dimension 768 with 2 balanced classes. dataset2 has around 9700 samples, feature dimension 768 with 2 balanced classes.

As for hardware, here is the info of CPU:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          32
On-line CPU(s) list:             0-31
Thread(s) per core:              2
Core(s) per socket:              8
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

And some basic memory info:

MemTotal:       131895676 kB
MemFree:        103871316 kB
MemAvailable:   127671524 kB

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants