Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error occurs while training #333

Closed
berkdenizi opened this issue Apr 11, 2020 · 8 comments
Closed

error occurs while training #333

berkdenizi opened this issue Apr 11, 2020 · 8 comments

Comments

@berkdenizi
Copy link

Epoch: [10/60][400/404] Time 0.173 (0.188) Data 0.000 (0.011) Loss 1.6979 (1.5605) Acc 81.25 (91.09) Lr 0.000300 eta 1:03:19

Evaluating market1501 (source)

Extracting features from query set ...
Done, obtained 3368-by-2048 matrix
Extracting features from gallery set ...
Done, obtained 15913-by-2048 matrix
Speed: 0.0211 sec/batch
Computing distance matrix with metric=euclidean ...
Computing CMC and mAP ...

"ValueError Traceback (most recent call last)
in
4 eval_freq=10,
5 print_freq=10,
----> 6 test_only=False
7 )

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\engine\engine.py in run(self, save_dir, max_epoch, start_epoch, print_freq, fixbase_epoch, open_layers, start_eval, eval_freq, test_only, dist_metric, normalize_feature, visrank, visrank_topk, use_metric_cuhk03, ranks, rerank)
141 save_dir=save_dir,
142 use_metric_cuhk03=use_metric_cuhk03,
--> 143 ranks=ranks
144 )
145 self._save_checkpoint(epoch, rank1, save_dir)

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\engine\engine.py in test(self, epoch, dist_metric, normalize_feature, visrank, visrank_topk, save_dir, use_metric_cuhk03, ranks, rerank)
225 use_metric_cuhk03=use_metric_cuhk03,
226 ranks=ranks,
--> 227 rerank=rerank
228 )
229

~\anaconda3\envs\torchreid\lib\site-packages\torch\autograd\grad_mode.py in decorate_no_grad(*args, **kwargs)
47 def decorate_no_grad(*args, **kwargs):
48 with self:
---> 49 return func(*args, **kwargs)
50 return decorate_no_grad
51

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\engine\engine.py in _evaluate(self, epoch, dataset_name, query_loader, gallery_loader, dist_metric, normalize_feature, visrank, visrank_topk, save_dir, use_metric_cuhk03, ranks, rerank)
300 q_camids,
301 g_camids,
--> 302 use_metric_cuhk03=use_metric_cuhk03
303 )
304

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\metrics\rank.py in evaluate_rank(distmat, q_pids, g_pids, q_camids, g_camids, max_rank, use_metric_cuhk03, use_cython)
199 return evaluate_cy(
200 distmat, q_pids, g_pids, q_camids, g_camids, max_rank,
--> 201 use_metric_cuhk03
202 )
203 else:

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\metrics\rank_cylib\rank_cy.pyx in torchreid.metrics.rank_cylib.rank_cy.evaluate_cy()
22
23 # Main interface
---> 24 cpdef evaluate_cy(distmat, q_pids, g_pids, q_camids, g_camids, max_rank, use_metric_cuhk03=False):
25 distmat = np.asarray(distmat, dtype=np.float32)
26 q_pids = np.asarray(q_pids, dtype=np.int64)

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\metrics\rank_cylib\rank_cy.pyx in torchreid.metrics.rank_cylib.rank_cy.evaluate_cy()
30 if use_metric_cuhk03:
31 return eval_cuhk03_cy(distmat, q_pids, g_pids, q_camids, g_camids, max_rank)
---> 32 return eval_market1501_cy(distmat, q_pids, g_pids, q_camids, g_camids, max_rank)
33
34

ValueError: Buffer dtype mismatch, expected 'long' but got 'long long'"

@KaiyangZhou
Copy link
Owner

any more info? like command to reproduce your error? did you make any change to the source code?

@darcyzhc
Copy link

darcyzhc commented May 12, 2020

any more info? like command to reproduce your error? did you make any change to the source code?

Hello KaiyangZhou. I encountered the same problem. The result is:

......
epoch: [10/60][790/808] time 0.520 (0.500)      data 0.000 (0.007)      eta 5:36:41     loss 2.1012 (1.8828)    acc 75.0000 (81.7168)   lr 0.000300
epoch: [10/60][800/808] time 0.490 (0.500)      data 0.000 (0.007)      eta 5:36:34     loss 1.8767 (1.8839)    acc 87.5000 (81.6875)   lr 0.000300
##### Evaluating market1501 (source) #####
Extracting features from query set ...
Done, obtained 3368-by-2048 matrix
Extracting features from gallery set ...
Done, obtained 15913-by-2048 matrix
Speed: 0.0383 sec/batch
Computing distance matrix with metric=euclidean ...
Computing CMC and mAP ...
Traceback (most recent call last):
  File ".\reid_test.py", line 56, in <module>
    run()
  File ".\reid_test.py", line 52, in run
    test_only=False
  File "e:\python_project\deep-person-reid\torchreid\engine\engine.py", line 211, in run
    ranks=ranks
  File "e:\python_project\deep-person-reid\torchreid\engine\engine.py", line 344, in test
    rerank=rerank
  File "D:\Anaconda3\envs\torchreid\lib\site-packages\torch\autograd\grad_mode.py", line 43, in decorate_no_grad
    return func(*args, **kwargs)
  File "e:\python_project\deep-person-reid\torchreid\engine\engine.py", line 419, in _evaluate
    use_metric_cuhk03=use_metric_cuhk03
  File "e:\python_project\deep-person-reid\torchreid\metrics\rank.py", line 201, in evaluate_rank
    use_metric_cuhk03
  File "torchreid\metrics\rank_cylib\rank_cy.pyx", line 24, in torchreid.metrics.rank_cylib.rank_cy.evaluate_cy
  File "torchreid\metrics\rank_cylib\rank_cy.pyx", line 32, in torchreid.metrics.rank_cylib.rank_cy.evaluate_cy
ValueError: Buffer dtype mismatch, expected 'long' but got 'long long'

I didn't change the source code. I just do as the guide says, no additional command. My running code is:

import torchreid
import torch

def run():
    torch.multiprocessing.freeze_support()
    print('loop')
    datamanager = torchreid.data.ImageDataManager(
        root='F:\\Data_sets\\reid-data',
        sources='market1501',
        targets='market1501',
        height=256,
        width=128,
        batch_size_train=16,
        batch_size_test=16,
        transforms=['random_flip', 'random_crop']
    )

    model = torchreid.models.build_model(
        name='resnet50',
        num_classes=datamanager.num_train_pids,
        loss='softmax',
        pretrained=True
    )

    model = model.cuda()

    optimizer = torchreid.optim.build_optimizer(
        model,
        optim='adam',
        lr=0.0003
    )

    scheduler = torchreid.optim.build_lr_scheduler(
        optimizer,
        lr_scheduler='single_step',
        stepsize=20
    )

    engine = torchreid.engine.ImageSoftmaxEngine(
        datamanager,
        model,
        optimizer=optimizer,
        scheduler=scheduler,
        label_smooth=True
    )

    engine.run(
        save_dir='log/resnet50',
        max_epoch=60,
        eval_freq=10,
        print_freq=10,
        test_only=False
    )

if __name__ == '__main__':
    run()

My system is Windows10, pytorch 1.1.0, cudatoolkit 9.0. Because Windows system ,I have to change some code compared to the get-started-30-seconds, or there will be error described as #issues21114.
And the unchanged code runs properly on my ubuntu system.
Could you help to solve this problem?

@KaiyangZhou
Copy link
Owner

seems to be a problem for windows

I can't reproduce the error without a windows machine, maybe @berkdenizi has solved the issue?

@berkdenizi
Copy link
Author

I solved but idk how . i played on some codes long time ago .. @KaiyangZhou @darcyzhc

@darcyzhc
Copy link

I solved but idk how . i played on some codes long time ago .. @KaiyangZhou @darcyzhc

Thanks for your reply, berkdenizi. Could you remember how did you solve this problem?

@s2244521
Copy link

I also have this problem and my environment is windows. @berkdenizi , How did you solve it?

@darcyzhc
Copy link

I figure out a workaround about this problem. You can prohibit the use of cython by change attribute use_cython=True to use_cython=False in function evaluate_rank in file torchreid\metrics\rank.py. Though it'll be slower, it can work.

@darcyzhc
Copy link

And I get the workable solution now. As described in #160:

  1. change every "long" to "long long" in the file /metrics/rank_cylib/rank_cy.pyx.
  2. delete the file 'rank_cy.cp37-win_amd64.pyd' in the same folder manually.
  3. in terminal, re-run the command 'python setup.py develop'. Then you will see a new file ''rank_cy.cp37-win_amd64.pyd' generated under folder /metrics/rank_cylib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants