error occurs while training #333

berkdenizi · 2020-04-11T15:28:01Z

Epoch: [10/60][400/404] Time 0.173 (0.188) Data 0.000 (0.011) Loss 1.6979 (1.5605) Acc 81.25 (91.09) Lr 0.000300 eta 1:03:19

Evaluating market1501 (source)

Extracting features from query set ...
Done, obtained 3368-by-2048 matrix
Extracting features from gallery set ...
Done, obtained 15913-by-2048 matrix
Speed: 0.0211 sec/batch
Computing distance matrix with metric=euclidean ...
Computing CMC and mAP ...

"ValueError Traceback (most recent call last)
in
4 eval_freq=10,
5 print_freq=10,
----> 6 test_only=False
7 )

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\engine\engine.py in run(self, save_dir, max_epoch, start_epoch, print_freq, fixbase_epoch, open_layers, start_eval, eval_freq, test_only, dist_metric, normalize_feature, visrank, visrank_topk, use_metric_cuhk03, ranks, rerank)
141 save_dir=save_dir,
142 use_metric_cuhk03=use_metric_cuhk03,
--> 143 ranks=ranks
144 )
145 self._save_checkpoint(epoch, rank1, save_dir)

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\engine\engine.py in test(self, epoch, dist_metric, normalize_feature, visrank, visrank_topk, save_dir, use_metric_cuhk03, ranks, rerank)
225 use_metric_cuhk03=use_metric_cuhk03,
226 ranks=ranks,
--> 227 rerank=rerank
228 )
229

~\anaconda3\envs\torchreid\lib\site-packages\torch\autograd\grad_mode.py in decorate_no_grad(*args, **kwargs)
47 def decorate_no_grad(*args, **kwargs):
48 with self:
---> 49 return func(*args, **kwargs)
50 return decorate_no_grad
51

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\engine\engine.py in _evaluate(self, epoch, dataset_name, query_loader, gallery_loader, dist_metric, normalize_feature, visrank, visrank_topk, save_dir, use_metric_cuhk03, ranks, rerank)
300 q_camids,
301 g_camids,
--> 302 use_metric_cuhk03=use_metric_cuhk03
303 )
304

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\metrics\rank.py in evaluate_rank(distmat, q_pids, g_pids, q_camids, g_camids, max_rank, use_metric_cuhk03, use_cython)
199 return evaluate_cy(
200 distmat, q_pids, g_pids, q_camids, g_camids, max_rank,
--> 201 use_metric_cuhk03
202 )
203 else:

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\metrics\rank_cylib\rank_cy.pyx in torchreid.metrics.rank_cylib.rank_cy.evaluate_cy()
22
23 # Main interface
---> 24 cpdef evaluate_cy(distmat, q_pids, g_pids, q_camids, g_camids, max_rank, use_metric_cuhk03=False):
25 distmat = np.asarray(distmat, dtype=np.float32)
26 q_pids = np.asarray(q_pids, dtype=np.int64)

~\Desktop\deep-person-reid-master\deep-person-reid-master\torchreid\metrics\rank_cylib\rank_cy.pyx in torchreid.metrics.rank_cylib.rank_cy.evaluate_cy()
30 if use_metric_cuhk03:
31 return eval_cuhk03_cy(distmat, q_pids, g_pids, q_camids, g_camids, max_rank)
---> 32 return eval_market1501_cy(distmat, q_pids, g_pids, q_camids, g_camids, max_rank)
33
34

ValueError: Buffer dtype mismatch, expected 'long' but got 'long long'"

KaiyangZhou · 2020-04-18T20:23:34Z

any more info? like command to reproduce your error? did you make any change to the source code?

darcyzhc · 2020-05-12T09:44:26Z

any more info? like command to reproduce your error? did you make any change to the source code?

Hello KaiyangZhou. I encountered the same problem. The result is:

......
epoch: [10/60][790/808] time 0.520 (0.500)      data 0.000 (0.007)      eta 5:36:41     loss 2.1012 (1.8828)    acc 75.0000 (81.7168)   lr 0.000300
epoch: [10/60][800/808] time 0.490 (0.500)      data 0.000 (0.007)      eta 5:36:34     loss 1.8767 (1.8839)    acc 87.5000 (81.6875)   lr 0.000300
##### Evaluating market1501 (source) #####
Extracting features from query set ...
Done, obtained 3368-by-2048 matrix
Extracting features from gallery set ...
Done, obtained 15913-by-2048 matrix
Speed: 0.0383 sec/batch
Computing distance matrix with metric=euclidean ...
Computing CMC and mAP ...
Traceback (most recent call last):
  File ".\reid_test.py", line 56, in <module>
    run()
  File ".\reid_test.py", line 52, in run
    test_only=False
  File "e:\python_project\deep-person-reid\torchreid\engine\engine.py", line 211, in run
    ranks=ranks
  File "e:\python_project\deep-person-reid\torchreid\engine\engine.py", line 344, in test
    rerank=rerank
  File "D:\Anaconda3\envs\torchreid\lib\site-packages\torch\autograd\grad_mode.py", line 43, in decorate_no_grad
    return func(*args, **kwargs)
  File "e:\python_project\deep-person-reid\torchreid\engine\engine.py", line 419, in _evaluate
    use_metric_cuhk03=use_metric_cuhk03
  File "e:\python_project\deep-person-reid\torchreid\metrics\rank.py", line 201, in evaluate_rank
    use_metric_cuhk03
  File "torchreid\metrics\rank_cylib\rank_cy.pyx", line 24, in torchreid.metrics.rank_cylib.rank_cy.evaluate_cy
  File "torchreid\metrics\rank_cylib\rank_cy.pyx", line 32, in torchreid.metrics.rank_cylib.rank_cy.evaluate_cy
ValueError: Buffer dtype mismatch, expected 'long' but got 'long long'

I didn't change the source code. I just do as the guide says, no additional command. My running code is:

import torchreid
import torch

def run():
    torch.multiprocessing.freeze_support()
    print('loop')
    datamanager = torchreid.data.ImageDataManager(
        root='F:\\Data_sets\\reid-data',
        sources='market1501',
        targets='market1501',
        height=256,
        width=128,
        batch_size_train=16,
        batch_size_test=16,
        transforms=['random_flip', 'random_crop']
    )

    model = torchreid.models.build_model(
        name='resnet50',
        num_classes=datamanager.num_train_pids,
        loss='softmax',
        pretrained=True
    )

    model = model.cuda()

    optimizer = torchreid.optim.build_optimizer(
        model,
        optim='adam',
        lr=0.0003
    )

    scheduler = torchreid.optim.build_lr_scheduler(
        optimizer,
        lr_scheduler='single_step',
        stepsize=20
    )

    engine = torchreid.engine.ImageSoftmaxEngine(
        datamanager,
        model,
        optimizer=optimizer,
        scheduler=scheduler,
        label_smooth=True
    )

    engine.run(
        save_dir='log/resnet50',
        max_epoch=60,
        eval_freq=10,
        print_freq=10,
        test_only=False
    )

if __name__ == '__main__':
    run()

My system is Windows10, pytorch 1.1.0, cudatoolkit 9.0. Because Windows system ,I have to change some code compared to the get-started-30-seconds, or there will be error described as #issues21114.
And the unchanged code runs properly on my ubuntu system.
Could you help to solve this problem?

KaiyangZhou · 2020-05-13T09:32:07Z

seems to be a problem for windows

I can't reproduce the error without a windows machine, maybe @berkdenizi has solved the issue?

berkdenizi · 2020-05-13T20:54:57Z

I solved but idk how . i played on some codes long time ago .. @KaiyangZhou @darcyzhc

darcyzhc · 2020-05-14T04:29:41Z

I solved but idk how . i played on some codes long time ago .. @KaiyangZhou @darcyzhc

Thanks for your reply, berkdenizi. Could you remember how did you solve this problem?

s2244521 · 2020-05-14T12:49:20Z

I also have this problem and my environment is windows. @berkdenizi , How did you solve it?

darcyzhc · 2020-05-17T14:32:08Z

I figure out a workaround about this problem. You can prohibit the use of cython by change attribute use_cython=True to use_cython=False in function evaluate_rank in file torchreid\metrics\rank.py. Though it'll be slower, it can work.

darcyzhc · 2020-05-17T15:09:24Z

And I get the workable solution now. As described in #160:

change every "long" to "long long" in the file /metrics/rank_cylib/rank_cy.pyx.
delete the file 'rank_cy.cp37-win_amd64.pyd' in the same folder manually.
in terminal, re-run the command 'python setup.py develop'. Then you will see a new file ''rank_cy.cp37-win_amd64.pyd' generated under folder /metrics/rank_cylib.

KaiyangZhou closed this as completed May 18, 2020

KaiyangZhou mentioned this issue May 25, 2020

when set fixbase_epoch to 0, the error occurs #340

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error occurs while training #333

error occurs while training #333

berkdenizi commented Apr 11, 2020

KaiyangZhou commented Apr 18, 2020

darcyzhc commented May 12, 2020 •

edited

Loading

KaiyangZhou commented May 13, 2020

berkdenizi commented May 13, 2020

darcyzhc commented May 14, 2020

s2244521 commented May 14, 2020

darcyzhc commented May 17, 2020

darcyzhc commented May 17, 2020

error occurs while training #333

error occurs while training #333

Comments

berkdenizi commented Apr 11, 2020

Evaluating market1501 (source)

KaiyangZhou commented Apr 18, 2020

darcyzhc commented May 12, 2020 • edited Loading

KaiyangZhou commented May 13, 2020

berkdenizi commented May 13, 2020

darcyzhc commented May 14, 2020

s2244521 commented May 14, 2020

darcyzhc commented May 17, 2020

darcyzhc commented May 17, 2020

darcyzhc commented May 12, 2020 •

edited

Loading