Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tqdm zip bug #5120

Merged

Conversation

david1542
Copy link
Contributor

This PR solves #5117, by wrapping the entire zip clause in tqdm.

For more information, please checkout this Stack Overflow thread:
https://stackoverflow.com/questions/41171191/tqdm-progressbar-and-zip-built-in-do-not-work-together

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying to fix this bug, @david1542.

However the proposed solution will not work: please note that pbar is also used if not batched:

if not batched:
for i, example in enumerate(pbar):

  • there it is expected that iterating over pbar returns examples, not a tuple like with your implementation.

@david1542
Copy link
Contributor Author

@albertvillanova Thanks for your comment. What do you think about creating 2 pbar for each case? I see the pbar_iterable is initialized differently. Maybe pbar can also be initialized like that.

@david1542
Copy link
Contributor Author

@albertvillanova Another solution I implemented is to change pbar_iterable and add the zip to it. I updated the PR with this solution. Let me know what you think.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement.

I think it is a good idea to include the indices in pbar_iterable.

You need to fix the style to pass the CI test on code quality. Please run:

make style

To be consistent, I would recommend to do the same (include the enumerate in pbar_iterable) for the non batched case.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 18, 2022

The documentation is not available anymore as the PR was closed or merged.

@dudulasry
Copy link

@albertvillanova Done :) Let me know what you think.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @david1542, awesome work!

I'm just wondering if it could be possible to implement a regression test for this, but I don't see any straightforward way...

Otherwise, this PR seems good to me!

@david1542
Copy link
Contributor Author

@albertvillanova Thanks :) I also don't see an easy way to test this. This was just a problem in the way tqdm was used. I'm not sure we should cover it in tests.

@albertvillanova albertvillanova merged commit 28bcb2f into huggingface:main Oct 19, 2022
@Cochonaki
Copy link

Hi,

First of all, thanks for this PR.
It's the first time I join a discussion on GitHUB on problem resolution in libraries such as transformers, so I hope I comply to the best practices for an efficient communication...

I am running AutoTokenizer.from_pretrained in a Google Colab notebook for using with BERT base.
I am experiencing issue 5117.

Each time I run my notebook, I do:

! pip install transformers ! pip install datasets ! pip install huggingface_hub

As I understand, the issue has been resolved and the solution merged to the released version of the code?
So I expect that the bug is resolved in my notebook, however this is not the case.

Do I get something wrong?
Do I have to implement some change in the source code myself?

Thanks in advance for your help!

@david1542
Copy link
Contributor Author

@Cochonaki Hi :) The problem was fixed but there wasn't a release since then. I believe a new release should come out in the upcoming weeks. Maybe someone from the core maintainers can answer that :)

cc: @albertvillanova

@armw4
Copy link

armw4 commented Oct 23, 2022 via email

@albertvillanova
Copy link
Member

Hi, @Cochonaki.

As @david1542 pointed out, we have not made a release since this bug was fixed. We will make one in the following weeks.

In the meantime, if you would like to incorporate the bug fix, you can install datasets from this repo main branch:

pip install git+https://github.com/huggingface/datasets#egg=datasets

@Cochonaki
Copy link

Thanks a lot @albertvillanova and @david1542, it works now!
I am really thankful for your help, that encourages me to participate more in this community.
See you around!

@albertvillanova
Copy link
Member

Welcome!!! 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants