Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NER - pl example #3180

Merged
merged 9 commits into from
Mar 10, 2020
Merged

NER - pl example #3180

merged 9 commits into from
Mar 10, 2020

Conversation

shubhamagarwal92
Copy link
Contributor

@shubhamagarwal92 shubhamagarwal92 commented Mar 8, 2020

  1. Solves most of the issues raised in NER: some issues in PyTorch Lightning example #3159
  2. Streamlines shell script pipeline
  3. pl logs related changes
  4. added in Readme

@srush
Copy link
Contributor

srush commented Mar 9, 2020

This looks great to me. Thanks @shubhamagarwal92.

You need to just run the black command, I believe to is make style to fix formatting. Otherwise it looks good.

@shubhamagarwal92
Copy link
Contributor Author

@srush I ran the make style which changed a lot of files, however, the check_code_quality is still failing!
Do you want me to revert the last two commits and manually change the 'logs' to "logs"

BTW, pip install -e ".[dev]" is failing on both mac and linux for tensorflow and on sentencepiece for mac. I had to manually install the ["black", "isort", "flake8"] packages. Python=3.8.2 in conda env.

@LysandreJik
Copy link
Member

BTW, pip install -e ".[dev]" is failing on both mac and linux for tensorflow and on sentencepiece for mac.

That would be because TensorFlow only supports python 3.5-3.7, unfortunately.

@shubhamagarwal92
Copy link
Contributor Author

BTW, pip install -e ".[dev]" is failing on both mac and linux for tensorflow and on sentencepiece for mac.

That would be because TensorFlow only supports python 3.5-3.7, unfortunately.

@LysandreJik Thanks. Installs on ubuntu with python 3.6.

However, on mac:

conda create -n transformers_dev python=3.6 -y
conda activate transformers_dev
pip install -e ".[dev]"


Failed to build tokenizers
ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed directly

Mac specs:

Python 3.6.10 | packaged by conda-forge | (default, Mar  5 2020, 09:56:10) 
[GCC Clang 9.0.1 ] on darwin

@stefan-it
Copy link
Collaborator

Hi @shubhamagarwal92 ,

thanks for that PR and fixing the issues 👍

I just ran the run_pl.sh script, training and final testing run without errors now. However, during training precision, recall and F1 are always 0 -> final output on the test set shows:

TEST RESULTS
{'val_loss': tensor(7.0679), 'precision': 0.0, 'recall': 0.0, 'f1': 0}
----------------------------------------------------------------------------------------------------
Testing: 200it [00:07, 27.98it/s]

Last lines of the prediction output:

der I-OTHderiv
Bibliothek I-OTHderiv
berufen I-OTHderiv
wurde I-OTHderiv
, I-OTHderiv
verließ I-OTHderiv
Gardthausen I-OTHderiv
den I-OTHderiv
Bibliotheksdienst I-OTHderiv
. I-OTHderiv

@shubhamagarwal92
Copy link
Contributor Author

shubhamagarwal92 commented Mar 9, 2020

I just ran the run_pl.sh script, training and final testing run without errors now. However, during training precision, recall and F1 are always 0 -> final output on the test set shows:

TEST RESULTS
{'val_loss': tensor(7.0679), 'precision': 0.0, 'recall': 0.0, 'f1': 0}
----------------------------------------------------------------------------------------------------
Testing: 200it [00:07, 27.98it/s]

Thanks for reporting this. Could you please verify the version of pytorch-lightning. For me it is
pytorch-lightning==0.7.1, transformers==2.5.1 and the results as reported in the README.

Also could you please check if the results in ${OUTPUT_DIR}/test_results.txt mentioned here
also correspond to 0.

It works for me as:

Screen Shot 2020-03-09 at 3 39 26 PM

@stefan-it
Copy link
Collaborator

Hi,

I'm using the same versions of both pytorch-lightning and transformers 😂

Output of test_results is:

$ cat germeval-model/test_results.txt 
f1 = 0
precision = 0.0
recall = 0.0
val_loss = tensor(9.4173)

But I'm going to test it on another machine :)

@shubhamagarwal92
Copy link
Contributor Author

shubhamagarwal92 commented Mar 9, 2020

But I'm going to test it on another machine :)

I am also attaching my environment file via pip freeze > requirements.txt:
requirements.txt

Please let me know if this doesn't work.

@srush
Copy link
Contributor

srush commented Mar 9, 2020

@shubhamagarwal92 I think somehow you have the wrong version of our style checks installed.

Can you try running under this command?

sudo pip install git+git://github.com/timothycrosley/isort.git@e63ae06ec7d70b06df9e528357650281a3d3ec22#egg=isort
sudo pip install .[tf,torch,quality]

@LysandreJik we have to fix this, it is really confusing...

@stefan-it would love to see your log as well. Could you also try rm cached* I think maybe your feature cache got messed up?

@codecov-io
Copy link

Codecov Report

Merging #3180 into master will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3180      +/-   ##
==========================================
+ Coverage   78.15%   78.16%   +0.01%     
==========================================
  Files          98       98              
  Lines       16641    16641              
==========================================
+ Hits        13006    13008       +2     
+ Misses       3635     3633       -2
Impacted Files Coverage Δ
src/transformers/modeling_utils.py 94.72% <0%> (+0.31%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e03129a...9f949d3. Read the comment docs.

@shubhamagarwal92
Copy link
Contributor Author

shubhamagarwal92 commented Mar 9, 2020

sudo pip install git+git://github.com/timothycrosley/isort.git@e63ae06ec7d70b06df9e528357650281a3d3ec22#egg=isort
sudo pip install .[tf,torch,quality]

@srush I reverted the last 3 style related commits, force-pushed and added a small commit to pass all the checks. Please merge if everything is fine.

Also, for isort, I guess you meant using git+https://

pip install git+https://github.com/timothycrosley/isort.git@e63ae06ec7d70b06df9e528357650281a3d3ec22#egg=isort

This link is also wrong at contributing.md. Could you please also state the python version in the md.

This command pip install .[tf,torch,quality] is still failing on Mac as mentioned in my previous comment.

Failed to build tokenizers
ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed directly

@srush
Copy link
Contributor

srush commented Mar 10, 2020

Thanks @shubhamagarwal92. Sorry for the annoyance.

@LysandreJik this lgtm.

@LysandreJik LysandreJik merged commit 5ca356a into huggingface:master Mar 10, 2020
@shubhamagarwal92
Copy link
Contributor Author

@srush Happy to help! :)

Thanks for approving the PR!

@stefan-it
Copy link
Collaborator

stefan-it commented Mar 10, 2020

@shashwath94 I think I've found the reason for the bad evaluation results: I'm using apex and the --fp16 parameter in the run_pl.sh script!

Do you have any idea, why it is not working using half precision 🤔

@srush
Copy link
Contributor

srush commented Mar 10, 2020

Ah, I will check with pytorch-lightning. It is plausible we are not integrating correctly with them

@shubhamagarwal92
Copy link
Contributor Author

@srush While you are it, could you please check the status of my PR in pl as well.

Lightning-AI/pytorch-lightning#1094

Basically, I was observing memory leak on GPU0 if other GPU id (eg. [1]) was provided when running the NER example.

AFAIK, the solution is torch.cuda.set_device()

jplu pushed a commit to jplu/transformers that referenced this pull request Mar 25, 2020
* 1. seqeval required by ner pl example. install from examples/requirements. 2. unrecognized arguments: save_steps

* pl checkpoint callback filenotfound error: make directory and pass

* huggingface#3159 pl checkpoint path difference

* 1. Updated Readme for pl 2. pl script now also correct displays logs 3. pass gpu ids compared to number of gpus

* Updated results in readme

* 1. updated readme 2. removing deprecated pl methods 3. finalizing scripts

* comment length check

* using deprecated validation_end for stable results

* style related changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants