Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bert Example #440

Merged
merged 70 commits into from
Dec 9, 2022
Merged

Bert Example #440

merged 70 commits into from
Dec 9, 2022

Conversation

HCY-11
Copy link
Contributor

@HCY-11 HCY-11 commented Nov 9, 2022

Example using convert_to_analog to run BERT Transformer on SQuAD task.

Pull from IBM master
Copy link
Collaborator

@maljoras maljoras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HCY-11 Shaping up greatly! Seems only that the evaluation loop is still missing.

examples/24_bert_on_squad.py Outdated Show resolved Hide resolved
maljoras and others added 28 commits December 6, 2022 17:17
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Update travis test stages

* Update travis cibuildwheel stages

* Update version specifiers

* Fix numpy typing issues

* Update CHANGELOG

* Fix typo from my way future self

Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Added multiple RNN cell/layer functionality

* Finished multiple rnn functionality

* Modified rnn example and finished rnn testing

* init

* examples cuda fixes

* skip large conversion

* change log

* pycodestyle

Co-authored-by: Malte Rasch <malte.rasch@ibm.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Add security policy

* Add test policy to contributor guidelines

Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Initial commit

* Updated SelfDefineDevice

* Fixed some files

* Reformat files

* Fixed visualization bug

* Polishing: added fitting to device data

* Cuda version

* Change log

* Test fix

Co-authored-by: Malte Rasch <malte.rasch@ibm.com>
Co-authored-by: Kaoutar El Maghraoui <kaoutar.elmaghraoui@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Added a fix in .travis.yml file to the last build error that is caused by the error:
"The repository 'https://public.dhe.ibm.com/software/server/POWER/Linux/toolchain/at/ubuntu bionic Release' does not have a Release file."

The fix adds "apt-get -qq install -y ca-certificates"
before any other apt-get commands.

Apparently some root certificate changed, so new CA certs are required to pull the release information. The error message is a bit misleading here.
the Release file is there, apt-get just can't retrieve it due to cert validation failures.

* Update .travis.yml

* Update .travis.yml

* * Upgraded sphinx from 3.1.2 to 4.0.2
* Added documentation about the new RNN layers
* Added api documentation for the new RNN layers
* Improved the api python doc for RNN types to explicitely state the various types supported
* Added instructions on how to run the examples in the examples readme file

* Additional changes to the api reference: added linear_mapped
Added readme to the tests folder

* Additional changes to the documentation. Added a hw-aware training section.

* Additonal fixes and restructuring of the documentation

* Minor changes to address Malte's review.

Co-authored-by: Kaoutar El Maghraoui <kaoutar@macbook-pro.mynetworksettings.com>
Co-authored-by: Malte J. Rasch <17587387+maljoras@users.noreply.github.com>
Co-authored-by: Kaoutar El Maghraoui <kaoutar@kaoutars-mbp.mynetworksettings.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
*Total -- 5,055.66kb -> 4,171.28kb (17.49%)

/docs/img/momentum_sgd.png -- 22.45kb -> 11.36kb (49.41%)
/notebooks/processing-unit-and-conventional-memory.png -- 11.62kb -> 6.34kb (45.44%)
/notebooks/processing-unit-and-computional-memory.png -- 8.70kb -> 4.75kb (45.41%)
/notebooks/pcm-array.png -- 18.13kb -> 11.16kb (38.44%)
/docs/img/analog_dnn_training.png -- 567.95kb -> 388.39kb (31.62%)
/docs/img/reram_measurements.png -- 467.18kb -> 320.84kb (31.32%)
/docs/img/pcm_drift_model.png -- 285.72kb -> 213.15kb (25.4%)
/notebooks/LeNet5_animation.png -- 871.90kb -> 683.96kb (21.56%)
/notebooks/pcm_rpu_unit.png -- 85.72kb -> 68.11kb (20.54%)
/docs/img/analog_non_idealities.png -- 48.28kb -> 38.84kb (19.56%)
/docs/img/analog_mac_time.png -- 61.86kb -> 50.58kb (18.22%)
/docs/img/toolkit_quantization.png -- 76.87kb -> 63.94kb (16.82%)
/docs/img/hwa_added_noise.png -- 205.13kb -> 173.25kb (15.54%)
/docs/img/parallel_update.png -- 73.71kb -> 62.54kb (15.16%)
/docs/img/analog_ai_hw.png -- 147.70kb -> 129.50kb (12.33%)
/notebooks/analog_Dnn.png -- 1,327.37kb -> 1,212.19kb (8.68%)
/docs/img/capacitorCell.png -- 83.86kb -> 77.43kb (7.66%)
/notebooks/LeNet.png -- 32.18kb -> 29.79kb (7.42%)
/docs/img/mixedprecision_sgd.png -- 111.55kb -> 103.53kb (7.19%)
/docs/img/ReRAM-ES.png -- 39.21kb -> 36.51kb (6.9%)
/docs/img/ReRAM-SB.png -- 39.15kb -> 36.56kb (6.62%)
/docs/img/tikitaka.png -- 166.09kb -> 155.85kb (6.17%)
/docs/img/EcRam.png -- 28.25kb -> 26.71kb (5.44%)
/docs/img/pulse_trains.png -- 275.10kb -> 266.01kb (3.3%)

Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com>

Co-authored-by: ImgBotApp <ImgBotHelp@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Co-authored-by: Malte Rasch <malte.rasch@ibm.com>
Co-authored-by: Diego M. Rodríguez <diego.plan9@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Co-authored-by: Kaoutar El Maghraoui <kaoutar@kaoutars-mbp.mynetworksettings.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Fixes a small spelling mistake.

Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Updated the changelog

* Changed the version of aihwkit

Co-authored-by: Kaoutar El Maghraoui <kaoutar@kaoutars-mbp.mynetworksettings.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
…g and in-analog training to the non-idealities of Analog Crossbars (IBM#380)

Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
…hod (IBM#389)

Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
@HCY-11 HCY-11 force-pushed the bert_example_hy branch 3 times, most recently from 3b1cee8 to 419aef4 Compare December 7, 2022 02:00
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Copy link
Contributor Author

@HCY-11 HCY-11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build is failing at the moment due to dependency issues (transformers, evaluate, datasets aren't in requirements.txt). How should these dependencies be dealt with, since it doesn't seem appropriate to have these in the requirements.txt for just one example?

@kaoutar55
Copy link
Collaborator

Hi @HCY-11 can you add the missing libraries to requirements file to fix the dependencies issues in the build:
Examples/24_bert_on_squad.py:26:0: E0401: Unable to import 'transformers.integrations' (import-error)
755examples/24_bert_on_squad.py:28:0: E0401: Unable to import 'transformers' (import-error)
756examples/24_bert_on_squad.py:38:0: E0401: Unable to import 'evaluate' (import-error)
757examples/24_bert_on_squad.py:39:0: E0401: Unable to import 'datasets' (import-error)
758examples/24_bert_on_squad.py:79:4: E0401: Unable to import 'wandb' (import-error)

Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
kaoutar55
kaoutar55 previously approved these changes Dec 7, 2022
Copy link
Collaborator

@maljoras maljoras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks great work!
Please remove the extra dependency and Aimos cluster information though. These do not belong into the public repo.

type=float)
PARSER.add_argument("-r", "--run_name",
help="Tensorboard run name",
defualt=datetime.now().strftime("%Y%m%d-%H%M%S"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be "default" ?

DOC_STRIDE = 128


def create_ideal_rpu_config(g_max=160, tile_size=256, w_noise=0.0, out_noise=0.0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the naming is confusing. Adding noise would make it not "ideal" anymore

examples/24_bert_on_squad.py Show resolved Hide resolved

def create_optimizer(model):
"""Create the analog-aware optimizer"""
optimizer = AnalogSGD(model.parameters(), lr=2e-4)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[OPTIONAL] adding the LR as an input parameter might be helpful for HWA training

examples/24_bert_on_squad.py Show resolved Hide resolved
examples/README.md Outdated Show resolved Hide resolved

source ~/.bashrc
conda activate <env-name>
srun --mem=32G python ~/barn/aihwkit/examples/24_bert_on_squad.py --noise 0.1 &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove all the aimos explanation from this public example

examples/README.md Outdated Show resolved Hide resolved
metrics evaluated using the model at various times after training completed.

Commandline arguments can be used to control certain options
```python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make an real use case example here rather than pasting the parser

requirements.txt Outdated
@@ -9,3 +9,7 @@ scipy
requests>=2.25,<3
numpy>=1.19
protobuf>=3.15.0,<4
transformers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove all these additional from requirements.txt! We cannot make this a global dependency. Use the extra dependency in setup.py for that

Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Copy link
Collaborator

@maljoras maljoras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HCY-11 many thanks looks almost ready!

only these changes:

  • restore the old travis, we do not test the examples in travis so no need to install [bert]
  • it seems that 'tensorboard' is needed as well, please add it in setup.py

setup.py Show resolved Hide resolved
.travis.yml Outdated
@@ -39,6 +39,7 @@ job_compile_common: &job_compile_common
- $PYTHON_EXECUTABLE -m pip install -r requirements.txt
# Install the package in editable mode.
- VERBOSE=1 $PYTHON_EXECUTABLE -m pip install -v -e ".[visualization]"
- VERBOSE=1 $PYTHON_EXECUTABLE -m pip install -v ".[bert]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this from travis again? we do not test the examples in travis. Thanks

.travis.yml Outdated
@@ -100,6 +101,7 @@ jobs:
- $PYTHON_EXECUTABLE -m pip install -r requirements.txt
# Install the package in editable mode.
- VERBOSE=1 $PYTHON_EXECUTABLE -m pip install -v -e ".[visualization]"
- VERBOSE=1 $PYTHON_EXECUTABLE -m pip install -v ".[bert]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line again. Just leave travis as was before.

examples/24_bert_on_squad.py Show resolved Hide resolved
maljoras
maljoras previously approved these changes Dec 9, 2022
@maljoras maljoras merged commit e522c10 into IBM:master Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.