-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bert Example #440
Bert Example #440
Conversation
Pull from IBM master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HCY-11 Shaping up greatly! Seems only that the evaluation loop is still missing.
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Update travis test stages * Update travis cibuildwheel stages * Update version specifiers * Fix numpy typing issues * Update CHANGELOG * Fix typo from my way future self Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Added multiple RNN cell/layer functionality * Finished multiple rnn functionality * Modified rnn example and finished rnn testing * init * examples cuda fixes * skip large conversion * change log * pycodestyle Co-authored-by: Malte Rasch <malte.rasch@ibm.com> Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Add security policy * Add test policy to contributor guidelines Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Initial commit * Updated SelfDefineDevice * Fixed some files * Reformat files * Fixed visualization bug * Polishing: added fitting to device data * Cuda version * Change log * Test fix Co-authored-by: Malte Rasch <malte.rasch@ibm.com> Co-authored-by: Kaoutar El Maghraoui <kaoutar.elmaghraoui@gmail.com> Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Added a fix in .travis.yml file to the last build error that is caused by the error: "The repository 'https://public.dhe.ibm.com/software/server/POWER/Linux/toolchain/at/ubuntu bionic Release' does not have a Release file." The fix adds "apt-get -qq install -y ca-certificates" before any other apt-get commands. Apparently some root certificate changed, so new CA certs are required to pull the release information. The error message is a bit misleading here. the Release file is there, apt-get just can't retrieve it due to cert validation failures. * Update .travis.yml * Update .travis.yml * * Upgraded sphinx from 3.1.2 to 4.0.2 * Added documentation about the new RNN layers * Added api documentation for the new RNN layers * Improved the api python doc for RNN types to explicitely state the various types supported * Added instructions on how to run the examples in the examples readme file * Additional changes to the api reference: added linear_mapped Added readme to the tests folder * Additional changes to the documentation. Added a hw-aware training section. * Additonal fixes and restructuring of the documentation * Minor changes to address Malte's review. Co-authored-by: Kaoutar El Maghraoui <kaoutar@macbook-pro.mynetworksettings.com> Co-authored-by: Malte J. Rasch <17587387+maljoras@users.noreply.github.com> Co-authored-by: Kaoutar El Maghraoui <kaoutar@kaoutars-mbp.mynetworksettings.com> Signed-off-by: Henry Ye <yehenry11@gmail.com>
*Total -- 5,055.66kb -> 4,171.28kb (17.49%) /docs/img/momentum_sgd.png -- 22.45kb -> 11.36kb (49.41%) /notebooks/processing-unit-and-conventional-memory.png -- 11.62kb -> 6.34kb (45.44%) /notebooks/processing-unit-and-computional-memory.png -- 8.70kb -> 4.75kb (45.41%) /notebooks/pcm-array.png -- 18.13kb -> 11.16kb (38.44%) /docs/img/analog_dnn_training.png -- 567.95kb -> 388.39kb (31.62%) /docs/img/reram_measurements.png -- 467.18kb -> 320.84kb (31.32%) /docs/img/pcm_drift_model.png -- 285.72kb -> 213.15kb (25.4%) /notebooks/LeNet5_animation.png -- 871.90kb -> 683.96kb (21.56%) /notebooks/pcm_rpu_unit.png -- 85.72kb -> 68.11kb (20.54%) /docs/img/analog_non_idealities.png -- 48.28kb -> 38.84kb (19.56%) /docs/img/analog_mac_time.png -- 61.86kb -> 50.58kb (18.22%) /docs/img/toolkit_quantization.png -- 76.87kb -> 63.94kb (16.82%) /docs/img/hwa_added_noise.png -- 205.13kb -> 173.25kb (15.54%) /docs/img/parallel_update.png -- 73.71kb -> 62.54kb (15.16%) /docs/img/analog_ai_hw.png -- 147.70kb -> 129.50kb (12.33%) /notebooks/analog_Dnn.png -- 1,327.37kb -> 1,212.19kb (8.68%) /docs/img/capacitorCell.png -- 83.86kb -> 77.43kb (7.66%) /notebooks/LeNet.png -- 32.18kb -> 29.79kb (7.42%) /docs/img/mixedprecision_sgd.png -- 111.55kb -> 103.53kb (7.19%) /docs/img/ReRAM-ES.png -- 39.21kb -> 36.51kb (6.9%) /docs/img/ReRAM-SB.png -- 39.15kb -> 36.56kb (6.62%) /docs/img/tikitaka.png -- 166.09kb -> 155.85kb (6.17%) /docs/img/EcRam.png -- 28.25kb -> 26.71kb (5.44%) /docs/img/pulse_trains.png -- 275.10kb -> 266.01kb (3.3%) Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com> Co-authored-by: ImgBotApp <ImgBotHelp@gmail.com> Signed-off-by: Henry Ye <yehenry11@gmail.com>
Co-authored-by: Kaoutar El Maghraoui <kaoutar@kaoutars-mbp.mynetworksettings.com> Signed-off-by: Henry Ye <yehenry11@gmail.com>
Fixes a small spelling mistake. Signed-off-by: Henry Ye <yehenry11@gmail.com>
* Updated the changelog * Changed the version of aihwkit Co-authored-by: Kaoutar El Maghraoui <kaoutar@kaoutars-mbp.mynetworksettings.com> Signed-off-by: Henry Ye <yehenry11@gmail.com>
…g and in-analog training to the non-idealities of Analog Crossbars (IBM#380) Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
…hod (IBM#389) Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
…M#393) Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
3b1cee8
to
419aef4
Compare
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build is failing at the moment due to dependency issues (transformers, evaluate, datasets aren't in requirements.txt). How should these dependencies be dealt with, since it doesn't seem appropriate to have these in the requirements.txt for just one example?
Hi @HCY-11 can you add the missing libraries to requirements file to fix the dependencies issues in the build: |
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks great work!
Please remove the extra dependency and Aimos cluster information though. These do not belong into the public repo.
examples/24_bert_on_squad.py
Outdated
type=float) | ||
PARSER.add_argument("-r", "--run_name", | ||
help="Tensorboard run name", | ||
defualt=datetime.now().strftime("%Y%m%d-%H%M%S"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be "default" ?
examples/24_bert_on_squad.py
Outdated
DOC_STRIDE = 128 | ||
|
||
|
||
def create_ideal_rpu_config(g_max=160, tile_size=256, w_noise=0.0, out_noise=0.0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the naming is confusing. Adding noise would make it not "ideal" anymore
examples/24_bert_on_squad.py
Outdated
|
||
def create_optimizer(model): | ||
"""Create the analog-aware optimizer""" | ||
optimizer = AnalogSGD(model.parameters(), lr=2e-4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[OPTIONAL] adding the LR as an input parameter might be helpful for HWA training
examples/README.md
Outdated
|
||
source ~/.bashrc | ||
conda activate <env-name> | ||
srun --mem=32G python ~/barn/aihwkit/examples/24_bert_on_squad.py --noise 0.1 & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove all the aimos explanation from this public example
examples/README.md
Outdated
metrics evaluated using the model at various times after training completed. | ||
|
||
Commandline arguments can be used to control certain options | ||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make an real use case example here rather than pasting the parser
requirements.txt
Outdated
@@ -9,3 +9,7 @@ scipy | |||
requests>=2.25,<3 | |||
numpy>=1.19 | |||
protobuf>=3.15.0,<4 | |||
transformers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove all these additional from requirements.txt! We cannot make this a global dependency. Use the extra dependency in setup.py for that
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
Signed-off-by: Henry Ye <yehenry11@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HCY-11 many thanks looks almost ready!
only these changes:
- restore the old travis, we do not test the examples in travis so no need to install [bert]
- it seems that 'tensorboard' is needed as well, please add it in setup.py
.travis.yml
Outdated
@@ -39,6 +39,7 @@ job_compile_common: &job_compile_common | |||
- $PYTHON_EXECUTABLE -m pip install -r requirements.txt | |||
# Install the package in editable mode. | |||
- VERBOSE=1 $PYTHON_EXECUTABLE -m pip install -v -e ".[visualization]" | |||
- VERBOSE=1 $PYTHON_EXECUTABLE -m pip install -v ".[bert]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove this from travis again? we do not test the examples in travis. Thanks
.travis.yml
Outdated
@@ -100,6 +101,7 @@ jobs: | |||
- $PYTHON_EXECUTABLE -m pip install -r requirements.txt | |||
# Install the package in editable mode. | |||
- VERBOSE=1 $PYTHON_EXECUTABLE -m pip install -v -e ".[visualization]" | |||
- VERBOSE=1 $PYTHON_EXECUTABLE -m pip install -v ".[bert]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line again. Just leave travis as was before.
Example using convert_to_analog to run BERT Transformer on SQuAD task.