Skip to content
This repository has been archived by the owner on Sep 28, 2022. It is now read-only.

Update local pytorch-lightning master #1

Merged
merged 2,310 commits into from
Mar 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
2310 commits
Select commit Hold shift + click to select a range
ad36c7b
Add hint in docs for how to use shared memory (#6036)
awaelchli Feb 17, 2021
68fd308
Prevent flickering progress bar (#6009)
SkafteNicki Feb 17, 2021
15d6788
Fix Wrapping optimizers upon assignment (#6006)
justusschock Feb 17, 2021
a121fd3
[Bugfix] Apply untoggle_optimizer when result is None (#5983)
tchaton Feb 17, 2021
6a409c7
remove outdated info (#6032)
awaelchli Feb 17, 2021
7189d67
DeepSpeed Integration (#5954)
SeanNaren Feb 17, 2021
b7c2e0a
Trainer only references accelerator (#6039)
SeanNaren Feb 17, 2021
8d7ac8f
Address code review for deepspeed (#6042)
SeanNaren Feb 17, 2021
c9622ba
[feat] Add Trainer(stochastic_weight_avg=True/False) (#6038)
tchaton Feb 17, 2021
8440595
[CI] Move DeepSpeed into CUDA image, remove DeepSpeed install from az…
SeanNaren Feb 17, 2021
bac617f
drop deprecated result object 1/n (#5005)
Borda Feb 17, 2021
d2cd7cb
Add option for weight tying on TPU's (#5441)
lezwon Feb 18, 2021
bfcfac4
Delete tests.helpers.TrialMNISTDataModule (#5999)
carmocca Feb 18, 2021
77f6aa4
Fix: Allow hashing of metrics with lists in their state (#5939)
peblair Feb 18, 2021
6de8dca
et al. (#6050)
pl-ghost Feb 18, 2021
38ad9e0
[ModelPruning] Add missing attribute with use_global_unstructured=Fal…
carmocca Feb 18, 2021
049006a
fix/test quant (#6040)
Borda Feb 18, 2021
b019c25
Add descriptions to accelerator broadcast function/clean up all_gathe…
SeanNaren Feb 18, 2021
bcc0004
Add before_batch_transfer and after_batch_transfer hooks (#3671)
rohitgr7 Feb 18, 2021
ffdcb62
Make parallel devices optional across all plugins (#6051)
SeanNaren Feb 18, 2021
115e58a
clarify gpu / process (#6049)
awaelchli Feb 18, 2021
f48a933
Fix docs typo (#6055)
ieshreya Feb 18, 2021
3449e2d
Docs for Pruning, Quantization, and SWA (#6041)
edenlightning Feb 18, 2021
02ac4b0
Replace .get_model() with explicit .lightning_module (#6035)
awaelchli Feb 18, 2021
6cc1a06
rename accelerator_backend -> accelerator (#6034)
awaelchli Feb 18, 2021
fc9bb53
fix flake8 for new plugins (#5951)
awaelchli Feb 18, 2021
d3a31bc
fix docs links (#6057)
Borda Feb 18, 2021
2cf39dc
Add warnings to on_before/after_batch_transfer hooks (#6059)
SeanNaren Feb 18, 2021
c46c23a
v1.2.0rc2 (#6063)
Borda Feb 18, 2021
b0074a4
Update auto-opt docs (#6037)
rohitgr7 Feb 18, 2021
8f82823
Raise AttributeError in lightning_getattr and lightning_setattr when …
akihironitta Feb 18, 2021
5d6a091
default sched (#6062)
rohitgr7 Feb 18, 2021
4574023
v1.2.0 (#6065)
Borda Feb 18, 2021
e12c8a7
add Azure tags trigger (#6066)
Borda Feb 18, 2021
3645eb1
pypi azure badges - tags (#6068)
Borda Feb 18, 2021
0b27147
continue towards 1.3 (#6069)
Borda Feb 19, 2021
4b7c0fa
Fix amp autocast (#6080)
awaelchli Feb 19, 2021
f2660ac
add sanity check on nb available GPUs (#6092)
Borda Feb 19, 2021
3bdc067
consistent behavior for reduce method across all Plugins (#6011)
awaelchli Feb 20, 2021
97a81c3
[Hot Fix] Give priority to plugins to set distributed mode, and then …
SeanNaren Feb 20, 2021
3b0e4e0
Enable ZeRO tests for CI, fix to/half function calls (#6070)
SeanNaren Feb 21, 2021
432e563
Expose DeepSpeed FP16 parameters due to loss instability (#6115)
SeanNaren Feb 21, 2021
97b4b3e
Collapse 2 DeepSpeed tests (#6108)
carmocca Feb 21, 2021
ae6ce17
fix amp/apex misconfiguration error for cpu (#6107)
awaelchli Feb 22, 2021
9b99328
Update Contributing Guide (#6118)
kaushikb11 Feb 22, 2021
1d28d11
Minor fixes/improvements in Metric docs (#6114)
akihironitta Feb 22, 2021
57215b7
Avoid printing ModelCheckpoint log with monitor=None and verbose=True…
carmocca Feb 22, 2021
423ecf9
Feature/5275 clean progress bar print (#5470)
asnorkin Feb 22, 2021
0456b45
mini refactor for _running_stage access (#5724)
awaelchli Feb 22, 2021
863a70c
Add specifics around DeepSpeed docs (#6142)
SeanNaren Feb 22, 2021
ebabe56
Ensure accelerator is valid if running interactively (#5970)
ifsheldon Feb 23, 2021
1c851b8
fixing miss-leading tested acc values (#5876)
Borda Feb 23, 2021
45158aa
Update CHANGELOG (#6156)
carmocca Feb 23, 2021
09baf29
prune deprecated profiler as bool (#6164)
Borda Feb 24, 2021
1d9c553
prune deprecated Trainer arg `enable_pl_optimizer` (#6163)
Borda Feb 24, 2021
a731269
Prune deprecated metrics for 1.3 (#6161)
Borda Feb 24, 2021
1b498d1
[Bugfix] Fixed epoch level schedulers not being called when val_check…
SkafteNicki Feb 24, 2021
46617d9
Prune deprecated checkpoint arguments (#6162)
Borda Feb 24, 2021
8b47527
Prune deprecated EarlyStopping(mode='auto') (#6167)
carmocca Feb 24, 2021
5cf892b
Fix typo (#6178)
akihironitta Feb 24, 2021
c33fd52
Update issue template to use discussions for questions (#6155)
edenlightning Feb 24, 2021
c7130b7
Update with GitHub Discussions (#6186)
rohitgr7 Feb 24, 2021
b0d1996
Update gpu warning (#6181)
edenlightning Feb 24, 2021
3ed8ef8
type accelerators (#6148)
justusschock Feb 25, 2021
dd2f5a0
Fix for multiple callbacks (#6197)
SeanNaren Feb 25, 2021
3df02b8
Add checkpoint parameter to on_save_checkpoint (#6072)
carmocca Feb 25, 2021
4d96f19
Document exceptions in loggers (#6171)
AlKun25 Feb 25, 2021
ddf55a2
Prune deprecated Trainer(checkpoint_callback=ModelCheckpoint()) (#6166)
carmocca Feb 25, 2021
e7298b5
fix parallel devices return type & add copyright (#6215)
kaushikb11 Feb 26, 2021
0647340
Add mypy typing to precision plugins. (#6149)
justusschock Feb 26, 2021
ee5032a
apply_func.py: from torchtext.legacy.data import Batch (#6211)
dbonner Feb 26, 2021
40d5a9d
fix(wandb): prevent WandbLogger from dropping values (#5931)
borisdayma Feb 27, 2021
111d9c7
Prune deprecated hparams setter (#6207)
carmocca Feb 27, 2021
15c477e
document exceptions for metrics/regression (#6202)
prajakta0111 Feb 28, 2021
58a6d59
simplify skip-if tests >> 0/n (#5920)
Borda Mar 1, 2021
ce05687
update (#6237)
awaelchli Mar 1, 2021
8aba885
Document Exceptions in profilers (#6229)
AlKun25 Mar 1, 2021
925f082
Call `optimizer.zero_grad()` before backward inside closure in AutoOp…
akihironitta Mar 1, 2021
651c25f
Fix for incorrect usage of detach(), cpu(), to() (#6216)
dvolgyes Mar 1, 2021
352e8f0
add skipif warpper (#6258)
Borda Mar 1, 2021
ed67490
cleaning SWA (#6259)
Borda Mar 1, 2021
412a7d8
Remove opt from manual_backward in docs (#6267)
akihironitta Mar 1, 2021
6788dba
switch agents pool (#6270)
Borda Mar 1, 2021
3371d32
docstring changes in tuner (#6264)
AlKun25 Mar 2, 2021
efda48f
Disable CPU Offload as default for DeepSpeed (#6262)
SeanNaren Mar 2, 2021
dc8647e
split profilers (#6261)
Borda Mar 2, 2021
eb81500
Refactor: skipif for multi - gpus 1/n (#6266)
Borda Mar 2, 2021
22985d2
Improved EarlyStopping.patience documentation (#6278)
turian Mar 2, 2021
0f9134e
Refactor: skipif for Windows 2/n (#6268)
Borda Mar 2, 2021
bc577ca
fix duplicate console logging bug v2 (#6275)
awaelchli Mar 2, 2021
b46d221
Refactor: skipif for AMPs 3/n (#6293)
Borda Mar 2, 2021
8001987
[fix] Ensure we check deepspeed/sharded in multinode DDP (#6297)
SeanNaren Mar 2, 2021
38274b9
unfreeze torchtext version (#6302)
Borda Mar 2, 2021
24c3a3f
Add possibility for custom naming when using multiple dataloaders (#6…
SkafteNicki Mar 2, 2021
7e8f4b9
try to fix imports for parsing (#6256)
Borda Mar 2, 2021
ac58378
Refactor: Runif for TPU and Horovod 5/n (#6301)
Borda Mar 2, 2021
d1a0315
Refactor: runif for spec 6/6 (#6307)
Borda Mar 2, 2021
4157b35
Add fairscale & deepspeed to skipif 4/n (#6281)
kaushikb11 Mar 2, 2021
1aac481
[bugfix] TPU test hangs to barrier on 1 process (#6272)
tchaton Mar 2, 2021
bf6ba83
prune duplicite test in optim (#6312)
Borda Mar 3, 2021
dcec4ef
Simplify test for AMP plugins (#6311)
Borda Mar 3, 2021
4a8422c
Fix ModelPruning(make_pruning_permanent=True) buffers getting removed…
carmocca Mar 3, 2021
484dce1
[bugfix] TPU + all_gather + SingleTPU shouldn't call xm.all_gather (#…
tchaton Mar 3, 2021
6166f46
drop unused variable in API (#6308)
Borda Mar 4, 2021
e038e74
hotfix for PT1.6 and torchtext (#6323)
Borda Mar 4, 2021
d01e8fd
[fix] Use training type plugin hook when saving (FSDP 1/n) (#6321)
SeanNaren Mar 4, 2021
577323c
leaving lezwon (#6347)
lezwon Mar 4, 2021
48a10f1
Add `tests/utilities/test_parsing.py` (#4460)
akihironitta Mar 4, 2021
59acf57
Add ignore param to save_hyperparameters (#6056)
kaushikb11 Mar 4, 2021
5d7388d
Fix when _stable_1d_sort to work when n >= N (#6177)
frankier Mar 4, 2021
4f90455
Update docs on arg train_dataloader in fit (#6076)
SkafteNicki Mar 4, 2021
b9cf122
missing tests default_root_dir=tmpdir (#6314)
Borda Mar 4, 2021
8e3524d
Document exception for metrics/classification (#6190)
dipam7 Mar 4, 2021
39231ae
[Fix] Call clip gradients if clip val greater than 0 (#6330)
SeanNaren Mar 4, 2021
7acbd65
[bugfix] Check LightningOptimizer doesn't delete optimizer hooks (#6305)
tchaton Mar 4, 2021
49c579f
docstring changes in accelerators (#6327)
AlKun25 Mar 4, 2021
248a8e8
[bugfix] Perform reduction for dict in training_step and DP (#6324)
tchaton Mar 4, 2021
ec8d46e
introduce default cluster environment for lightning-specific ddp (#5915)
awaelchli Mar 5, 2021
46540ee
[bugfix] Resolve memory leak for evaluation (#6326)
tchaton Mar 5, 2021
b6aa350
Update changelog for v1.2.2 (#6325)
kaushikb11 Mar 5, 2021
e848542
CI: fix examples - patch download MNIST (#6357)
Borda Mar 5, 2021
2ec67a4
[bug] Fix Pytorch profiler with emit_nvtx (#6260)
tchaton Mar 5, 2021
2a3ab67
fix importing torchtext batch (#6365)
Borda Mar 5, 2021
4f391bc
give a more complete GAN example (#6294)
tchaton Mar 5, 2021
d0596fa
Refactor RunningStage usage in advance of implementing Trainer.valida…
EliaCereda Mar 6, 2021
85c8074
require: adjust versions (#6363)
Borda Mar 6, 2021
217470b
Use f-"""-string in a Trainer comment (#6377)
carmocca Mar 6, 2021
facfda8
Remove no return warning from val/test step (#6139)
rohitgr7 Mar 6, 2021
34b733b
Fix manual optimization in pl_example (#6373)
akihironitta Mar 6, 2021
966184a
Update Sharded test with RunIf (#6384)
kaushikb11 Mar 6, 2021
38a5fe7
Remove optimizer_idx arg in manual optimization (#6093)
rohitgr7 Mar 7, 2021
2708c39
[doc] Improve Multiple Val/Test Dataloaders with simultaneous batches…
mees Mar 7, 2021
c7f30a2
[doc] Fix closure in manual optimization (#6374)
akihironitta Mar 7, 2021
826375e
Fix ModelCheckpoint(monitor=None, save_last=True) not saving checkpoi…
carmocca Mar 7, 2021
ff16104
Update TBLogger docs (#6315)
s-rog Mar 8, 2021
718074b
Fix trainer not resetting lightning_optimizers (#6372)
awaelchli Mar 8, 2021
0ec7a23
update python version (#6399)
awaelchli Mar 8, 2021
a6c98c4
Fix AttributeError: 'NoneType' object has no attribute 'finalize' on…
chizuchizu Mar 8, 2021
8dabc30
Run CI (#6402)
carmocca Mar 8, 2021
efd272a
Pass {fit,validate,test,predict} to setup() and teardown() (#6386)
carmocca Mar 8, 2021
e1f5eac
fix dp reduction test (#6404)
awaelchli Mar 8, 2021
9eded7f
Add check for verbose attribute of ModelCheckpoint (#6419)
ashleve Mar 8, 2021
523c59b
fixed bug where tuner would not tune lr if also tuning batch_size (#4…
Palzer Mar 9, 2021
75c6486
update (#6403)
awaelchli Mar 9, 2021
fc6d402
fix logger creating directory structure too early in DDP (#6380)
awaelchli Mar 9, 2021
55dd3a4
Typing for tests 1/n (#6313)
Borda Mar 9, 2021
30d649b
[changelog] Update Changelog on release v1.2.3 (#6444)
tchaton Mar 9, 2021
615b2f7
Improve DummyLogger (#6398)
awaelchli Mar 9, 2021
74d79e7
Raise an exception if check_val_every_n_epoch is not an integer (#6411)
kaushikb11 Mar 10, 2021
c81b2a8
Set find unused parameters to True by default to fix breaking compati…
SeanNaren Mar 10, 2021
7d4e74c
[bug] All_gather support tensor on cpu (#6416)
tchaton Mar 10, 2021
1c013b4
[Fix] Ensure we set the default device before initializing deepspeed …
SeanNaren Mar 10, 2021
d1db604
Remove redundant test (#6466)
carmocca Mar 10, 2021
f4cc745
Add Trainer.validate(…) method to run one validation epoch (#4948)
EliaCereda Mar 11, 2021
2ecda5d
Allow user to disable the automatic formatting of checkpoint file nam…
maxfrei750 Mar 11, 2021
079fe9b
Hotfix for torchvision (#6476)
kaushikb11 Mar 11, 2021
afe0ede
cover subproc coverage (#6477)
Borda Mar 11, 2021
e886d55
argparse: Add use_argument_group=True (#6088)
EricCousineau-TRI Mar 11, 2021
c53edce
Disable batch transfer in DP mode (#6098)
rohitgr7 Mar 11, 2021
62d4304
remove obsolete todo in pl_examples (#6475)
awaelchli Mar 11, 2021
cea170e
[feat] Support iteration-based checkpointing in model checkpoint call…
ananthsub Mar 11, 2021
6596447
update xla version (#6464)
awaelchli Mar 12, 2021
518c7e4
Remove unused mixin attributes (#6487)
carmocca Mar 12, 2021
680e83a
[doc] Update the order of zero_grad and backward (#6478)
akihironitta Mar 12, 2021
b2bcad1
Fix tuner.scale_batch_size not finding batch size attribute when usin…
awaelchli Mar 14, 2021
dcd9dd8
Update docs for limit_predict_batches (#6507)
rohitgr7 Mar 14, 2021
0544efd
[bug] Update broadcast + reduce decision ModelCheckpoint] (#6410)
tchaton Mar 14, 2021
02fa32b
Handle torch.jit scripted modules in layer summary (#6511)
awaelchli Mar 15, 2021
156847b
CI: resume testing with py3.8 (#6516)
Borda Mar 15, 2021
06756a8
document exceptions for metrics/functional (#6273)
dipam7 Mar 15, 2021
5d73fbb
Mean Average Precision metric for Information Retrieval (1/5) (#5032)
lucadiliello Mar 15, 2021
eb3ff41
CI: Azure publish results (#6514)
Borda Mar 15, 2021
b341b53
deprecate metrics pkg (#6505)
Borda Mar 15, 2021
c48fc6a
[test] lr_find with bs_scale (#6422)
s-rog Mar 15, 2021
383565d
Update DeepSpeed docs (#6528)
SeanNaren Mar 15, 2021
ea36ee3
fix attribute access in LightningModule.toggle_optimizer (#6513)
awaelchli Mar 15, 2021
9c59733
Update hook lifecycle (#6538)
carmocca Mar 15, 2021
6453091
Prune metrics base classes 2/n (#6530)
Borda Mar 15, 2021
6a14146
Custom Plugin is_distributed (#6537)
amogkam Mar 15, 2021
0f07eaf
refactor reading env defaults (#6510)
Borda Mar 16, 2021
a312219
Prune metric: helpers and inputs 3/n (#6547)
Borda Mar 16, 2021
555a6fe
prune warning & deprecation wrapper (#6540)
Borda Mar 16, 2021
b190403
Add outputs param for `on_val/test_epoch_end` hooks (#6120)
kaushikb11 Mar 16, 2021
00cd918
[doc] Add Zero Grad `set_to_none=True` trick (#6548)
tchaton Mar 16, 2021
297e438
fix deprecation wrapper & tests (#6553)
Borda Mar 17, 2021
2f6ce1a
prune metric: accuracy 4/n (#6515)
Borda Mar 17, 2021
9e35f97
Prune metrics: AUC & AUROC (#6572)
Borda Mar 18, 2021
8853a36
[doc] Update Dict Train Loader doc. (#6579)
tchaton Mar 18, 2021
38a2119
Prune metrics: precision & recall 6/n (#6573)
Borda Mar 18, 2021
b606171
Update Changelog for v1.2.4 (#6581)
kaushikb11 Mar 18, 2021
4e9b453
[Fix] Move init dist connection into the setup function (#6506)
SeanNaren Mar 18, 2021
983a888
Fix all_gather for tpu_cores=8 (#6587)
ethanwharris Mar 19, 2021
87c03b1
Update Gradient Clipping for TPU Accelerator (#6576)
kaushikb11 Mar 19, 2021
5780796
NGC container PoC (#6187)
Borda Mar 19, 2021
3b72bcc
Automatically set sync_batchnorm for training_type_plugin (#6536)
amogkam Mar 19, 2021
3a56a60
Prune metrics: other classification 7/n (#6584)
Borda Mar 19, 2021
cb59039
fixing examples (#6600)
Borda Mar 20, 2021
634d831
Add AMP for validation, prediction and testing (#6565)
justusschock Mar 20, 2021
37f22c9
Add trainer.predict config validation (#6543)
kaushikb11 Mar 21, 2021
42a7b70
Add DDP Spawn being default for Multi GPUs (#6292)
kaushikb11 Mar 21, 2021
51c9260
Move profiler tests (#6619)
carmocca Mar 21, 2021
870247f
drop mypy from .pre-commit-config.yaml (#6542)
carmocca Mar 22, 2021
853523e
Clean utilities/argparse and add missing tests (#6607)
ethanwharris Mar 22, 2021
58c9fa7
Allow training type plugin to delay optimizer creation (FSDP 2/n) (#6…
SeanNaren Mar 22, 2021
e2e1de0
Add teardown method to BaseProfiler. (#6370)
camruta Mar 22, 2021
1fae10a
refactoring setup (#6590)
Borda Mar 22, 2021
e62c7c7
hotfix: mock examples (#6632)
Borda Mar 22, 2021
2064ece
[refactor] Add setup to profilers + _run_stage_setup to trainer 2/5 (…
tchaton Mar 22, 2021
8cd75a4
fix comparing versions (#6434)
Borda Mar 23, 2021
efce2b7
Prune metrics: regression 8/n (#6636)
Borda Mar 23, 2021
f93414d
Prune metyrics: regression 9/n (#6637)
Borda Mar 23, 2021
36d180e
Refactor base profilers 3/5 (#6621)
carmocca Mar 23, 2021
a74909a
prune metrics: info retrieval (#6649)
Borda Mar 23, 2021
0995d30
Flash predict step (#6577)
tchaton Mar 23, 2021
3cf0c31
fix back-compatibility for Accel (#6655)
Borda Mar 23, 2021
51b10f7
Refactor PyTorch profiler 4/5 (#6349)
carmocca Mar 23, 2021
fd5cb7f
Add PyTorch 1.8 Profiler 5/5 (#6618)
tchaton Mar 23, 2021
64d0fa4
update coverage config (#6524)
Borda Mar 23, 2021
741c452
Fix disabled grads after call to predict (#6657)
ethanwharris Mar 23, 2021
b1e3dcc
Use `pl.LightningModule` in new-project docs (#6656)
Rubiel1 Mar 23, 2021
70beddf
Prune metrics: others 11/DoNe (#6659)
Borda Mar 24, 2021
cbca6cd
fix: update example autoencoder.py to reflect args (#6638)
bmahlbrand Mar 24, 2021
5733889
Docs/robots (#6658)
Borda Mar 24, 2021
d02fe34
Feature/double precision (#6595)
ethanwharris Mar 24, 2021
ac60536
Follow E231 [flake8] (#6110)
akihironitta Mar 24, 2021
ab4c838
Remove ModelSummary validation from train loop on_trainer_init (#6610)
ananthsub Mar 24, 2021
d471fa3
add copyr (#6661)
Borda Mar 24, 2021
2dd6f9e
`MetricsHolder` clean-up + typing (#6645)
carmocca Mar 24, 2021
b8ef52b
Match the number of outputs of backward with forward for AllGatherGra…
ArvinZhuang Mar 25, 2021
2cbdc01
Fix checkpoint callback & Trainer.test(_) issue for TPUs (#6654)
kaushikb11 Mar 25, 2021
92a1671
Update CODEOWNERS (#6220)
Borda Mar 25, 2021
40976e4
Support teardown hook on DataModule (#4673)
ananthsub Mar 25, 2021
9be092d
Add on_epoch_start to run at the beginning of every loop irrespective…
rohitgr7 Mar 25, 2021
217c12a
Simplify deprecations (#6620)
Borda Mar 25, 2021
0ea8f39
Resolve schedule step bug for PyTorch Profiler (#6674)
tchaton Mar 25, 2021
6b990f3
Add artifcact_location arg to MLFlow logger (#6677)
ethanwharris Mar 25, 2021
bc61361
Do not add return dict items to callback_metrics (#6682)
carmocca Mar 26, 2021
b730a5a
Do not describe when there's no summary (#6681)
carmocca Mar 26, 2021
21fc5eb
Automatically find and run special tests (#6669)
carmocca Mar 26, 2021
0e45220
[warning] Add warning when values are not being reduced (#6417)
tchaton Mar 26, 2021
f0c5479
Remove legacy `Result` parameters (#6016)
carmocca Mar 28, 2021
dcf6e4e
remake nvidia docker (#6686)
Borda Mar 29, 2021
cca0eca
More explicit exception message when testing with fast_dev_run=True (…
ashleve Mar 29, 2021
5b5a5cc
support python 3.9 (#4944)
Borda Mar 29, 2021
3a4c424
[TPU] update is_tpu_exists utils internal logic to rely on xmp.spawn …
tchaton Mar 29, 2021
646cf2f
[refactor] Move save_function to accelerator 1/n [DeepSpeed] (#6689)
tchaton Mar 29, 2021
f79a13e
[Model Parallel] Add configure sharded model hook (#6679)
kaushikb11 Mar 29, 2021
3c86193
update readme by v1.2.x (#6728)
Borda Mar 29, 2021
9044470
Remove logger_connector legacy code (#6733)
carmocca Mar 30, 2021
583fcf2
update chlog v1.2.5 (#6742)
Borda Mar 30, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
260 changes: 101 additions & 159 deletions .circleci/config.yml
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,191 +1,133 @@
# Python CircleCI 2.0 configuration file
#
# Check https://circleci.com/docs/2.0/language-python/ for more details
#
version: 2.0
# Python CircleCI 2.1 configuration file.
version: 2.1
orbs:
gcp-gke: circleci/gcp-gke@1.0.4
go: circleci/go@1.3.0
codecov: codecov/codecov@1.1.0

references:

install_deps: &install_deps
make_docs: &make_docs
run:
name: Install Dependences
name: Make Documentation
command: |
sudo apt-get update && sudo apt-get install -y cmake
pip install "$TORCH_VERSION"
pip install -r requirements.txt -q
sudo pip install pytest pytest-cov pytest-flake8 -q
pip install -r ./tests/requirements-devel.txt -q

tests: &tests
# First run the same pipeline as Read-The-Docs
# apt-get update && apt-get install -y cmake
# using: https://hub.docker.com/r/readthedocs/build
# we need to use py3.7 ot higher becase of an issue with metaclass inheritence
pyenv global 3.7.3
python --version
pip install -r requirements/docs.txt
pip list
cd docs
make clean
make html --jobs 2 SPHINXOPTS="-W"

checkout_ml_testing: &checkout_ml_testing
run:
name: Testing
name: Checkout ml-testing-accelerators
command: |
python --version ; pip --version ; pip list
py.test pytorch_lightning tests -v --doctest-modules --junitxml=test-reports/pytest_junit.xml
no_output_timeout: 30m
git clone https://github.com/GoogleCloudPlatform/ml-testing-accelerators.git
cd ml-testing-accelerators
git fetch origin 5e88ac24f631c27045e62f0e8d5dfcf34e425e25:stable
git checkout stable
cd ..

examples: &examples
run:
name: PL Examples
command: |
pip install -r ./pl_examples/requirements.txt --user
python --version ; pip --version ; pip list
py.test pl_examples -v --doctest-modules --junitxml=test-reports/pytest_junit.xml
no_output_timeout: 20m

install_pkg: &install_pkg
build_push_docker: &build_push_docker
run:
name: Install package
name: Build and push Docker image
command: |
virtualenv vEnv ; source vEnv/bin/activate
pip install --editable . ; cd .. & python -c "import pytorch_lightning ; print(pytorch_lightning.__version__)"
deactivate ; rm -rf vEnv

create_pkg: &create_pkg
run:
name: Create package
command: |
sudo pip install twine==1.13.0
python setup.py sdist
twine check dist/*
python setup.py clean

format: &format
gcloud --quiet auth configure-docker
#cd dockers/tpu-tests
export PYTHON_VER=$(python -c "import random ; print('3.6' if random.random() > 0.5 else '3.7')" 2>&1)
echo $PYTHON_VER
docker build --tag "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" -f ./dockers/tpu-tests/Dockerfile --build-arg "PYTHON_VERSION=$PYTHON_VER" --build-arg "PYTORCH_VERSION=$XLA_VER" .
docker push "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID"

deploy_cluster: &deploy_cluster
run:
name: Formatting
name: Deploy the job on the kubernetes cluster
command: |
python --version ; pip --version
sudo pip install flake8 -q
pip list
flake8 .

make_docs: &make_docs
go get github.com/google/go-jsonnet/cmd/jsonnet
export PATH=$PATH:$HOME/go/bin
python -c "fname = 'dockers/tpu-tests/tpu_test_cases.jsonnet' ; fff = open(fname).read().replace('pytorch-VERSION', 'pytorch-$XLA_VER') ; open(fname, 'w').write(fff)"
job_name=$(jsonnet -J ml-testing-accelerators/ dockers/tpu-tests/tpu_test_cases.jsonnet --ext-str image=$GCR_IMAGE_PATH --ext-str image-tag=$CIRCLE_WORKFLOW_JOB_ID | kubectl create -f -)
job_name=${job_name#job.batch/}
job_name=${job_name% created}
echo "Waiting on kubernetes job: $job_name"
i=0 && \
# N checks spaced 30s apart = 900s total.
status_code=2 && \
# Check on the job periodically. Set the status code depending on what
# happened to the job in Kubernetes. If we try MAX_CHECKS times and
# still the job hasn't finished, give up and return the starting
# non-zero status code.
printf "Waiting for job to finish: " && \
while [ $i -lt $MAX_CHECKS ]; do ((i++)); if kubectl get jobs $job_name -o jsonpath='Failed:{.status.failed}' | grep "Failed:1"; then status_code=1 && break; elif kubectl get jobs $job_name -o jsonpath='Succeeded:{.status.succeeded}' | grep "Succeeded:1" ; then status_code=0 && break; else printf "."; fi; sleep $CHECK_SPEEP; done && \
echo "Done waiting. Job status code: $status_code" && \
pod_name=$(kubectl get po -l controller-uid=`kubectl get job $job_name -o "jsonpath={.metadata.labels.controller-uid}"` | awk 'match($0,!/NAME/) {print $1}') && \
echo "GKE pod name: $pod_name" && \
kubectl logs -f $pod_name --container=train > /tmp/full_output.txt
if grep -q '<?xml version="1.0" ?>' /tmp/full_output.txt ; then csplit /tmp/full_output.txt '/<?xml version="1.0" ?>/'; else mv /tmp/full_output.txt xx00; fi && \
# First portion is the test logs. Print these to Github Action stdout.
cat xx00 && \
echo "Done with log retrieval attempt." && \
gcloud container images delete "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" --force-delete-tags && \
exit $status_code

stats: &stats
run:
name: Make Documentation
name: Statistics
command: |
# sudo apt-get install pandoc
sudo apt-get update && sudo apt-get install -y cmake
pip install -r requirements.txt --user
sudo pip install -r docs/requirements.txt
pip install -r requirements-extra.txt --user # for doctesting loggers etc.
# sphinx-apidoc -o ./docs/source ./pytorch_lightning **/test_* --force --follow-links
cd docs; make clean; make html --debug --jobs 2 SPHINXOPTS="-W"
make doctest; make coverage
mv ./xx01 coverage.xml
# TODO: add human readable report
cat coverage.xml
sudo pip install pycobertura
pycobertura show coverage.xml

jobs:

Build-Docs:
docker:
- image: circleci/python:3.7
steps:
- checkout
- *make_docs
- store_artifacts:
# allows us to preview the generated html pages
path: docs/build/html/
destination: html

Formatting:
TPU-tests:
docker:
- image: circleci/python:3.7
environment:
- TORCH_VERSION: "torch"
- XLA_VER: 1.7
- MAX_CHECKS: 240
- CHECK_SPEEP: 5
steps:
- checkout
- *format
- go/install
- *checkout_ml_testing
- gcp-gke/install
- gcp-gke/update-kubeconfig-with-credentials:
cluster: $GKE_CLUSTER
perform-login: true
- setup_remote_docker
- *build_push_docker
- *deploy_cluster
- *stats
- codecov/upload:
file: coverage.xml
flags: tpu,pytest
upload_name: TPU-coverage

PyTorch:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch"
steps: &steps
- checkout
#- restore_cache:
# keys:
# # when lock file changes, use increasingly general patterns to restore cache
# - pip-packages--{{ .Environment.CIRCLE_JOB }}
# - pip-packages--
- *install_deps
#- save_cache:
# key: pip-packages--{{ .Environment.CIRCLE_JOB }}
# paths:
# # this path depends on where pipenv creates a virtualenv
# - "~/.cache/pip"
# - "/usr/local/lib/python3.6/site-packages"
# - "/usr/local/lib/site-python"
- *tests
- store_test_results:
path: test-reports
- store_artifacts:
path: test-reports

PyTorch-v1_1:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch>=1.1, <1.2"
steps: *steps

PyTorch-v1_2:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch>=1.2, <1.3"
steps: *steps

PyTorch-v1_3:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch>=1.3, <1.4"
steps: *steps

PyTorch-v1_4:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch>=1.4, <1.5"
steps: *steps

PyTorch-v1_5:
docker:
- image: circleci/python:3.6
environment:
- TORCH_VERSION: "torch>=1.5, <1.6"
steps: *steps
path: coverage.xml

Examples:
build-Docs:
docker:
- image: circleci/python:3.7
environment:
- TORCH_VERSION: "torch"
steps:
- checkout
- *install_deps
- *examples

Install-pkg:
docker:
- image: circleci/python:3.7
- image: readthedocs/build:latest
steps:
- checkout
- *create_pkg
- *install_pkg

#orbs:
# python: circleci/python@0.2.1
- *make_docs
- store_artifacts:
# allows us to preview the generated html pages
path: docs/build/html/
destination: html

workflows:
version: 2
build:
tpu-tests:
jobs:
- Formatting
- Build-Docs
- PyTorch-v1_1
- PyTorch-v1_2
- PyTorch-v1_3
- PyTorch-v1_4
- PyTorch-v1_5
- Install-pkg
- Examples
- build-Docs
- TPU-tests
21 changes: 18 additions & 3 deletions .codecov.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
# Copyright The PyTorch Lightning team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see https://docs.codecov.io/docs/codecov-yaml
# Validation check:
# $ curl --data-binary @.codecov.yml https://codecov.io/validate
Expand All @@ -9,8 +23,10 @@ codecov:
strict_yaml_branch: "yaml-config"
require_ci_to_pass: yes
notify:
# after_n_builds: 2
after_n_builds: 23
wait_for_ci: yes
# https://docs.codecov.io/docs/codecov-yaml#section-expired-reports
max_report_age: off

coverage:
precision: 0 # 2 = xx.xx%, 0 = xx%
Expand Down Expand Up @@ -48,5 +64,4 @@ comment:
layout: header, diff
require_changes: false
behavior: default # update if exists else create new
# branches: *

after_n_builds: 23
58 changes: 0 additions & 58 deletions .drone.yml

This file was deleted.

Loading