Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

burn scars model 600m failing v0.99.8 #409

Open
romeokienzler opened this issue Feb 6, 2025 · 1 comment
Open

burn scars model 600m failing v0.99.8 #409

romeokienzler opened this issue Feb 6, 2025 · 1 comment
Assignees
Labels

Comments

@romeokienzler
Copy link
Collaborator

(app-root) sh-5.1$ terratorch fit -c /working/test/burnscars.yaml
INFO:albumentations.check_version:A new version of Albumentations is available: 2.0.3 (you have 1.4.10). Upgrade using: pip install --upgrade albumentations
INFO: Seed set to 0
INFO:lightning.fabric.utilities.seed:Seed set to 0
INFO:root:Loaded weights for HLSBands.BLUE in position 0 of patch embed
INFO:root:Loaded weights for HLSBands.GREEN in position 1 of patch embed
INFO:root:Loaded weights for HLSBands.RED in position 2 of patch embed
INFO:root:Loaded weights for HLSBands.NIR_NARROW in position 3 of patch embed
INFO:root:Loaded weights for HLSBands.SWIR_1 in position 4 of patch embed
INFO:root:Loaded weights for HLSBands.SWIR_2 in position 5 of patch embed
/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/cli.py:681: SemanticSegmentationTask.configure_optimizers will be overridden by MyLightningCLI.configure_optimizers.
INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO:lightning.pytorch.utilities.rank_zero:Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO:lightning.pytorch.utilities.rank_zero:HPU available: False, using: 0 HPUs
/opt/miniconda/lib/python3.11/site-packages/urllib3/connectionpool.py:1063: InsecureRequestWarning: Unverified HTTPS request is being made to host 'gfm-mlflow-internal-nasageospatial-dev.cash.sl.cloud9.ibm.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
/opt/miniconda/lib/python3.11/site-packages/urllib3/connectionpool.py:1063: InsecureRequestWarning: Unverified HTTPS request is being made to host 'gfm-mlflow-internal-nasageospatial-dev.cash.sl.cloud9.ibm.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ ┃ Name ┃ Type ┃ Params ┃ Mode ┃
┡━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ 0 │ model │ PixelWiseModel │ 660 M │ train │
│ 1 │ criterion │ DiceLoss │ 0 │ train │
│ 2 │ train_metrics │ MetricCollection │ 0 │ train │
│ 3 │ val_metrics │ MetricCollection │ 0 │ train │
│ 4 │ test_metrics │ ModuleList │ 0 │ train │
└───┴───────────────┴──────────────────┴────────┴───────┘
Trainable params: 660 M
Non-trainable params: 0
Total params: 660 M
Total estimated model params size (MB): 2.6 K
Modules in train mode: 811
Modules in eval mode: 0
/opt/miniconda/lib/python3.11/site-packages/urllib3/connectionpool.py:1063: InsecureRequestWarning: Unverified HTTPS request is being made to host 'gfm-mlflow-internal-nasageospatial-dev.cash.sl.cloud9.ibm.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(

/opt/miniconda/lib/python3.11/site-packages/urllib3/connectionpool.py:1063: InsecureRequestWarning: Unverified HTTPS request is being made to host 'gfm-mlflow-internal-nasageospatial-dev.cash.sl.cloud9.ibm.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
/opt/miniconda/lib/python3.11/site-packages/urllib3/connectionpool.py:1063: InsecureRequestWarning: Unverified HTTPS request is being made to host 'gfm-mlflow-internal-nasageospatial-dev.cash.sl.cloud9.ibm.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
🏃 View run prithvi-eo2-burnscars-prithvi_eo_v2_600-retrain at: https://gfm-mlflow-internal-nasageospatial-dev.cash.sl.cloud9.ibm.com/#/experiments/975/runs/ff1c5b0e5423487ca15335ceff777aeb
🧪 View experiment at: https://gfm-mlflow-internal-nasageospatial-dev.cash.sl.cloud9.ibm.com/#/experiments/975
/opt/miniconda/lib/python3.11/site-packages/urllib3/connectionpool.py:1063: InsecureRequestWarning: Unverified HTTPS request is being made to host 'gfm-mlflow-internal-nasageospatial-dev.cash.sl.cloud9.ibm.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
Traceback (most recent call last):
File "/opt/miniconda/bin/terratorch", line 8, in
sys.exit(main())
^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/terratorch/main.py", line 9, in main
_ = build_lightning_cli()
^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/terratorch/cli_tools.py", line 447, in build_lightning_cli
return MyLightningCLI(
^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/cli.py", line 396, in init
self._run_subcommand(self.subcommand)
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/cli.py", line 706, in _run_subcommand
fn(**fn_kwargs)
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 539, in fit
call._call_and_handle_interrupt(
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 47, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 575, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 982, in _run
results = self._run_stage()
^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1024, in _run_stage
self._run_sanity_check()
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1053, in _run_sanity_check
val_loop.run()
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/loops/utilities.py", line 179, in _decorator
return loop_run(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 144, in run
self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 433, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 323, in _call_strategy_hook
output = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/lightning/pytorch/strategies/strategy.py", line 412, in validation_step
return self.lightning_module.validation_step(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/terratorch/tasks/segmentation_tasks.py", line 285, in validation_step
model_output: ModelOutput = self(x, **rest)
^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/torchgeo/trainers/base.py", line 78, in forward
return self.model(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/terratorch/models/pixel_wise_model.py", line 124, in forward
decoder_output = self.decoder([f.clone() for f in features])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/terratorch/models/decoders/unet_decoder.py", line 39, in forward
return self.decoder(*x)
^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/segmentation_models_pytorch/decoders/unet/decoder.py", line 122, in forward
x = decoder_block(x, skip)
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/miniconda/lib/python3.11/site-packages/segmentation_models_pytorch/decoders/unet/decoder.py", line 40, in forward
x = torch.cat([x, skip], dim=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 36 but got size 37 for tensor number 1 in the list.

segmentation_models_pytorch==0.4.0

@romeokienzler romeokienzler self-assigned this Feb 6, 2025
@kaushikCanada
Copy link

recently i used terratorch to train lulc model on landsat imagery. here is my code that works (after 100s of experiment). running on my RTX Quadro 8000 48GB VRAM.

`
import os
os.environ["CONDA_PREFIX"] = "/opt/conda/envs/geo_ml_env/"
os.environ["PROJ_LIB"] = os.path.join(os.environ["CONDA_PREFIX"], "share", "proj")
from osgeo import gdal
import os
import torch
from terratorch.datasets import HLSBands
from terratorch import BACKBONE_REGISTRY
from terratorch.models import EncoderDecoderFactory, SMPModelFactory
from segmentation_models_pytorch.encoders import encoders as smp_encoders
from landsat_datamodule import LandsatLabelledDataModule
from terratorch.tasks import SemanticSegmentationTask
from lightning.pytorch import Trainer
from terratorch.models.model import AuxiliaryHead

hls_bands = ["BLUE", "GREEN", "RED", "NIR", "SWIR_1", "SWIR_2"]

dm = LandsatLabelledDataModule(
root = "/EODATA/exported_data/training/cleaned/bengal_256/",
images_root = "images",
masks_root = "masks_3class",
classes_csv = "/EODATA/csvfiles/class_remap.csv",
batch_size = 40,
bands = hls_bands,
num_workers = 4,
val_split_pct = 0.1,
patch_size = 256
)

dm.setup('fit')

Model

model = SemanticSegmentationTask(
model_args={
"decoder": "UperNetDecoder",
"backbone_pretrained": True,
"backbone": "terratorch_prithvi_eo_v2_600", # Model can be either prithvi_eo_v2_300, prithvi_eo_v2_300_tl, prithvi_eo_v2_600, prithvi_eo_v2_600_tl
"backbone_in_channels": 6,
"rescale": True,
"backbone_bands": ["BLUE", "GREEN", "RED", "NIR_NARROW", "SWIR_1", "SWIR_2"],
"backbone_num_frames": 1,
"num_classes": 3,
"head_dropout": 0.1,
"decoder_channels": 256,
"decoder_scale_modules": True,
"head_channel_list": [128, 64],
"necks": [
{
"name": "SelectIndices",
#"indices": [2, 5, 8, 11] # indices for prithvi_vit_100
"indices": [5, 11, 17, 23] # indices for prithvi_vit_300
#"indices": [7, 15, 23, 31] # indices for prithvi_vit_600
},
{
"name": "ReshapeTokensToImage"
}
]
},
plot_on_val=False,
loss="focal",
lr=1.0e-4,
optimizer="AdamW",
optimizer_hparams={"weight_decay": 0.1},
scheduler="StepLR",
scheduler_hparams={"step_size": 10, "gamma": 0.9},
ignore_index=-1,
freeze_backbone=False,
freeze_decoder=False,
model_factory="EncoderDecoderFactory",
)

trainer = Trainer(
accelerator="gpu",
num_nodes=1,
# logger = logger,
max_epochs=2,
check_val_every_n_epoch=1,
log_every_n_steps=1,
enable_checkpointing=True,
default_root_dir="terratorch_logs",
# callbacks=[checkpoint_callback],
)

trainer.fit(model, datamodule=dm)

`

I see the training bar as having 1451 iterations, but i dont know how it happenned since my dataset originally does not have that many patches.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants