-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clay over aoi #116
Clay over aoi #116
Conversation
I merged this PR into #118. Which would close this one when merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yellowcap, just some minor recommendations/questions below. I think we're trying to move fast, so I'll pre-approve this, but feel free to handle any of the suggestions below if you have time before merging.
docs/clay_over_aoi.ipynb
Outdated
"source": [ | ||
"aoi_src = \"\"\"{\n", | ||
" \"type\": \"FeatureCollection\",\n", | ||
" \"name\": \"puri\",\n", | ||
" \"crs\": {\n", | ||
" \"type\": \"name\",\n", | ||
" \"properties\": { \"name\": \"urn:ogc:def:crs:OGC:1.3:CRS84\" }\n", | ||
" },\n", | ||
" \"features\": [\n", | ||
" {\n", | ||
" \"type\": \"Feature\",\n", | ||
" \"properties\": {\n", | ||
" \"Region\": \"Puri\"\n", | ||
" },\n", | ||
" \"geometry\": {\n", | ||
" \"type\": \"Polygon\",\n", | ||
" \"coordinates\": [ [ [ 85.050328768992927, 19.494998302498729 ], [ 85.169749958884921, 19.441346370958311 ], [ 85.968028516820738, 19.799945705366934 ], [ 86.029742823006544, 20.15322442052609 ], [ 86.104280881127053, 20.516965603502392 ], [ 85.404584916189307, 20.564248936237991 ], [ 84.945334300027469, 20.339711117597215 ], [ 84.816295296184407, 19.799945705366934 ], [ 85.050328768992927, 19.494998302498729 ] ] ]\n", | ||
" }\n", | ||
" }\n", | ||
" ]\n", | ||
"}\"\"\"\n", | ||
"aoi = gpd.read_file(aoi_src, driver=\"GeoJSON\")\n", | ||
"aoi.geometry[0]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we show how to specify the AOI via shapely.Point
or shapely.box
instead? GeoJSON is a little harder to construct manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, changed to use shapely.box
docs/clay_over_aoi.ipynb
Outdated
"! wandb disabled\n", | ||
"! python trainer.py predict --ckpt_path=data/checkpoints/Clay_v0.1_epoch-24_val-loss-0.46.ckpt \\\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we should look to disable wandb by default in trainer.py
as mentioned at #109 (comment). Relevant lines are here:
Lines 53 to 56 in 0145e55
LearningRateMonitor(logging_interval="step"), | |
LogIntermediatePredictions(), | |
], | |
"logger": [WandbLogger(project="CLAY-v0", log_model=False)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most definitively, we can remove this line once the trainer has been udpated.
scripts/datacube.py
Outdated
required=False, | ||
default="", | ||
type=str, | ||
help="Comma separated list of date ranges, each provided as yy-mm-dd/yy-mm-dd.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it year as 4 numbers (e.g. 2024
) or as 2 numbers (e.g. 24
)?
help="Comma separated list of date ranges, each provided as yy-mm-dd/yy-mm-dd.", | |
help="Comma separated list of date ranges, each provided as yyyy-mm-dd/yyyy-mm-dd.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I will update accordingly.
scripts/datacube.py
Outdated
"--subset", | ||
required=False, | ||
default="", | ||
help="For debugging, subset x and y to this pixel window as a comma" | ||
"separated string of 4 integers.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this is for debugging only, but just to clarify, this option is for cropping the tiles/chips to a specified height and width?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand what this does either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is creating confusion, I'll try to work this better in the help string.
The goal of this is to reduce the amount of data when processing in test mode. Normally, the script would download the entire scene, which is 10k pixels squared. Whit this one can reduce the amount of pixels downloaded and passed to the tiler script. So this is cutting a window from the scene data. The tiler will then work on this smaller input, but be unaltered otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to make it more complex, but I understand we download the entire MGRS tile, even when many of the chips might be outside of the requested AOI. If we drop chips outside of the AoI, we can just use a point or small polygon for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct and would be the best approach, but for that the datacube.py script needs much more refactoring. Currently it's fully focused on downloading the entire MGRS tile. Filtering that spatially will require passing the AOI itself and handling spatial intersections, so that indeed is quite a bit more complicated.
For now I can state also that more clearly, that the AOI approach right now means getting data for full MGRS tiles that intersect with the AOI.
|
||
with rasterio.open(name, "r+") as rst: | ||
rst.colorinterp = color | ||
rst.update_tags(date=date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to handle #70 here, i.e. use ACQUISITIONDATETIME or TIFFTAG_DATETIME? Or do that in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate PR, coming up.
"source": [ | ||
"import geopandas as gpd\n", | ||
"\n", | ||
"! wget https://clay-mgrs-samples.s3.amazonaws.com/mgrs_full.fgb -O data/mgrs/mgrs_full.fgb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention the original KML file this mgrs_full.fgb
file was converted from? Assuming that you got it from https://sentinel.esa.int/web/sentinel/missions/sentinel-2/data-products at https://sentinel.esa.int/documents/247904/1955685/S2A_OPER_GIP_TILPAR_MPC__20151209T095117_V20150622T000000_21000101T000000_B00.kml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, the full sample file is constructed in the landcover.sh script. There I actually got it from the NASA site but its the same file. I'll link to this in the notebook.
"source": [ | ||
"for index, row in mgrs_aoi.iterrows():\n", | ||
" print(index, row)\n", | ||
" ! python scripts/datacube.py --sample data/mgrs/mgrs_aoi.fgb --subset 1500,1500,2524,2524 --localpath data/chips --index {index} --dateranges 2020-01-01/2020-04-01,2021-06-01/2021-09-15" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line requires a PC_SDK_SUBSCRIPTION_KEY
token to be set I think, otherwise users will get a requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://planetarycomputer.microsoft.com/api/sas/v1/token/sentinel1euwestrtc/sentinel1-grd-rtc
. Need to document this inline perhaps, xref #119.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added reference to the markdown
docs/clay_over_aoi.ipynb
Outdated
"source": [ | ||
"## Create the embeddings for each training chip\n", | ||
"\n", | ||
"The checkpoints can be downloaded from huggingface." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to HuggingFace page perhaps?
"The checkpoints can be downloaded from huggingface." | |
"The checkpoints can be downloaded from huggingface at https://huggingface.co/made-with-clay/Clay." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, added.
docs/clay_over_aoi.ipynb
Outdated
], | ||
"source": [ | ||
"! wandb disabled\n", | ||
"! python trainer.py predict --ckpt_path=data/checkpoints/Clay_v0.1_epoch-24_val-loss-0.46.ckpt \\\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lightning supports passing in a checkpoint URL (see https://lightning.ai/docs/pytorch/2.1.2/common/checkpointing_advanced.html#resume-training-from-a-cloud-checkpoint), so can remove the wget
line above and predict directly from the HuggingFace checkpoint URL. The checkpoint will be downloaded automatically to ~/.cache/torch/hub/checkpoints/
"! python trainer.py predict --ckpt_path=data/checkpoints/Clay_v0.1_epoch-24_val-loss-0.46.ckpt \\\n", | |
"! python trainer.py predict --ckpt_path=https://huggingface.co/made-with-clay/Clay/resolve/main/Clay_v0.1_epoch-24_val-loss-0.46.ckpt \\\n", |
The pyproject.toml based config is not always being picked up.
Updated pipeline and added notebook to docs to run clay over custom aois over custom date ranges.
I tested the script locally extensively and it works, having more people trying it won't hurt to see if there are any things that are missing / that I did not catch in my local runs.