-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Obtain patch level metadata (e.g. geospatial bounds and cloud cover), save and demo DEP use case (sim search) #172
Conversation
fbc286f
to
674c716
Compare
c7dd579
to
55ec91d
Compare
efe5611
to
b51bc0e
Compare
The notebook
@weiji14 @yellowcap ready for when you have time to review. |
23639a9
to
88eba37
Compare
Thanks @lillythomas! I haven't looked too closely yet, but would it be possible to show where the similarity search results are located? Maybe something like showing the bounding boxes of all the patches on a map, and also overlay where the original quarry points are. |
ddfd36a
to
874a349
Compare
Yes! Great idea. Working on this tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Lilly, this is great to see! I did not yet have time get to the sim search part, but will get back to it on Monday. Left some comments for now.
Could you add some more context on what the purpose is? Do you think we going to apply this to a region for searching certain events / features?
"outputs": [], | ||
"source": [ | ||
"mrd = gpd.read_file(\n", | ||
" \"../mineral-resource-detection/training_data/draft_inputs/MRD_dissagregated_25.geojson\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use url for this, then it becomes reproducible
"outputs": [], | ||
"source": [ | ||
"DATA_DIR = \"data/minicubes\"\n", | ||
"CKPT_PATH = \"/home/ubuntu/data/checkpoints/mae_epoch-24_val-loss-0.46.ckpt\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, if we use huggingface its reproducible
https://huggingface.co/made-with-clay/Clay/resolve/main/Clay_v0.1_epoch-24_val-loss-0.46.ckpt
" box_emb = shapely.geometry.box(box_[0], box_[1], box_[2], box_[3])\n", | ||
"\n", | ||
" # Create the GeoDataFrame\n", | ||
" gdf = gpd.GeoDataFrame(data, geometry=[box_emb], crs=f\"EPSG:{epsg}\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why making one pdf file per embedding and not one that contains all? This would make everything downstream much easier. Would it be possible to create one before the loop and add rows to it?
" lambda x: Path(x).stem.rsplit(\"/\")[-1].rsplit(\"_\")[0]\n", | ||
" )\n", | ||
" gdf[\"idx\"] = \"_\".join(emb.split(\"/\")[-1].split(\"_\")[2:]).replace(\".gpq\", \"\")\n", | ||
" gdf[\"box\"] = [gdf.geometry[0].bounds]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This did not work for me, replaced with the following to get it to work
gdf["box"] = [box(*geom.bounds) for geom in gdf.geometry]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. It works on my end. Do you have the traceback?
"# Combine patch level geodataframes into one\n", | ||
"embeddings_gdf = pd.concat(gdfs, ignore_index=True)\n", | ||
"# Make a polygon for each patch level bounding box\n", | ||
"embeddings_gdf[\"bbox\"] = embeddings_gdf[\"box\"].apply(lambda bbox: box(*bbox))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lambda function is no longer necessary if comment above is applied
}, | ||
{ | ||
"data": { | ||
"image/png": " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I got the following error.
ArrowInvalid: Could not convert <POLYGON ((177.507 -17.778, 177.507 -17.775, 177.504 -17.775, 177.504 -17.77...> with type Polygon: did not recognize Python value type when inferring an Arrow data type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was introduced by making your suggested change in #172 (comment) but I have a way to handling it. Essentially pyarrow doesn't like a polygon object, so I can get a list equivalent for the table by adding the bounds method to "box": row["box"]
e.g. "box": row["box"].bounds
, which I am in favor of doing as it allows us to drop the lamba function as you pointed out.
src/model_clay.py
Outdated
@@ -790,7 +790,7 @@ def __init__( # noqa: PLR0913 | |||
wd=0.05, | |||
b1=0.9, | |||
b2=0.95, | |||
embeddings_level: Literal["mean", "patch", "group"] = "mean", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop this from this PR seems not necessary for the notebook to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Will do.
… results and ground truth points on rgb xarray, add descriptions, revert change in model_clay.py
ccb5e4a
to
fc0ffa2
Compare
57fd74d
to
a451a04
Compare
369e6af
to
3526cb7
Compare
3561699
to
b51fb94
Compare
b311678
to
53a75fa
Compare
f67d8c6
to
8281a78
Compare
405514a
to
985b275
Compare
b051092
to
bb16690
Compare
f586524
to
784a437
Compare
f6d092e
to
1583a6f
Compare
b8cdd25
to
a87597e
Compare
a51208c
to
daa0486
Compare
8a61be1
to
d1a7ebb
Compare
52a05cb
to
89ba6d6
Compare
6ff60ff
to
bcddfef
Compare
881d1cf
to
7e6b894
Compare
a7047bd
to
40a2c76
Compare
This PR will work towards a demonstration of how to obtain patch level embeddings and write them to GeoParquet files to run similarity search with.
Main tasks that need to be done:
embeddings_level = "patch"
)Reference tickets: #168 #140