Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtain patch level metadata (e.g. geospatial bounds and cloud cover), save and demo DEP use case (sim search) #172

Merged
merged 33 commits into from
Mar 24, 2024

Conversation

lillythomas
Copy link
Contributor

@lillythomas lillythomas commented Mar 6, 2024

This PR will work towards a demonstration of how to obtain patch level embeddings and write them to GeoParquet files to run similarity search with.

Main tasks that need to be done:

Reference tickets: #168 #140

@lillythomas lillythomas marked this pull request as draft March 6, 2024 08:31
@lillythomas lillythomas force-pushed the patch_level_metadata branch from fbc286f to 674c716 Compare March 8, 2024 07:50
@lillythomas lillythomas force-pushed the patch_level_metadata branch from c7dd579 to 55ec91d Compare March 8, 2024 19:14
@lillythomas lillythomas force-pushed the patch_level_metadata branch from efe5611 to b51bc0e Compare March 11, 2024 20:45
@lillythomas lillythomas marked this pull request as ready for review March 11, 2024 20:48
@lillythomas
Copy link
Contributor Author

The notebook docs/tutorial_digital_earth_pacific_patch_level.ipynb walks through an example of:

  • generating patch level embeddings for an area where known mining extraction events occur
  • saving the patch level embeddings to independent GeoParquet files
  • executing similarity search based on a ground truth point's overlapping patch

@weiji14 @yellowcap ready for when you have time to review.

@lillythomas lillythomas force-pushed the patch_level_metadata branch from 23639a9 to 88eba37 Compare March 11, 2024 20:56
@weiji14
Copy link
Contributor

weiji14 commented Mar 11, 2024

Thanks @lillythomas! I haven't looked too closely yet, but would it be possible to show where the similarity search results are located? Maybe something like showing the bounding boxes of all the patches on a map, and also overlay where the original quarry points are.

@lillythomas lillythomas force-pushed the patch_level_metadata branch from ddfd36a to 874a349 Compare March 12, 2024 07:41
@lillythomas
Copy link
Contributor Author

Thanks @lillythomas! I haven't looked too closely yet, but would it be possible to show where the similarity search results are located? Maybe something like showing the bounding boxes of all the patches on a map, and also overlay where the original quarry points are.

Yes! Great idea. Working on this tomorrow.

Copy link
Member

@yellowcap yellowcap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lilly, this is great to see! I did not yet have time get to the sim search part, but will get back to it on Monday. Left some comments for now.

Could you add some more context on what the purpose is? Do you think we going to apply this to a region for searching certain events / features?

"outputs": [],
"source": [
"mrd = gpd.read_file(\n",
" \"../mineral-resource-detection/training_data/draft_inputs/MRD_dissagregated_25.geojson\"\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"outputs": [],
"source": [
"DATA_DIR = \"data/minicubes\"\n",
"CKPT_PATH = \"/home/ubuntu/data/checkpoints/mae_epoch-24_val-loss-0.46.ckpt\"\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" box_emb = shapely.geometry.box(box_[0], box_[1], box_[2], box_[3])\n",
"\n",
" # Create the GeoDataFrame\n",
" gdf = gpd.GeoDataFrame(data, geometry=[box_emb], crs=f\"EPSG:{epsg}\")\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why making one pdf file per embedding and not one that contains all? This would make everything downstream much easier. Would it be possible to create one before the loop and add rows to it?

" lambda x: Path(x).stem.rsplit(\"/\")[-1].rsplit(\"_\")[0]\n",
" )\n",
" gdf[\"idx\"] = \"_\".join(emb.split(\"/\")[-1].split(\"_\")[2:]).replace(\".gpq\", \"\")\n",
" gdf[\"box\"] = [gdf.geometry[0].bounds]\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This did not work for me, replaced with the following to get it to work

gdf["box"] = [box(*geom.bounds) for geom in gdf.geometry]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. It works on my end. Do you have the traceback?

"# Combine patch level geodataframes into one\n",
"embeddings_gdf = pd.concat(gdfs, ignore_index=True)\n",
"# Make a polygon for each patch level bounding box\n",
"embeddings_gdf[\"bbox\"] = embeddings_gdf[\"box\"].apply(lambda bbox: box(*bbox))"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda function is no longer necessary if comment above is applied

},
{
"data": {
"image/png": "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I got the following error.

ArrowInvalid: Could not convert <POLYGON ((177.507 -17.778, 177.507 -17.775, 177.504 -17.775, 177.504 -17.77...> with type Polygon: did not recognize Python value type when inferring an Arrow data type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was introduced by making your suggested change in #172 (comment) but I have a way to handling it. Essentially pyarrow doesn't like a polygon object, so I can get a list equivalent for the table by adding the bounds method to "box": row["box"] e.g. "box": row["box"].bounds, which I am in favor of doing as it allows us to drop the lamba function as you pointed out.

@@ -790,7 +790,7 @@ def __init__( # noqa: PLR0913
wd=0.05,
b1=0.9,
b2=0.95,
embeddings_level: Literal["mean", "patch", "group"] = "mean",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop this from this PR seems not necessary for the notebook to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Will do.

… results and ground truth points on rgb xarray, add descriptions, revert change in model_clay.py
@lillythomas lillythomas force-pushed the patch_level_metadata branch from ccb5e4a to fc0ffa2 Compare March 15, 2024 20:27
@lillythomas lillythomas force-pushed the patch_level_metadata branch from 369e6af to 3526cb7 Compare March 19, 2024 02:02
@lillythomas lillythomas changed the title Obtain patch level metadata (e.g. geospatial bounds) and demo DEP use case Obtain patch level metadata (e.g. geospatial bounds and cloud cover), save and demo DEP use case (sim search) Mar 19, 2024
@lillythomas lillythomas force-pushed the patch_level_metadata branch from 3561699 to b51fb94 Compare March 19, 2024 16:27
@lillythomas lillythomas force-pushed the patch_level_metadata branch from b311678 to 53a75fa Compare March 22, 2024 17:53
@lillythomas lillythomas force-pushed the patch_level_metadata branch from f67d8c6 to 8281a78 Compare March 22, 2024 18:01
@lillythomas lillythomas force-pushed the patch_level_metadata branch from 405514a to 985b275 Compare March 22, 2024 18:07
@lillythomas lillythomas force-pushed the patch_level_metadata branch from b051092 to bb16690 Compare March 22, 2024 18:10
@lillythomas lillythomas force-pushed the patch_level_metadata branch from f586524 to 784a437 Compare March 22, 2024 18:21
@lillythomas lillythomas force-pushed the patch_level_metadata branch from f6d092e to 1583a6f Compare March 22, 2024 18:42
@lillythomas lillythomas force-pushed the patch_level_metadata branch 2 times, most recently from b8cdd25 to a87597e Compare March 22, 2024 19:09
@lillythomas lillythomas force-pushed the patch_level_metadata branch from a51208c to daa0486 Compare March 22, 2024 19:16
@lillythomas lillythomas force-pushed the patch_level_metadata branch from 8a61be1 to d1a7ebb Compare March 22, 2024 19:36
@lillythomas lillythomas force-pushed the patch_level_metadata branch from 52a05cb to 89ba6d6 Compare March 23, 2024 00:27
@lillythomas lillythomas force-pushed the patch_level_metadata branch from 6ff60ff to bcddfef Compare March 24, 2024 20:24
@lillythomas lillythomas force-pushed the patch_level_metadata branch from 881d1cf to 7e6b894 Compare March 24, 2024 21:29
@lillythomas lillythomas force-pushed the patch_level_metadata branch from a7047bd to 40a2c76 Compare March 24, 2024 21:41
@lillythomas lillythomas merged commit 7bda731 into main Mar 24, 2024
6 checks passed
@lillythomas lillythomas deleted the patch_level_metadata branch March 24, 2024 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants