-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add read image and process lables natebook #162
base: main
Are you sure you want to change the base?
Conversation
"from snowflake.snowpark.context import get_active_session\n", | ||
"session = get_active_session()\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work outside of Snowflake Notebooks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Need to create the session from config first)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ray data won't work from notebook as well, this notebook is meant to be used inside a snowbook.
" database = \"ST_DB\",\n", | ||
" schema = \"ST_SCHEMA\",\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This db/schema don't exist for users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what would be the best practice on this? I guess we cannot assume any database and scheme won't exist on customer account.
}, | ||
"source": [ | ||
"### Process both dataset to include addition columns\n", | ||
"**Image Dataset**: add a join key, encode the images, standardize image\\n\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove \\n
"### Process both dataset to include addition columns\n", | ||
"**Image Dataset**: add a join key, encode the images, standardize image\\n\n", | ||
"\n", | ||
"**Label Dataset**: add a join key, interrpet the labels" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sp
"source": [ | ||
"from snowflake.ml.ray.datasource import SFStageImageDataSource, SFStageTextDataSource\n", | ||
"\n", | ||
"image_source = SFStageImageDataSource(\n", | ||
" stage_location = \"@DATA_STAGE_RAY/images/\",\n", | ||
" database = \"ST_DB\",\n", | ||
" schema = \"ST_SCHEMA\",\n", | ||
" image_size=(256, 256),\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "2324e409-b4c5-4405-ad1c-267831be1773", | ||
"metadata": { | ||
"language": "python", | ||
"name": "cell15" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"label_source = SFStageTextDataSource(\n", | ||
" stage_location = \"@DATA_STAGE_RAY/labels/\",\n", | ||
" database = \"ST_DB\",\n", | ||
" schema = \"ST_SCHEMA\",\n", | ||
")" | ||
] | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should external users get the images and labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me add a step before this notebook to prepare for the data, to answer your question: this is using a public third party dataset
}, | ||
"source": [ | ||
"### Merge image source and label source into a single dataset\n", | ||
"We have two ways of achieving this: 1) if customer is more famaliar with `pandas.Dataframe` and if the data fit into memory, then we can convert all data into pandas (or write into snowflake) and do the rest of the ops. 2) If the data does not fit into memory, we can directly leverage ray dataset to do the processing. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sp famaliar
"### Merge image source and label source into a single dataset\n", | ||
"We have two ways of achieving this: 1) if customer is more famaliar with `pandas.Dataframe` and if the data fit into memory, then we can convert all data into pandas (or write into snowflake) and do the rest of the ops. 2) If the data does not fit into memory, we can directly leverage ray dataset to do the processing. \n", | ||
"\n", | ||
"**Note**: Ray dataset is not naturally architeched to support join ops, so it's better for to use other method (in memory / snowflake) to perform joins" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sp architeched
"resultHeight": 46 | ||
}, | ||
"source": [ | ||
"## Save the Transformed Dataset to a snowflake table\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: capitalize Snowflake
" database = \"ST_DB\",\n", | ||
" schema = \"ST_SCHEMA\",\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just a reminder that db/schema don't exist for users)
"source": [ | ||
"# sql cell\n", | ||
"\n", | ||
"# SELECT * FROM RAY_DEMO_JAN21_IMAGE_DS;" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert to Snowpark Python call?
Add notebook to show unstrcutured data processing on container runtime