-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add more notebook samples for documentation #1043
feat: Add more notebook samples for documentation #1043
Conversation
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #1043 +/- ##
==========================================
+ Coverage 84.92% 84.93% +0.01%
==========================================
Files 203 203
Lines 9689 9689
Branches 558 558
==========================================
+ Hits 8228 8229 +1
+ Misses 1461 1460 -1
Continue to review full report at Codecov.
|
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
df83c7a
to
aa28f11
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
225500e
to
44856d1
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Job! Mostly just little things left.
Two larger questions:
there are a lot of cache count and repartitions going on in VW code. Would you be able to try removing some of these to see if they are necessary? We want to avoid having many dataframes cached, but if they are needed to avoid re-fitting the model that is OK.
I will also send over Jack Gerrits example on Vowpal Wabbit Contextual Bandit code when available, (we don't have to block on this though it can be a separate PR)
"- Anomaly status of latest point: generates a model using preceding points and determines whether the latest point is anomalous ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/scala/com/microsoft/ml/spark/cognitive/DetectLastAnomaly.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.DetectLastAnomaly))\n", | ||
"- Find anomalies: generates a model using an entire series and finds anomalies in the series ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/scala/com/microsoft/ml/spark/cognitive/DetectAnomalies.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.DetectAnomalies))\n", | ||
"\n", | ||
"### Web Search\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Web Search -> Search
"\n", | ||
"### Web Search\n", | ||
"- [Bing Image search](https://azure.microsoft.com/en-us/services/cognitive-services/bing-image-search-api/) ([Scala](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/scala/com/microsoft/ml/spark/cognitive/BingImageSearch.html), [Python](https://mmlspark.blob.core.windows.net/docs/1.0.0-rc3/pyspark/mmlspark.cognitive.html#module-mmlspark.cognitive.BingImageSearch))\n", | ||
"- [Azure Cognitive search](https://docs.microsoft.com/en-us/azure/search/search-what-is-azure-search)\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add corresponding scala snd python docs links?
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"train_data.show(10)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"train_data.groupBy(\"Bankrupt?\").count().show()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show -> display
"outputs": [], | ||
"source": [ | ||
"from mmlspark.lightgbm import LightGBMClassificationModel\n", | ||
"model.saveNativeModel(\"/lgbmcmodel\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we can call this lgbmclassifier.model and add a cried markdown description that this allows you to extract the underlying lightGBM model for fast deployment after you train on spark
"dt1 = spark.read.format('libsvm') \\\n", | ||
" .load(\"wasbs://publicwasb@mmlspark.blob.core.windows.net/lightGBMRanker_rank_test.libsvm\") \\\n", | ||
" .withColumn('iid', monotonically_increasing_id())\n", | ||
"dt2 = spark.read.format('csv').option('inferSchema', True) \\\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise here
@@ -0,0 +1,659 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit to keep with the style of others lets make title Vowpal Wabbit - Overview. Likewise for other NBs
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"train_data.groupBy(\"target\").count().show()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
display
"data = spark.read.parquet(\"wasbs://publicwasb@mmlspark.blob.core.windows.net/AdultCensusIncome.parquet\")\n", | ||
"data = data.select([\"education\", \"marital-status\", \"hours-per-week\", \"income\"])\n", | ||
"train, test = data.randomSplit([0.75, 0.25], seed=123)\n", | ||
"display(train.limit(10))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for limit
"# Making predictions\n", | ||
"test = test.withColumn(\"label\", when(col(\"income\").contains(\"<\"), 0.0).otherwise(1.0))\n", | ||
"prediction = vw_trained.transform(test)\n", | ||
"display(prediction.limit(10))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for limit
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
42e8b33
to
c497742
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
…park into serena/addDocumentation
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
No description provided.