diff --git a/docs/machine-learning/tutorials/sentiment-analysis.md b/docs/machine-learning/tutorials/sentiment-analysis.md index 4bd5f5d7a4fa4..02d21a6982139 100644 --- a/docs/machine-learning/tutorials/sentiment-analysis.md +++ b/docs/machine-learning/tutorials/sentiment-analysis.md @@ -5,9 +5,9 @@ ms.date: 05/07/2018 ms.custom: mvc #Customer intent: As a developer, I want to use ML.NET to apply a binary classification task so that I can understand how to use sentiment prediction to take appropriaste action. --- -# Walkthrough: Use the ML.NET APIs in a sentiment analysis classification scenario +# Tutorial: Use the ML.NET APIs in a sentiment analysis classification scenario -This sample walkthrough illustrates using the ML.NET API to create a sentiment classifier via a .NET Core console application using C# in Visual Studio 2017. +This sample tutorial illustrates using the ML.NET API to create a sentiment classifier via a .NET Core console application using C# in Visual Studio 2017. In this tutorial, you learn how to: > [!div class="checklist"] @@ -28,7 +28,7 @@ Sentiment analysis is either positive or negative. So, you can use classificatio ## Machine learning workflow -This walkthrough follows a machine learning workflow that enables the process to move in an orderly fashion. +This tutorial follows a machine learning workflow that enables the process to move in an orderly fashion. The workflow phases are as follows: @@ -43,7 +43,7 @@ The workflow phases are as follows: You first need to understand the problem, so you can break it down to parts that can support building and training the model. Breaking the problem down you to predict and evaluate the results. -The problem for this walkthrough is to understand incoming website comment sentiment to take the appropriate action. +The problem for this tutorial is to understand incoming website comment sentiment to take the appropriate action. You can break down the problem to the sentiment text and sentiment value for the data you want to train the model with, and a predicted sentiment value that you can evaluate and then use operationally. @@ -81,17 +81,7 @@ Predict the **sentiment** of a new website comment, either positive or negative. Add the following `using` statements to the top of the *Program.cs* file: -```csharp -using System; -using Microsoft.ML.Models; -using Microsoft.ML.Runtime; -using Microsoft.ML.Runtime.Api; -using Microsoft.ML.Trainers; -using Microsoft.ML.Transforms; -using System.Collections.Generic; -using System.Linq; -using Microsoft.ML; -``` +[!code-csharp[AddUsings](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#1 "Add necessary usings")] You need to create two global variables to hold the path to the recently downloaded files: @@ -100,10 +90,7 @@ You need to create two global variables to hold the path to the recently downloa Add the following code to the line right above the `Main` method: -```csharp -const string _dataPath = @"..\..\..\data\imdb_labelled.txt"; -const string _testDataPath = @"..\..\..\data\yelp_labelled.txt"; -``` +[!code-csharp[Declare file variables](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#2 "Declare variables to store data files")] You need to create some classes for your input data and predictions. Add a new class to your project: @@ -113,35 +100,17 @@ You need to create some classes for your input data and predictions. Add a new c The *SentimentData.cs* file opens in the code editor. Add the following `using` statements to the top of *SentimentData.cs*: -```csharp -using Microsoft.ML.Runtime.Api; -``` +[!code-csharp[AddUsings](../../../samples/machine-learning/tutorials/SentimentAnalysis/SentimentData.cs#1 "Add necessary usings")] Add the following code, which has two classes `SentimentData` and `SentimentPrediction`, to the *SentimentData.cs* file: -```csharp -public class SentimentData -{ - [Column(ordinal: "0")] - public string SentimentText; - [Column(ordinal: "1", name: "Label")] - public float Sentiment; -} - -public class SentimentPrediction -{ - [ColumnName("PredictedLabel")] - public bool Sentiment; -} -``` +[!code-csharp[DeclareTypes](../../../samples/machine-learning/tutorials/SentimentAnalysis/SentimentData.cs#2 "Declare data record types")] `SentimentData` is the input dataset class and has a string for the comment (`SentimentText`), a `float` (`Sentiment`) that has a value for sentiment of either positive or negative. Both fields have `Column` attributes attached to them. This attribute describes the order of each field in the data file, and which is the `Label` field. `SentimentPrediction` is the class used for prediction after the model has been trained. It has a single boolean (`Sentiment`) and a `PredictedLabel` `ColumnName` attribute. The `Label` is used to create and train the model, and it's also used with a second dataset to evaluate the model. The `PredictedLabel` is used during prediction and evaluation. For evaluation, an input with training data, the predicted values, and the model are used. In the *Program.cs* file, replace the `Console.WriteLine("Hello World!")` line with the following code in the `Main` method: -```csharp -var model = TrainAndPredict(); -``` +[!code-csharp[TrainAndPredict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#3 "Train and predict your model")] The `TrainAndPredict` method executes the following tasks: @@ -152,26 +121,17 @@ The `TrainAndPredict` method executes the following tasks: Create the `TrainAndPredict` method, just after the `Main` method, using the following code: -```csharp -public static PredictionModel TrainAndPredict() -{ - -} -``` +[!code-csharp[DeclareTrainAndPredict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#4 "Declare the TrainAndPredict model")] ## Ingest the data Initialize a new instance of that will include the data loading, data processing/featurization, and model. Add the following code as the first line of the `TrainAndPredict` method: -```csharp -var pipeline = new LearningPipeline(); -``` +[!code-csharp[LearningPipeline](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#5 "Create a learning pipeline")] The object is the first part of the pipeline, and loads the training file data. -```csharp -pipeline.Add(new TextLoader(_dataPath, useHeader: false, separator: "tab")); -``` +[!code-csharp[TextLoader](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#6 "Add a text loader to the pipeline")] ## Data preprocess and feature engineering @@ -179,9 +139,7 @@ Pre-processing and cleaning data are important tasks that occur before a dataset Apply a to convert the `SentimentText` column into a numeric vector called `Features` used by the machine learning algorithm. This is the preprocessing/featurization step. Using additional components available in ML.NET can enable better results with your model. Add `TextFeaturizer` to the pipeline as the next line of code: -```csharp -pipeline.Add(new TextFeaturizer("Features", "SentimentText")); -``` +[!code-csharp[TextFeaturizer](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#7 "Add a TextFeaturizer to the pipeline")] ### About the classification model @@ -201,9 +159,7 @@ The object is a decision t Add the following code to the `TrainAndPredict` method: -```csharp -pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 }); -``` +[!code-csharp[BinaryClassifier](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#8 "Add a fast binary tree classifier")] ## Train the model @@ -211,122 +167,65 @@ You train the model, , based on the datas Add the following code to the `TrainAndPredict` method: -```csharp -PredictionModel model = pipeline.Train(); -``` +[!code-csharp[TrainModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#9 "Train the model")] ## Predict the model Add some comments to test the trained model's predictions in the `TrainAndPredict` method: -```csharp -IEnumerable sentiments = new[] -{ - new SentimentData - { - SentimentText = "Contoso's 11 is a wonderful experience", - Sentiment = 0 - }, - new SentimentData - { - SentimentText = "Really bad", - Sentiment = 0 - }, - new SentimentData - { - SentimentText = "Joe versus the Volcano Coffee Company is a great film.", - Sentiment = 0 - } -}; -``` +[!code-csharp[PredictionData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#10 "CVreate test data for predictions")] Now that you have a model, you can use that to predict the positive or negative sentiment of the comment data using the method. To get a prediction, use `Predict` on new data. Note that the input data is a string and the model includes the featurization. Your pipeline is in sync during training and prediction. You didn’t have to write preprocessing/featurization code specifically for predictions, and the same API takes care of both batch and one-time predictions. -```csharp -IEnumerable predictions = model.Predict(sentiments); -``` +[!code-csharp[Predict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#11 "Create predictions of sentiments")] ### Model operationalization: prediction Display `SentimentText` and corresponding sentiment prediction in order to share the results and act on them accordingly. This is called operationalization, using the returned data as part of the operational policies. Create a header for the results using the following code: -```csharp -Console.WriteLine(); -Console.WriteLine("Sentiment Predictions"); -Console.WriteLine("---------------------"); -``` +[!code-csharp[OutputHeaders](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#12 "Display prediction outputs")] Before displaying the predicted results, combine the sentiment and prediction together to see the original comment with its predicted sentiment. The following code uses the method to make that happen, so add that code next: -```csharp -var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => new { sentiment, prediction }); -``` +[!code-csharp[BuildTuples](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#13 "Build the pairs of sentiment data and predictions")] Now that you've combined the `SentimentText` and `Sentiment` into a class, you can display the results using the method: -```csharp -foreach (var item in sentimentsAndPredictions) -{ - Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}"); -} -Console.WriteLine(); -``` +[!code-csharp[DisplayPredictions](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#14 "Display the predictions")] #### Return the model trained to use for evaluation Return the model at the end of the `TrainAndPredict` method. At this point, you could then save it to a zip file or continue to work with it. For this tutorial, you're going to work with it, so add the following code to the next line in `TrainAndPredict`: -```csharp -return model; -``` +[!code-csharp[ReturnModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#15 "Return the model")] ## Evaluate the model Now that you've created and trained the model, you need to evaluate it with a different dataset for quality assurance and validation. In the `Evaluate` method, the model created in `TrainAndPredict` is passed in to be evaluated. Create the `Evaluate` method, just after `TrainAndPredict`, as in the following code: -```csharp -public static void Evaluate(PredictionModel model) -{ - -} -``` +[!code-csharp[Evaluate](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#16 "Evaluate your model")] Add a call to the new method from the `Main` method, right under the `TrainAndPredict` method call, using the following code: -```csharp -Evaluate(model); -``` +[!code-csharp[CallEvaluate](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#17 "Call the Evaluate method")] The class loads the new test dataset with the same schema. You can evaluate the model using this dataset as a quality check. Add that next to the `Evaluate` method call, using the following code: -```csharp -var testData = new TextLoader(_testDataPath, useHeader: false, separator: "tab"); -``` +[!code-csharp[LoadText](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#18 "Load the test dataset")] The object computes the quality metrics for the `PredictionModel` using the specified dataset. To see those metrics, add the evaluator as the next line in the `Evaluate` method, with the following code: -```csharp -var evaluator = new BinaryClassificationEvaluator(); -``` +[!code-csharp[BinaryEvaluator](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#19 "Create the binary evaluator")] The contains the overall metrics computed by binary classification evaluators. To display these to determine the quality of the model, we need to get the metrics first. Add the following code: -```csharp -BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData); -``` +[!code-csharp[CreateMetrics](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#20 "Evaluate the model and create metrics")] ### Displaying the metrics for model validation Use the following code to display the metrics, share the results, and act on them accordingly: -```csharp -Console.WriteLine(); -Console.WriteLine("PredictionModel quality metrics evaluation"); -Console.WriteLine("------------------------------------------"); -Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}"); -Console.WriteLine($"Auc: {metrics.Auc:P2}"); -Console.WriteLine($"F1Score: {metrics.F1Score:P2}"); -``` +[!code-csharp[DisplayMetrics](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#21 "Display selected metrics")] ## Results @@ -336,7 +235,7 @@ Your results should be similar to the following. As the pipeline processes, it d Sentiment Predictions --------------------- Sentiment: Contoso's 11 is a wonderful experience | Prediction: Positive -Sentiment: Really bad | Prediction: Negative +Sentiment:The acting in this movie is really bad | Prediction: Negative Sentiment: Joe versus the Volcano Coffee Company is a great film. | Prediction: Positive diff --git a/docs/machine-learning/tutorials/taxi-fare.md b/docs/machine-learning/tutorials/taxi-fare.md index 1692e7834d243..d75f286147b1a 100644 --- a/docs/machine-learning/tutorials/taxi-fare.md +++ b/docs/machine-learning/tutorials/taxi-fare.md @@ -75,25 +75,11 @@ The **label** is the identifier of the column you are trying to predict. The ide Add the following `using` statements to the top of Program.cs: -```csharp -using System; -using Microsoft.ML.Models; -using Microsoft.ML.Runtime; -using Microsoft.ML.Runtime.Api; -using Microsoft.ML.Trainers; -using Microsoft.ML.Transforms; -using System.Collections.Generic; -using System.Linq; -using Microsoft.ML; -``` +[!code-csharp[AddUsings](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#1 "Add necessary usings")] You define variables to hold your datapath (the dataset that trains your model), your testdatapath (the dataset that evaluates your model), and your modelpath (where you store the trained model). Add the following code to the line right above `Main` to specify the recently downloaded files: -```csharp -const string DataPath = @".\Data\train.csv"; -const string TestDataPath = @".\Data\test.csv"; -const string ModelPath = @".\Models\Model.zip"; -``` +[!code-csharp[InitializePaths](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#2 "Define variables to store the data file paths")] Next, create classes for the input data and the predictions: @@ -101,41 +87,15 @@ Next, create classes for the input data and the predictions: 1. In the **Add New Item** dialog box, change the **Name** to `TaxiTrip.cs`, and then click **Add**. 1. Add the following `using` statements: -```csharp -using Microsoft.ML.Runtime.Api; -``` +[!code-csharp[AddUsings](../../../samples/machine-learning/tutorials/TaxiFarePrediction/TaxiTrip.cs#1 "Add necessary usings")] Add two classes into this file. `TaxiTrip`, the input data set class, has definitions for each of the columns discovered above and a `Label` attribute for the fare_amount column that you are predicting. Add the following code to the file: -```csharp -public class TaxiTrip -{ - [Column(ordinal: "0")] - public string vendor_id; - [Column(ordinal: "1")] - public string rate_code; - [Column(ordinal: "2")] - public float passenger_count; - [Column(ordinal: "3")] - public float trip_time_in_secs; - [Column(ordinal: "4")] - public float trip_distance; - [Column(ordinal: "5")] - public string payment_type; - [Column(ordinal: "6", "Label")] - public float fare_amount; -} -``` +[!code-csharp[DefineTaxiTrip](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#2 "Define the taxi trip class")] The `TaxiTripFarePrediction` class is used for prediction after the model has been trained. It has a single float (fare_amount) and a `Score` `ColumnName` attribute. Add the following code into the file below the `TaxiTrip` class: -```csharp -public class TaxiTripFarePrediction -{ - [ColumnName("Score")] - public float fare_amount; -} -``` +[!code-csharp[DefineFarePrediction](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#3 "Define the fare predictions class")] Now go back to the **Program.cs** file. In `Main`, replace the `Console.WriteLine("Hello World!")` with the following code: @@ -186,11 +146,11 @@ The last step in data preparation combines all of your **features** into one vec ```csharp pipeline.Add(new ColumnConcatenator("Features", - "vendor_id", - "rate_code", - "passenger_count", - "trip_distance", - "payment_type")); + "vendor_id", + "rate_code", + "passenger_count", + "trip_distance", + "payment_type")); ``` Notice that the "trip_time_in_secs" column isn't included. You already determined that it isn't a useful prediction feature. @@ -210,13 +170,15 @@ Add the following code into the `Train()` method following the data processing c pipeline.Add(new FastTreeRegressor()); ``` +You added all the preceding steps to the pipeline as individual statements, but C# has a handy collection initialization syntax that makes it simpler to create and initialize the pipeline: + +[!code-csharp[CreatePipeline](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#3 "Create and initialize the learning pipeline")] + ## Train the model The final step is to train the model. Until this point, nothing in the pipeline has been executed. The `pipeline.Train()` function takes in the pre-defined `TaxiTrip` class type and outputs a `TaxiTripFarePrediction` type. Add this final piece of code into the `Train()` function: -```csharp -PredictionModel model = pipeline.Train(); -``` +[!code-csharp[TrainMOdel](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#4 "Train your model")] And that's it! You have successfully trained a machine learning model that can predict taxi fares in NYC. Now take a look to understand how accurate your model is and learn how to consume it. @@ -224,36 +186,23 @@ And that's it! You have successfully trained a machine learning model that can p Before you go onto the next step, save your model to a .zip file by adding the following code at the end of your `Train()` function: -```csharp -await model.WriteAsync(ModelPath); -``` +[!code-csharp[SaveModel](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#5 "Save the model asynchronously and return the model")] Adding the `await` statement to the `model.WriteAsync()` call means that the `Train()` method must be changed to an async method that returns a `Task`. Modify the signature of `Train` as shown in the following code: -```csharp -public static Task> Train() -{ - -} -``` +[!code-csharp[AsyncTraining](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#6 "Make the Train method async and return a task.")] Changing the return type of the `Train` method means you have to add an `await` to the codde that calls `Train` in the `Method` as shown in the following code: -```csharp -PredictionModel model = await Train(); -``` +[!code-csharp[AwaitTraining](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#7 "Await the Train method")] Adding an `await` in your `Main` method means the `Main` method must have the `async` modifier and return a `Task`: -```csharp -public static async Task Main() -``` +[!code-csharp[AsyncMain](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#8 "Make the Main method async and return a task.")] You'll also need to add the following using statement at the top of the file: -```csharp -using System.Threading.Tasks; -``` +[!code-csharp[UsingTasks](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#9 "Add System.Threading.Tasks. to your usings.")] ## Evaluate the model @@ -261,44 +210,27 @@ Evaluation is the process of checking how well the model works. It is important Now go back to your `Main` function and add the following code beneath the call to the `Train()`method: -```csharp -Evaluate(model); -``` +[!code-csharp[Evaluate](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#10 "Evaluate the model.")] The `Evaluate()` function evaluates your model. Create that function below `Train()`. Add the following code: -```csharp -public static void Evaluate(PredictionModel model) -{ - -} -``` +[!code-csharp[EvaluateMethod](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#11 "Define the Evaluate method.")] Load the test data using the `TextLoader()` function. Add the following code into the `Evaluate()` method: -```csharp -var testData = new TextLoader(TestDataPath, useHeader: true, separator: ","); -``` +[!code-csharp[LoadTestData](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#12 "Load the test data.")] Add the following code to evaluate the model and produce the metrics for it: -```csharp -var evaluator = new RegressionEvaluator(); -RegressionMetrics metrics = evaluator.Evaluate(model, testData); -``` +[!code-csharp[EvaluateAndMeasure](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#13 "Evaluate the model and its predictions.")] RMS is one metric for evaluating regression problems. The lower it is, the better your model. Add the following code into the `Evaluate()` function to print the RMS for your model. -```csharp -// Rms should be around 2.795276 -Console.WriteLine("Rms=" + metrics.Rms); -``` +[!code-csharp[DisplayRMS](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#14 "Display the RMS metric.")] RSquared is another metric for evaluating regression problems. RSquared will be a value between 0 and 1. The closer you are to 1, the better your model. Add the following code into the `Evaluate()` function to print the RSquared value for your model. -```csharp -Console.WriteLine("RSquared = " + metrics.RSquared); -``` +[!code-csharp[DisplayRSquared](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#15 "Display the RSquared metric.")] ## Use the model for predictions @@ -313,27 +245,13 @@ static class TestTrips This tutorial uses one test trip within this class. Later you can add other scenarios to experiment with this sample. Add the following code into the `TestTrips` class: -```csharp -internal static readonly TaxiTrip Trip1 = new TaxiTrip -{ - vendor_id = "VTS", - rate_code = "1", - passenger_count = 1, - trip_distance = 10.33f, - payment_type = "CSH", - fare_amount = 0 // predict it. actual = 29.5 -}; -``` +[!code-csharp[TestData](../../../samples/machine-learning/tutorials/TaxiFarePrediction/TestTrips.cs#1 "Create aq trip to predict its cost.")] This trip's actual fare is 29.5, but use 0 as a placeholder. The machine learning algorithm will predict the fare. Add the following code in your `Main` function. It tests out your model using the `TestTrip` data: -```csharp -var prediction = model.Predict(TestTrips.Trip1); - -Console.WriteLine("Predicted fare: {0}, actual fare: 29.5", prediction.fare_amount); -``` +[!code-csharp[Predict](../../../samples/machine-learning/tutorials/TaxiFarePrediction/Program.cs#16 "Try a prediction.")] Run the program to see the predicted taxi fare for your test case.