Skip to content

Latest commit

 

History

History
291 lines (234 loc) · 22 KB

03.Automated ML in Azure ML.MD

File metadata and controls

291 lines (234 loc) · 22 KB

Use Automated ML in Azure ML

Azure Machine Learning compute

  • At its core, Azure ML is a service for training and managing ML models, for which you need compute on which to run the training process.
  • Compute targets are cloud-based resources on which you can run model training and data exploration processes.
  • In Azure Machine Learning studio, you can manage the compute targets for your data science activities. 4 compute resource you can create:
    1. Compute Instances: Development workstations that data scientists can use to work with data and models.
    2. Compute Clusters: Scalable clusters of virtual machines for on-demand processing of experiment code.
    3. Inference Clusters: Deployment targets for predictive services that use your trained models.
    4. Attached Compute: Links to existing Azure compute resources, such as Virtual Machines or Azure Databricks clusters.

What is Azure Automated Machine Learning?

  • Azure ML includes an automated ML capability that automatically tries multiple pre-processing techniques and model-training algorithms in parallel. These automated capabilities use the power of cloud compute to find the best performing supervised ML model for your data.

  • create an automated machine learning job in Azure Machine Learning studio.

  • In Azure ML, operations that you run are called jobs. You can configure multiple settings for your job before starting an automated ML run. The run configuration provides the information needed to specify your training script, compute target, and Azure ML environment in your run configuration and run a training job.

Understand the AutoML process

  • The steps in a machine learning process as:

    1. Prepare data: Identify the features and label in a dataset. Pre-process, or clean and transform, the data as needed.
    2. Train model: Split the data into two groups, a training and a validation set.
      1. Train a machine learning model using the training data set.
      2. Test the machine learning model for performance using the validation data set.
    3. Evaluate performance: Compare how close the model's predictions are to the known labels.
    4. Deploy a predictive service: After you train a ML model, deploy the model as an application on a server so that others can use it.
  • These are the same steps in the automated ML process with Azure ML

    1. Prepare data - ML models must be trained with existing data. In Azure ML, data for model training and other operations is usually encapsulated in an object called a data asset. You can create your own data asset in Azure Machine Learning studio.
    2. Train model - The automated ML capability in Azure ML supports supervised ML models, means models for which the training data includes known label values. You can use automated ML to train models for:
      1. Classification (predicting categories or classes)
      2. Regression (predicting numeric values)
      3. Time series forecasting (predicting numeric values at a future point in time)
  • In Automated Machine Learning you can select from several types of tasks:

  • In Automated ML, select configurations for the primary metric, type of model used for training, exit criteria, and concurrency limits.

  • AutoML will split data into a training set and a validation set. You can configure the details in the settings before you run the job.

  • Evaluate performance - After the job has finished you can review the best performing model. Thus the "best" model the job generated might not be the best possible model, just the best one found within the time allowed for this exercise.

    • The best model is identified based on the evaluation metric you specified, Normalized root mean squared error.

    • A technique called cross-validation is used to calculate the evaluation metric. After the model is trained using a portion of the data, the remaining portion is used to iteratively test, or cross-validate, the trained model. The metric is calculated by comparing the predicted value from the test with the actual known value, or label.

    • The difference between the predicted and actual value, known as the residuals, indicates the amount of error in the model.

    • The performance metric root mean squared error (RMSE), is calculated by squaring the errors across all of the test cases, finding the mean of these squares, and then taking the square root. What all of this means is that smaller this value is, the more accurate the model's predictions.

    • The normalized root mean squared error (NRMSE) standardizes the RMSE metric so it can be used for comparison between models which have variables on different scales.

    • The Residual Histogram shows the frequency of residual value ranges. Residuals represent variance between predicted and true values that can't be explained by the model, in other words, errors. You should hope to see the most frequently occurring residual values clustered around zero. You want small errors with fewer errors at the extreme ends of the scale.

    • The Predicted vs. True chart should show a diagonal trend in which the predicted value correlates closely to the true value. The dotted line shows how a perfect model should perform. The closer the line of your model's average predicted value is to the dotted line, the better its performance. A histogram below the line chart shows the distribution of true values.

    • After you've used automated ML to train some models, you can deploy the best performing model as a service for client applications to use.

  • Deploy a predictive service - In Azure ML, you can deploy a service as an Azure Container Instances (ACI) or to an Azure Kubernetes Service (AKS) cluster. For production scenarios, an AKS deployment is recommended, for which you must create an inference cluster compute target.

Exercise - Explore Automated ML in Azure ML

  • use a dataset of historical bicycle rental details to train a model that predicts the number of bicycle rentals that should be expected on a given day, based on seasonal and meteorological features.

https://microsoftlearning.github.io/AI-900-AIFundamentals/instructions/02-module-02.html

  1. Create an Azure ML workspace: The Azure portal, create a new "Azure Machine Learning" resource with an Azure ML plan. Use the following settings:
    1. Subscription, Resource group, Workspace name, Region
    2. Storage account, Key vault, Application insights: Note the default will be created for your workspace.
    3. Container registry: None (one will be created automatically the first time you deploy a model to a container).
    4. Create. Go to the deployed resource. Select Launch studio (navigate to https://ml.azure.com), you should see your newly created workspace.
      1. If that is not the case, select Azure directory -> Workspaces -> select the one you created.
  1. Create compute: In Azure ML studio, select the ≡ icon -> Compute (under Manage) -> Compute clusters -> add a new compute cluster. Use this to train a machine learning model:

    1. Select Location | VM tier: Dedicated | VM type: CPU | VM size: Choose from all options, select Standard_DS11_v2 | Next
    2. Compute name: enter a unique name | Minimum number of nodes: 0 | Maximum number of nodes: 2 | Idle seconds before scale down: 120 | Enable SSH access: Clear | Create
    3. Note Compute instances and clusters are based on standard Azure VM images. For this module, the Standard_DS11_v2 image is recommended to achieve the optimal balance of cost and performance.
  2. Create a data asset - View the comma-separated data. https://capitalbikeshare.com/system-data

    1. In Azure ML studio, View the Data page (under Assets). The Data page contains specific data files or tables that you plan to work with in Azure ML.
    2. Create datasets: the Data page -> Create -> Then configure a data asset with the following settings:
      1. Data type => Name: bike-rentals | Description: Bicycle rental data | Dataset type: Tabular
      2. Data source: From Web Files
      1. Web URL => Web URL: https://aka.ms/bike-rentals | Skip data validation: do not select
      2. Settings => File format: Delimited | Delimiter: Comma | Encoding: UTF-8 | Column headers: Only first file has headers | Skip rows: None | Dataset contains multi-line data: do not select
      3. Schema= > Include all columns other than Path | Review the automatically detected types
      4. Review -> Select Create
    3. After the dataset has been created, open it and view the Explore page to see a sample of the data. This data contains historical features and labels for bike rentals.
  3. Run an automated machine learning job - to train a regression model that predicts bicycle rentals.

    1. In Azure ML studio, Select - > Automated ML (under Author) -> New Automated ML Job:
      1. Select data asset => Dataset: bike-rentals
      2. Configure job => New experiment name: mslearn-bike-rental | Target column: rentals (this is the label that the model is trained to predict) | Select Azure ML compute cluster: the compute cluster that you created previously.
      3. Select task and settings => Task type: Regression (the model predicts a numeric value)
        1. Additional configuration settings:
          1. Primary metric: Select Normalized root mean squared error
          2. Explain best model: Selected — this option causes automated ML to calculate feature importance for the best model which makes it possible to determine the influence of each feature on the predicted label.
          3. Use all supported models: Unselected. You’ll restrict the job to try only a few specific algorithms.
          4. Allowed models: Select only RandomForest and LightGBM — normally you’d want to try as many as possible, but each model added increases the time it takes to run the job.
          5. Exit criterion:
            1. Training job time (hours): 0.5 — ends the job after a maximum of 30 minutes.
            2. Metric score threshold: 0.085 — if a model achieves a normalized root mean squared error metric score of 0.085 or less, the job ends.
          6. Concurrency: do not change
        2. Featurization settings => Enable featurization: Selected — automatically preprocess the features before training.
      4. Click Next -> Validate and test type => Validation type: Auto | Test data asset (preview): No test data asset required
      5. When you finish submitting the automated ML job details, it starts automatically. Wait for the status to change from Preparing to Running.
      6. When the status changes to Running, view the Models tab and observe as each possible combination of training algorithm and pre-processing steps is tried and the performance of the resulting model is evaluated.
  1. Review the best model - On the Overview tab of the automated ML job, note the best model summary.

    1. A message under the status “Warning: User specified exit score reached…”. This is an expected message. Please continue to the next step.
    2. Select the text under Algorithm name for the best model to view its details.
      1. Select the Metrics tab and select the residuals and predicted_true charts if they are not already selected.
        1. Review the charts which show the performance of the model. The first chart shows the residuals, the differences between predicted and actual values, as a histogram, the second chart compares the predicted values against the true values.
      2. Select the Explanations tab. Select an Explanation ID and then select Aggregate feature importance. This chart shows how much each feature in the dataset influences the label prediction:
    1. Next to the Normalized root mean squared error value, select View all other metrics to see values of other possible evaluation metrics for a regression model.
  2. Deploy a predictive service

    1. In Azure ML studio, Select Automated ML (under Authoring) -> Display name -> Algorithm Name (Overview tab under Best model summary) -> Model tab -> Deploy button -> Web service option:
      1. Name: predict-rentals | Description: Predict cycle rentals | Compute type: Azure Container Instance | Enable authentication: Selected
      2. Wait for the deployment to start - this may take a few seconds.
    2. Then, in the Model summary section, observe the Deploy status for the predict-rentals service, which should be Running. Wait for this status to change to Succeeded, which may take some time. You may need to select Refresh periodically.
  1. Test the deployed service - In Azure ML studio, select Endpoints (under Assest). Now you can test your deployed service.

    1. On the Endpoints page, open the predict-rentals real-time endpoint. view the Test tab.
    2. In the Input data to test real-time endpoint pane, replace the template JSON with the following input data. Click on the Test button. { "Inputs": { "data": [ { "day": 1, "mnth": 1,
      "year": 2022, "season": 2, "holiday": 0, "weekday": 1, "workingday": 1, "weathersit": 2, "temp": 0.3, "atemp": 0.3, "hum": 0.3, "windspeed": 0.3 } ]
      },
      "GlobalParameters": 1.0 }
    3. Review the test results, which include a predicted number of rentals based on the input features. The test pane took the input data and used the model you trained to return the predicted number of rentals.
  2. Review what you have done. Used a dataset of historical bicycle rental data to train a model. The model predicts the number of bicycle rentals expected on a given day, based on seasonal and meteorological features. In this case, the labels are number of bicycle rentals.

  3. Ready to be connected to a client application using the credentials in the Consume tab.

  4. Clean-up

    1. The web service you created is hosted in an Azure Container Instance. If no use further,delete the endpoint to avoid accruing unnecessary Azure usage.
    2. Stop the compute instance until you need it again.
      1. In Azure ML studio, Select Endpoint -> chekc predict-rentals -> Delete.
      2. Select Compute -> Compute Instances tab -> select your compute instance and then select Stop.
    3. Note: Stopping your compute ensures your subscription won’t be charged for compute resources. You will however be charged a small amount for data storage as long as the Azure ML workspace exists in your subscription.
  5. To delete your workspace - In the Azure portal -> Resource groups page -> select Azure Machine Learning workspace -> Delete resource group.

Explore Azure OpenAI

  • Explore Azure OpenAI Service and use it to deploy and experiment with generative AI models.
  • Need access to the Azure OpenAI service for both text and code models, and DALL-E image generation models.
  • Sign into the Azure portal- > Create an Azure OpenAI resource with the following settings:

Subscription: An Azure subscription that has been approved for access to the Azure OpenAI service. Resource group: Create a new resource group with a name of your choice. Region: Choose any available region. Name: A unique name of your choice. Pricing tier: Standard S0 Wait for deployment to complete. Then go to the deployed Azure OpenAI resource in the Azure portal. Explore Azure AI Studio You can deploy, manage, and explore models in your Azure OpenAI Service by using Azure AI Studio.

On the Overview page for your Azure OpenAI resource, use the Explore button to open Azure AI Studio in a new browser tab. Alternatively, navigate to Azure AI Studio directly.

When you first open Azure AI Studio, it should look similar to this:

Screenshot of Azure AI Studio.

View the pages available in the pane on the left. You can always return to the home page at the top. Additionally, AI Studio provides multiple pages where you can:

Experiment with models in a playground. Manage model deployments and data. Deploy a model for language generation To experiment with natural language generation, you must first deploy a model.

On the Models page view the available models in your Azure OpenAI service instance. Select any of the gpt-35-turbo models for which the Deployable status is Yes, and then select Deploy:

Screenshot of the Models page in Azure AI Studio.

Create a new deployment with the following settings: Model: gpt-35-turbo Model version: Auto-update to default Deployment name: A unique name for your model deployment Note: You can only deploy one instance of each model. If you have already deployed a gpt-35-turbo model, you can use it to complete the rest of this exercise.

Use the Chat playground to work with the model Now that you have deployed a model, you can use it in the Chat playground to generate natural language output from prompts that you submit in a chat interface.

In Azure AI Studio, navigate to the Chat playground in the left pane.

The Chat playground provides a chatbot interface with which you can interact with your deployed model, as shown here:

Screenshot of the Chat playgrund in Azure AI Studio.

In the Configuration pane, ensure that your model deployment is selected. In the Assistant setup pane, select the Default system message template, and view the system message this template creates. The system message defines how the model will behave in your chat session. In the Chat session section, enter the following user message.

Code What is generative AI? Observe the output returned by the model, which should provide a definition of generative AI. Enter the following user message as a follow-up question:

Code What are three benefits it provides? Review the output, noting that the chat session has kept track of the previous input and response to provide context (so it correctly interprets “it” as referring to “generative AI”) and that it provides a suitable response based on what was requested (it should return three benefits of generative AI). Use the DALL-E playground to generate images In addition to language generation models, Azure OpenAI Service supports the DALL-E 2 model for image generation.

Note: You must have applied for and received access to DALL-E functionality in your Azure OpenAI service access application to complete this section of the exercise.

In Azure AI Studio, navigate to the DALL-E playground in the left pane. Enter the following prompt:

Code A robot eating spaghetti Select Generate and view the results, which should consist of an image based on the description you provided in the prompt, similar to this:

Screenshot of the DALL-E playgrund in Azure AI Studio.

Generate a second image by modifying the prompt to:

Code A robot eating spaghetti in the style of Rembrandt Verify that the new image matches the requirements of the prompt, similar to this:

Screenshot of DALL-E generated images in Azure AI Studio.

Clean up When you’re done with your Azure OpenAI resource, remember to delete the deployment or the entire resource in the Azure portal.

Quiz

  1. An automobile dealership wants to use historic car sales data to train a machine learning model. The model should predict the price of a pre-owned car based on its make, model, engine size, and mileage. What kind of machine learning model should the dealership use automated machine learning to create?
    • Classification
    • Regression : Correct. To predict a numeric value, use a regression model.
    • Time series forecasting
  2. A bank wants to use historic loan repayment records to categorize loan applications as low-risk or high-risk based on characteristics like the loan amount, the income of the borrower, and the loan period. What kind of machine learning model should the bank use automated machine learning to create?
    • Classification: Correct. To predict a category, or class, use a classification model.
    • Regression
    • Time series forecasting
  3. You want to use automated machine learning to train a regression model with the best possible R2 score. How should you configure the automated machine learning experiment?
    • Set the Primary metric to R2 score : Correct. The primary metric determines the metric used to evaluate the best performing model.
    • Block all algorithms other than GradientBoosting
    • Enable featurization