amp-catalog-cloudera-default-new.yaml

name: Cloudera

entries:
  - title: RAG Studio
    label: rag-studio
    short_description: |
      Build chatbots powered by your data - no code RAG studio for a quick start, easy access and fast innovation.
    long_description: |
      RAG Studio is a no-code application built on the Cloudera platform that enables you to create RAG chatbots powered by your enterprise data in minutes. Designed for accessibility, it bridges the gap between business and IT teams, driving collaboration in AI projects.          
      ---------------------------
      IMPORTANT: Please read the following before proceeding.  This AMP includes or otherwise depends on certain third party software packages. Information about such third party software packages are made available in the notice file associated with this AMP. By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites. For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms. If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP. By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.
    long_description_html: |
      RAG Studio is a no-code application built on the Cloudera platform that enables you to create RAG chatbots powered by your enterprise data in minutes. Designed for accessibility, it bridges the gap between business and IT teams, driving collaboration in AI projects.
      <div style="margin-top:10px"><b>IMPORTANT:</b> Please read the following before proceeding.</div>
      <div style="margin-top:10px">This AMP includes or otherwise depends on certain third party software packages.  Information about such third party software packages are made available in the notice file associated with this AMP.  By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites.  For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms.</div>
      <div style="margin-top:10px">If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP.  By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.</div>
    image_path: >-
      https://raw.githubusercontent.com/cloudera/CML_AMP_RAG_Studio/refs/heads/main/RAG-AMP.jpg
    tags:
      - LLM
      - RAG
      - AI
    git_url: "https://github.com/cloudera/CML_AMP_RAG_Studio"
    git_ref: "release/1"
    is_prototype: true
    is_new: true
  - title: RAG Monitoring
    label: rag-monitoring
    short_description: >
      Monitor real-time metrics for a RAG pipeline and gather insights into the pipeline's behavior.
    long_description: |
      Build a monitoring dashboard over a RAG (Retrieval Augmented Generation) system. The dashboard should be able to monitor the model's performance and provide insights into the model's behavior. The AMP uses AWS Bedrock Models for indexing, response generation and evaluation and would need access keys for it to run.
      ---------------------------
      IMPORTANT: Please read the following before proceeding. This AMP includes or otherwise depends on certain third party software packages. Information about such third party software packages are made available in the notice file associated with this AMP. By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites. For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms.
      If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP. By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.
    long_description_html: |
      Build a monitoring dashboard over a RAG (Retrieval Augmented Generation) system. The dashboard should be able to monitor the model's performance and provide insights into the model's behavior. The AMP uses AWS Bedrock Models for indexing, response generation and evaluation and would need access keys for it to run.
      <div style="margin-top:10px"><b>IMPORTANT:</b> Please read the following before proceeding.</div>
      <div style="margin-top:10px">This AMP includes or otherwise depends on certain third party software packages.  Information about such third party software packages are made available in the notice file associated with this AMP.  By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites.  For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms.</div>
      <div style="margin-top:10px">If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP.  By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.</div>
    image_path: >-
      https://github.com/cloudera/CML_AMP_RAG_Monitoring/blob/main/assets/RAG-monitoring-tn.png?raw=true
    tags:
      - Model Monitoring
      - LLM Monitoring
      - MLOps
      - LLM
      - RAG
      - MLFlow
      - Streamlit
      - GenAI
      - Evaluation
      - LLM Evaluation
      - RAG Evaluation
    git_url: "https://github.com/cloudera/CML_AMP_RAG_Monitoring"
    is_prototype: true
    is_new: true
  - title: Knowledge Graph powered RAG based QA application
    label: llm-kg-rag-qa
    short_description: |
      A Knowledge-graph enhanced RAG application to answer your AI/ML related questions.
    long_description: |
      This AMP spins up a knowledge graph powered RAG application which has the capability to answer AI/ML questions drawing from the latest research publications. The knowledge base consists of ~650 AI/ML papers from arXiv, and the citation relationships between them are captured as "edges" in the knowledge graph, which is powered by Neo4j. Additional information from knowledge graph is used to better rerank text chunks retrieved by vector search and also suggest related papers from the answer to assist user's research.
      ---------------------------
      IMPORTANT: Please read the following before proceeding.  This AMP includes or otherwise depends on certain third party software packages. Information about such third party software packages are made available in the notice file associated with this AMP. By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites. For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms. If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP. By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.
    long_description_html: |
      This AMP spins up a knowledge graph powered RAG application which has the capability to answer AI/ML questions drawing from the latest research publications. The knowledge base consists of ~650 AI/ML papers from arXiv, and the citation relationships between them are captured as "edges" in the knowledge graph, which is powered by Neo4j. Additional information from knowledge graph is used to better rerank text chunks retrieved by vector search and also suggest related papers from the answer to assist user's research.
      <div style="margin-top:10px"><b>IMPORTANT:</b> Please read the following before proceeding.</div>
      <div style="margin-top:10px">This AMP includes or otherwise depends on certain third party software packages.  Information about such third party software packages are made available in the notice file associated with this AMP.  By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites.  For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms.</div>
      <div style="margin-top:10px">If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP.  By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.</div>
    image_path: "https://github.com/cloudera/CML_AMP_Knowledge_Graph_Backed_RAG/blob/main/assets/AMP_thumbnail.jpg?raw=true"
    tags:
      - GraphDB
      - RAG
      - Neo4j
      - LLM
      - Knowledge-Graph
      - GenAI
    git_url: "https://github.com/cloudera/CML_AMP_Knowledge_Graph_Backed_RAG"
    is_prototype: true
    is_new: true
    environment_variables: 
      is_embedded_app:
        default: true
        description: "Embed this app within AI Workbench"

  - title: Fine Tuning Studio
    label: fine-tuning-studio
    short_description: |
      A one-stop-shop interface for managing, viewing, launching, and monitoring LLM fine-tuning jobs within Cloudera ML.
    long_description: |
      The CML Fine Tuning Studio is a Cloudera-developed AMP that provides users with an all-encompassing application and ecosystem for managing, fine tuning, and evaluating LLMs. This application is a launcher that helps users organize and dispatch other CML Workloads (primarily CML Jobs) that are configured specifically for LLM training and evaluation type tasks.
      ---------------------------
      IMPORTANT: Please read the following before proceeding.  This AMP includes or otherwise depends on certain third party software packages. Information about such third party software packages are made available in the notice file associated with this AMP. By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites. For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms. If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP. By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.
    long_description_html: |
      The CML Fine Tuning Studio is a Cloudera-developed AMP that provides users with an all-encompassing application and ecosystem for managing, fine tuning, and evaluating LLMs. This application is a launcher that helps users organize and dispatch other CML Workloads (primarily CML Jobs) that are configured specifically for LLM training and evaluation type tasks.
      <div style="margin-top:10px"><b>IMPORTANT:</b> Please read the following before proceeding.</div>
      <div style="margin-top:10px">This AMP includes or otherwise depends on certain third party software packages.  Information about such third party software packages are made available in the notice file associated with this AMP.  By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites.  For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms.</div>
      <div style="margin-top:10px">If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP.  By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.</div>
    image_path: "https://github.com/cloudera/CML_AMP_LLM_Fine_Tuning_Studio/blob/main/resources/images/ft-logo-catalog.png?raw=true"
    tags:
      - Finetuning
      - LLM
      - PEFT
      - LoRA
      - QLoRA 
      - Evaluation 
      - MLFlow 
      - Streamlit 
      - Adapters
      - GenAI
    git_url: "https://github.com/cloudera/CML_AMP_LLM_Fine_Tuning_Studio"
    is_prototype: true
    is_new: true

  - title: PromptBrew by Verta
    label: prompt-brew
    short_description: |
      Create, iterate, refine and test your LLM prompts with AI assistance.
    long_description: |
      Launch a PromptBrew app to get AI-powered assistance in crafting LLM prompts. Apply leading prompting strategies to your prompts and learn by doing.
      ---------------------------
      IMPORTANT: Please read the following before proceeding.  This AMP includes or otherwise depends on certain third party software packages. Information about such third party software packages are made available in the notice file associated with this AMP. By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites. For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms. If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP. By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.
    long_description_html: |
      Launch a PromptBrew app to get AI-powered assistance in crafting LLM prompts. Apply leading prompting strategies to your prompts and learn by doing.
      <div style="margin-top:10px"><b>IMPORTANT:</b> Please read the following before proceeding.</div>
      <div style="margin-top:10px">This AMP includes or otherwise depends on certain third party software packages.  Information about such third party software packages are made available in the notice file associated with this AMP.  By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites.  For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms.</div>
      <div style="margin-top:10px">If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP.  By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.</div>
    image_path: "https://github.com/cloudera/CML_AMP_PromptBrew/blob/main/assets/image.png?raw=true"
    tags:
      - Verta
      - OpenAI
      - GenAI
      - Prompts
      - Python
      - Prompt Engineering
      - LLM
    git_url: "https://github.com/cloudera/CML_AMP_PromptBrew"
    is_prototype: true
    is_new: true

  - title: Chat with your Documents
    label: docchat
    short_description: Chatbot for Custom Documents
    long_description: >-
      Pre-trained instruction-following LLM (Large Language Model) ChatBot enhanced by context from an internal knowledge base 
    image_path: "https://github.com/cloudera/LlamaIndex_IN_CML_AMP/blob/main/assets/images/logo.png?raw=true"
    tags:
      - DocChat
      - RAG
      - LlamaIndex
    git_url: "https://github.com/cloudera/LlamaIndex_IN_CML_AMP.git" 
    is_prototype: true
    is_new: true

  - title: Intelligent QA Chatbot with   NiFi, Pinecone, and Llama2
    label: llm-code-creation
    short_description: |
        Ingest data with Cloudera DataFlow from a user-specified website sitemap to create embeddings in a Pinecone vector DB and deploy a context-aware LLM chatbot app with Cloudera Machine Learning.
    long_description: |
        Ingest data with Cloudera DataFlow from a user-specified website sitemap to create embeddings in a Pinecone vector DB and deploy a context-aware LLM chatbot app with Cloudera Machine Learning.
        ---------------------------
        IMPORTANT: Please read the following before proceeding. This AMP includes or otherwise depends on certain third party software packages.  Information about such third party software packages are made available in the notice file associated with this AMP.  By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites.  For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms.
        If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP.  By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.
    long_description_html: |
        Ingest data with Cloudera DataFlow from a user-specified website sitemap to create embeddings in a Pinecone vector DB and deploy a context-aware LLM chatbot app with Cloudera Machine Learning.
        <div style="margin-top:10px"><b>IMPORTANT:</b> Please read the following before proceeding.</div>
        <div style="margin-top:10px">This AMP includes or otherwise depends on certain third party software packages.  Information about such third party software packages are made available in the notice file associated with this AMP.  By configuring and launching this AMP, you will cause such third party software packages to be downloaded and installed into your environment, in some instances, from third parties' websites.  For each third party software package, please see the notice file and the applicable websites for more information, including the applicable license terms.</div>
        <div style="margin-top:10px">If you do not wish to download and install the third party software packages, do not configure, launch or otherwise use this AMP.  By configuring, launching or otherwise using the AMP, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the third party software packages.</div>
    image_path: >-
      https://raw.githubusercontent.com/cloudera/CML_AMP_Intelligent-QA-Chatbot-with-NiFi-Pinecone-and-Llama2/main/assets/catalog-entry.png
    tags:
      - Chatbot
      - Pinecone
      - Milvus
      - LLM
      - Llama2
      - Generative AI
      - RAG
      - NiFi
      - CDF
      - NLP
      - GPU
    git_url: 'https://github.com/cloudera/CML_AMP_Intelligent-QA-Chatbot-with-NiFi-Pinecone-and-Llama2'
    is_prototype: true
    
  - title: Text Summarization and more with Amazon Bedrock
    label: llm_bedrock
    short_description: |
        This AMP demonstrates how to integrate text generation models from the Amazon Bedrock service for text usecases like summarization.
    long_description: |
        This AMP demonstrates how to integrate text generation models from the Amazon Bedrock service for text usecases like summarization.
  
    image_path: >-
      https://raw.githubusercontent.com/cloudera/CML_AMP_AI_Text_Summarization_with_Amazon_Bedrock/main/images/amp-image.png
    tags:
      - Bedrock
      - LLM
      - Summarization
    git_url: 'https://github.com/cloudera/CML_AMP_AI_Text_Summarization_with_Amazon_Bedrock'
    is_prototype: true
    
  - title: Fine-Tuning a Foundation Model for Multiple Tasks (with QLoRA)
    label: llm-fine-tuning
    short_description: |
        This AMP demonstrates how to improve performance of Large Language Models for specific tasks using distributed fine tuning techniques like Parameter-Efficient Fine-Tuning(PEFT) and Quantization.
    long_description: |
        This AMP demonstrates how to improve performance of Large Language Models for specific tasks using distributed fine tuning techniques like Parameter-Efficient Fine-Tuning(PEFT) and Quantization.
        ---------------------------
        IMPORTANT: Please read the following before proceeding.
        -----
        By configuring and launching this AMP, you will cause the model and datasets, identified below, to be downloaded and installed into your environment from third parties’ websites.  For each model or dataset, please see the applicable website for more information, including the applicable license terms.
        ------
        Model: https://huggingface.co/bigscience/bloom-1b1
        ------
        Datasets:
        https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct
        https://huggingface.co/datasets/s-nlp/paradetox
        https://huggingface.co/datasets/philschmid/sql-create-context-copy
        ------------------------------
        If you do not wish to download and install the model and the datasets, click “cancel” below.  By clicking “configure” below, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the model and the datasets.
    long_description_html: |
        This AMP demonstrates how to improve performance of Large Language Models for specific tasks using distributed fine tuning techniques like Parameter-Efficient Fine-Tuning(PEFT) and Quantization.
        <div style="margin-top:10px"><b>IMPORTANT:</b> Please read the following before proceeding.</div>
        <div style="margin-top:10px">By configuring and launching this AMP, you will cause the model and datasets, identified below, to be downloaded and installed into your environment from third parties’ websites.  For each model or dataset, please see the applicable website for more information, including the applicable license terms.</div>
        <div style="margin-top:10px"><b>Model:</b></div>
            <div style="margin-top:10px">https://huggingface.co/bigscience/bloom-1b1</div>
        <div style="margin-top:10px"><b>Datasets:</b></div>
            <div style="margin-top:10px">https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct</div>
            <div style="margin-top:10px">https://huggingface.co/datasets/s-nlp/paradetox</div>
            <div style="margin-top:10px">https://huggingface.co/datasets/philschmid/sql-create-context-copy</div>
        <div style="margin-top:10px">If you do not wish to download and install the model and the datasets, click “cancel” below.  By clicking “configure” below, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for the model and the datasets.</div>

    image_path: >-
      https://github.com/cloudera/CML_AMP_Finetune_Foundation_Model_Multiple_Tasks/blob/main/images/titlecard.png?raw=true
    tags:
      - Huggingface
      - QLoRA
      - PEFT
      - LLM
      - Fine-tuning
      - PEFT
      - Distributed
      - GPU
    git_url: 'https://github.com/cloudera/CML_AMP_Finetune_Foundation_Model_Multiple_Tasks'
    is_prototype: true
  - title: LLM Chatbot Augmented with Enterprise Data
    label: llm-chatbot
    short_description: |
        Build a Retrieval Augmented Generation (RAG) Question-Answer Large
        Language Model (LLM) Bot with local documents
    long_description: |
        IMPORTANT: Please read the following before proceeding.  By configuring and launching this AMP, you will cause h2oai/h2ogpt-oig-oasst1-512-6.9b, which is a third party large language model (LLM), to be downloaded and installed into your environment from the third party’s website.  Please see https://huggingface.co/h2oai/h2ogpt-oig-oasst1-512-6.9b for more information about the LLM, including the applicable license terms.  If you do not wish to download and install h2oai/h2ogpt-oig-oasst1-512-6.9b, click “cancel” below.  By clicking “Configure Project” below, you acknowledge the foregoing statement and agree that Cloudera is not responsible or liable in any way for h2oai/h2ogpt-oig-oasst1-512-6.9b. Author: Cloudera Inc.
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        This AMP builds a Retrieval Augmented Generation (RAG)
        Question-Answer Large Language Model (LLM) Bot application which
        demonstrates how context from local documents can be used with pre-trained
        LLM models to perform context retrieval and chat response
        generation with factual responses.        

    image_path: >-
      https://raw.githubusercontent.com/cloudera/CML_AMP_LLM_Chatbot_Augmented_with_Enterprise_Data/main/images/catalog-screenshot.png
    tags:
      - Chatbot
      - LLM
      - Huggingface
      - Generative AI
      - RAG
      - Vector DB
      - Milvus
      - Transformers
      - NLP
      - GPU
    git_url: 'https://github.com/cloudera/CML_AMP_LLM_Chatbot_Augmented_with_Enterprise_Data'
    is_prototype: true

  - title: Churn Modeling with scikit-learn
    label: churn-prediction
    short_description: Build an scikit-learn model to predict churn using customer telco data.
    long_description: >-
      This project demonstrates how to build a logistic regression classification model to predict the probability 
      that a group of customers will churn from a fictitious telecommunications company. In addition, the model is 
      interpreted using a technique called Local Interpretable Model-agnostic Explanations (LIME). Both the logistic 
      regression and LIME models are deployed using CML's real-time model deployment capability and interact with a 
      basic Flask-based web application.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/churn-prediction.jpg
    tags:
      - Churn Prediction
      - Logistic Regression
      - Explainability
      - Lime
    git_url: "https://github.com/cloudera/CML_AMP_Churn_Prediction"
    is_prototype: true

  - title: Deep Learning for Image Analysis
    label: image-analysis
    short_description: Build a semantic search application with deep learning models.
    long_description: >-
      This project demonstrates how to build a scalable semantic search solution
      on a dataset of images. Pretrained convolutional neural networks are used to
      extract semantically meaningful representations, which are then indexed
      using the FAISS library for scalable retrieval. Finally, the project
      launches an interactive visualization for exploring the quality of
      representations extracted using multiple model architectures.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/image-analysis.jpg
    tags:
      - Computer Vision
      - Image Analysis
      - Semantic Search
    git_url: "https://github.com/cloudera/CML_AMP_Image_Analysis"
    is_prototype: true

  - title: Deep Learning for Anomaly Detection
    label: anomaly-detection
    short_description: Apply modern, deep learning techniques for anomaly detection to identify network intrusions.
    long_description: >-
      This project includes implementations of several neural networks
      (Autoencoder, Variational Autoencoder, Bidirectional GAN, Sequence Models)
      applied to the task of anomaly detection in Tensorflow 2.0. For comparison,
      it includes two baselines (One Class SVM, PCA) and provides a frontend
      interface for exploring model results.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/anomaly-detection.jpg
    tags:
      - Anomaly Detection
      - Tensorflow
      - Autoencoder
      - GAN
    git_url: "https://github.com/cloudera/CML_AMP_Anomaly_Detection"
    is_prototype: true
    environment_variables: 
      is_embedded_app:
        default: true
        description: "Embed this app within AI Workbench"

  - title: Structural Time Series
    label: structural-time-series
    short_description: Applying a structural time series approach to California hourly electricity demand data.
    long_description: >-
      This project provides an example application of a structural approach to time series via 
      generalized additive models (with the Prophet library) to California hourly electricity demand 
      data. The primary output of this repository is a small application exposing a probablistic 
      forecast and interface for asking a probabilistic questions against it.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/structural-time-series.jpg
    tags:
      - Time Series
      - Prophet
      - Demand Forcasting
    git_url: "https://github.com/cloudera/CML_AMP_Structural_Time_Series"
    is_prototype: true

  - title: Analyzing News Headlines with SpaCy
    label: spacy-entity-extraction
    short_description: Notebook demonstrating entity extraction on headlines with SpaCy.
    long_description: >-
      This project is a single notebook that demonstrates extracting named entities from Reuters news headlines with spaCy. 
      It provides a few example downstream use cases.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/spacy-entity-extraction.png
    tags:
      - SpaCy
      - NLP
      - Named Entity Recognition
    git_url: "https://github.com/cloudera/CML_AMP_SpaCy_Entity_Extraction.git"
    is_prototype: true

  - title: Deep Learning for Question Answering
    label: question-answering
    short_description: Explore an emerging NLP capability with WikiQA, an automated question answering system built on top of Wikipedia.
    long_description: >-
      This project allows users to explore the task of question answering from several angles. First, users can interact with a real QA 
      system, in which open questions are answered with snippets found in Wikipedia articles. Next, users can explore an app that showcases 
      the types of models that make QA systems possible. Finally, users can learn and visualize the data structures required for training 
      and evaluating those models.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/question-answering.png
    tags:
      - Automated Question Answering
      - Extractive Question Answering
      - BERT
      - NLP
    git_url: "https://github.com/cloudera/CML_AMP_Question_Answering.git"
    is_prototype: true

  - title: Explaining Models with LIME and SHAP
    label: explainability-lime-shap
    short_description: Learn how to explain ML models using LIME and SHAP.
    long_description: >-
      This projects provides a notebook on how to explain machine learning models using tools such as SHAP and LIME. It explores 
      concepts such as global and local explanations, illustrated with six different models - Naive Bayes, Logistic Regression, 
      Decision Tree, Random Forest, Gradient Boosted Tree, and a Multilayer Perceptron. It also discusses best practices for debugging 
      explanations as well as limitations of LIME and SHAP.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/explainability-lime-shap.png
    tags:
      - Interpretability
      - Explainability
      - LIME
      - SHAP
    git_url: "https://github.com/cloudera/CML_AMP_Explainability_LIME_SHAP.git"
    is_prototype: true

  - title: Active Learning
    label: active-learning
    short_description: Interactive visual workflow of active learning using the MNIST dataset.
    long_description: >-
      Supervised machine learning, while powerful, needs labeled data to be effective. Active learning reduces the number of labeled examples 
      needed to train a model, saving time and money while obtaining comparable performance to models trained with much more data.This application 
      demonstrates the active learning workflow in an interactive experience.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/active-learning.png
    tags:
      - Active Learning
      - Learning with Limited Labeled Data
    git_url: "https://github.com/cloudera/CML_AMP_Active_Learning.git"
    is_prototype: true

  - title: Few-Shot Text Classification
    label: fewshot-text-classification
    short_description: Perform topic classification on news articles in several limited-labeled data regimes.
    long_description: >-
      This project provides a sample user interface that demonstrates how to perform text classification when only a few labeled training 
      examples exist, or even when there are no training examples at all! The approach relies on embedding text using word embeddings and 
      sentence embeddings with state-of-the-art Transformer models.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/fewshot-text-classification.png
    tags:
      - NLP
      - Few-Shot Learning
      - Zero-Shot Classification
      - Text Embeddings
      - BERT
      - GPU
    git_url: "https://github.com/cloudera/CML_AMP_Few-Shot_Text_Classification.git"
    is_prototype: true

  - title: Canceled Flight Prediction
    label: canceled-flight-prediction
    short_description: Perform analytics on a large airline dataset with Spark and build an XGBoost model to predict flight cancellations.
    long_description: >-
      This project demonstrates end-to-end processing with Spark to take two large, raw datasets and transform them into a unified dataset 
      upon which an XGBoost classification model is trained to predict flight cancellations. Additionally, the project deploys a hosted model 
      and front-end application to allow users to interact with the trained model.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/canceled-flight-prediction.png
    tags:
      - Binary Classification
      - XGBoost
      - PySpark
      - Flask
    git_url: "https://github.com/cloudera/CML_AMP_Canceled_Flight_Prediction.git"
    is_prototype: true

  - title: Streamlit
    label: streamlit
    short_description: Demonstration of how to use Streamlit as a CML Application.
    long_description: >-
      This project demonstrates running a small Streamlit application inside CML. It does no machine learning, and simply illustrates the small
      amount of wiring necessary to create a CML Application using Streamlit.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/streamlit.png
    tags:
      - Streamlit
      - Applications
      - Data Visualization
    git_url: "https://github.com/cloudera/CML_AMP_Streamlit_on_CML.git"
    is_prototype: true

  - title: Object Detection Inference Visualized
    label: object-detection-inference
    short_description: Interact with a blog-style Streamlit application to visually unpack the inference workflow of a modern, single-stage object detector.
    long_description: >-
      This application offers a step-by-step walkthrough to help visualize the inference workflow of a single-stage object detector. Specifically,
      we'll see how a pre-trained RetinaNet model processes an image to quickly and accurately detect objects while also exploring fundamental object 
      detection concepts along the way.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/object-detection-inference.png
    tags:
      - Computer Vision
      - Object Detection
      - PyTorch
      - Streamlit
    git_url: "https://github.com/cloudera/CML_AMP_Object_Detection_Inference.git"
    is_prototype: true

  - title: Getting Started with the CML API
    label: apiv2
    short_description: Demonstration of how to use the CML API to interact with CML.
    long_description: >-
      In addition to the UI interface, Cloudera Machine Learning (CML) provides an API to interact with the platform programmatically. This notebook 
      demonstrates how to work with the API.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/apiv2.png
    tags:
      - API
      - CML
      - Python
    git_url: "https://github.com/cloudera/CML_AMP_APIv2.git"
    is_prototype: true

  - title: AutoML with TPOT
    label: automl-with-tpot
    short_description: AutoML using TPOT, distributed with Dask.
    long_description: >-
      Automated data visualization and scikit learn pipeline creation using TPOT, Dask, and CML Workers in a Jupyter notebook.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/automl-with-tpot.png
    tags:
      - TPOT
      - AutoML
      - Dask
      - Python
      - Workers
    git_url: "https://github.com/cloudera/CML_AMP_AutoML_with_TPOT.git"
    is_prototype: true

  - title: Automatic Text Summarization
    label: summarize
    short_description: Automatic text summarization with extractive and abstractive models.
    long_description: >-
      This project builds a Streamlit application that demos four automatic summarization models, including extractive and abstractive techniques.
      It facilitates qualitative and quantitative comparisons of model summaries, as well as allowing users to summarize their own input text.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/summarize.png
    tags:
      - Summarization
      - NLP
      - Streamlit
    git_url: "https://github.com/cloudera/CML_AMP_Summarize.git"
    is_prototype: true

  - title: Train Gensim's Word2Vec
    label: gensim-w2v
    short_description: Demonstration of how to train Gensim's Word2Vec for a non-language use case.
    long_description: >-
      This Jupyter Notebook project demonstrates how to train Word2Vec for a non-language  use case to learn embeddings for productcs on an e-commerce website. 
      Includes a demonstration of hyperparameter optimization and early stopping for the Word2Vec model.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/gensim-w2v.png
    tags:
      - Embeddings
      - Gensim
      - Word2Vec
      - Hyperparameter Optimization
    git_url: "https://github.com/cloudera/CML_AMP_Train_Gensim_W2V.git"
    is_prototype: true

  - title: TensorBoard as a CML Application
    label: tensorboard
    short_description: Demonstration of how to use TensorBoard as a CML Application.
    long_description: >-
      This project demonstrates how to run a TensorBoard dashboard as an application inside CML. To facilitate the demo, a minimal script is run to train a neural network
      on the MNIST digits data set while capturing logs that are visualized in the TensorBoard application.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/tensorboard.png
    tags:
      - Tensorboard
      - Applications
      - Tensorflow
      - Keras
    git_url: "https://github.com/cloudera/CML_AMP_Tensorboard_on_CML.git"
    is_prototype: true

  - title: Video Classification
    label: video-classification
    short_description: Demonstration of how to perform video classification using pre-trained TensorFlow models.
    long_description: >-
      This project provides a Jupyter Notebook walkthrough of video classification/action recognition with a 
      pre-trained Tensorflow model and provides guidance for working with video data.  The project also includes 
      a script that demonstrates how to perform larger-scale model inference.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/video-classification.png
    tags:
      - Video Classification
      - Action Recognition
      - Activity Recognition
      - Video Understanding
      - Tensorflow
    git_url: "https://github.com/cloudera/CML_AMP_Video_Classification.git"
    is_prototype: true

  - title: Continuous Model Monitoring
    label: continuous-model-monitoring
    short_description: Demonstration of how to perform continuous model monitoring on CML using Model Metrics and Evidently.ai dashboards.
    long_description: >-
      To combat concept drift in production systems, its important to have robust monitoring capabilities that alert 
      stakeholders when relationships in the incoming data or model have changed. In this Applied Machine Learning 
      Prototype (AMP), we demonstrate how this can be achieved on CML. Specifically, we leverage CML's Model Metrics 
      feature in combination with Evidently.ai's Data Drift, Numerical Target Drift, and Regression Performance reports 
      to monitor a simulated production model that predicts housing prices over time.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/continuous-model-monitoring.png
    tags:
      - Model Monitoring
      - Production ML
      - MLOps
      - Evidently.ai
      - APIv2
    git_url: "https://github.com/cloudera/CML_AMP_Continuous_Model_Monitoring.git"
    is_prototype: true

  - title: Distributed XGBoost with Dask on CML
    label: dask-on-cml
    short_description: How to perform distributed training of an XGBoost model using Dask on CML.
    long_description: >-
      This project provides a Jupyter Notebook that demonstrates a typical data science workflow for detecting fraudulent credit card 
      transactions by training a distributed XGBoost model in conjunction with Dask, a library for scaling Python applications, using the CML Workers API.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/dask-on-cml.png
    tags:
      - Distributed Computing
      - XGBoost
      - Dask
    git_url: "https://github.com/cloudera/CML_AMP_Dask_on_CML.git"
    is_prototype: true

  - title: Exploring Intelligent Writing Assistance
    label: intelligent-writing-assistance
    short_description: A demonstration of how the NLP task of text style transfer can be applied to enhance the human writing experience using HuggingFace Transformers and Streamlit.
    long_description: >-
      The goal of this application is to demonstrate how the NLP task of text style transfer can be applied to enhance the human writing experience. In this sense, 
      we intend to peel back the curtains on how an intelligent writing assistant might function — walking through the logical steps needed to automatically re-style a piece of text 
      (from informal-to-formal or subjective-to-neutral) while building up confidence in the model output.
    image_path: >-
      https://raw.githubusercontent.com/cloudera/Applied-ML-Prototypes/master/images/intelligent-writing-assistance.png
    tags:
      - NLP
      - Text Style Transfer
      - HuggingFace
      - BERT
      - BART
      - Streamlit
      - PyTorch
    git_url: "https://github.com/cloudera/CML_AMP_Intelligent_Writing_Assistance.git"
    is_prototype: true