Skip to content

V. Nearest Neighbor Experiments

Maren Eckhoff edited this page Sep 2, 2019 · 1 revision

Nearest Neighbors

For this project, the problem was formulated as an information retrieval task, i.e. for a new question, k most similar questions are identified from the historical data, and the lawyers’ answers to those questions are suggested as candidate answers to the new question. This process is akin to a standard ‘Nearest Neighbor’ (NN) method in Machine Learning.

NearestNeighbor

The experiments are set up in the src/barefoot_winnie/d04_modelling/experiments.py script. A NNExperiment parent class is created, and an experience is created for TF-IDF and W2V. Parameters can be tuned in the run_preprocessing_steps function defined in src/barefoot_winnie/d00_utils/preprocessing

Training pipeline

Since NN models are non-parametric, there is not ‘training phase’ for learning optimal parameters. Instead, the following steps are carried out in the training phase:

  1. Extract the questions from the database used as historical data
  2. Use the preprocessing steps to clean and preprocess the questions
  3. Use the feature extraction process described in the previous section to convert questions to the structured representation
  4. Save the generated features

This pipeline can be run using kedro run. This will trigger the create_train_pipeline function defined in src/barefoot_winnie/d07_pipelines/pipeline.py The modules that are run in this pipeline can be edited in the pipeline scripts in src/barefoot_winnie/d07_pipelines folder.

Inference pipeline

Once these steps are completed in the training phase, when a new inquiry comes in, the following steps are taken to produce the candidate responses:

  1. Use the same preprocessing steps used in the training phase to preprocess the inquiry
  2. Use the same feature extraction process used in training to generate the features for the new inquiry
  3. Calculate the distance between the inquiry feature vector and all the feature vectors of the training questions
  4. Select the k training questions with the minimum distance to the new inquiry
  5. Return the answers of the k closest training questions as candidate responses

This pipeline is triggered when BarefootLaw's web-based interface sends an HTTP request with the case_id. It triggers the create_pipeline function defined in src/barefoot_winnie/d07_pipelines/pipeline.py The modules that are run in this pipeline can be edited in the pipeline scripts in src/barefoot_winnie/d07_pipelines folder.

Clone this wiki locally