V. Nearest Neighbor Experiments

Nearest Neighbors

For this project, the problem was formulated as an information retrieval task, i.e. for a new question, k most similar questions are identified from the historical data, and the lawyers’ answers to those questions are suggested as candidate answers to the new question. This process is akin to a standard ‘Nearest Neighbor’ (NN) method in Machine Learning.

NearestNeighbor

The experiments are set up in the src/barefoot_winnie/d04_modelling/experiments.py script. A NNExperiment parent class is created, and an experience is created for TF-IDF and W2V. Parameters can be tuned in the run_preprocessing_steps function defined in src/barefoot_winnie/d00_utils/preprocessing

Training pipeline

Since NN models are non-parametric, there is not ‘training phase’ for learning optimal parameters. Instead, the following steps are carried out in the training phase:

Extract the questions from the database used as historical data
Use the preprocessing steps to clean and preprocess the questions
Use the feature extraction process described in the previous section to convert questions to the structured representation
Save the generated features

This pipeline can be run using kedro run. This will trigger the create_train_pipeline function defined in src/barefoot_winnie/d07_pipelines/pipeline.py The modules that are run in this pipeline can be edited in the pipeline scripts in src/barefoot_winnie/d07_pipelines folder.

Inference pipeline

Once these steps are completed in the training phase, when a new inquiry comes in, the following steps are taken to produce the candidate responses:

Use the same preprocessing steps used in the training phase to preprocess the inquiry
Use the same feature extraction process used in training to generate the features for the new inquiry
Calculate the distance between the inquiry feature vector and all the feature vectors of the training questions
Select the k training questions with the minimum distance to the new inquiry
Return the answers of the k closest training questions as candidate responses

This pipeline is triggered when BarefootLaw's web-based interface sends an HTTP request with the case_id. It triggers the create_pipeline function defined in src/barefoot_winnie/d07_pipelines/pipeline.py The modules that are run in this pipeline can be edited in the pipeline scripts in src/barefoot_winnie/d07_pipelines folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V. Nearest Neighbor Experiments

Nearest Neighbors

Training pipeline

Inference pipeline

Clone this wiki locally