-
Notifications
You must be signed in to change notification settings - Fork 4
V. Nearest Neighbor Experiments
For this project, the problem was formulated as an information retrieval task, i.e. for a new question, k most similar questions are identified from the historical data, and the lawyers’ answers to those questions are suggested as candidate answers to the new question. This process is akin to a standard ‘Nearest Neighbor’ (NN) method in Machine Learning.
The experiments are set up in the src/barefoot_winnie/d04_modelling/experiments.py
script. A NNExperiment
parent class is created, and an experience is created for TF-IDF and W2V. Parameters can be tuned in the run_preprocessing_steps
function defined in src/barefoot_winnie/d00_utils/preprocessing
Since NN models are non-parametric, there is not ‘training phase’ for learning optimal parameters. Instead, the following steps are carried out in the training phase:
- Extract the questions from the database used as historical data
- Use the preprocessing steps to clean and preprocess the questions
- Use the feature extraction process described in the previous section to convert questions to the structured representation
- Save the generated features
This pipeline can be run using kedro run
. This will trigger the create_train_pipeline
function defined in src/barefoot_winnie/d07_pipelines/pipeline.py
The modules that are run in this pipeline can be edited in the pipeline scripts in src/barefoot_winnie/d07_pipelines
folder.
Once these steps are completed in the training phase, when a new inquiry comes in, the following steps are taken to produce the candidate responses:
- Use the same preprocessing steps used in the training phase to preprocess the inquiry
- Use the same feature extraction process used in training to generate the features for the new inquiry
- Calculate the distance between the inquiry feature vector and all the feature vectors of the training questions
- Select the k training questions with the minimum distance to the new inquiry
- Return the answers of the k closest training questions as candidate responses
This pipeline is triggered when BarefootLaw's web-based interface sends an HTTP request with the case_id
. It triggers the create_pipeline
function defined in src/barefoot_winnie/d07_pipelines/pipeline.py
The modules that are run in this pipeline can be edited in the pipeline scripts in src/barefoot_winnie/d07_pipelines
folder.