This Python script contains several common pre-processing in NLP such as PPMI computation, SVD-based dimensionality reduction, and PLSR-based distribution prediction.
The following packages are required.
- Python 2.7 (not tested with Python 3)
- numpy
- scipy
- sparsesvd
- sklearn
- svmlight-loader
There is no specific installation for svdmi. Once you have all the dependencies installed, you can run svmi as described in the usage section.
Positive Pointwise Mutual Information
$ python svdmi.py -m PPMI -i raw_co-occurrences_matrix_file_name -o ppmi_matrix_file_name
Singular Value Decomposition-based dimensionality reduction (SVD1) and matrix smoothing (SVD2).
For SVD1 mode
$ python svdmi.py -m SVD1 -i matrix_file_name -o dimensionality_reduced_matrix_file_name -n svd_dimensions -p power_to_raise_singular_values
For SVD2 mode
$ python svdmi.py -m SVD2 -i raw_co-occurrences_matrix_file_name -o smoothed_matrix_file_name -n svd_dimensions -p power_to_raise_singular_values
Use -v option to print the reproduction error (Frobenious norm)
Partial Least Square Regression-based distribution prediction.
Training a PLSR model
$ python svdmi.py -m -m PLSR.train -x x_matrix_file_name -y y_matrix_file_name -n PLSR_components -i model_file_name
Use -v option to print the reproduction error (Frobenious norm)
Predicting using the trained PLSR model.
$ python svdmi.py -m PLSR.pred -x x_matrix_file_name -y predicted_y_matrix_file_name -i model_file_name
Simple BSD
Danushka Bollegala