A sentiment analysis project based on Sentiment140 training data. This First attempt utilises some basic rules for creating quality input data from the raw data which is fed into a LSTM neural network.
Demo Available Here: https://www.ericsciberras.com/portfolio/#LSTM-Sentiment-Analysis
- Go to
html
folder and and opennn.html
-
Download the sentiment140 data from here. Alternatively you may use your own dataset however the
process_raw_data()
function may need to be altered so it is in the correct format. -
Run
pip3 install -r requirements.txt
To install all the python3 libraries required for use. -
Run
python3 processData /path/to/dataset/'
To create 3 files:processed_data.csv
: which is the data in correct format for converting sentences to arraysprocessed_data_2.csv
: which is data that is ready to be fed into the neural networktokeniser.pickle
: holds the dictionary which is useful for converting the words to numbers and vice versa
-
Run
python3 neuralnet
To start training. Note: it is recommended to use tensorflow-gpu for this as running on a cpu will be very slow. This will create a .hdf5 file for each epoch e.g. sentiment-ai-04-0.74. The format is sentiment-ai--<val_acc> and a file calledmy-model.hdf5
which is the last epoch.
- Run
python3 runmodel /path/to/model.hdf5
OR
- Install tfjs-converter
pip install tensorflowjs
- Convert the keras model to TF.js Layers format
tensorflowjs_converter --input_format keras path/to/my_model.h5 path/to/tfjs_target_dir
- Run
dictionary2json ./tokeniser.pickle ./dict.json
To turn the dictionary pickle to json. - Replace all new created files
dict.json
,group1-shard1of1
andmodel.json
with the current ones in thehtml
folder. - Open
nn.html
.
- Use more or better datasets (create my own ???)
- Research Word2vec and its possible advantages
- Allow neural network to accept arbitrarily sized sentences
- Use more sophisticated techniques for processing raw datasets (i.e: identifying words that don't contribute to sentiment)
- Optimise parameters for LSTM neural network or use different neural network