Labeling Sustainability reports

This folder contains a streamlit app which can be used to combine different parameters and assess the models performance on the validation data.

This part of the project is deprecated since the generation of a robust labeling set turned out to not be feasible due to ressource constraints but also due to the structure of the problem. For documentation reasons this subproject is kept in the repository. Go to our main repository to read up on our project.

About the model, training and testing data

Data

The ground truth to train and test the model consists of ca. 750 texts which were annotated with the labels human_rights, environment, corruption, social_concerns and employee_concerns. Some of the texts are not unique if they have multiple labels.

The labels were created by the dev team of bizres, based on relevant extracts from sustainability reports

Models

The following approaches are tested and validated:

Cleansing and tokenization: The spacy md language model is used for filtering stopwords and lemmatization

Text Vectorization: Bag-of-Words and vectorization using the spacy language model

Prediction Algorithm: Logistic Regression and Naive Bayes

Optimization: Optimizing the modell for overall accuracy or for recall

Running the application

Local with Docker

To run the application with Docker, install the Docker Engine on your development machine.
Navigate to the local repository
Run docker build -t report-topic-classification . to build a local docker image
Run docker run -p 8501:8501 report-topic-classification to run the docker image
Visit localhost:8501 to interact with the streamlit app

Local for development purposes

For development we use Ubuntu 20.04 with Python 3.9. But the code should also run on Windows with Python 3.9
If necessary, install Python 3.9 and Pipenv and spawn a virtual environment with pipenv shell
Install libraries using the Pipfile with pipenv install
To run the streamlit app usw streamlit run app_model_validation.py
Have fun developing ;-)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
Dockerfile		Dockerfile
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
app_model_validation.py		app_model_validation.py
data_importer.py		data_importer.py
data_transformer.py		data_transformer.py
predictor.py		predictor.py
validator.py		validator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Labeling Sustainability reports

About the model, training and testing data

Data

Models

Running the application

Local with Docker

Local for development purposes

About

Releases

Packages

Languages

License

bizres/report-topic-classification-supervised

Folders and files

Latest commit

History

Repository files navigation

Labeling Sustainability reports

About the model, training and testing data

Data

Models

Running the application

Local with Docker

Local for development purposes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages