Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockerfile #129

Merged
merged 27 commits into from
May 31, 2024
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# base image
FROM ubuntu:latest

# set environment variables
ENV PYTHONUNBUFFERED=1

# set working directory
ENV DockerHOME=/home/app/TracEX
RUN mkdir -p $DockerHOME
WORKDIR $DockerHOME

# copy source files
COPY . $DockerHOME

# expose port
EXPOSE 8000

# install dependencies
RUN apt-get update && apt-get install -y python3 graphviz python3-pip
RUN pip install --break-system-packages --no-cache-dir -r requirements.txt

# start server
CMD ["python3", "tracex_project/manage.py", "runserver", "0.0.0.0:8000"]
75 changes: 62 additions & 13 deletions README.md
tkv29 marked this conversation as resolved.
Show resolved Hide resolved
tkv29 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,34 +1,83 @@
# TracEX

[![GitHub stars](https://img.shields.io/github/stars/bptlab/TracEX)](https://github.com/bptlab/TracEX)
[![GitHub open issues](https://img.shields.io/github/issues/bptlab/TracEX)](https://github.com/bptlab/TracEX/issues)
[![GitHub closed pull requests](https://img.shields.io/github/issues-closed/bptlab/TracEX)](https://github.com/bptlab/TracEX/issues)
[![GitHub open pull requests](https://img.shields.io/github/issues-pr/bptlab/TracEX)](https://github.com/bptlab/TracEX/issues)
[![Pylint](https://github.com/bptlab/tracex/actions/workflows/pylint.yml/badge.svg)](https://github.com/bptlab/TracEX/blob/main/.github/workflows/pylint.yml)

## Key Points
TracEX aims to extract event logs from unstructured text, specifically written patient experiences known as patient journeys. By leveraging Large Language Models (LLMs), TracEX can automatically identify and extract relevant events, activities, timestamps and further information from natural langauge text. This enables healthcare professionals and researchers to gain valuable insights into patient experiences, treatment pathways, and potential areas for improvement in healthcare delivery.

This bachelorproject focuses on event log extraction from Patient Journeys using large-language models.
This project was initiated and completed as part of the team's bachelor's degree under the supervision of the Business Process Technology chair at the Hasso Plattner Institute. The project was conducted in cooperation with [mamahealth](https://www.mamahealth.com/).

Our project partner is mamahealth. More information about them can be found here: [mamahealth](https://www.mamahealth.io/)
## Key Features
- **Extraction Pipeline**: A robust pipeline to clean, process, and extract data from natural language text.
- **Patient Journey Generator**: Generates comprehensive patient journeys based on randomized cohort data.
- **Database**: Stores patient journeys and related extraction results for easy access and analysis.
- **Metrics and Evaluation Tool**: Evaluates the accuracy and effectiveness of the extraction process and allows for analysis of exctraction results.
- **Intuitive UI**: User-friendly interface for you to interact with the tool and visualize results.

More information about the project will be released soon.
## Requirements
To run TracEX successfully, it is essential to obtain an OpenAI API GPT 3.5 Turbo key with adequate credits. TracEX integrates the OpenAI API to leverage Large Language Models (LLMs) for extracting relevant information from unstructured text. Without a valid API key and sufficient balance, the extraction process cannot be performed. The current prices for API can be looked up at [OpenAI Pricing](https://openai.com/api/pricing/).
tkv29 marked this conversation as resolved.
Show resolved Hide resolved

## Installation using Docker
**Option 1: Using a Pre-built Docker Image** \
The easiest way to run a local instance of TracEX is using the provided Docker image.

## Set Up Guide
1. Install Docker: Ensure that you have Docker installed on your system. If you haven't installed it yet, please follow the official Docker installation guide for your operating system.
1. Download the Latest Docker Image: Download the latest TracEX Docker image from the provided link: [docker image](https://github.com/bptlab/TracEX/releases/tag/release)
1. Load the Docker Image: Open a terminal or command prompt and navigate to the directory where you downloaded the Docker image file. Run the following command to load the image: `docker load -i tracex.tar`\
Note: Depending on your system configuration, you may need to run this command with `sudo` privileges.
1. Run the Docker Container: After the image is successfully loaded, run the following command to start the TracEX container: `docker run -p 8000:8000 tracex`\
This command will start the container and map port 8000 from the container to port 8000 on your local machine. Again, you may need to use `sudo` depending on your system setup.
1. Access TracEX: Open a web browser and navigate to http://localhost:8000/. This will bring you to the TracEX application, where you can enter your OpenAI API Key and start extracting event logs.

**Option 2: Building the Docker Image from Source** \
Alternatively, you can build the Docker image from the TracEX source code.

1. Clone the TracEX Repository: Open a terminal or command prompt and navigate to the directory where you want to clone the TracEX repository. Run the following command to clone the repository: `git clone https://github.com/bptlab/TracEX`
1. Navigate to the TracEX Directory: Change your current directory to the cloned TracEX repository: `cd TracEX`
1. Build the Docker Image: Run the following command to build the TracEX Docker image: `docker build -t tracex .`\
Note: Depending on your system configuration, you may need to run this command with `sudo` privileges.
1. Run the Docker Container: After the image is successfully built, run the following command to start the TracEX container: `docker run -p 8000:8000 tracex`\
This command will start the container and map port 8000 from the container to port 8000 on your local machine. Again, you may need to use `sudo` depending on your system setup.
1. Access TracEX: Open a web browser and navigate to http://localhost:8000/. This will bring you to the TracEX application, where you can enter your OpenAI API Key and start extracting event logs.

## Local Setup for Development

### Download

- Use git and run "git clone https://github.com/bptlab/TracEX" in the desired directory _(Using e.g. Git Bash)_
- Use git and run `git clone https://github.com/bptlab/TracEX` in the desired directory _(Using e.g. Git Bash)_

### Installation
- navigate to the root directory of TracEX in your terminal
- run `install-dependencies-unix.sh` or `install-dependencies-windows.ps1`, based on your operating system _(Using e.g. Terminal)_
- run `python tracex/manage.py migrate` to update the database and apply all changes stored in the `migrations` folder
- run `python tracex_project/manage.py migrate` to update the database and apply all changes stored in the `migrations/` folder
tkv29 marked this conversation as resolved.
Show resolved Hide resolved
- export OpenAI API key as environment variable: `export OPENAI_API_KEY=<API-KEY>`

### Execution

**Web-App:**
- Run `python tracex/manage.py runserver` in the root directory of TracEX _(Using e.g. Terminal)_

**Command-Line Tool:**
- Run `python command_line_tool.py` in the root directory of TracEX _(Using e.g. Terminal)_
- Run `python tracex_project/manage.py runserver` in the root directory of TracEX _(Using e.g. Terminal)_

### Pre-Commit

- If you intend on expanding the code, please run `pre-commit install` in the root directory of TracEX _(Using e.g. Terminal)_

## Contributors

The main contributors to the project are the six members of the [2023/24 Bachelor Project](https://hpi.de/fileadmin/user_upload/hpi/dokumente/studiendokumente/bachelor/bachelorprojekte/2023_24/BA-Projekt_FG_Weske_Event_Log_Extraction_from_Patient_Experiences.pdf) of Professor Weske's [Business Process Technology Chair](https://bpt.hpi.uni-potsdam.de) at the [Hasso Plattner Institute](https://hpi.de):

- [Pit Buttchereit](https://github.com/PitButtchereit)
- [Frederic Rupprecht](https://github.com/FR-SON)
- [VanThang Nguyen](https://github.com/thangixd)
- [Nils Schmitt](https://github.com/nils-schmitt)
- [Soeren Schubert](https://github.com/soeren227)
- [Trung-Kien Vu](https://github.com/tkv29)

These six participants will push the project forward as part of their bachelor's degree until the summer of 2024.
At the same time our commitment to open source means that we are enabling -in fact encouraging- all interested parties to contribute and become part of its developer community.

## Project documentation

In the project wiki, you can find detailed documentation that covers various aspects of TracEX.
In the [architecture](https://github.com/bptlab/TracEX/wiki/Architecture) section, we provide an overview of the system's design and components. The [repository structure](https://github.com/bptlab/TracEX/wiki/Repository-Structure) is also outlined, making it easier for you to navigate and understand the organization of our codebase.
Most importantly, we have dedicated a significant portion of the wiki to explaining our [pipeline frameworks](https://github.com/bptlab/TracEX/wiki/Pipelines), which are the core of TracEX. These frameworks are responsible for processing and transforming the unstructured patient journey data into structured event logs.
1 change: 0 additions & 1 deletion install-dependencies-unix.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#!/bin/bash

sudo apt-get update && apt-get upgrade -y

sudo apt install python3 graphviz -y

# add Graphviz to the system path
Expand Down
1 change: 1 addition & 0 deletions install-dependencies-windows.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
winget install Python --accept-source-agreements --accept-package-agreements
winget install graphviz --accept-source-agreements --accept-package-agreements


# add Graphviz to the system path
$graphvizPath = "C:\Program Files\Graphviz\bin"
$envPath = [Environment]::GetEnvironmentVariable("Path", [EnvironmentVariableTarget]::Machine)
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ pandas~=2.1.3
numpy~=1.26.2
jinja2~=3.1.4
regex~=2024.5.15
requests~=2.31.0
Loading