Skip to content
/ TracEX Public

This bachelorproject focuses on event log extraction from patient journeys using large-language models.

Notifications You must be signed in to change notification settings

bptlab/TracEX

Repository files navigation

TracEX

GitHub stars GitHub open issues GitHub closed pull requests GitHub open pull requests Pylint

TracEX aims to extract event logs from unstructured text, specifically written patient experiences known as patient journeys. By leveraging Large Language Models (LLMs), TracEX can automatically identify and extract relevant events, activities, timestamps and further information from natural langauge text. This enables healthcare professionals and researchers to gain valuable insights into patient experiences, treatment pathways, and potential areas for improvement in healthcare delivery.

This project was initiated and completed as part of the team's bachelor's degree under the supervision of the Business Process Technology chair at the Hasso Plattner Institute. The project was conducted in cooperation with mamahealth.

Key Features

  • Extraction Pipeline: A robust pipeline to clean, process, and extract data from natural language text.
  • Patient Journey Generator: Generates comprehensive patient journeys based on randomized cohort data.
  • Database: Stores patient journeys and related extraction results for easy access and analysis.
  • Metrics and Evaluation Tool: Evaluates the accuracy and effectiveness of the extraction process and allows for analysis of exctraction results.
  • Intuitive UI: User-friendly interface for you to interact with the tool and visualize results.

Requirements

To run TracEX successfully, it is essential to obtain an OpenAI API key with adequate credits. TracEX integrates the OpenAI API to leverage Large Language Models (LLMs) for extracting relevant information from unstructured text. Without a valid API key and sufficient balance, the extraction process cannot be performed. The current prices for API can be looked up at OpenAI Pricing.

Installation using Docker

Option 1: Using a Pre-built Docker Image
The easiest way to run a local instance of TracEX is using the provided Docker image.

  1. Install Docker: Ensure that you have Docker installed on your system. If you haven't installed it yet, please follow the official Docker installation guide for your operating system.
  2. Download the Latest Docker Image: Download the latest TracEX Docker image from the provided link: docker image
  3. Load the Docker Image: Open a terminal or command prompt and navigate to the directory where you downloaded the Docker image file. Run the following command to load the image: docker load -i tracex.tar
    Note: Depending on your system configuration, you may need to run this command with sudo privileges.
  4. Run the Docker Container: After the image is successfully loaded, run the following command to start the TracEX container: docker run -p 8000:8000 tracex
    This command will start the container and map port 8000 from the container to port 8000 on your local machine. Again, you may need to use sudo depending on your system setup.
  5. Access TracEX: Open a web browser and navigate to http://localhost:8000/. This will bring you to the TracEX application, where you can enter your OpenAI API Key and start extracting event logs.

Option 2: Building the Docker Image from Source
Alternatively, you can build the Docker image from the TracEX source code.

  1. Clone the TracEX Repository: Open a terminal or command prompt and navigate to the directory where you want to clone the TracEX repository. Run the following command to clone the repository: git clone https://github.com/bptlab/TracEX
  2. Navigate to the TracEX Directory: Change your current directory to the cloned TracEX repository: cd TracEX
  3. Build the Docker Image: Run the following command to build the TracEX Docker image: docker build -t tracex .
    Note: Depending on your system configuration, you may need to run this command with sudo privileges.
  4. Run the Docker Container: After the image is successfully built, run the following command to start the TracEX container: docker run -p 8000:8000 tracex
    This command will start the container and map port 8000 from the container to port 8000 on your local machine. Again, you may need to use sudo depending on your system setup.
  5. Access TracEX: Open a web browser and navigate to http://localhost:8000/. This will bring you to the TracEX application, where you can enter your OpenAI API Key and start extracting event logs.

Local Setup for Development

Download

  • Use git and run git clone https://github.com/bptlab/TracEX in the desired directory (Using e.g. Git Bash)

Installation

  • navigate to the root directory of TracEX in your terminal
  • run install-dependencies-unix.sh or install-dependencies-windows.ps1, based on your operating system (Using e.g. Terminal)
  • If you are using macOS, please use homebrew or any other package manager to install the listed dependencies manually
  • run python tracex_project/manage.py migrate to update the database and apply all changes stored in the migrations/ folder
  • export OpenAI API key as environment variable: export OPENAI_API_KEY=<API-KEY>

Execution

  • Run python tracex_project/manage.py runserver in the root directory of TracEX (Using e.g. Terminal)

Pre-Commit

  • If you intend on expanding the code, please run pre-commit install in the root directory of TracEX (Using e.g. Terminal)

Contributors

The main contributors to the project are the six members of the 2023/24 Bachelor Project of Professor Weske's Business Process Technology Chair at the Hasso Plattner Institute:

These six participants will push the project forward as part of their bachelor's degree until the summer of 2024. At the same time our commitment to open source means that we are enabling -in fact encouraging- all interested parties to contribute and become part of its developer community.

Project documentation

In the project wiki, you can find detailed documentation that covers various aspects of TracEX. In the architecture section, we provide an overview of the system's design and components. The repository structure is also outlined, making it easier for you to navigate and understand the organization of our codebase. Most importantly, we have dedicated a significant portion of the wiki to explaining our pipeline frameworks, which are the core of TracEX. These frameworks are responsible for processing and transforming the unstructured patient journey data into structured event logs.