Skip to content
@hplt-project

HPLT - High Performance Language Technologies

A space that combines petabytes of natural language data with large-scale model training

Pinned Loading

  1. OpusCleaner OpusCleaner Public

    OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

    Python 47 13

  2. OpusTrainer OpusTrainer Public

    Curriculum training

    Python 16 5

Repositories

Showing 10 of 21 repositories
  • warc2text-runner Public

    Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.

    hplt-project/warc2text-runner’s past year of commit activity
    HTML 3 0 5 1 Updated Oct 25, 2024
  • hplt-project/release2_inspection’s past year of commit activity
    Jupyter Notebook 0 4 0 2 Updated Oct 24, 2024
  • data-analytics-tool Public

    Data Analytics Tool

    hplt-project/data-analytics-tool’s past year of commit activity
    JavaScript 9 1 0 0 Updated Oct 22, 2024
  • bitextor-slurm Public Forked from paracrawl/cirrus-scripts

    Scripts for running bitextor jobs

    hplt-project/bitextor-slurm’s past year of commit activity
    Shell 0 1 1 0 Updated Oct 22, 2024
  • monotextor-slurm Public

    Set of scripts to run monotextor-like pipeline under slurm HPCs

    hplt-project/monotextor-slurm’s past year of commit activity
    Rust 2 GPL-3.0 0 0 0 Updated Oct 17, 2024
  • hplt-project/bitextor-mt-models’s past year of commit activity
    Shell 1 0 3 0 Updated Oct 17, 2024
  • hplt-project/cc-download’s past year of commit activity
    Shell 0 0 0 0 Updated Oct 15, 2024
  • OpusPocus Public

    Marian machine translation training pipeline for thousands of models

    hplt-project/OpusPocus’s past year of commit activity
    Python 2 0 23 (4 issues need help) 0 Updated Oct 14, 2024
  • OpusTrainer Public

    Curriculum training

    hplt-project/OpusTrainer’s past year of commit activity
    Python 16 MIT 5 19 0 Updated Sep 14, 2024
  • OpusCleaner Public

    OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

    hplt-project/OpusCleaner’s past year of commit activity

Top languages

Loading…

Most used topics

Loading…