Bind Evaluator interface to pymarian #1013

thammegowda · 2023-10-04T23:04:41Z

Description

List of changes:

replaced skbuild with skbuild-core, the next gen build system. replaced setup.py with pyproject.toml (setup.py is deprecated)
Revised pymarian code and added evaluator interface. split pymarian.h -> translator + evaluator .hpp files
Add BufferedVectorCollector to access scores in memory without i/o
Reorg pymarian dir into tests and examples
add evaluator example script that downloads metrics from our blob storage (publicly accessible)
configured CLI executables: pymarian-evaluate, pymarian-qtdemo, pymarian-mtapi

Added dependencies: none

How to test

These instructions are added to README in src/python.

git checkout tg/pybind-new
# build and install -- along with optional dependencies for demos
# run this from root of project, i.e., dir with pyproject.toml
pip install -v .[demos]   

# using a specific version of compiler (e.g., gcc-9 g++-9)
CMAKE_ARGS="-DCMAKE_C_COMPILER=gcc-9 -DCMAKE_CXX_COMPILER=g++-9" pip install -v .[demos]

# with CUDA on
CMAKE_ARGS="-DCOMPILE_CUDA=ON" pip install . 

# with a specific version of cuda toolkit, e,g. cuda 11.5
CMAKE_ARGS="-DCOMPILE_CUDA=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.5" pip install -v .[demos]

Example Usage

# download sample dataset
langs=en-ru
prefix=tmp.$langs
teset=wmt21/systems
sysname=Online-B
sacrebleu -t $teset -l $langs --echo src > $prefix.src
sacrebleu -t $teset -l $langs --echo ref > $prefix.ref
sacrebleu -t $teset -l $langs --echo $sysname > $prefix.mt

# chrfoid
paste $prefix.{src,mt} | head | pymarian-evaluate --stdin -m chrfoid-wmt23 

# cometoid22-wmt{21,22,23}
paste $prefix.{src,mt} | head | pymarian-evaluate --stdin -m cometoid22-wmt22

# bleurt20
paste $prefix.{ref,mt} | head | pymarian-evaluate --stdin  -m bleurt20 --debug

`mtapi`

Launch server

# example model: download and extract
wget http://data.statmt.org/romang/marian-regression-tests/models/wngt19.tar.gz 
tar xvf wngt19.tar.gz 

# launch server
pymarian-mtapi -s en -t de "-m wngt19/model.base.npz -v wngt19/en-de.spm wngt19/en-de.spm"

Example request from client

URL="http://127.0.0.1:5000/translate"
curl $URL --header "Content-Type: application/json" --request POST --data '[{"text":["Good Morning."]}]'

QtDemo

pymnarian-qt

Checklist

I have tested the code manually
I have run regression tests
I have read and followed CONTRIBUTING.md
I have updated CHANGELOG.md

- readme with usage instructions - add executables: mtapi, qtdemo, evaluate - rename binding package as _pymarian to avoid conflicts with "pymarian" package

thammegowda · 2023-10-11T04:19:59Z

@mjpost updated instructions for testing these changes.

add note on experimental API

thammegowda · 2023-10-19T18:28:05Z

There seems to be a problem with multi-gpu usage with pymarian.
Model gets loaded to all the requested gpu devices, but only the first GPU gets used for inference.

How to reproduce:
terminal1: paste tmp.{src,mt} | pymarian-evaluate --stdin -m chrfoid-wmt23 -d 0 1 2 3

terminal2: watch usage: gpustat -cup -i 1

thammegowda · 2023-10-19T18:58:07Z

Fixed it. Since there is no iterator support at the mment, we have minibatches made in python (to avoid buffering all scores in memory and the waiting until the end).
The batch_size in python was set too small (mini_batch) so only one GPU was utilized. Fixed it by setting batch_size=mini_batch*maxi_batch
TODO: support passing of iterators between python and c++ so we can eliminate minibatching in python

iterator will be done in a future PR

thammegowda · 2024-02-16T18:52:21Z

Closing since we have merged these changes in Azure DevOps fork!

* This code is same as [public github repo tg/pybind-new branch](#1013). Git histories seems slightly different between public and private repo so we are seeing a lot of commits * This builds on top of work by Elijah #948

Thamme Gowda added 3 commits October 4, 2023 21:12

pymarian: move test scripts to tests subdir

f40bd01

pymarian: add Evaluator, BufferedVectorCollector

7787b50

rename bench -> examples; add evaluator.py

57ea6d4

thammegowda changed the title ~~Tg/pybind new~~ Bind Evaluator interface to pymarian Oct 4, 2023

thammegowda requested review from snukky, mjpost and emjotde October 4, 2023 23:05

Thamme Gowda added 2 commits October 10, 2023 23:44

setup.py ->pyproject.toml; bind to _pymarian

ecd978f

- readme with usage instructions - add executables: mtapi, qtdemo, evaluate - rename binding package as _pymarian to avoid conflicts with "pymarian" package

Update README with usage instructions

0630597

thammegowda and others added 2 commits October 13, 2023 15:01

Update README.md

f2fad84

add note on experimental API

fix variable naming: dont suffix _ for local vars

c746e03

fix: pymarian batch_size = mini * maxi batch

0bd070a

TG Gowda added 10 commits October 30, 2023 14:56

Add an iterator for binding to python

7bc9b46

resolve merge conflicts

99f9654

add pytest for pymarian chrfoid

6db3c95

pymarian: py-api py3

0838fd2

pymarian: kwargs support

c36050e

pybinding: fix override for TextIterator2

542629c

pymarian: Add tests for training comet-qe, nmt

105b61d

add tests for bleurt, and comet scoring

cb4abf5

reorg tests dir

0682bec

TextInput: simplify iterator; decouple next+encode

7ba76eb

thammegowda force-pushed the tg/pybind-new branch from efd9cd1 to 7ba76eb Compare November 2, 2023 04:53

Thamme Gowda added 4 commits November 3, 2023 23:23

rename evaluator.run->evaluate; remove iterator

ee1337c

iterator will be done in a future PR

pymarian: use same build dir as cpp build

84e84de

update github actions: build and test pymarian

2325b9b

kwargs to Translator. add pytest for translator

10be522

thammegowda closed this Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bind Evaluator interface to pymarian #1013

Bind Evaluator interface to pymarian #1013

thammegowda commented Oct 4, 2023 •

edited

Loading

thammegowda commented Oct 11, 2023

thammegowda commented Oct 19, 2023

thammegowda commented Oct 19, 2023

thammegowda commented Feb 16, 2024

Bind Evaluator interface to pymarian #1013

Bind Evaluator interface to pymarian #1013

Conversation

thammegowda commented Oct 4, 2023 • edited Loading

Description

How to test

mtapi

QtDemo

Checklist

thammegowda commented Oct 11, 2023

thammegowda commented Oct 19, 2023

thammegowda commented Oct 19, 2023

thammegowda commented Feb 16, 2024

thammegowda commented Oct 4, 2023 •

edited

Loading

`mtapi`