Skip to content
This repository has been archived by the owner on Apr 1, 2020. It is now read-only.
/ jp_tokenizer Public archive

A tokenizer and lemmatizer for Japanese text

License

Notifications You must be signed in to change notification settings

CatalystCode/jp_tokenizer

Repository files navigation

CI status Docker status

Deploy to Azure

jp_tokenizer

This repository contains a tiny web service that lets you tokenize and lemmatize Japanese text.

The service is implemented by wrapping the MeCab tokenizer (paper) in a Sanic app.

Usage

Ensure that your server has at least 2-3GB of available RAM (e.g. Azure Standard DS1_v2) and then run:

# start a container for the service and its dependencies
docker run -p 8080:80 cwolff/jp_tokenizer

# call the API
curl -X POST 'http://localhost:8080/tokenize' --data 'サザエさんは走った'
curl -X POST 'http://localhost:8080/lemmatize' --data 'サザエさんは走った'

The API will respond with a space-delimited string of tokens/lemmas.