Quick Start

とりあえず使ってみよう！

Here's a guide to getting started with MeCab parsing using natto.

This requires:

Ruby 1.9 or greater
an existing installation of MeCab with a system dictionary
either:
- use automatic configuration: just make sure that mecab (and mecab-config if you are on Mac OS or *nix) are on your PATH
- or explicit configuration: MECAB_PATH environment variable set to the full path to the mecab library

First create an instance of a Natto::MeCab parser:

 require 'natto'

 nm = Natto::MeCab.new
 => #<Natto::MeCab:0x288f6d08 
      @tagger=#<FFI::Pointer address=0x28d3ab80>,  
      @libpath="/usr/local/lib/libmecab.so",        
      @options={},  
      @dicts=[#<Natto::DictionaryInfo:0x288f6ba0 
               @filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic", 
               charset=utf-8,
               type=0>],  
      @version=0.996>

Query the Natto::MeCab parser for its MeCab version and absolute path to MeCab library:
```
 puts nm.version
 => 0.996

 puts nm.libpath
 => /usr/local/lib/libmecab.so  
```

Fetch information about the dictionary used by the Natto::MeCab parser:

 puts nm.dicts.first.filepath
 => /usr/local/lib/mecab/dic/ipadic/sys.dic

 puts nm.dicts.first.charset
 => utf-8

Use the parse method to tokenize a Japanese sentence, treating the result as a single string, and print the output to screen:

 puts nm.parse('この星の一等賞になりたいの卓球で俺は、そんだけ！')

 この	連体詞,*,*,*,*,*,この,コノ,コノ
 星	名詞,一般,*,*,*,*,星,ホシ,ホシ
 の	助詞,連体化,*,*,*,*,の,ノ,ノ
 一等	名詞,一般,*,*,*,*,一等,イットウ,イットー
 賞	名詞,接尾,一般,*,*,*,賞,ショウ,ショー
 に	助詞,格助詞,一般,*,*,*,に,ニ,ニ
 なり	動詞,自立,*,*,五段・ラ行,連用形,なる,ナリ,ナリ
 たい	助動詞,*,*,*,特殊・タイ,基本形,たい,タイ,タイ
 の	助詞,連体化,*,*,*,*,の,ノ,ノ
 卓球	名詞,サ変接続,*,*,*,*,卓球,タッキュウ,タッキュー
 で	助詞,格助詞,一般,*,*,*,で,デ,デ
 俺	名詞,代名詞,一般,*,*,*,俺,オレ,オレ
 は	助詞,係助詞,*,*,*,*,は,ハ,ワ
 、	記号,読点,*,*,*,*,、,、,、
 そん	名詞,一般,*,*,*,*,そん,ソン,ソン
 だけ	助詞,副助詞,*,*,*,*,だけ,ダケ,ダケ
 ！	記号,一般,*,*,*,*,！,！,！
 EOS

Parse the given text into an enumeration of nodes. By providing a block to parse, a mecab node representing each morpheme and carrying much more detailed information is yielded:

 nm.parse('飛べねえ鳥もいるってこった。') do |n|
   puts "#{n.surface}\t#{n.wcost}" if n.is_nor?
 end

 飛べ    7175
 ねえ    6661
 鳥      4905
 も      4669
 いる    9109
 って    6984
 こっ    9587
 た      5500
 。      215

Combine node-parsing with a custom node-format for more interesting processing:

 # -F    ... short-form of --node-format
 # %m    ... morpheme
 # %h    ... part-of-speech ID (IPADIC)
 # %f[0] ... part-of-speech (first ChaSen feature element)
 nm = Natto::MeCab.new('-F%m\t%h\t%f[0]')

 # only output feature attribute of normal nodes, 
 # ignoring end-of-sentence or unknown nodes
 nm.parse('あんたはオイラに飛び方を教えてくれた。') do |n|
   puts n.feature if n.is_nor?
 end

 あんた  59      名詞
 は      16      助詞
 オイラ  59      名詞
 に      13      助詞
 飛び    31      動詞
 方      57      名詞
 を      13      助詞
 教え    31      動詞
 て      18      助詞
 くれ    33      動詞
 た      25      助動詞
 。      7       記号

Usage Top | Next

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start

とりあえず使ってみよう！

Clone this wiki locally