Skip to content
Brooke M. Fujita edited this page Feb 10, 2015 · 1 revision

とりあえず使ってみよう!

Here's a guide to getting started with MeCab parsing using natto.

This requires:

  • Ruby 1.9 or greater
  • an existing installation of MeCab with a system dictionary
  • either:
    • use automatic configuration: just make sure that mecab (and mecab-config if you are on Mac OS or *nix) are on your PATH
    • or explicit configuration: MECAB_PATH environment variable set to the full path to the mecab library

  1. First create an instance of a Natto::MeCab parser:

     require 'natto'
    
     nm = Natto::MeCab.new
     => #<Natto::MeCab:0x288f6d08 
          @tagger=#<FFI::Pointer address=0x28d3ab80>,  
          @libpath="/usr/local/lib/libmecab.so",        
          @options={},  
          @dicts=[#<Natto::DictionaryInfo:0x288f6ba0 
                   @filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic", 
                   charset=utf-8,
                   type=0>],  
          @version=0.996>  
    
  2. Query the Natto::MeCab parser for its MeCab version and absolute path to MeCab library:

     puts nm.version
     => 0.996
    
     puts nm.libpath
     => /usr/local/lib/libmecab.so  
    
  3. Fetch information about the dictionary used by the Natto::MeCab parser:

     puts nm.dicts.first.filepath
     => /usr/local/lib/mecab/dic/ipadic/sys.dic
    
     puts nm.dicts.first.charset
     => utf-8  
    
  4. Use the parse method to tokenize a Japanese sentence, treating the result as a single string, and print the output to screen:

     puts nm.parse('この星の一等賞になりたいの卓球で俺は、そんだけ!')
    
     この	連体詞,*,*,*,*,*,この,コノ,コノ
     星	名詞,一般,*,*,*,*,星,ホシ,ホシ
     の	助詞,連体化,*,*,*,*,の,ノ,ノ
     一等	名詞,一般,*,*,*,*,一等,イットウ,イットー
     賞	名詞,接尾,一般,*,*,*,賞,ショウ,ショー
     に	助詞,格助詞,一般,*,*,*,に,ニ,ニ
     なり	動詞,自立,*,*,五段・ラ行,連用形,なる,ナリ,ナリ
     たい	助動詞,*,*,*,特殊・タイ,基本形,たい,タイ,タイ
     の	助詞,連体化,*,*,*,*,の,ノ,ノ
     卓球	名詞,サ変接続,*,*,*,*,卓球,タッキュウ,タッキュー
     で	助詞,格助詞,一般,*,*,*,で,デ,デ
     俺	名詞,代名詞,一般,*,*,*,俺,オレ,オレ
     は	助詞,係助詞,*,*,*,*,は,ハ,ワ
     、	記号,読点,*,*,*,*,、,、,、
     そん	名詞,一般,*,*,*,*,そん,ソン,ソン
     だけ	助詞,副助詞,*,*,*,*,だけ,ダケ,ダケ
     !	記号,一般,*,*,*,*,!,!,!
     EOS  
    
  5. Parse the given text into an enumeration of nodes. By providing a block to parse, a mecab node representing each morpheme and carrying much more detailed information is yielded:

     nm.parse('飛べねえ鳥もいるってこった。') do |n|
       puts "#{n.surface}\t#{n.wcost}" if n.is_nor?
     end
    
     飛べ    7175
     ねえ    6661
     鳥      4905
     も      4669
     いる    9109
     って    6984
     こっ    9587
     た      5500
     。      215  
    
  6. Combine node-parsing with a custom node-format for more interesting processing:

     # -F    ... short-form of --node-format
     # %m    ... morpheme
     # %h    ... part-of-speech ID (IPADIC)
     # %f[0] ... part-of-speech (first ChaSen feature element)
     nm = Natto::MeCab.new('-F%m\t%h\t%f[0]')
    
     # only output feature attribute of normal nodes, 
     # ignoring end-of-sentence or unknown nodes
     nm.parse('あんたはオイラに飛び方を教えてくれた。') do |n|
       puts n.feature if n.is_nor?
     end
    
     あんた  59      名詞
     は      16      助詞
     オイラ  59      名詞
     に      13      助詞
     飛び    31      動詞
     方      57      名詞
     を      13      助詞
     教え    31      動詞
     て      18      助詞
     くれ    33      動詞
     た      25      助動詞
     。      7       記号
    

Usage Top | Next

Clone this wiki locally