Skip to content

Commit

Permalink
feat: decrease memory footprint and binary size (#11)
Browse files Browse the repository at this point in the history
* feat: internal map now use pointers to decrease memory footprint.

dicts/data.go split into multiple orphaned packages, to decrease import size.

main.go renamed to golem.go

Italian added.

Contributors Hall of Fame added

* fix: solved TODO in dicts/cmd and completed README

* fix: regenerated files with code-comment
  • Loading branch information
aaaton authored May 6, 2019
1 parent afcd5a4 commit 64c3afa
Show file tree
Hide file tree
Showing 19 changed files with 1,681 additions and 366 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ data
*.out
*.test
pprof
.DS_Store
20 changes: 11 additions & 9 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
SHELL:=/bin/bash
default: download-all
default: all
LANG=en
download-all:
all:
go get -u github.com/jteeuwen/go-bindata/...
mkdir -p data
$(MAKE) en
$(MAKE) sv
$(MAKE) fr
$(MAKE) es
$(MAKE) de
rm data/*.zip
go get -u github.com/jteeuwen/go-bindata/...
go-bindata -o data.go -nocompress data/
$(MAKE) it


en: LANG=en
en: download
Expand All @@ -27,11 +27,13 @@ es: download
de: LANG=de
de: download

it: LANG=it
it: download

download:
curl http://www.lexiconista.com/Datasets/lemmatization-$(LANG).zip > data/$(LANG).zip
unzip data/$(LANG).zip -d data
mv data/lemmatization-$(LANG).txt data/$(LANG)
gzip data/$(LANG)
curl https://raw.githubusercontent.com/michmech/lemmatization-lists/master/lemmatization-$(LANG).txt > data/$(LANG)
go-bindata -o dicts/$(LANG)/$(LANG).go -pkg $(LANG) data/$(LANG)
go run dicts/cmd/generate_pack.go -locale $(LANG) > dicts/$(LANG)/pack.go

benchcmp:
# ensure no govenor weirdness
Expand Down
19 changes: 13 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# GoLem
This project is a dictionary based lemmatizer written in pure go, without external dependencies.
This project is a dictionary based lemmatizer written in pure go, without external dependencies.

### What?
A [lemmatizer](https://en.wikipedia.org/wiki/Lemmatisation) is a tool that finds the base form of words.
Expand All @@ -10,22 +10,24 @@ A [lemmatizer](https://en.wikipedia.org/wiki/Lemmatisation) is a tool that finds
| Swedish | sprungit | springa |
| French | abattaient | abattre |

It's based on the dictionaries found on [lexiconista.com](http://www.lexiconista.com/datasets/lemmatization/), which are available under the [Open Database License](https://opendatacommons.org/licenses/odbl/summary/). This project would not be feasible without them.
It's based on the dictionaries found on [michmech/lemmatization-lists](https://github.com/michmech/lemmatization-lists), which are available under the [Open Database License](https://opendatacommons.org/licenses/odbl/summary/). This project would not be feasible without them.

### Languages
At the moment I have added English, Swedish, French, Spanish & German, but adding another language should be no more trouble than getting the dictionary for that language. Some of which are already available on lexiconista. Please let me know if there is something you would like to see in here, or fork the project and create a pull request.
At the moment golem supports English, Swedish, French, Spanish, Italian & German, but adding another language should be no more trouble than getting the dictionary for that language. Some of which are already available on lexiconista. Please let me know if there is something you would like to see in here, or fork the project and create a pull request.

### Basic usage
```golang
package main

import (
"github.com/aaaton/golem"
"github.com/aaaton/golem/dicts/en"
)

func main() {
// "en" and "english" will give an english lemmatizer
lemmatizer, err := golem.New("english")
// the language packages are available under golem/dicts
// "en" is for english
lemmatizer, err := golem.New(en.NewPackage())
if err != nil {
panic(err)
}
Expand All @@ -34,5 +36,10 @@ func main() {
panic("The output is not what is expected!")
}
}
```

```
### Contributors

- axamon
- charlesgiroux
- glaslos
47 changes: 47 additions & 0 deletions dicts/cmd/generate_pack.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
package main

import (
"flag"
"os"
"text/template"
)

type data struct {
Locale string
}

// This is a code generator for language pack constructors. Check dicts/en/pack.go for an example
func main() {
var d data
flag.StringVar(&d.Locale, "locale", "", "The locale abbreviation this language pack is generated for")
flag.Parse()

t := template.Must(template.New("pack").Parse(packTemplate))
t.Execute(os.Stdout, d)
}

var packTemplate = ` // Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
package {{.Locale}}
const locale = "{{.Locale}}"
// LanguagePack is an implementation of the generic golem.LanguagePack interface for {{.Locale}}
type LanguagePack struct {
}
// NewPackage creates a language pack
func NewPackage() *LanguagePack {
return &LanguagePack{}
}
// GetResource returns the dictionary of lemmatized words
func (l *LanguagePack) GetResource() ([]byte, error) {
return Asset("data/" + locale)
}
// GetLocale returns the language name
func (l *LanguagePack) GetLocale() string {
return locale
}
`
291 changes: 0 additions & 291 deletions dicts/data.go

This file was deleted.

237 changes: 237 additions & 0 deletions dicts/de/de.go

Large diffs are not rendered by default.

24 changes: 24 additions & 0 deletions dicts/de/pack.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
package de

const locale = "de"

// LanguagePack is an implementation of the generic golem.LanguagePack interface for de
type LanguagePack struct {
}

// NewPackage creates a language pack
func NewPackage() *LanguagePack {
return &LanguagePack{}
}

// GetResource returns the dictionary of lemmatized words
func (l *LanguagePack) GetResource() ([]byte, error) {
return Asset("data/" + locale)
}

// GetLocale returns the language name
func (l *LanguagePack) GetLocale() string {
return locale
}

237 changes: 237 additions & 0 deletions dicts/en/en.go

Large diffs are not rendered by default.

24 changes: 24 additions & 0 deletions dicts/en/pack.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
package en

const locale = "en"

// LanguagePack is an implementation of the generic golem.LanguagePack interface for en
type LanguagePack struct {
}

// NewPackage creates a language pack
func NewPackage() *LanguagePack {
return &LanguagePack{}
}

// GetResource returns the dictionary of lemmatized words
func (l *LanguagePack) GetResource() ([]byte, error) {
return Asset("data/" + locale)
}

// GetLocale returns the language name
func (l *LanguagePack) GetLocale() string {
return locale
}

237 changes: 237 additions & 0 deletions dicts/es/es.go

Large diffs are not rendered by default.

24 changes: 24 additions & 0 deletions dicts/es/pack.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
package es

const locale = "es"

// LanguagePack is an implementation of the generic golem.LanguagePack interface for es
type LanguagePack struct {
}

// NewPackage creates a language pack
func NewPackage() *LanguagePack {
return &LanguagePack{}
}

// GetResource returns the dictionary of lemmatized words
func (l *LanguagePack) GetResource() ([]byte, error) {
return Asset("data/" + locale)
}

// GetLocale returns the language name
func (l *LanguagePack) GetLocale() string {
return locale
}

237 changes: 237 additions & 0 deletions dicts/fr/fr.go

Large diffs are not rendered by default.

24 changes: 24 additions & 0 deletions dicts/fr/pack.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
package fr

const locale = "fr"

// LanguagePack is an implementation of the generic golem.LanguagePack interface for fr
type LanguagePack struct {
}

// NewPackage creates a language pack
func NewPackage() *LanguagePack {
return &LanguagePack{}
}

// GetResource returns the dictionary of lemmatized words
func (l *LanguagePack) GetResource() ([]byte, error) {
return Asset("data/" + locale)
}

// GetLocale returns the language name
func (l *LanguagePack) GetLocale() string {
return locale
}

237 changes: 237 additions & 0 deletions dicts/it/it.go

Large diffs are not rendered by default.

24 changes: 24 additions & 0 deletions dicts/it/pack.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
package it

const locale = "it"

// LanguagePack is an implementation of the generic golem.LanguagePack interface for it
type LanguagePack struct {
}

// NewPackage creates a language pack
func NewPackage() *LanguagePack {
return &LanguagePack{}
}

// GetResource returns the dictionary of lemmatized words
func (l *LanguagePack) GetResource() ([]byte, error) {
return Asset("data/" + locale)
}

// GetLocale returns the language name
func (l *LanguagePack) GetLocale() string {
return locale
}

24 changes: 24 additions & 0 deletions dicts/sv/pack.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
package sv

const locale = "sv"

// LanguagePack is an implementation of the generic golem.LanguagePack interface for sv
type LanguagePack struct {
}

// NewPackage creates a language pack
func NewPackage() *LanguagePack {
return &LanguagePack{}
}

// GetResource returns the dictionary of lemmatized words
func (l *LanguagePack) GetResource() ([]byte, error) {
return Asset("data/" + locale)
}

// GetLocale returns the language name
func (l *LanguagePack) GetLocale() string {
return locale
}

237 changes: 237 additions & 0 deletions dicts/sv/sv.go

Large diffs are not rendered by default.

Loading

0 comments on commit 64c3afa

Please sign in to comment.