lua-utf8

This "library" is meant to be a very thin helper that you can easily drop in to another project without really calling it a dependency. It aims to provide the most minimal of handling functions for working with utf8 strings. It does not aim to be feature-complete or even error-descriptive. It works for what is practical but not complex. You have been warned. =^__^=

The Only Functions You Should Know

utf8.iter(s)

s: (string) the utf8 string to iterate over (by characters)

-- i is the byte index within the string
-- c is the full utf8 character (string)
-- b is the byte index within the utf8 string
for i, c, b in utf8.iter('Αγαπώ τηγανίτες') do
	print(i, c, b)
end

Output:

utf8.map(s, f)

s: (string) the utf8 string to map over
f: (function) a function optionally accepting: f(visual_index, utf8_char, byte_index)

returns: (nothing)

> utf8.map('Αγαπώ τηγανίτες', print) -- does the same as the above example

Others

utf8.clen(s, i)

s: (string) the utf8 string
i: (number) the byte index of a utf8 character within s (defaults to 1)

returns: (number) the length of the utf8 character at i

note: call this on the first byte of the utf8 character, continuing or invalid utf8 bytes will also return 1

> = utf8.clen('i ♥ cats', 3)
3

utf8.at(s, i):

s: (string) the utf8 string
i: (number) the utf8 character index (not the byte index)

returns: (string, number) the utf8 character at that "visual index" + byte index within s

> = utf8.at('Αγαπώ τηγανίτες', 4)
π	7

utf8.len(s):

s: (string) the utf8 string

returns: (number) the number of utf8 characters in s (not the byte length)

note: be aware of "invisible" utf8 characters

> = utf8.len('Αγαπώ τηγανίτες')
15

utf8.reverse(s):

s: (string) the utf8 string

returns: (string) the utf8-reversed form of s

note: reversing left-to-right utf8 strings that include directional formatting characters will look odd

> = utf8.reverse('Αγαπώ τηγανίτες')
ςετίναγητ ώπαγΑ

utf8.strip(s):

s: (string) the utf8 string

returns: (string) s with all utf8 characters removed (characters > 1 byte)

> = utf8.strip('cat♥dog')
catdog

utf8.replace(s, map):

s: (string) the utf8 string
map: (table) keys are utf8 characters to replace, values are their replacement

returns: (string) s with all the key-characters in map replaced

note: the keys must be utf8 characters, the values can be strings

> = utf8.replace('∃y ∀x ¬(x ≺ y)', { ['∃'] = 'E', ['∀'] = 'A', ['¬'] = '\r\n', ['≺'] = '<' })
Ey Ax 
(x < y)

utf8.sub(s, i, j):

s: (string) the utf8 string
i: (string) the starting utf8 substring to look for
j: (stirng) the ending utf8 substring to look for

returns: (string) the substring formed from i to j, inclusive

> = utf8.sub('Αγαπώ τηγανίτες', 'α', 'αν')
απώ τηγαν

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
utf8-scm-1.rockspec		utf8-scm-1.rockspec
utf8.lua		utf8.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lua-utf8

The Only Functions You Should Know

utf8.iter(s)

utf8.map(s, f)

Others

utf8.clen(s, i)

utf8.at(s, i):

utf8.len(s):

utf8.reverse(s):

utf8.strip(s):

utf8.replace(s, map):

utf8.sub(s, i, j):

About

Releases

Packages

Languages

clementfarabet/utf8

Folders and files

Latest commit

History

Repository files navigation

lua-utf8

The Only Functions You Should Know

utf8.iter(s)

utf8.map(s, f)

Others

utf8.clen(s, i)

utf8.at(s, i):

utf8.len(s):

utf8.reverse(s):

utf8.strip(s):

utf8.replace(s, map):

utf8.sub(s, i, j):

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages