第 09.02.01 章、regexp 包

regexp 用于正则表达式的匹配

所有用于匹配的字符均为 UTF-8 编码的码点
具体匹配细则请查看 regexp/syntax


Match(pattern string, b []byte) (matched bool, err error)
最基础的 字节流 正则匹配，返回是否匹配成功
pattern 为以字符串表示的正则匹配表达式
b 为待匹配字节流


MatchReader(pattern string, r io.RuneReader) (mathced bool, err error)
最基础的 Rune 正则匹配，返回是否匹配成功
pattern 为以字符串表示的正则匹配表达式
r 为满足了 io.RuneReader 的变量


MatchString(pattern string, s string) (matched bool, err error)
最基础的 字符串 正则匹配，返回是否匹配成功
pattern 为以字符串表示的正则匹配表达式
s 为待匹配字符串


QuoteMeta(s string) string
将输入的字符串作为 raw string
全部转义为不用反引号即可表示的字符串
EG.
regexp.QuoteMeta(`^(\n)`)
返回 "\^\(\\n\)"


Regexp struct 类型
Regexp 类型就是编译过的（compiled）正则表达式


Compile(expr string) (*Regexp, error)
输入一个 string，生成一个正则表达式类型变量
当匹配字符时，遵循 leftmost-first 匹配，也就是
从左至右，匹配第一个能匹配上的开头，接着匹配上第一个能够匹配结尾的部分


CompilePOSIX(expr string) (*Regexp, error)
输入一个 string，生成一个正则表达式类型变量
当匹配字符时，遵循 leftmost-longest 匹配，也就是
从左至右，匹配第一个能匹配上的开头，接着匹配上最后一个能够匹配结尾的部分


MustCompile(str string) *Regexp
MustCompilePOSIX(str string) *Regexp
正则表达式类型必须编译成功，否则会立刻 panic


(re *Regexp) Copy() *Regexp
完全拷贝一个 Regexp
当使用多个 goroutine 的时候，可以防止锁竞争


(re *Regexp) Find(b []byte) []byte
寻找 b 中，以 leftmost 模式能匹配上的第一段字节流
若返回 nil，则表示未能匹配上


(re *Regexp) FindIndex(b []byte) (loc []int)
用 两个元素的 []int 变量来确定 leftmost 匹配上的 字节 的位置
匹配上的内容应为 b[loc[0]:loc[1]]


(re *Regexp) FindString(s string) string
寻找 s 中，以 leftmost 模式能匹配上的第一段字符串
若返回空字符串，表示未匹配成功，或者成功匹配上了空字符串


(re *Regexp) FindSubmatch(b []byte) [][]byte
次级匹配，一系列 []byte 类型的数据
其中第一个 []byte 是全匹配的数据
其后的每一个都是正则表达式中，用小括号括起来的部分的数据

见下方案例


(re *Regexp) FindSubmatchIndex(b []byte) []int
次级匹配Index，返回一个 []int 列表
其中数据两两配对，表示匹配上的 字节 所在的位置

EG.
regexp.Compile("p(at)").FindSubmatchIndex([]byte("bat pat"))
// 返回 [4 7 5 7]
// 也就是全匹配的 pat 以及次级匹配的 pat 的 at


(re *Regexp) FindReaderIndex(r io.RuneReader) (loc []int)
(re *Regexp) FindReaderSubmatchIndex(r io.RuneReader) []int
匹配 Rune 的 Find


(re *Regexp) FindStringIndex(s string) (loc []int)
(re *Regexp) FindStringSubmatch(s string) []string
(re *Regexp) FindStringSubmatchIndex(s string) []int
匹配 string 的 Find


(re *Regexp) FindAll(b []byte, n int) [][]byte
(re *Regexp) FindAllIndex(b []byte, n int) [][]int

(re *Regexp) FindAllString(s string, n int) []string
(re *Regexp) FindAllStringIndex(s string, n int) [][]int
(re *Regexp) FindAllStringSubmatch(s string, n int) [][]string
(re *Regexp) FindAllStringSubmatchIndex(s string, n int) [][]int

(re *Regexp) FindAllSubmatch(b []byte, n int) [][][]byte
(re *Regexp) FindAllSubmatchIndex(b []byte, n int) [][]int
“All” 版本的 Find
若 n<0，则返回输入字段中，所有可以被匹配上的全部都会返回
若 n>=0，则最多只返回制定个数的匹配项


(re *REgexp) NumSubexp() int
返回这个正则表达式变量中，含有多少次级匹配表达式


(re *Regexp) SubexpNames() []string
返回这个正则表达式变量中，还有多少命名过的次级表达式


(re *Regexp) LiteralPrefix() (prefix string, complete bool)
返回该正则表达式中，从头算起，匹配上的固定不变的字符串
若该字符串就是整个正则表达式，则 complete 为 true

EG.
regexp.Compile("Hello").LiteralPrefix()
// 返回 "Hello" true
regexp.Compile("Hello \\w+").LiteralPrefix()
// 返回 "Hello " false


(re *Regexp) Longest()
在执行了这条语句之后，其后的执行的匹配的模式变为 leftmost-longest


(re *Regexp) Expand(dst []byte, template []byte, src []byte, match []int) []byte
(re *Regexp) ExpandString(dst []byte, template string, src string, match []int) []byte
match 是通过 (*Regexp).FindSubmatchIndex 解析 原文本 src 获得的 次级匹配Index
接着把从 src 中次级匹配出来的内容替换至 template 中
最后将 替换好的 template 追加至 dst 的末尾

在 template 中，待替换的部分使用 $1 $2 （次级匹配顺序） 以及 ${<name1>} ${<name2>} （次级匹配名称）表示
待替换变量采取最长识别模式， eg. $1x 将被认为是 ${1x} 而非 ${1}x
若要输出美元符号 $，请使用 $$

EG.
reg := regexp.MustCompile(`(\w+),(\w+)`)
src := []byte("Golang,World!")           // 源文本
dst := []byte("Say: ")                   // 目标文本
template := []byte("Hello $1, Hello $2") // 模板
match := reg.FindSubmatchIndex(src)      // 解析源文本
// 填写模板，并将模板追加到目标文本中
fmt.Printf("%q", reg.Expand(dst, template, src, match))
// "Say: Hello Golang, Hello World"


(re *Regexp) Match(b []byte) bool
(re *Regexp) MatchReader(r io.RuneReader) bool
(re *Regexp) MatchString(s string) bool
返回 字节流/Rune/字符串 是否完全匹配上了正则表达式


(re *Regexp) ReplaceAll(src, repl []byte) []byte
完全用 repl 中的内容替换 src 中匹配上 re 的内容
repl 的书写方法见 (*Regexp).Expand 的 template

EG.
re, _ := regexp.Compile("(\\w)at[^\\w]")
src := []byte("cat pat nat gate")
repl := []byte("${1}Wow! ")
res := re.ReplaceAll(src, repl)
fmt.Printf("%v\n", string(res))


(re *Regexp) ReplaceAllFunc(src []byte, repl func([]byte) []byte) []byte
自定义替换函数


(re *Regexp) ReplaceAllLiteral(src, repl []byte) []byte
(re *Regexp) ReplaceAllLiteralString(src, repl string) string
repl 将使用字面值替换，也就是说，完全忽视 (* Regexp).Expand 的中的 $1 ${name} 规则
直接用字符替换匹配上的部分


(re *Regexp) ReplaceAllString(src, repl string) string
(re *Regexp) ReplaceAllStringFunc(src string, repl func(string) string) string
字符串版本的 Replace


(re *Regexp) Split(s string, n int) []string
从 原字符串 中移除匹配上的字符，并依据移除字符产生的间隔将剩余字符分组

n>0 时，最多分成 n 组，若没有分解完成，则最后一组为剩余所有的字符串
n=0 时，返回 nil
n<0 时，完整分组

eg.
src := "cat sat mat"
re, _ := regexp.Compile("at")
fmt.Printf("%#v\n", re.Split(src, -1))
// 返回 []string{"c"," s"," m",""}