-
Notifications
You must be signed in to change notification settings - Fork 5
/
regex.h
142 lines (134 loc) · 4.95 KB
/
regex.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
/**
* # `regex.h`
*
* This is a regular expression evaluator based on code from the article
* "Regular Expressions: Languages, algorithms, and software"
* By Brian W. Kernighan and Rob Pike in the April 1999 issue of
* Dr. Dobb's Journal
*
* I've found the article online at
* <http://www.ddj.com/dept/architect/184410904>
*
* ## Usage
*
* * `rx_match()` is used to match text to a pattern.
* * `rx_search()` is used to find the position of text matching a pattern
* for further processing.
* * `rx_sub()` is used to substitute a piece of text matching a regex with
* some other text.
* * `rx_gsub()` is used to substitute all pieces of the text matching a regex
* with some other text.
*
* ## Syntax
* The evaluator implements only a subset of the usual regular expression
* language.
*
* * The `'*'`, `'+'` and `'?'` operators
* * Character sets `'[abc]'` with ranges (`'[a-c]'`) and inversion (`'[!abc]'`)
* To match a `'-'` in the character set, place it before the closing `']'`,
* like `'[abcde-]'`
* * Special characters can be escaped using the characters '\'
* * Additionally, a '\' can be used for specific character classes:
* * `\a` - Alphabetic characters: `[a-zA-Z]`
* * `\w` - Word characters: `[a-zA-Z0-9]`
* * `\d` - Digits: `[0-9]`
* * `\u` - Uppercase characters: `[A-Z]`
* * `\l` - Lowercase characters: `[a-z]`
* * `\x` - Hexadecimal digits: `[0-9a-fA-F]`
* * `\s` - Whitespace characters
* * Upper case versions of these are used to negate the class. For example,
* `\A` matches anything that is not an alphabetic character: `[!a-zA-Z]`
* * Case-insensitive matching is enabled with `'\i'` and disabled with `'\I'`
*
* The tradeoff is a very compact implementation.
*
* To use this module in your program, you can include "regex.h" at the top,
* and link against the compiled objact file.
*
* Features not supported, but found in other regex engines:
*
* * Alternation `|`
* * Grouping and submatch extraction `(abc)`
* * The `{m,n}` operator
*
* ### License
*
* Author: Werner Stoop
* This is free and unencumbered software released into the public domain.
* See http://unlicense.org/ for more details.
*
* ## API
* ### Functions
*/
#ifndef REGEX_H
#define REGEX_H
#if defined(__cplusplus) || defined(c_plusplus)
extern "C"
{
#endif
/**
* #### `int rx_match (const char *text, const char *re)`
* Checks whether the text `text` contains the regular expression `re`.
*
* It returns non-zero if the regular expression `re` was found in `text`,
* zero otherwise
*/
int rx_match (const char *text, const char *re);
/**
* #### `int rx_search (const char *text, const char *re, const char **beg, const char **end)`
* Checks whether the text `text` contains the regular expression `re` and
* extracts the match.
*
* It uses greedy matching internally to locate the leftmost longest match.
*
* If `text` matches the expression the char pointer pointed to
* by `beg` will contain the address of the first character in `text`
* that matched `re` and `end` will contain the address of the last
* character in `text` that matched `re`.
*
* It returns non-zero if the regular expression `re` was found in `text`,
* zero otherwise.
*/
int rx_search (const char *text, const char *re, const char **beg,
const char **end);
/**
* #### `char *rx_sub (const char *text, const char *re, const char *sub)`
* Substitutes the first occurance of `text` that matches
* `re` with `sub`.
*
* Using a `'&'` in sub will replace that part of `sub` with the part of
* `text` that matched `re`.
*
* For example, `rx_sub("#foooo#", "fo+", "|&|")` will return `"#|foooo|#"`.
*
* Use a `'/'` to escape the `'&'` (eg `'/&'`) and use a `'//'` to have a single
* `'/'` (`'/'` was chosen to avoid C's use of the `'\'` causing confusion).
*
* For example `rx_sub("#foooo#", "fo+", "// /&")` will return `"#/ &#"`.
*
* It returns the result that should be `free()`'d afterwards.
* It may return `NULL` on a `malloc()` failure.
*/
char *rx_sub (const char *text, const char *re, const char *sub);
/**
* #### `char *rx_gsub (const char *text, const char *re, const char *sub)`
* Substitutes all occurances of `text` that matches `re` with `sub`.
*
* Using a `'&'` in sub will replace that part of `sub` with the part of
* `text` that matched `re`.
*
* For example, `rx_gsub("#foooo#", "fo+", "|&|")` will return `"#|foooo|#"`.
*
* Use a `'/'` to escape the `'&'` (eg `'/&'`) and use a `'//'` to have a single
* `'/'` (`'/'` was chosen to avoid C's use of the `'\'` causing confusion).
*
* For example `match_sub("#foooo#", "fo+", "// /&")` will return `"#/ &#"`.
*
* It returns the result that should be `free()`'d afterwards.
* It may return `NULL` on a `malloc()` failure.
*/
char *rx_gsub (const char *text, const char *re, const char *sub);
#if defined(__cplusplus) || defined(c_plusplus)
} /* extern "C" */
#endif
#endif /* REGEX_H */