gosax
is a Go library for XML SAX (Simple API for XML) parsing, supporting read-only functionality. This library is
designed for efficient and memory-conscious XML parsing, drawing inspiration from various sources to provide a
performant parser.
- Read-only SAX parsing: Stream and process XML documents without loading the entire document into memory.
- Efficient parsing: Utilizes techniques inspired by
quick-xml
andpkg/json
for high performance. - SWAR (SIMD Within A Register): Optimizations for fast text processing, inspired by
memchr
. - Compatibility with encoding/xml: Includes utility functions to bridge
gosax
types withencoding/xml
types, facilitating easy integration with existing code that uses the standard library.
goos: darwin
goarch: arm64
pkg: github.com/orisano/gosax
BenchmarkReader_Event-12 5 211845800 ns/op 1103.30 MB/s 2097606 B/op 6 allocs/op
To install gosax
, use go get
:
go get github.com/orisano/gosax
Here is a basic example of how to use gosax
to parse an XML document:
package main
import (
"fmt"
"log"
"strings"
"github.com/orisano/gosax"
)
func main() {
xmlData := `<root><element>Value</element></root>`
reader := strings.NewReader(xmlData)
r := gosax.NewReader(reader)
for {
e, err := r.Event()
if err != nil {
log.Fatal(err)
}
if e.Type() == gosax.EventEOF {
break
}
fmt.Println(string(e.Bytes))
}
// Output:
// <root>
// <element>
// Value
// </element>
// </root>
}
Important Note for encoding/xml Users:
When migrating from
encoding/xml
togosax
, note that self-closing tags are handled differently. To mimicencoding/xml
behavior, setgosax.Reader.EmitSelfClosingTag
totrue
. This ensures self-closing tags are recognized and processed correctly.
If you are used to encoding/xml
's Token
, start with gosax.TokenE
.
Note: Using gosax.TokenE
and gosax.Token
involves memory allocation due to interfaces.
Before:
var dec *xml.Decoder
for {
tok, err := dec.Token()
if err == io.EOF {
break
}
// ...
}
After:
var dec *gosax.Reader
for {
tok, err := gosax.TokenE(dec.Event())
if err == io.EOF {
break
}
// ...
}
xmlb
is an extension for gosax
to simplify rewriting code from encoding/xml
. It provides a higher-performance bridge for XML parsing and processing.
Before:
var dec *xml.Decoder
for {
tok, err := dec.Token()
if err == io.EOF {
break
}
switch t := tok.(type) {
case xml.StartElement:
// ...
case xml.CharData:
// ...
case xml.EndElement:
// ...
}
}
After:
var dec *xmlb.Decoder
for {
tok, err := dec.Token()
if err == io.EOF {
break
}
switch tok.Type() {
case xmlb.StartElement:
t, _ := tok.StartElement()
// ...
case xmlb.CharData:
t, _ := tok.CharData()
// ...
case xmlb.EndElement:
t := tok.EndElement()
// ...
}
}
This library is licensed under the terms specified in the LICENSE file.
gosax
is inspired by the following projects and resources:
- Dave Cheney's GopherCon SG 2023 Talk
- quick-xml
- memchr (SWAR part)
Contributions are welcome! Please fork the repository and submit pull requests.
For any questions or feedback, feel free to open an issue on the GitHub repository.