Skip to content
/ gosax Public

Go library for XML SAX (Simple API for XML) parsing

License

Notifications You must be signed in to change notification settings

orisano/gosax

Repository files navigation

gosax

Go Reference

gosax is a Go library for XML SAX (Simple API for XML) parsing, supporting read-only functionality. This library is designed for efficient and memory-conscious XML parsing, drawing inspiration from various sources to provide a performant parser.

Features

  • Read-only SAX parsing: Stream and process XML documents without loading the entire document into memory.
  • Efficient parsing: Utilizes techniques inspired by quick-xml and pkg/json for high performance.
  • SWAR (SIMD Within A Register): Optimizations for fast text processing, inspired by memchr.
  • Compatibility with encoding/xml: Includes utility functions to bridge gosax types with encoding/xml types, facilitating easy integration with existing code that uses the standard library.

Benchmark

goos: darwin
goarch: arm64
pkg: github.com/orisano/gosax
BenchmarkReader_Event-12    	       5	 211845800 ns/op	1103.30 MB/s	 2097606 B/op	       6 allocs/op

Installation

To install gosax, use go get:

go get github.com/orisano/gosax

Usage

Here is a basic example of how to use gosax to parse an XML document:

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/orisano/gosax"
)

func main() {
	xmlData := `<root><element>Value</element></root>`
	reader := strings.NewReader(xmlData)

	r := gosax.NewReader(reader)
	for {
		e, err := r.Event()
		if err != nil {
			log.Fatal(err)
		}
		if e.Type() == gosax.EventEOF {
			break
		}
		fmt.Println(string(e.Bytes))
	}
	// Output:
	// <root>
	// <element>
	// Value
	// </element>
	// </root>
}

Bridging with encoding/xml

Important Note for encoding/xml Users:

When migrating from encoding/xml to gosax, note that self-closing tags are handled differently. To mimic encoding/xml behavior, set gosax.Reader.EmitSelfClosingTag to true. This ensures self-closing tags are recognized and processed correctly.

Using TokenE

If you are used to encoding/xml's Token, start with gosax.TokenE. Note: Using gosax.TokenE and gosax.Token involves memory allocation due to interfaces.

Before:

var dec *xml.Decoder
for {
	tok, err := dec.Token()
	if err == io.EOF {
		break
	}
	// ...
}

After:

var dec *gosax.Reader
for {
	tok, err := gosax.TokenE(dec.Event())
	if err == io.EOF {
		break
	}
	// ...
}

Utilizing xmlb

xmlb is an extension for gosax to simplify rewriting code from encoding/xml. It provides a higher-performance bridge for XML parsing and processing.

Before:

var dec *xml.Decoder
for {
	tok, err := dec.Token()
	if err == io.EOF {
		break
	}
	switch t := tok.(type) {
	case xml.StartElement:
		// ...
	case xml.CharData:
		// ...
	case xml.EndElement:
		// ...
	}
} 

After:

var dec *xmlb.Decoder
for {
	tok, err := dec.Token()
	if err == io.EOF {
		break
	}
	switch tok.Type() {
	case xmlb.StartElement:
		t, _ := tok.StartElement()
		// ...
	case xmlb.CharData:
		t, _ := tok.CharData()
		// ...
	case xmlb.EndElement:
		t := tok.EndElement()
		// ...
	}
} 

License

This library is licensed under the terms specified in the LICENSE file.

Acknowledgements

gosax is inspired by the following projects and resources:

Contributing

Contributions are welcome! Please fork the repository and submit pull requests.

Contact

For any questions or feedback, feel free to open an issue on the GitHub repository.

About

Go library for XML SAX (Simple API for XML) parsing

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages