Skip to content

Commit

Permalink
refactor: fork from sax-js
Browse files Browse the repository at this point in the history
  • Loading branch information
lddubeau committed Jun 29, 2018
1 parent 5aee216 commit 813db06
Show file tree
Hide file tree
Showing 13 changed files with 2,915 additions and 556 deletions.
1 change: 1 addition & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# contributors sorted by whether or not they're me.
Louis-Dominique Dubeau <ldd@lddubeau.com>
Isaac Z. Schlueter <i@izs.me>
Stein Martin Hustad <stein@hustad.com>
Mikeal Rogers <mikeal.rogers@gmail.com>
Expand Down
22 changes: 22 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,5 +1,27 @@
The ISC License

Copyright (c) Contributors

Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR
IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

====

The following license is the one that governed sax, from which saxes
was forked. Isaac Schlueter is not *directly* involved with saxes so
don't go bugging him for saxes issues.

The ISC License

Copyright (c) Isaac Z. Schlueter and Contributors

Permission to use, copy, modify, and/or distribute this software for any
Expand Down
106 changes: 36 additions & 70 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,38 @@
# sax js
# saxes

A sax-style parser for XML and HTML.
A sax-style non-validating parser for XML.

Saxes is a fork of [sax-js](https://github.com/isaacs/sax-js)
1.2.4. All references to sax in this project's documentation are
references to sax 1.2.4.

Designed with [node](http://nodejs.org/) in mind, but should work fine in
the browser or other CommonJS implementations.

## What This Is

* A very simple tool to parse through an XML string.
* A stepping stone to a streaming HTML parser.
* A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML
docs.

## What This Is (probably) Not
## Notable Differences from sax-js.

* An HTML Parser - That's a fine goal, but this isn't it. It's just
XML.
* A DOM Builder - You can use it to build an object model out of XML,
but it doesn't do that out of the box.
* XSLT - No DOM = no querying.
* 100% Compliant with (some other SAX implementation) - Most SAX
implementations are in Java and do a lot more than this does.
* An XML Validator - It does a little validation when in strict mode, but
not much.
* A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic
masochism.
* A DTD-aware Thing - Fetching DTDs is a much bigger job.
* Saxes aims to be much stricter than sax-js with regards to XML
well-formedness. sax-js, even in its so-called "strict mode", is not
strict. It silently accepts structures that are not well-formed
XML. Projects that need absolute compliance with well-formedness
constraints cannot use sax-js as-is.
* Saxes does not support HTML, or anything short of XML.
* Saxes does not aim to support antiquated platforms.

## Regarding `<!DOCTYPE`s and `<!ENTITY`s

The parser will handle the basic XML entities in text nodes and attribute
values: `&amp; &lt; &gt; &apos; &quot;`. It's possible to define additional
entities in XML by putting them in the DTD. This parser doesn't do anything
with that. If you want to listen to the `ondoctype` event, and then fetch
the doctypes, and read the entities and add them to `parser.ENTITIES`, then
be my guest.

Unknown entities will fail in strict mode, and in loose mode, will pass
through unmolested.
The parser will handle the basic XML entities in text nodes and
attribute values: `&amp; &lt; &gt; &apos; &quot;`. It's possible to
define additional entities in XML by putting them in the DTD. This
parser doesn't do anything with that. If you want to listen to the
`ondoctype` event, and then fetch the doctypes, and read the entities
and add them to `parser.ENTITIES`, then be my guest.

## Usage

```javascript
var sax = require("./lib/sax"),
strict = true, // set to false for html-mode
parser = sax.parser(strict);
var saxes = require("./lib/saxes"),
parser = saxes.parser();

parser.onerror = function (e) {
// an error happened.
Expand All @@ -66,32 +54,29 @@ parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();

// stream usage
// takes the same options as the parser
var saxStream = require("sax").createStream(strict, options)
saxStream.on("error", function (e) {
var saxesStream = require("saxes").createStream(options)
saxesStream.on("error", function (e) {
// unhandled errors will throw, since this is a proper node
// event emitter.
console.error("error!", e)
// clear the error
this._parser.error = null
this._parser.resume()
})
saxStream.on("opentag", function (node) {
saxesStream.on("opentag", function (node) {
// same object as above
})
// pipe is supported, and it's readable/writable
// same chunks coming in also go out.
fs.createReadStream("file.xml")
.pipe(saxStream)
.pipe(saxesStream)
.pipe(fs.createWriteStream("file-copy.xml"))
```


## Arguments

Pass the following arguments to the parser function. All are optional.

`strict` - Boolean. Whether or not to be a jerk. Default: `false`.

`opt` - Object bag of settings regarding string formatting. All default to `false`.

Settings supported:
Expand Down Expand Up @@ -132,8 +117,6 @@ document where the parser currently is looking.
`closed` - Boolean indicating whether or not the parser can be written to.
If it's `true`, then wait for the `ready` event to write again.

`strict` - Boolean indicating whether or not the parser is a jerk.

`opt` - Any options passed into the constructor.

`tag` - The current tag being dealt with.
Expand All @@ -152,8 +135,8 @@ When using the stream interface, assign handlers using the EventEmitter

`error` - Indication that something bad happened. The error will be hanging
out on `parser.error`, and must be deleted before parsing can continue. By
listening to this event, you can keep an eye on that kind of stuff. Note:
this happens *much* more in strict mode. Argument: instance of `Error`.
listening to this event, you can keep an eye on that kind of stuff.
Argument: instance of `Error`.

`text` - Text node. Argument: string of text.

Expand All @@ -165,29 +148,25 @@ processing instructions have implementation dependent semantics.

`sgmldeclaration` - Random SGML declarations. Stuff like `<!ENTITY p>`
would trigger this kind of event. This is a weird thing to support, so it
might go away at some point. SAX isn't intended to be used to parse SGML,
might go away at some point. saxes isn't intended to be used to parse SGML,
after all.

`opentagstart` - Emitted immediately when the tag name is available,
but before any attributes are encountered. Argument: object with a
`name` field and an empty `attributes` set. Note that this is the
same object that will later be emitted in the `opentag` event.

`opentag` - An opening tag. Argument: object with `name` and `attributes`.
In non-strict mode, tag names are uppercased, unless the `lowercase`
option is set. If the `xmlns` option is set, then it will contain
`opentag` - An opening tag. Argument: object with `name` and
`attributes`. If the `xmlns` option is set, then it will contain
namespace binding information on the `ns` member, and will have a
`local`, `prefix`, and `uri` member.

`closetag` - A closing tag. In loose mode, tags are auto-closed if their
parent closes. In strict mode, well-formedness is enforced. Note that
self-closing tags will have `closeTag` emitted immediately after `openTag`.
Argument: tag name.
`closetag` - A closing tag. Note that self-closing tags will have
`closeTag` emitted immediately after `openTag`. Argument: tag name.

`attribute` - An attribute node. Argument: object with `name` and `value`.
In non-strict mode, attribute names are uppercased, unless the `lowercase`
option is set. If the `xmlns` option is set, it will also contains namespace
information.
`attribute` - An attribute node. Argument: object with `name` and
`value`. If the `xmlns` option is set, it will also contains
namespace information.

`comment` - A comment node. Argument: the string of the comment.

Expand All @@ -210,16 +189,3 @@ signal the end of a namespace binding.

`ready` - Indication that the stream has reset, and is ready to be written
to.

`noscript` - In non-strict mode, `<script>` tags trigger a `"script"`
event, and their contents are not checked for special xml characters.
If you pass `noscript: true`, then this behavior is suppressed.

## Reporting Problems

It's best to write a failing test if you find an issue. I will always
accept pull requests with failing tests if they demonstrate intended
behavior, but it is very hard to figure out what issue you're describing
without a test. Writing a test is also the best way for you yourself
to figure out if you really understand the issue you think you have with
sax-js.
5 changes: 5 additions & 0 deletions commitlint.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"use strict";

module.exports = {
extends: ["@commitlint/config-angular"],
};
File renamed without changes.
Loading

0 comments on commit 813db06

Please sign in to comment.