diff --git a/INSTALL b/INSTALL index f3d8351..8e0986f 100644 --- a/INSTALL +++ b/INSTALL @@ -1,6 +1,6 @@ Welcome to Parsley! -Parsley depends on +Parsley depends on - argp (standard with Linux, other platforms use argp-standalone package) - the JSON C library from http://oss.metaparadigm.com/json-c/ (I used 0.8) - pcre (with dev headers) @@ -12,17 +12,17 @@ Here's how to install it: 1. Get the release ------------------------------------------------------------------------ -Parsley is currently still being tracked in git, and isn't ready to make a +Parsley is currently still being tracked in git, and isn't ready to make a formal release. So you need to either clone or download the latest tarball: git clone git://github.com/fizx/parsley.git -or +or wget http://github.com/fizx/parsley/tarball/master 2. Build for your platform ------------------------------------------------------------------------ -Enter your parsley working directory, (from the clone or download you +Enter your parsley working directory, (from the clone or download you just made) and, based on your platform, do the following: @@ -56,7 +56,7 @@ make sudo make install If you have a few extra minutes, consider replacing the last make with a -'make check' and let us know if it reports any failures from the test +'make check' and let us know if it reports any failures from the test suite - thanks! 3. Ruby Binding (via Gems) diff --git a/INTRO b/INTRO index 119147a..9e7cc2c 100644 --- a/INTRO +++ b/INTRO @@ -17,17 +17,17 @@ In order to make this easy to learn, let's keep the best of what's working today Now for some examples: -- 3rd paragraph: +- 3rd paragraph: p:nth-child(3) - First sentence in that paragraph (period-delimited): substring-before(p:nth-child(3), '.') - Any simple phone number in an ordered list called "numbers" re:match(ul#numbers>li, '\d{3}-\d{4}', 'g') - + We support all of CSS3, XPath1, as well as all functions in XSLT 1.0 and EXSLT (required+regexp). I think this is a pretty good way to grab a single piece of data from a page. It's simple and gives you all of the tools (CSS for simplicity, XPath for power, regex for detailed text handling) you are used to, in one expression. - + We'd like to make our scraper script both portable and fast. For both these reasons, we need to be able to express the structure of the scraped data independently of the general-purpose programming language you happen to be working in. Jumping from XPath to Python and back means multiple passes over the document, and Python idioms prevent easy use of your scraper by Rubyists. If we can represent the entire scrape in a language-independent way, we can compile it into something that libxml2 can handle in one pass, giving screaming-fast (milliseconds per parse) performance. To describe the output structure, lets use json. It's compact, and the Ruby/Python/etc bindings can use hashes/lists/dictionaries to represent the same structure. We can also have the scraper output json or native data structures. Here's an example script that grabs the title and all hyperlinks on a page: @@ -36,21 +36,21 @@ To describe the output structure, lets use json. It's compact, and the Ruby/Pyt "title": "h1", "links": ["a"] } - + Applying this to http://www.yelp.com/biz/amnesia-san-francisco yields: { "title": "Amnesia", "links": ["Yelp", "Welcome", "About Me", ... ] } - + You'll note that the output structure mirrors the input structure. In the Ruby binding, you can get both input and output natively: > require "open-uri" > require "parsley" > Parsley.new({"title" => "h1", "links" => ["a"]}).parse(:url => "http://www.yelp.com/biz/amnesia-san-francisco") #=> {"title"=>"Amnesia", "links"=>["Yelp", "Welcome", "About Me"]} - + We'll also add both explicit and implicit grouping Here's an extension of the previous example with explicit grouping: { @@ -60,7 +60,7 @@ We'll also add both explicit and implicit grouping Here's an extension of the p "link": "@href" }] } - + The json structure in the output still mirrors the input, but now you can get both the link text and the href. Pages like craigslist are slightly trickier to group. Elements on this page go h4, p, p, p, h4, p, p, p. To group this, you could do: @@ -72,7 +72,7 @@ Pages like craigslist are slightly trickier to group. Elements on this page go }] } -If you instead wanted to group by date, you could use implicit grouping. It's implicit, because the parenthesized filter is omitted. Grouping happens by page order. We treat the first single (i.e. non-square-bracketed) value (the h4 in the below example) as the beginning of a new group, and adds following values to the group (i.e.: [h4, p, p, p], [h4, p, p], [h4, p]). +If you instead wanted to group by date, you could use implicit grouping. It's implicit, because the parenthesized filter is omitted. Grouping happens by page order. We treat the first single (i.e. non-square-bracketed) value (the h4 in the below example) as the beginning of a new group, and adds following values to the group (i.e.: [h4, p, p, p], [h4, p, p], [h4, p]). { "entry":[{ @@ -80,5 +80,5 @@ If you instead wanted to group by date, you could use implicit grouping. It's i "title": ["p"] }] } - + \ No newline at end of file diff --git a/Makefile.am b/Makefile.am index 3f71916..3ea35d3 100644 --- a/Makefile.am +++ b/Makefile.am @@ -21,7 +21,7 @@ profile: install-all: ./bootstrap.sh && ./configure && make && make install && cd ruby && rake install && cd ../python && python setup.py install - + bench: @echo "yelp..."; ./parsley test/yelp.let test/yelp.html > /dev/null @echo "craigs-simple..."; ./parsley test/craigs-simple.let test/craigs-simple.html > /dev/null @@ -73,4 +73,3 @@ check-am: @echo "default-namespace..."; ./parsley -x test/default-namespace.let test/default-namespace.xml 2>&1 | diff test/default-namespace.json - && echo " success." @echo "sg-wrap..."; ./parsley -s test/sg-wrap.let test/sg-wrap.html 2>&1 | diff test/sg-wrap.json - && echo " success." @echo "collate_regression..."; ./parsley test/collate_regression.let test/collate_regression.html 2>&1 | diff test/collate_regression.json - && echo " success." - \ No newline at end of file diff --git a/Makefile.in b/Makefile.in index a21b1b1..1889d09 100644 --- a/Makefile.in +++ b/Makefile.in @@ -327,7 +327,7 @@ parser.h: parser.c rm -f parser.c; \ $(MAKE) $(AM_MAKEFLAGS) parser.c; \ else :; fi -libparsley.la: $(libparsley_la_OBJECTS) $(libparsley_la_DEPENDENCIES) +libparsley.la: $(libparsley_la_OBJECTS) $(libparsley_la_DEPENDENCIES) $(LINK) -rpath $(libdir) $(libparsley_la_OBJECTS) $(libparsley_la_LIBADD) $(LIBS) install-binPROGRAMS: $(bin_PROGRAMS) @$(NORMAL_INSTALL) @@ -372,10 +372,10 @@ clean-binPROGRAMS: list=`for p in $$list; do echo "$$p"; done | sed 's/$(EXEEXT)$$//'`; \ echo " rm -f" $$list; \ rm -f $$list -parsley$(EXEEXT): $(parsley_OBJECTS) $(parsley_DEPENDENCIES) +parsley$(EXEEXT): $(parsley_OBJECTS) $(parsley_DEPENDENCIES) @rm -f parsley$(EXEEXT) $(LINK) $(parsley_OBJECTS) $(parsley_LDADD) $(LIBS) -parsleyc$(EXEEXT): $(parsleyc_OBJECTS) $(parsleyc_DEPENDENCIES) +parsleyc$(EXEEXT): $(parsleyc_OBJECTS) $(parsleyc_DEPENDENCIES) @rm -f parsleyc$(EXEEXT) $(LINK) $(parsleyc_OBJECTS) $(parsleyc_LDADD) $(LIBS) diff --git a/PAPER b/PAPER index 19cddf5..047044a 100644 --- a/PAPER +++ b/PAPER @@ -27,7 +27,7 @@ Features Examples - Ruby/python/json - structural parse -- +- Benchmarks - size comparision with XSLT diff --git a/Portfile b/Portfile index 9091bc7..dd062ac 100644 --- a/Portfile +++ b/Portfile @@ -14,5 +14,5 @@ depends_lib port:argp-standalone \ port:json-c \ port:libxslt \ port:pcre - + checksums md5 5e4d9080aa4ed2dfa7996c89a8e7f719 sha1 9508eea67212d9a9620eac3fe3719c91e00e11d9 rmd160 dfa9cee2fdb41ac750d47288d5128f1963a84334 diff --git a/Portfile.in b/Portfile.in index daa19b2..ba80ed8 100644 --- a/Portfile.in +++ b/Portfile.in @@ -14,4 +14,4 @@ depends_lib port:argp-standalone \ port:json-c \ port:libxslt \ port:pcre - + diff --git a/README.C-LANG b/README.C-LANG index 402347c..ef72cf6 100644 --- a/README.C-LANG +++ b/README.C-LANG @@ -1,6 +1,6 @@ To use parsley from C, the following functions are available from parsley.h. In addition, there is a function to convert xml documents of the type returned by -parsley into json. +parsley into json. You will also need passing familiarity with libxml2 and json-c to print, manipulate, and free some of the generated objects. @@ -19,23 +19,23 @@ parsleyPtr parsley_compile(char* parsley, char* incl) Arguments: - char* parsley -- a string of parsley to compile. - - char* incl -- arbitrary XSLT to inject directly into the stylesheet, + - char* incl -- arbitrary XSLT to inject directly into the stylesheet, outside any templates. - + Returns: A structure that you can pass to parsley_parse_* to do the actual parsing. This structure contains the compiled XSLT. - - Notes: This is *NOT* thread-safe. (Usage of the parselet via parsley_parse_* *IS* + + Notes: This is *NOT* thread-safe. (Usage of the parselet via parsley_parse_* *IS* thread-safe, however.) - + void parsley_set_user_agent(char *); Sets the user-agent used by parsley's internal http library. - + void parsley_free(parsleyPtr); Frees the parsleyPtr's memory. - + void parsed_parsley_free(parsedParsleyPtr); Frees the parsedParsleyPtr's memory. @@ -54,20 +54,20 @@ parsedParsleyPtr parsley_parse_file(parsleyPtr parsley, char* file_name, int fla PARSLEY_OPTIONS_COLLATE = 16, PARSLEY_OPTIONS_SGWRAP = 32 }; - - Returns: A libxml2 document of the extracted data. You need to free this - with xmlFree(). To output, look at the libxml2 documentation for functions - like xmlSaveFormatFile(). If you want json output, look below for xml2json - docs. + + Returns: A libxml2 document of the extracted data. You need to free this + with xmlFree(). To output, look at the libxml2 documentation for functions + like xmlSaveFormatFile(). If you want json output, look below for xml2json + docs. parsedParsleyPtr parsley_parse_string(parsleyPtr parsley, char* string, size_t len, char * base_uri, int flags); - Parses the in-memory string/length combination given. See parsley_parse_file + Parses the in-memory string/length combination given. See parsley_parse_file docs. - + parsedParsleyPtr parsley_parse_doc(parsleyPtr parsley, xmlDocPtr doc, bool prune); - Uses the parsley parser to parse a libxml2 document. + Uses the parsley parser to parse a libxml2 document. From xml2json.h =============== @@ -75,18 +75,18 @@ From xml2json.h struct json_object * xml2json(xmlNodePtr); Converts an xml subtree to json. The xml should be in the format returned - by parsley. Basically, xml attributes get ignored, and if you want an array + by parsley. Basically, xml attributes get ignored, and if you want an array like [a,b], use: - - + + a b - + To get a null-terminated string out, use: - + json_object_to_json_string(struct json_object *) - + To free (actually, to decrement the reference count), call: - + json_object_put(struct json_object *) \ No newline at end of file diff --git a/TODO b/TODO index 221e549..5cccf47 100644 --- a/TODO +++ b/TODO @@ -34,6 +34,6 @@ - saxon compatibility?! - XML input converter?! - check windows build -- flags?! +- flags?! ^ - force group-before $ - force group-after \ No newline at end of file diff --git a/aclocal.m4 b/aclocal.m4 index 93dff77..b19459e 100644 --- a/aclocal.m4 +++ b/aclocal.m4 @@ -7899,7 +7899,7 @@ _LT_DECL(, macro_revision, 0) # included after everything else. This provides aclocal with the # AC_DEFUNs it wants, but when m4 processes it, it doesn't do anything # because those macros already exist, or will be overwritten later. -# We use AC_DEFUN over AU_DEFUN for compatibility with aclocal-1.6. +# We use AC_DEFUN over AU_DEFUN for compatibility with aclocal-1.6. # # Anytime we withdraw an AC_DEFUN or AU_DEFUN, remember to add it here. # Yes, that means every name once taken will need to remain here until diff --git a/functions.c b/functions.c index 7edd37c..9385fce 100644 --- a/functions.c +++ b/functions.c @@ -27,7 +27,7 @@ void parsley_register_all(){ xsltInnerXmlFunction); } -static void +static void xsltStarXMLFunction(xmlXPathParserContextPtr ctxt, int nargs, bool is_inner) { if (nargs != 1) { xsltTransformError(xsltXPathGetTransformContext(ctxt), NULL, NULL, @@ -208,16 +208,16 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI "document() : internal error tctxt == NULL\n"); valuePush(ctxt, xmlXPathNewNodeSet(NULL)); return; - } - + } + uri = xmlParseURI((const char *) URI); if (uri == NULL) { xsltTransformError(tctxt, NULL, NULL, "document() : failed to parse URI\n"); valuePush(ctxt, xmlXPathNewNodeSet(NULL)); return; - } - + } + /* * check for and remove fragment identifier */ @@ -231,12 +231,12 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI } else idoc = xsltLoadHtmlDocument(tctxt, URI); xmlFreeURI(uri); - + if (idoc == NULL) { if ((URI == NULL) || (URI[0] == '#') || ((tctxt->style->doc != NULL) && - (xmlStrEqual(tctxt->style->doc->URL, URI)))) + (xmlStrEqual(tctxt->style->doc->URL, URI)))) { /* * This selects the stylesheet's doc itself. @@ -257,7 +257,7 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI valuePush(ctxt, xmlXPathNewNodeSet((xmlNodePtr) doc)); return; } - + /* use XPointer of HTML location for fragment ID */ #ifdef LIBXML_XPTR_ENABLED xptrctxt = xmlXPtrNewContext(doc, NULL, NULL); @@ -270,11 +270,11 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI resObj = xmlXPtrEval(fragment, xptrctxt); xmlXPathFreeContext(xptrctxt); #endif - xmlFree(fragment); + xmlFree(fragment); if (resObj == NULL) goto out_fragment; - + switch (resObj->type) { case XPATH_NODESET: break; @@ -288,11 +288,11 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI case XPATH_RANGE: case XPATH_LOCATIONSET: xsltTransformError(tctxt, NULL, NULL, - "document() : XPointer does not select a node set: #%s\n", + "document() : XPointer does not select a node set: #%s\n", fragment); goto out_object; } - + valuePush(ctxt, resObj); return; @@ -303,7 +303,7 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI valuePush(ctxt, xmlXPathNewNodeSet(NULL)); } -xsltDocumentPtr +xsltDocumentPtr xsltLoadHtmlDocument(xsltTransformContextPtr ctxt, const xmlChar *URI) { xsltDocumentPtr ret; xmlDocPtr doc; @@ -316,7 +316,7 @@ xsltLoadHtmlDocument(xsltTransformContextPtr ctxt, const xmlChar *URI) { */ if (ctxt->sec != NULL) { int res; - + res = xsltCheckRead(ctxt->sec, ctxt, URI); if (res == 0) { xsltTransformError(ctxt, NULL, NULL, diff --git a/parsed_xpath.c b/parsed_xpath.c index e1933ac..acad79c 100644 --- a/parsed_xpath.c +++ b/parsed_xpath.c @@ -4,9 +4,9 @@ // __pxpath_node * next; // __pxpath_node * child; // } pxpath_node; -// +// // typedef pxpath_node pxpathPtr; -// +// // enum { // PXPATH_FUNCTION, // PXPATH_PATH @@ -29,14 +29,14 @@ pxpathPtr pxpath_new(int type, char* value) { } // pxpathPtr pxpath_cat_paths(int n, va_list) { -// +// // } // pxpathPtr pxpath_new_paths(int n, va_list) { -// +// // } static void -_pxpath_to_string(pxpathPtr ptr, struct printbuf *buf) { +_pxpath_to_string(pxpathPtr ptr, struct printbuf *buf) { if(ptr == NULL) return; if(ptr->type == PXPATH_FUNCTION) { sprintbuf(buf, "%s(", ptr->value); @@ -53,7 +53,7 @@ _pxpath_to_string(pxpathPtr ptr, struct printbuf *buf) { } } -static char * +static char * format_n(int n) { char * out = calloc(2 * n + 1, 1); for(int i =0; i < n; i++) { @@ -129,7 +129,7 @@ pxpathPtr pxpath_cat_paths(int n, ...) { return out; } -static pxpathPtr +static pxpathPtr pxpath_new_thing(int n, va_list va) { struct printbuf *buf = printbuf_new(); for(int i = 0; i < n; i++) { diff --git a/parser.c b/parser.c index 7ef2829..19c6ff1 100644 --- a/parser.c +++ b/parser.c @@ -2614,7 +2614,7 @@ yyuserMerge (int yyn, YYSTYPE* yy0, YYSTYPE* yy1) switch (yyn) { - + default: break; } } @@ -4262,7 +4262,7 @@ void init_xpath_alias() { xmlHashAddEntry(alias_hash, "match", "regexp:match"); xmlHashAddEntry(alias_hash, "replace", "regexp:replace"); xmlHashAddEntry(alias_hash, "test", "regexp:test"); - xmlHashAddEntry(alias_hash, "with-newlines", "lib:nl"); + xmlHashAddEntry(alias_hash, "with-newlines", "lib:nl"); } pxpathPtr myparse(char* string){ @@ -4281,4 +4281,4 @@ void answer(pxpathPtr a){ void start_debugging(){ yydebug = 1; return; -} +} diff --git a/parser.h b/parser.h index e0aca8e..55d6aa1 100644 --- a/parser.h +++ b/parser.h @@ -177,8 +177,8 @@ void answer(pxpathPtr); #define PXP(A) pxpath_new_path(1, A) #define LIT(A) pxpath_new_literal(1, A) #define OP(A) pxpath_new_operator(1, A) -#define APPEND(A, S) pxpath_cat_paths(2, A, PXP(S)); -#define PREPEND(A, S) pxpath_cat_paths(2, PXP(S), A); +#define APPEND(A, S) pxpath_cat_paths(2, A, PXP(S)); +#define PREPEND(A, S) pxpath_cat_paths(2, PXP(S), A); #define PXPWRAP(A, B, C) pxpath_cat_paths(3, PXP(A), B, PXP(C)) #define P4E(A, B, C, D) pxpath_cat_paths(4, A, PXP(B), C, PXP(D)) #define P4O(A, B, C, D) pxpath_cat_paths(4, PXP(A), B, PXP(C), D) @@ -187,11 +187,11 @@ void answer(pxpathPtr); #define TRACE(A, B) fprintf(stderr, "trace(%s): ", A); fprintf(stderr, "%s\n", pxpath_to_string(B)); #endif - + #if ! defined YYSTYPE && ! defined YYSTYPE_IS_DECLARED -typedef union YYSTYPE +typedef union YYSTYPE #line 53 "parser.y" { int empty; diff --git a/parser.y b/parser.y index 3e5d11a..25a7bc8 100644 --- a/parser.y +++ b/parser.y @@ -33,8 +33,8 @@ void answer(pxpathPtr); #define PXP(A) pxpath_new_path(1, A) #define LIT(A) pxpath_new_literal(1, A) #define OP(A) pxpath_new_operator(1, A) -#define APPEND(A, S) pxpath_cat_paths(2, A, PXP(S)); -#define PREPEND(A, S) pxpath_cat_paths(2, PXP(S), A); +#define APPEND(A, S) pxpath_cat_paths(2, A, PXP(S)); +#define PREPEND(A, S) pxpath_cat_paths(2, PXP(S), A); #define PXPWRAP(A, B, C) pxpath_cat_paths(3, PXP(A), B, PXP(C)) #define P4E(A, B, C, D) pxpath_cat_paths(4, A, PXP(B), C, PXP(D)) #define P4O(A, B, C, D) pxpath_cat_paths(4, PXP(A), B, PXP(C), D) @@ -43,7 +43,7 @@ void answer(pxpathPtr); #define TRACE(A, B) fprintf(stderr, "trace(%s): ", A); fprintf(stderr, "%s\n", pxpath_to_string(B)); #endif - + %} %glr-parser @@ -104,9 +104,9 @@ void answer(pxpathPtr); %token XDIV %token XMOD %token XCOMMENT -%token XTEXT -%token XPI -%token XNODE +%token XTEXT +%token XPI +%token XNODE %token CXEQUATION %token CXOPHE %token CXOPNE @@ -225,93 +225,93 @@ LocationPath | AbsoluteLocationPath %dprec 2 | selectors_group %dprec 3 ; - + AbsoluteLocationPath : SLASH RelativeLocationPath { $$ = PREP_OP($1, $2); } | SLASH { $$ = PXP($1); } | AbbreviatedAbsoluteLocationPath ; - + RelativeLocationPath - : Step + : Step | RelativeLocationPath SLASH Step { $$ = BIN_OP($1, $2, $3); } | AbbreviatedRelativeLocationPath ; - + Step : AxisSpecifier NodeTest { $$ = pxpath_cat_paths(2, $1, $2); } | AxisSpecifier NodeTest Predicates { $$ = pxpath_cat_paths(3, $1, $2, $3); } | AbbreviatedStep { $$ = PXP($1); } ; - + AxisSpecifier - : AxisName DBLCOLON { $$ = pxpath_new_path(2, $1, $2); } + : AxisName DBLCOLON { $$ = pxpath_new_path(2, $1, $2); } | AbbreviatedAxisSpecifier { $$ = PXP($1); } ; AxisName - : XANCESTOR - | XANCESTORSELF - | XATTR - | XCHILD - | XDESC - | XDESCSELF - | XFOLLOW - | XFOLLOWSIB - | XNS - | XPARENT - | XPRE - | XPRESIB - | XSELF - ; - + : XANCESTOR + | XANCESTORSELF + | XATTR + | XCHILD + | XDESC + | XDESCSELF + | XFOLLOW + | XFOLLOWSIB + | XNS + | XPARENT + | XPRE + | XPRESIB + | XSELF + ; + NodeTest - : NameTest + : NameTest | NodeType LPAREN RPAREN { $$ = pxpath_new_path(3, $1, $2, $3); } | XPI LPAREN Literal RPAREN { $$ = pxpath_new_path(4, $1, $2, $3, $4); } ; - + Predicates : Predicates Predicate { $$ = pxpath_cat_paths(2, $1, $2); } | Predicate ; - + Predicate : LBRA PredicateExpr RBRA { $$ = PXPWRAP($1, $2, $3); } ; - + PredicateExpr : Expr ; - + AbbreviatedAbsoluteLocationPath : DBLSLASH RelativeLocationPath { $$ = PREP_OP($1, $2); } ; - + AbbreviatedRelativeLocationPath : RelativeLocationPath DBLSLASH Step { $$ = BIN_OP($1, $2, $3); } ; - + AbbreviatedStep : DOT | DBLDOT ; - + AbbreviatedAxisSpecifier : AT | { $$ = ""; } ; Expr - : LPAREN Argument RPAREN %dprec 2 { $$ = PXPWRAP($1, $2, $3); } + : LPAREN Argument RPAREN %dprec 2 { $$ = PXPWRAP($1, $2, $3); } | OrExpr %dprec 1 ; PrimaryExpr - : VariableReference - | LPAREN Expr RPAREN { $$ = PXPWRAP($1, $2, $3); } + : VariableReference + | LPAREN Expr RPAREN { $$ = PXPWRAP($1, $2, $3); } | Literal { $$ = LIT($1); } - | Number + | Number | FunctionCall ; - + FunctionCall : FunctionName LPAREN Arguments RPAREN { $$ = pxpath_new_func(xpath_alias(pxpath_to_string($1)), $3); } ; @@ -327,38 +327,38 @@ Argument : OptS Expr OptS { $$ = $2; } ; UnionExpr - : PathExpr + : PathExpr | UnionExpr PIPE PathExpr { $$ = BIN_OP($1, $2, $3); } ; - + PathExpr - : LocationPath - | FilterExpr + : LocationPath + | FilterExpr | FilterExpr SLASH RelativeLocationPath { $$ = BIN_OP($1, $2, $3); } | FilterExpr DBLSLASH RelativeLocationPath { $$ = BIN_OP($1, $2, $3); } ; - + FilterExpr - : PrimaryExpr + : PrimaryExpr | FilterExpr Predicates { $$ = pxpath_cat_paths(2, $1, $2); } ; - + OrExpr - : AndExpr + : AndExpr | OrExpr XOR AndExpr { $$ = LIT_BIN_OP($1, $2, $3); } ; - + AndExpr - : EqualityExpr + : EqualityExpr | AndExpr XAND EqualityExpr { $$ = LIT_BIN_OP($1, $2, $3); } ; - + EqualityExpr - : RelationalExpr + : RelationalExpr | EqualityExpr EQ RelationalExpr { $$ = LIT_BIN_OP($1, $2, $3); } - | EqualityExpr CXOPNE RelationalExpr { $$ = LIT_BIN_OP($1, $2, $3); } + | EqualityExpr CXOPNE RelationalExpr { $$ = LIT_BIN_OP($1, $2, $3); } ; - + RelationalExpr : AdditiveExpr %dprec 2 | RelationalExpr OptS LT OptS AdditiveExpr %dprec 3 { $$ = LIT_BIN_OP($1, $3, $5); } @@ -381,10 +381,10 @@ MultiplicativeExpr ; UnaryExpr - : UnionExpr + : UnionExpr | DASH UnaryExpr { $$ = PREP_OP($1, $2); } ; - + Literal : STRING ; @@ -394,28 +394,28 @@ Number | NUMBER DOT NUMBER { $$ = pxpath_new_literal(3, $1, $2, $3); } | DOT NUMBER { $$ = pxpath_new_literal(2, $1, $2); } ; - + MultiplyOperator : SPLAT ; - + VariableReference : DOLLAR QName { $$ = PREP_OP($1, $2); } ; - + NameTest : SPLAT { $$ = PXP($1); } | NCName COLON SPLAT { $$ = pxpath_new_path(3, $1, $2, $3); } | QName ; - + NodeType - : XCOMMENT - | XTEXT - | XPI + : XCOMMENT + | XTEXT + | XPI | XNODE ; - + FunctionName : QName ; @@ -428,7 +428,7 @@ QName PrefixedName : Prefix COLON LocalPart { $$ = pxpath_new_path(3, $1, $2, $3); } ; - + UnprefixedName : LocalPart ; @@ -460,16 +460,16 @@ selector : simple_selector_sequence combinator selector { $$ = pxpath_cat_paths(3, $1, PXP($2), $3); } | simple_selector_sequence ; - + combinator - : OptS PLUS OptS { $$ = "/following-sibling::*[1]/self::"; } - | OptS GT OptS { $$ = "/"; } - | OptS TILDE OptS { $$ = "/following-sibling::*/self::"; } + : OptS PLUS OptS { $$ = "/following-sibling::*[1]/self::"; } + | OptS GT OptS { $$ = "/"; } + | OptS TILDE OptS { $$ = "/following-sibling::*/self::"; } | S { $$ = "//"; } ; simple_selector_sequence - : simple_selector_anchor + : simple_selector_anchor | possibly_empty_sequence LBRA type_selector OptS CXOPHE OptS StringLike OptS RBRA { $$ = pxpath_cat_paths(10, $1, PXP("[@"), $3, PXP(" = "), $7, PXP(" or starts-with(@"), $3, PXP(", concat("), $7, PXP(", '-' ))]")); } | possibly_empty_sequence CXNOT LPAREN selectors_group RPAREN { $$ = pxpath_cat_paths(5, PXP("set-difference("), $1, PXP(", "), $4, PXP(")")); } | possibly_empty_sequence HASH Ident { $$ = P4E($1, "[@id='", $3, "']"); } @@ -519,7 +519,7 @@ simple_selector_sequence | possibly_empty_sequence CXBUTTON { $$ = INPUT_TYPE($1, button); } | possibly_empty_sequence CXFILE { $$ = INPUT_TYPE($1, file); } ; - + possibly_empty_sequence : simple_selector_sequence | { $$ = PXP("*"); } @@ -532,7 +532,7 @@ simple_selector_anchor type_selector : namespace_prefix element_name { $$ = pxpath_cat_paths(3, $1, PXP(":"), $2); } - | element_name + | element_name ; namespace_prefix @@ -561,7 +561,7 @@ Ident | BSLASHLIT Ident { $$ = pxpath_cat_paths(2, PXP($1 + 1), $2); } | keyword { $$ = PXP($1); } ; - + keyword : XANCESTOR | XANCESTORSELF @@ -585,17 +585,17 @@ keyword | XPI | XNODE ; - + StringLike - : Ident + : Ident | STRING { $$ = pxpath_new_literal(1, $1); } ; OptS - : S { $$ = 0; } + : S { $$ = 0; } | { $$ = 0; } ; - + %% char* xpath_alias(char* key) { @@ -609,7 +609,7 @@ void init_xpath_alias() { xmlHashAddEntry(alias_hash, "match", "regexp:match"); xmlHashAddEntry(alias_hash, "replace", "regexp:replace"); xmlHashAddEntry(alias_hash, "test", "regexp:test"); - xmlHashAddEntry(alias_hash, "with-newlines", "lib:nl"); + xmlHashAddEntry(alias_hash, "with-newlines", "lib:nl"); } pxpathPtr myparse(char* string){ @@ -628,4 +628,4 @@ void answer(pxpathPtr a){ void start_debugging(){ yydebug = 1; return; -} \ No newline at end of file +} \ No newline at end of file diff --git a/parsley.c b/parsley.c index 940db4f..65929ff 100644 --- a/parsley.c +++ b/parsley.c @@ -34,7 +34,7 @@ struct ll { struct ll *next; }; -static char* +static char* arepl(char* orig, char* old, char* new) { // printf("y\n"); char* ptr = strdup(orig); @@ -54,7 +54,7 @@ arepl(char* orig, char* old, char* new) { } static char * -full_key_name(contextPtr c) { +full_key_name(contextPtr c) { if(c == NULL || c->parent == NULL) return strdup("/"); static struct ll * last = NULL; while(c->parent != NULL) { @@ -140,9 +140,9 @@ parsedParsleyPtr parsley_parse_string(parsleyPtr parsley, char* string, size_t s static char * xpath_of(xmlNodePtr node) { if(node == NULL || node->name == NULL || node->parent == NULL) return strdup("/"); - + struct ll * ptr = (struct ll *) calloc(sizeof(struct ll), 1); - + while(node->name != NULL && node->parent != NULL) { if(node->ns == NULL) { struct ll * tmp = (struct ll *) calloc(sizeof(struct ll), 1); @@ -152,7 +152,7 @@ xpath_of(xmlNodePtr node) { } node = node->parent; } - + struct printbuf *buf = printbuf_new(); sprintbuf(buf, ""); while(ptr->name != NULL) { @@ -162,7 +162,7 @@ xpath_of(xmlNodePtr node) { free(last); } free(ptr); - + char *str = strdup(strlen(buf->buf) ? buf->buf : "/"); printbuf_free(buf); return str; @@ -180,7 +180,7 @@ int compare_pos (const void * a, const void * b) return atoi(as) - atoi(bs); } -static void +static void _xmlAddChild(xmlNodePtr parent, xmlNodePtr child) { xmlNodePtr node = parent->children; if(node == NULL) { @@ -193,7 +193,7 @@ _xmlAddChild(xmlNodePtr parent, xmlNodePtr child) { node->next = child; } -static int +static int _xmlChildElementCount(xmlNodePtr n) { xmlNodePtr child = n->children; int i = 0; @@ -205,7 +205,7 @@ _xmlChildElementCount(xmlNodePtr n) { } static bool -xml_empty(xmlNodePtr xml) { +xml_empty(xmlNodePtr xml) { // fprintf(stderr, "%s\n", xml->name); xmlNodePtr child = xml->children; while(child != NULL) { @@ -217,8 +217,8 @@ xml_empty(xmlNodePtr xml) { return true; } -static void -collate(xmlNodePtr xml) { +static void +collate(xmlNodePtr xml) { if(xml == NULL) return ; if(xml->type != XML_ELEMENT_NODE) return; if(xml->ns != NULL && !strcmp(xml->ns->prefix, "parsley") && !strcmp(xml->name, "zipped")){ @@ -226,13 +226,13 @@ collate(xmlNodePtr xml) { xmlNodePtr child = xml->children; if(child == NULL) return; int n = _xmlChildElementCount(xml); - + xmlNodePtr* name_nodes = calloc(n, sizeof(xmlNodePtr)); xmlNodePtr* lists = calloc(n, sizeof(xmlNodePtr)); bool* empty = calloc(n, sizeof(bool)); bool* multi = calloc(n, sizeof(bool)); bool* optional = calloc(n, sizeof(bool)); - + int len = 0; for(int i = 0; i < n; i++) { name_nodes[i] = child; @@ -259,7 +259,7 @@ collate(xmlNodePtr xml) { xmlNodePtr* sortable = malloc(len * sizeof(xmlNodePtr)); int j = 0; - + for(int i = 0; i < n; i++) { xmlNodePtr node = lists[i]; while(node != NULL){ @@ -268,18 +268,18 @@ collate(xmlNodePtr xml) { node = node->next; } } - + for(int i = 0; i < len; i++) { sortable[i]->next = NULL; } - + qsort(sortable, len, sizeof(xmlNodePtr), compare_pos); - + xmlNodePtr groups = xml->parent; groups->children = NULL; xmlNodePtr group; xmlNodePtr* targets = calloc(sizeof(xmlNodePtr), n); - + for(j = 0; j < len; j++) { int i = sortable[j]->parent->extra; if (j == 0 || (!empty[i] && !multi[i] && !optional[i])) { // first or full @@ -292,12 +292,12 @@ collate(xmlNodePtr xml) { if(multi[k]) targets[k] = xmlNewChild(targets[k], xml->ns, "groups", NULL); } } - + if(!multi[i]) sortable[j] = sortable[j]->children; if(empty[i] || multi[i]) _xmlAddChild(targets[i], sortable[j]); empty[i] = false; } - + free(targets); free(name_nodes); free(lists); @@ -305,7 +305,7 @@ collate(xmlNodePtr xml) { free(empty); free(multi); free(sortable); - + collate(groups); } else { xmlNodePtr child = xml->children; @@ -325,7 +325,7 @@ static void unlink(xmlNodePtr parent, xmlNodePtr child) { if(child == NULL || parent == NULL) return; xmlNodePtr ptr = parent->children; - + if(ptr == child) { parent->children = child->next; if(child->next != NULL) { @@ -339,7 +339,7 @@ unlink(xmlNodePtr parent, xmlNodePtr child) { } ptr = ptr->next; } - } + } child->next = NULL; child->prev = NULL; child->parent = NULL; @@ -348,8 +348,8 @@ unlink(xmlNodePtr parent, xmlNodePtr child) { static void visit(parsedParsleyPtr ptr, xmlNodePtr xml, char* err); -static void -prune(parsedParsleyPtr ptr, xmlNodePtr xml, char* err) { +static void +prune(parsedParsleyPtr ptr, xmlNodePtr xml, char* err) { if(xml == NULL || is_root(xml)) return; bool optional = xmlGetProp(xml, "optional") != NULL; if(optional) { @@ -372,23 +372,23 @@ prune(parsedParsleyPtr ptr, xmlNodePtr xml, char* err) { } static void -visit(parsedParsleyPtr ptr, xmlNodePtr xml, char* err) { +visit(parsedParsleyPtr ptr, xmlNodePtr xml, char* err) { if(xml == NULL) return; // printf("trying to visit: %s\n", xml->name); if(xml->type != XML_ELEMENT_NODE) return; xmlNodePtr child = xml->children; xmlNodePtr parent = xml->parent; if(parent == NULL) return; - + // printf("passed guard clause: %s\n", xml->name); - + if(xml_empty(xml)) { if(err == NULL) asprintf(&err, "%s was empty", xpath_of(xml)); - + prune(ptr, xml, err); } else if(err != NULL) { free(err); - } + } while(err == NULL && child != NULL){ child->parent = xml; visit(ptr, child, err); @@ -398,7 +398,7 @@ visit(parsedParsleyPtr ptr, xmlNodePtr xml, char* err) { static parsedParsleyPtr current_ptr = NULL; -void +void parsleyXsltError(void * ctx, const char * msg, ...) { if(current_ptr == NULL) return; va_list ap; @@ -409,7 +409,7 @@ parsleyXsltError(void * ctx, const char * msg, ...) { vasprintf(&tmp, msg, ap); tmp2 = arepl(tmp, "xmlXPathCompOpEval: ", ""); current_ptr->error = arepl(tmp2, "\n", ""); - + free(tmp); free(tmp2); } @@ -421,7 +421,7 @@ hasDefaultNS(xmlDocPtr doc) { return xmlSearchNs(doc, doc->children, NULL) != NULL; } -static void +static void _killDefaultNS(xmlNodePtr node) { if(node == NULL) return; @@ -433,7 +433,7 @@ _killDefaultNS(xmlNodePtr node) { if(ns->prefix == NULL) prev->next = ns->next; } } - + ns = node->ns; if(ns != NULL) { if(ns->prefix == NULL) node->ns = ns->next; @@ -442,12 +442,12 @@ _killDefaultNS(xmlNodePtr node) { if(ns->prefix == NULL) prev->next = ns->next; } } - + _killDefaultNS(node->children); _killDefaultNS(node->next); } -void +void killDefaultNS(xmlDocPtr doc) { if(hasDefaultNS(doc)) { _killDefaultNS(doc->children); @@ -456,23 +456,23 @@ killDefaultNS(xmlDocPtr doc) { parsedParsleyPtr parsley_parse_doc(parsleyPtr parsley, xmlDocPtr doc, int flags) { killDefaultNS(doc); - + parsedParsleyPtr ptr = (parsedParsleyPtr) calloc(sizeof(parsed_parsley), 1); ptr->error = NULL; ptr->parsley = parsley; - + parsley_io_set_mode(flags); xsltTransformContextPtr ctxt = xsltNewTransformContext(parsley->stylesheet, doc); xmlSetGenericErrorFunc(ctxt, parsleyXsltError); current_ptr = ptr; - - if(flags & PARSLEY_OPTIONS_SGWRAP) { + + if(flags & PARSLEY_OPTIONS_SGWRAP) { doc = parsley_apply_span_wrap(doc); } ptr->xml = xsltApplyStylesheetUser(parsley->stylesheet, doc, NULL, NULL, NULL, ctxt); xsltFreeTransformContext(ctxt); current_ptr = NULL; - + if(ptr->error == NULL) { if(ptr->xml != NULL && ptr->error == NULL) { if(flags & PARSLEY_OPTIONS_COLLATE) collate(ptr->xml->children); @@ -485,11 +485,11 @@ parsedParsleyPtr parsley_parse_doc(parsleyPtr parsley, xmlDocPtr doc, int flags) return ptr; } -static bool +static bool json_invalid_object(parsleyPtr ptr, struct json_object *json) { json_object_object_foreach(json, key, val) { if(val==NULL) ptr->error = strdup("Parselets can only be made up of strings, arrays, and objects."); - + switch(json_object_get_type(val)) { case json_type_string: break; @@ -523,7 +523,7 @@ json_invalid_object(parsleyPtr ptr, struct json_object *json) { return false; } -static bool +static bool json_invalid(parsleyPtr ptr, struct json_object *json) { if(!json_object_is_type(json, json_type_object)) { ptr->error = strdup("The parselet root must be an object"); @@ -537,7 +537,7 @@ static void free_context(contextPtr c) { if(c->tag != NULL) free(c->tag); if(c->filter != NULL) pxpath_free(c->filter); if(c->expr != NULL) pxpath_free(c->expr); - + if(c->parent != NULL && c->parent->child != NULL) { if(c->parent->child == c) { c->parent->child = NULL; @@ -557,7 +557,7 @@ static void free_context(contextPtr c) { free(c); } -static contextPtr +static contextPtr new_context(struct json_object * json, xmlNodePtr node) { contextPtr c = calloc(sizeof(parsley_context), 1); c->node = node; @@ -570,46 +570,46 @@ new_context(struct json_object * json, xmlNodePtr node) { parsleyPtr parsley_compile(char* parsley_str, char* incl) { parsleyPtr parsley = (parsleyPtr) calloc(sizeof(compiled_parsley), 1); - + if(last_parsley_error != NULL) { free(last_parsley_error); last_parsley_error = NULL; } - + registerEXSLT(); - + // struct json_tokener *tok = json_tokener_new(); // struct json_object *json = json_tokener_parse_ex(tok, parsley_str); - // + // struct json_tokener *tok = json_tokener_new(); struct json_object *json = json_tokener_parse_ex(tok, parsley_str, -1); int error_offset = tok->char_offset; if(tok->err != json_tokener_success) json = error_ptr(-tok->err); json_tokener_free(tok); - + if(is_error(json)) { asprintf(&(parsley->error), "Your parselet is not valid json: %s at char:%d", json_tokener_errors[-(unsigned long) json], error_offset); return parsley; } - + if(json_invalid(parsley, json)) { // fprintf(stderr, "Invalid parselet structure: %s\n", parsley->error); return parsley; } xmlNodePtr node = new_stylesheet_skeleton(incl); - + contextPtr context = new_context(json, node); __parsley_recurse(context); - + json_object_put(json); // frees json parsley->error = last_parsley_error; - + if(parsley->error == NULL) { parsley->stylesheet = xsltParseStylesheetDoc(node->doc); } - + free_context(context); return parsley; } @@ -640,9 +640,9 @@ contextPtr deeper_context(contextPtr context, char* key, struct json_object * va } void parsley_free(parsleyPtr ptr) { - if(ptr->error != NULL) + if(ptr->error != NULL) free(ptr->error); - if(ptr->stylesheet != NULL) + if(ptr->stylesheet != NULL) xsltFreeStylesheet(ptr->stylesheet); free(ptr); } @@ -655,8 +655,8 @@ void yyerror(const char * s) { printbuf_free(buf); } -static bool -all_strings(struct json_object * json) { +static bool +all_strings(struct json_object * json) { json_object_object_foreach(json, key, val) { if(val == NULL || !json_object_is_type(val, json_type_string)) return false; } @@ -682,7 +682,7 @@ static char * inner_path_to_dot(pxpathPtr p) { char *outer = pxpath_to_string(p); char *inner = inner_path(p); - char *repl = NULL; + char *repl = NULL; if(inner != NULL) { repl = arepl(outer, inner, "."); free(inner); @@ -696,18 +696,18 @@ inner_path_transform(contextPtr c) { return c->filter == NULL && c->expr != NULL && inner_path(c->expr) != NULL; } -static char * +static char * resolve_filter(contextPtr c) { return inner_path_transform(c) ? inner_path(c->expr) : pxpath_to_string(c->filter); } -static char * -resolve_expr(contextPtr c) { +static char * +resolve_expr(contextPtr c) { return inner_path_transform(c) ? inner_path_to_dot(c->expr) : pxpath_to_string(c->expr); } static void -render(contextPtr c) { +render(contextPtr c) { char *filter = resolve_filter(c); char *expr = resolve_expr(c); char *scope = filter == NULL ? expr : filter; @@ -719,7 +719,7 @@ render(contextPtr c) { // printf("node %s\n", c->node->name); xmlNsPtr parsley = c->ns; xmlNsPtr xsl = xmlDocGetRootElement(c->node->doc)->ns; - + if(c->array) c->node = xmlNewChild(c->node, parsley, "groups", NULL); if(filtered) { c->node = xmlNewChild(c->node, xsl, "for-each", NULL); @@ -737,7 +737,7 @@ render(contextPtr c) { xmlSetProp(attr, "name", "position"); xmlNodePtr counter = xmlNewChild(attr, xsl, "value-of", NULL); xmlSetProp(counter, "select", "count(preceding::*) + count(ancestor::*)"); - + if(c->string) { c->node = xmlNewChild(c->node, xsl, "value-of", NULL); xmlSetProp(c->node, "select", expr); @@ -745,7 +745,7 @@ render(contextPtr c) { if(magic_children) c->node = xmlNewChild(c->node, parsley, "zipped", NULL); __parsley_recurse(c); } - + if(filter !=NULL) free(filter); if(expr !=NULL) free(expr); } @@ -760,7 +760,7 @@ void __parsley_recurse(contextPtr context) { c->node = xmlAddChild(c->node, xmlNewNode(NULL, c->tag)); if (c->flags & PARSLEY_OPTIONAL) xmlSetProp(c->node, "optional", "true"); render(c); - } + } } @@ -769,12 +769,12 @@ void __parsley_recurse(contextPtr context) { // char* merged = arepl(expr, ".", context->full_expr); // return arepl(merged, "///", "//"); // } -static char* +static char* inner_key_each(struct json_object * json); static char* inner_key_of(struct json_object * json) { switch(json_object_get_type(json)) { - case json_type_string: + case json_type_string: return json_object_get_string(json); case json_type_array: return NULL; @@ -783,7 +783,7 @@ static char* inner_key_of(struct json_object * json) { } } -static char* +static char* inner_key_each(struct json_object * json) { json_object_object_foreach(json, key, val) { char* inner = inner_key_of(val); diff --git a/parsley.h b/parsley.h index ac927c6..36906fd 100644 --- a/parsley.h +++ b/parsley.h @@ -83,5 +83,5 @@ void parsleyXsltError(void * ctx, const char * msg, ...); void parsley_set_user_agent(char const *agent); static contextPtr parsley_parsing_context; - + #endif \ No newline at end of file diff --git a/parsley_main.c b/parsley_main.c index 32c4d02..665dbec 100644 --- a/parsley_main.c +++ b/parsley_main.c @@ -59,7 +59,7 @@ static error_t parse_opt (int key, char *arg, struct argp_state *state) struct list_elem *e; switch (key) - { + { case 'x': arguments->flags &= ~PARSLEY_OPTIONS_HTML; break; @@ -129,12 +129,12 @@ int main (int argc, char **argv) { arguments.include_files = elemptr; arguments.output_file = "-"; argp_parse (&argp, argc, argv, 0, 0, &arguments); - + struct printbuf *buf = printbuf_new(); struct printbuf *incl = printbuf_new(); sprintbuf(buf, ""); sprintbuf(incl, ""); - + FILE * fd = parsley_fopen(arguments.parsley, "r"); printbuf_file_read(fd, buf); fclose(fd); @@ -145,8 +145,8 @@ int main (int argc, char **argv) { printbuf_file_read(f, incl); fclose(f); } - - // printf("a\n"); + + // printf("a\n"); parsleyPtr compiled = parsley_compile(buf->buf, incl->buf); // printf("b\n"); @@ -154,16 +154,16 @@ int main (int argc, char **argv) { fprintf(stderr, "%s\n", compiled->error); exit(1); } - + parsedParsleyPtr ptr = parsley_parse_file(compiled, arguments.input_file, arguments.flags); if(ptr->error != NULL) { fprintf(stderr, "Parsing failed: %s\n", ptr->error); exit(1); } - + if(arguments.output_xml) { - xmlSaveFormatFile(arguments.output_file, ptr->xml, 1); + xmlSaveFormatFile(arguments.output_file, ptr->xml, 1); } else { struct json_object *json = xml2json(ptr->xml->children->children); if(json == NULL) { @@ -176,7 +176,7 @@ int main (int argc, char **argv) { json_object_put(json); fclose(f); } - + printbuf_free(buf); printbuf_free(incl); parsley_free(compiled); diff --git a/parsleyc_main.c b/parsleyc_main.c index 151cf2f..ed50291 100644 --- a/parsleyc_main.c +++ b/parsleyc_main.c @@ -46,7 +46,7 @@ static error_t parse_opt (int key, char *arg, struct argp_state *state) base->next = e; base->has_next = 1; break; - case 'd': + case 'd': // parsley_set_debug_mode(1); break; case 'o': @@ -72,19 +72,19 @@ int main (int argc, char **argv) { struct list_elem elem; struct list_elem *elemptr = &elem; elem.has_next = 0; - + arguments.include_files = elemptr; arguments.output_file = "-"; arguments.parsley = "-"; argp_parse (&argp, argc, argv, 0, 0, &arguments); - + struct printbuf* parsley = printbuf_new(); struct printbuf* incl = printbuf_new(); sprintbuf(parsley, ""); sprintbuf(incl, ""); FILE* in = parsley_fopen(arguments.parsley, "r"); - + printbuf_file_read(in, parsley); while(elemptr->has_next) { elemptr = elemptr->next; @@ -92,17 +92,17 @@ int main (int argc, char **argv) { printbuf_file_read(f, incl); fclose(f); } - + parsleyPtr compiled = parsley_compile(parsley->buf, incl->buf); if(compiled->error != NULL) { fprintf(stderr, "%s\n", compiled->error); exit(1); } - + FILE* fo = parsley_fopen(arguments.output_file, "w"); xmlDocFormatDump(fo, compiled->stylesheet->doc, 1); fclose(fo); - + return 0; } diff --git a/regexp.c b/regexp.c index e1858d8..5a96bb9 100644 --- a/regexp.c +++ b/regexp.c @@ -19,7 +19,7 @@ #include "regexp.h" static void -exsltRegexpFlagsFromString(const xmlChar* flagstr, +exsltRegexpFlagsFromString(const xmlChar* flagstr, int* global, int* flags) { const xmlChar* i = flagstr; @@ -38,7 +38,7 @@ exsltRegexpFlagsFromString(const xmlChar* flagstr, } static int -exsltRegexpExecute(xmlXPathParserContextPtr ctxt, +exsltRegexpExecute(xmlXPathParserContextPtr ctxt, const xmlChar* haystack, const xmlChar* regexp, int flags, int ovector[], int ovector_len) { @@ -75,8 +75,8 @@ exsltRegexpExecute(xmlXPathParserContextPtr ctxt, "exslt:regexp failed to execute %s for %s", regexp, haystack); rc = 0; } - - if (compiled_regexp != NULL) + + if (compiled_regexp != NULL) pcre_free(compiled_regexp); return rc; @@ -84,7 +84,7 @@ exsltRegexpExecute(xmlXPathParserContextPtr ctxt, /** * exsltRegexpMatchFunction: - * @ns: + * @ns: * * Returns a node set of string matches */ @@ -113,7 +113,7 @@ exsltRegexpMatchFunction (xmlXPathParserContextPtr ctxt, int nargs) } else { flagstr = xmlStrdup(""); } - + regexp = xmlXPathPopString(ctxt); if (xmlXPathCheckError(ctxt) || (regexp == NULL)) { xmlFree(flagstr); @@ -140,11 +140,11 @@ exsltRegexpMatchFunction (xmlXPathParserContextPtr ctxt, int nargs) xsltRegisterTmpRVT(tctxt, container); ret = xmlXPathNewNodeSet(NULL); if (ret != NULL) { - ret->boolval = 0; + ret->boolval = 0; exsltRegexpFlagsFromString(flagstr, &global, &flags); working = haystack; - rc = exsltRegexpExecute(ctxt, working, regexp, flags, + rc = exsltRegexpExecute(ctxt, working, regexp, flags, ovector, sizeof(ovector)/sizeof(int)); while (rc > 0) { @@ -161,12 +161,12 @@ exsltRegexpMatchFunction (xmlXPathParserContextPtr ctxt, int nargs) if (!global) break; working = working + ovector[1]; - rc = exsltRegexpExecute(ctxt, working, regexp, flags, + rc = exsltRegexpExecute(ctxt, working, regexp, flags, ovector, sizeof(ovector)/sizeof(int)); } } } - + fail: if (flagstr != NULL) xmlFree(flagstr); @@ -183,7 +183,7 @@ exsltRegexpMatchFunction (xmlXPathParserContextPtr ctxt, int nargs) /** * exsltRegexpReplaceFunction: - * @ns: + * @ns: * * Returns a node set of string matches */ @@ -229,7 +229,7 @@ exsltRegexpReplaceFunction (xmlXPathParserContextPtr ctxt, int nargs) exsltRegexpFlagsFromString(flagstr, &global, &flags); working = haystack; - rc = exsltRegexpExecute(ctxt, working, regexp, flags, + rc = exsltRegexpExecute(ctxt, working, regexp, flags, ovector, sizeof(ovector)/sizeof(int)); while (rc > 0 ) { @@ -246,11 +246,11 @@ exsltRegexpReplaceFunction (xmlXPathParserContextPtr ctxt, int nargs) } result = xmlStrcat(result, replace); } - + working = working + ovector[1]; if (!global) break; - rc = exsltRegexpExecute(ctxt, working, regexp, flags, + rc = exsltRegexpExecute(ctxt, working, regexp, flags, ovector, sizeof(ovector)/sizeof(int)); } @@ -277,11 +277,11 @@ exsltRegexpReplaceFunction (xmlXPathParserContextPtr ctxt, int nargs) /** * exsltRegexpTestFunction: - * @ns: + * @ns: * - * returns true if the string given as the first argument + * returns true if the string given as the first argument * matches the regular expression given as the second argument - * + * */ static void @@ -323,7 +323,7 @@ exsltRegexpTestFunction (xmlXPathParserContextPtr ctxt, int nargs) regexp = xmlStrcat(regexp, "\\Z"); exsltRegexpFlagsFromString(flagstr, &global, &flags); - rc = exsltRegexpExecute(ctxt, haystack, regexp, flags, + rc = exsltRegexpExecute(ctxt, haystack, regexp, flags, ovector, sizeof(ovector)/sizeof(int)); fail: diff --git a/scanner.c b/scanner.c index 92db539..8e0791d 100644 --- a/scanner.c +++ b/scanner.c @@ -33,7 +33,7 @@ #if __STDC_VERSION__ >= 199901L /* C99 says to define __STDC_LIMIT_MACROS before including stdint.h, - * if you want the limit (max/min) macros for int types. + * if you want the limit (max/min) macros for int types. */ #ifndef __STDC_LIMIT_MACROS #define __STDC_LIMIT_MACROS 1 @@ -50,7 +50,7 @@ typedef uint32_t flex_uint32_t; typedef signed char flex_int8_t; typedef short int flex_int16_t; typedef int flex_int32_t; -typedef unsigned char flex_uint8_t; +typedef unsigned char flex_uint8_t; typedef unsigned short int flex_uint16_t; typedef unsigned int flex_uint32_t; #endif /* ! C99 */ @@ -160,7 +160,7 @@ extern FILE *yyin, *yyout; #define EOB_ACT_LAST_MATCH 2 #define YY_LESS_LINENO(n) - + /* Return all but the first "n" matched characters back to the input stream. */ #define yyless(n) \ do \ @@ -227,7 +227,7 @@ struct yy_buffer_state int yy_bs_lineno; /**< The line count. */ int yy_bs_column; /**< The column count. */ - + /* Whether to try to fill the input buffer when we reach the * end of it. */ @@ -692,13 +692,13 @@ char *yytext; YY_BUFFER_STATE mybuffer; void prepare_parse(char* msg) { - mybuffer = yy_scan_string(msg); + mybuffer = yy_scan_string(msg); } void cleanup_parse() { - yy_delete_buffer(mybuffer); + yy_delete_buffer(mybuffer); } - + #line 705 "scanner.c" @@ -734,7 +734,7 @@ extern int yywrap (void ); #endif static void yyunput (int c,char *buf_ptr ); - + #ifndef yytext_ptr static void yy_flex_strncpy (char *,yyconst char *,int ); #endif @@ -756,13 +756,13 @@ static int input (void ); static int yy_start_stack_ptr = 0; static int yy_start_stack_depth = 0; static int *yy_start_stack = NULL; - + static void yy_push_state (int new_state ); - + static void yy_pop_state (void ); - + static int yy_top_state (void ); - + /* Amount of stuff to slurp up with each read. */ #ifndef YY_READ_BUF_SIZE #define YY_READ_BUF_SIZE 8192 @@ -865,7 +865,7 @@ YY_DECL register yy_state_type yy_current_state; register char *yy_cp, *yy_bp; register int yy_act; - + #line 123 "scanner.l" #line 872 "scanner.c" @@ -1726,7 +1726,7 @@ static int yy_get_next_buffer (void) { register yy_state_type yy_current_state; register char *yy_cp; - + yy_current_state = (yy_start); for ( yy_cp = (yytext_ptr) + YY_MORE_ADJ; yy_cp < (yy_c_buf_p); ++yy_cp ) @@ -1780,7 +1780,7 @@ static int yy_get_next_buffer (void) static void yyunput (int c, register char * yy_bp ) { register char *yy_cp; - + yy_cp = (yy_c_buf_p); /* undo effects of setting up yytext */ @@ -1823,7 +1823,7 @@ static int yy_get_next_buffer (void) { int c; - + *(yy_c_buf_p) = (yy_hold_char); if ( *(yy_c_buf_p) == YY_END_OF_BUFFER_CHAR ) @@ -1890,12 +1890,12 @@ static int yy_get_next_buffer (void) /** Immediately switch to a different input stream. * @param input_file A readable stream. - * + * * @note This function does not reset the start condition to @c INITIAL . */ void yyrestart (FILE * input_file ) { - + if ( ! YY_CURRENT_BUFFER ){ yyensure_buffer_stack (); YY_CURRENT_BUFFER_LVALUE = @@ -1908,11 +1908,11 @@ static int yy_get_next_buffer (void) /** Switch to a different input buffer. * @param new_buffer The new input buffer. - * + * */ void yy_switch_to_buffer (YY_BUFFER_STATE new_buffer ) { - + /* TODO. We should be able to replace this entire function body * with * yypop_buffer_state(); @@ -1952,13 +1952,13 @@ static void yy_load_buffer_state (void) /** Allocate and initialize an input buffer state. * @param file A readable stream. * @param size The character buffer size in bytes. When in doubt, use @c YY_BUF_SIZE. - * + * * @return the allocated buffer state. */ YY_BUFFER_STATE yy_create_buffer (FILE * file, int size ) { YY_BUFFER_STATE b; - + b = (YY_BUFFER_STATE) yyalloc(sizeof( struct yy_buffer_state ) ); if ( ! b ) YY_FATAL_ERROR( "out of dynamic memory in yy_create_buffer()" ); @@ -1981,11 +1981,11 @@ static void yy_load_buffer_state (void) /** Destroy the buffer. * @param b a buffer created with yy_create_buffer() - * + * */ void yy_delete_buffer (YY_BUFFER_STATE b ) { - + if ( ! b ) return; @@ -2001,7 +2001,7 @@ static void yy_load_buffer_state (void) #ifndef __cplusplus extern int isatty (int ); #endif /* __cplusplus */ - + /* Initializes or reinitializes a buffer. * This function is sometimes called more than once on the same buffer, * such as during a yyrestart() or at EOF. @@ -2010,7 +2010,7 @@ extern int isatty (int ); { int oerrno = errno; - + yy_flush_buffer(b ); b->yy_input_file = file; @@ -2026,13 +2026,13 @@ extern int isatty (int ); } b->yy_is_interactive = file ? (isatty( fileno(file) ) > 0) : 0; - + errno = oerrno; } /** Discard all buffered characters. On the next scan, YY_INPUT will be called. * @param b the buffer state to be flushed, usually @c YY_CURRENT_BUFFER. - * + * */ void yy_flush_buffer (YY_BUFFER_STATE b ) { @@ -2061,7 +2061,7 @@ extern int isatty (int ); * the current state. This function will allocate the stack * if necessary. * @param new_buffer The new state. - * + * */ void yypush_buffer_state (YY_BUFFER_STATE new_buffer ) { @@ -2091,7 +2091,7 @@ void yypush_buffer_state (YY_BUFFER_STATE new_buffer ) /** Removes and deletes the top of the stack, if present. * The next element becomes the new top. - * + * */ void yypop_buffer_state (void) { @@ -2115,7 +2115,7 @@ void yypop_buffer_state (void) static void yyensure_buffer_stack (void) { int num_to_alloc; - + if (!(yy_buffer_stack)) { /* First allocation is just for 2 elements, since we don't know if this @@ -2126,9 +2126,9 @@ static void yyensure_buffer_stack (void) (yy_buffer_stack) = (struct yy_buffer_state**)yyalloc (num_to_alloc * sizeof(struct yy_buffer_state*) ); - + memset((yy_buffer_stack), 0, num_to_alloc * sizeof(struct yy_buffer_state*)); - + (yy_buffer_stack_max) = num_to_alloc; (yy_buffer_stack_top) = 0; return; @@ -2154,13 +2154,13 @@ static void yyensure_buffer_stack (void) /** Setup the input buffer state to scan directly from a user-specified character buffer. * @param base the character buffer * @param size the size in bytes of the character buffer - * - * @return the newly allocated buffer state object. + * + * @return the newly allocated buffer state object. */ YY_BUFFER_STATE yy_scan_buffer (char * base, yy_size_t size ) { YY_BUFFER_STATE b; - + if ( size < 2 || base[size-2] != YY_END_OF_BUFFER_CHAR || base[size-1] != YY_END_OF_BUFFER_CHAR ) @@ -2189,14 +2189,14 @@ YY_BUFFER_STATE yy_scan_buffer (char * base, yy_size_t size ) /** Setup the input buffer state to scan a string. The next call to yylex() will * scan from a @e copy of @a str. * @param str a NUL-terminated string to scan - * + * * @return the newly allocated buffer state object. * @note If you want to scan bytes that may contain NUL values, then use * yy_scan_bytes() instead. */ YY_BUFFER_STATE yy_scan_string (yyconst char * yystr ) { - + return yy_scan_bytes(yystr,strlen(yystr) ); } @@ -2204,7 +2204,7 @@ YY_BUFFER_STATE yy_scan_string (yyconst char * yystr ) * scan from a @e copy of @a bytes. * @param bytes the byte buffer to scan * @param len the number of bytes in the buffer pointed to by @a bytes. - * + * * @return the newly allocated buffer state object. */ YY_BUFFER_STATE yy_scan_bytes (yyconst char * yybytes, int _yybytes_len ) @@ -2213,7 +2213,7 @@ YY_BUFFER_STATE yy_scan_bytes (yyconst char * yybytes, int _yybytes_len ) char *buf; yy_size_t n; int i; - + /* Get memory for full buffer, including space for trailing EOB's. */ n = _yybytes_len + 2; buf = (char *) yyalloc(n ); @@ -2305,16 +2305,16 @@ static void yy_fatal_error (yyconst char* msg ) /* Accessor methods (get/set functions) to struct members. */ /** Get the current line number. - * + * */ int yyget_lineno (void) { - + return yylineno; } /** Get the input stream. - * + * */ FILE *yyget_in (void) { @@ -2322,7 +2322,7 @@ FILE *yyget_in (void) } /** Get the output stream. - * + * */ FILE *yyget_out (void) { @@ -2330,7 +2330,7 @@ FILE *yyget_out (void) } /** Get the length of the current token. - * + * */ int yyget_leng (void) { @@ -2338,7 +2338,7 @@ int yyget_leng (void) } /** Get the current token. - * + * */ char *yyget_text (void) @@ -2348,18 +2348,18 @@ char *yyget_text (void) /** Set the current line number. * @param line_number - * + * */ void yyset_lineno (int line_number ) { - + yylineno = line_number; } /** Set the input stream. This does not discard the current * input buffer. * @param in_str A readable stream. - * + * * @see yy_switch_to_buffer */ void yyset_in (FILE * in_str ) @@ -2417,7 +2417,7 @@ static int yy_init_globals (void) /* yylex_destroy is for both reentrant and non-reentrant scanners. */ int yylex_destroy (void) { - + /* Pop the buffer stack, destroying each element. */ while(YY_CURRENT_BUFFER){ yy_delete_buffer(YY_CURRENT_BUFFER ); diff --git a/scanner.l b/scanner.l index 8df4c39..27e5a16 100644 --- a/scanner.l +++ b/scanner.l @@ -5,13 +5,13 @@ YY_BUFFER_STATE mybuffer; void prepare_parse(char* msg) { - mybuffer = yy_scan_string(msg); + mybuffer = yy_scan_string(msg); } void cleanup_parse() { - yy_delete_buffer(mybuffer); + yy_delete_buffer(mybuffer); } - + %} %option stack @@ -69,7 +69,7 @@ XAND "and" XDIV "div" XMOD "mod" XCOMMENT "comment" -XTEXT "text" +XTEXT "text" XPI "processing-instruction" XNODE "node" CXEQUATION [0-9]+n diff --git a/test/ambiguous.html b/test/ambiguous.html index 23142fd..9469833 100644 --- a/test/ambiguous.html +++ b/test/ambiguous.html @@ -16,10 +16,10 @@
@@ -57,7 +57,7 @@ title="check this box to search only posting titles"> only search titles - + @@ -194,7 +194,7 @@

Wed Jan 07

RSS (?)
- add to My Yahoo! + add to My Yahoo!


diff --git a/test/collate_regression.html b/test/collate_regression.html index bf39b8e..8c96b71 100644 --- a/test/collate_regression.html +++ b/test/collate_regression.html @@ -42,7 +42,7 @@ - Fleishman-Hillard - Careers + Fleishman-Hillard - Careers