fizx · GunioRobot · Oct 27, 2011
diff --git a/INSTALL b/INSTALL
@@ -1,6 +1,6 @@
 Welcome to Parsley!
 
-Parsley depends on 
+Parsley depends on
 - argp (standard with Linux, other platforms use argp-standalone package)
 - the JSON C library from http://oss.metaparadigm.com/json-c/ (I used 0.8)
 - pcre (with dev headers)
@@ -12,17 +12,17 @@ Here's how to install it:
 
 1. Get the release
 ------------------------------------------------------------------------
-Parsley is currently still being tracked in git, and isn't ready to make a 
+Parsley is currently still being tracked in git, and isn't ready to make a
 formal release. So you need to either clone or download the latest tarball:
 
 git clone git://github.com/fizx/parsley.git
-or 
+or
 wget http://github.com/fizx/parsley/tarball/master
 
 
 2. Build for your platform
 ------------------------------------------------------------------------
-Enter your parsley working directory, (from the clone or download you 
+Enter your parsley working directory, (from the clone or download you
 just made) and, based on your platform, do the following:
 
 
@@ -56,7 +56,7 @@ make
 sudo make install
 
 If you have a few extra minutes, consider replacing the last make with a
-'make check' and let us know if it reports any failures from the test 
+'make check' and let us know if it reports any failures from the test
 suite - thanks!
 
 3. Ruby Binding (via Gems)

diff --git a/INTRO b/INTRO
@@ -17,17 +17,17 @@ In order to make this easy to learn, let's keep the best of what's working today
 
 Now for some examples:
 
--	3rd paragraph: 
+-	3rd paragraph:
 	p:nth-child(3)
 - First sentence in that paragraph (period-delimited):
 	substring-before(p:nth-child(3), '.')
 - Any simple phone number in an ordered list called "numbers"
 	re:match(ul#numbers>li, '\d{3}-\d{4}', 'g')
-	
+
 We support all of CSS3, XPath1, as well as all functions in XSLT 1.0 and EXSLT (required+regexp).
 
 I think this is a pretty good way to grab a single piece of data from a page.  It's simple and gives you all of the tools (CSS for simplicity, XPath for power, regex for detailed text handling) you are used to, in one expression.
-	
+
 We'd like to make our scraper script both portable and fast.  For both these reasons, we need to be able to express the structure of the scraped data independently of the general-purpose programming language you happen to be working in.  Jumping from XPath to Python and back means multiple passes over the document, and Python idioms prevent easy use of your scraper by Rubyists.  If we can represent the entire scrape in a language-independent way, we can compile it into something that libxml2 can handle in one pass, giving screaming-fast (milliseconds per parse) performance.
 
 To describe the output structure, lets use json.  It's compact, and the Ruby/Python/etc bindings can use hashes/lists/dictionaries to represent the same structure.  We can also have the scraper output json or native data structures.  Here's an example script that grabs the title and all hyperlinks on a page:
@@ -36,21 +36,21 @@ To describe the output structure, lets use json.  It's compact, and the Ruby/Pyt
 		  "title": "h1",
 		  "links": ["a"]
 		}
-		
+
 Applying this to http://www.yelp.com/biz/amnesia-san-francisco yields:
 
 		{
 		  "title": "Amnesia",
 		  "links": ["Yelp", "Welcome", "About Me", ... ]
 		}
-		
+
 You'll note that the output structure mirrors the input structure.  In the Ruby binding, you can get both input and output natively:
 
 		> require "open-uri"
 		> require "parsley"
 		> Parsley.new({"title" => "h1", "links" => ["a"]}).parse(:url => "http://www.yelp.com/biz/amnesia-san-francisco")
 		#=> {"title"=>"Amnesia", "links"=>["Yelp", "Welcome", "About Me"]}
-		
+
 We'll also add both explicit and implicit grouping  Here's an extension of the previous example with explicit grouping:
 
 		{
@@ -60,7 +60,7 @@ We'll also add both explicit and implicit grouping  Here's an extension of the p
 				"link": "@href"
 			}]
 		}
-		
+
 The json structure in the output still mirrors the input, but now you can get both the link text and the href.
 
 Pages like craigslist are slightly trickier to group.  Elements on this page go h4, p, p, p, h4, p, p, p. To group this, you could do:
@@ -72,13 +72,13 @@ Pages like craigslist are slightly trickier to group.  Elements on this page go
 			}]
 		}
 
-If you instead wanted to group by date, you could use implicit grouping.  It's implicit, because the parenthesized filter is omitted.  Grouping happens by page order. We treat the first single (i.e. non-square-bracketed) value (the h4 in the below example) as the beginning of a new group, and adds following values to the group (i.e.: [h4, p, p, p], [h4, p, p], [h4, p]).  
+If you instead wanted to group by date, you could use implicit grouping.  It's implicit, because the parenthesized filter is omitted.  Grouping happens by page order. We treat the first single (i.e. non-square-bracketed) value (the h4 in the below example) as the beginning of a new group, and adds following values to the group (i.e.: [h4, p, p, p], [h4, p, p], [h4, p]).
 
 		{
 			"entry":[{
 				"date": "h4",
 				"title": ["p"]
 			}]
 		}
-		
+
 </textarea></html>
diff --git a/Makefile.am b/Makefile.am
@@ -21,7 +21,7 @@ profile:
 
 install-all:
 	./bootstrap.sh && ./configure && make && make install && cd ruby && rake install && cd ../python && python setup.py install
-	
+
 bench:
 	@echo "yelp..."; ./parsley test/yelp.let test/yelp.html > /dev/null
 	@echo "craigs-simple..."; ./parsley test/craigs-simple.let test/craigs-simple.html > /dev/null
@@ -73,4 +73,3 @@ check-am:
 	@echo "default-namespace..."; ./parsley -x test/default-namespace.let test/default-namespace.xml 2>&1 | diff test/default-namespace.json - && echo "    success."
 	@echo "sg-wrap..."; ./parsley -s test/sg-wrap.let test/sg-wrap.html 2>&1 | diff test/sg-wrap.json - && echo "    success."
 	@echo "collate_regression..."; ./parsley test/collate_regression.let test/collate_regression.html 2>&1 | diff test/collate_regression.json - && echo "    success."
-
diff --git a/Makefile.in b/Makefile.in
@@ -327,7 +327,7 @@ parser.h: parser.c
 	  rm -f parser.c; \
 	  $(MAKE) $(AM_MAKEFLAGS) parser.c; \
 	else :; fi
-libparsley.la: $(libparsley_la_OBJECTS) $(libparsley_la_DEPENDENCIES) 
+libparsley.la: $(libparsley_la_OBJECTS) $(libparsley_la_DEPENDENCIES)
 	$(LINK) -rpath $(libdir) $(libparsley_la_OBJECTS) $(libparsley_la_LIBADD) $(LIBS)
 install-binPROGRAMS: $(bin_PROGRAMS)
 	@$(NORMAL_INSTALL)
@@ -372,10 +372,10 @@ clean-binPROGRAMS:
 	list=`for p in $$list; do echo "$$p"; done | sed 's/$(EXEEXT)$$//'`; \
 	echo " rm -f" $$list; \
 	rm -f $$list
-parsley$(EXEEXT): $(parsley_OBJECTS) $(parsley_DEPENDENCIES) 
+parsley$(EXEEXT): $(parsley_OBJECTS) $(parsley_DEPENDENCIES)
 	@rm -f parsley$(EXEEXT)
 	$(LINK) $(parsley_OBJECTS) $(parsley_LDADD) $(LIBS)
-parsleyc$(EXEEXT): $(parsleyc_OBJECTS) $(parsleyc_DEPENDENCIES) 
+parsleyc$(EXEEXT): $(parsleyc_OBJECTS) $(parsleyc_DEPENDENCIES)
 	@rm -f parsleyc$(EXEEXT)
 	$(LINK) $(parsleyc_OBJECTS) $(parsleyc_LDADD) $(LIBS)
 

diff --git a/PAPER b/PAPER
@@ -27,7 +27,7 @@ Features
 Examples
 - Ruby/python/json
 - structural parse
-- 
+-
 
 Benchmarks
 - size comparision with XSLT

diff --git a/Portfile b/Portfile
@@ -14,5 +14,5 @@ depends_lib					port:argp-standalone \
 										port:json-c \
 										port:libxslt \
 										port:pcre
-										
+
 checksums 				md5 5e4d9080aa4ed2dfa7996c89a8e7f719 				sha1 9508eea67212d9a9620eac3fe3719c91e00e11d9 				rmd160 dfa9cee2fdb41ac750d47288d5128f1963a84334
diff --git a/Portfile.in b/Portfile.in
@@ -14,4 +14,4 @@ depends_lib					port:argp-standalone \
 										port:json-c \
 										port:libxslt \
 										port:pcre
-										
+
diff --git a/README.C-LANG b/README.C-LANG
@@ -1,6 +1,6 @@
 To use parsley from C, the following functions are available from parsley.h.  In
 addition, there is a function to convert xml documents of the type returned by
-parsley into json.  
+parsley into json.
 
 You will also need passing familiarity with libxml2 and json-c to print, manipulate, and free some of the generated objects.
 
@@ -19,23 +19,23 @@ parsleyPtr parsley_compile(char* parsley, char* incl)
 
 	Arguments:
 	- char* parsley -- a string of parsley to compile.
-	- char* incl -- arbitrary XSLT to inject directly into the stylesheet, 
+	- char* incl -- arbitrary XSLT to inject directly into the stylesheet,
 		outside any templates.
-		
+
 	Returns: A structure that you can pass to parsley_parse_* to do the actual
 	parsing.  This structure contains the compiled XSLT.
-	
-	Notes: This is *NOT* thread-safe. (Usage of the parselet via parsley_parse_* *IS* 
+
+	Notes: This is *NOT* thread-safe. (Usage of the parselet via parsley_parse_* *IS*
 	thread-safe, however.)
-	
+
 void parsley_set_user_agent(char *);
 
 	Sets the user-agent used by parsley's internal http library.
-	
+
 void parsley_free(parsleyPtr);
 
 	Frees the parsleyPtr's memory.
-	
+
 void parsed_parsley_free(parsedParsleyPtr);
 
   Frees the parsedParsleyPtr's memory.
@@ -54,39 +54,39 @@ parsedParsleyPtr parsley_parse_file(parsleyPtr parsley, char* file_name, int fla
 				PARSLEY_OPTIONS_COLLATE        = 16,
 				PARSLEY_OPTIONS_SGWRAP         = 32
 			};
-	
-	Returns: A libxml2 document of the extracted data.  You need to free this 
-	with xmlFree().  To output, look at the libxml2 documentation for functions 
-	like xmlSaveFormatFile().  If you want json output, look below for xml2json 
-	docs.  
+
+	Returns: A libxml2 document of the extracted data.  You need to free this
+	with xmlFree().  To output, look at the libxml2 documentation for functions
+	like xmlSaveFormatFile().  If you want json output, look below for xml2json
+	docs.
 
 parsedParsleyPtr parsley_parse_string(parsleyPtr parsley, char* string, size_t len, char * base_uri, int flags);
 
-	Parses the in-memory string/length combination given.  See parsley_parse_file 
+	Parses the in-memory string/length combination given.  See parsley_parse_file
 	docs.
-	
+
 parsedParsleyPtr parsley_parse_doc(parsleyPtr parsley, xmlDocPtr doc, bool prune);
 
-	Uses the parsley parser to parse a libxml2 document.  
+	Uses the parsley parser to parse a libxml2 document.
 
 From xml2json.h
 ===============
 
 struct json_object * xml2json(xmlNodePtr);
 
 	Converts an xml subtree to json.  The xml should be in the format returned
-	by parsley.  Basically, xml attributes get ignored, and if you want an array 	
+	by parsley.  Basically, xml attributes get ignored, and if you want an array
 	like [a,b], use:
-	
-		<parsley:groups> 
+
+		<parsley:groups>
 			<parsley:group>a</parsley:group>
 			<parsley:group>b</parsley:group>
 		</parsley:groups>
-		
+
 	To get a null-terminated string out, use:
-	
+
 		json_object_to_json_string(struct json_object *)
-		
+
 	To free (actually, to decrement the reference count), call:
-	
+
 		json_object_put(struct json_object *)
diff --git a/TODO b/TODO
@@ -34,6 +34,6 @@
 - saxon compatibility?!
 - XML input converter?!
 - check windows build
-- flags?! 
+- flags?!
 		^ - force group-before
 		$ - force group-after
diff --git a/aclocal.m4 b/aclocal.m4
@@ -7899,7 +7899,7 @@ _LT_DECL(, macro_revision, 0)
 # included after everything else.  This provides aclocal with the
 # AC_DEFUNs it wants, but when m4 processes it, it doesn't do anything
 # because those macros already exist, or will be overwritten later.
-# We use AC_DEFUN over AU_DEFUN for compatibility with aclocal-1.6. 
+# We use AC_DEFUN over AU_DEFUN for compatibility with aclocal-1.6.
 #
 # Anytime we withdraw an AC_DEFUN or AU_DEFUN, remember to add it here.
 # Yes, that means every name once taken will need to remain here until

diff --git a/functions.c b/functions.c
@@ -27,7 +27,7 @@ void parsley_register_all(){
 		   xsltInnerXmlFunction);
 }
 
-static void 
+static void
 xsltStarXMLFunction(xmlXPathParserContextPtr ctxt, int nargs, bool is_inner) {
 	if (nargs != 1) {
 		xsltTransformError(xsltXPathGetTransformContext(ctxt), NULL, NULL,
@@ -208,16 +208,16 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI
 	    "document() : internal error tctxt == NULL\n");
 	valuePush(ctxt, xmlXPathNewNodeSet(NULL));
 	return;
-    } 
-	
+    }
+
     uri = xmlParseURI((const char *) URI);
     if (uri == NULL) {
 	xsltTransformError(tctxt, NULL, NULL,
 	    "document() : failed to parse URI\n");
 	valuePush(ctxt, xmlXPathNewNodeSet(NULL));
 	return;
-    } 
-    
+    }
+
     /*
      * check for and remove fragment identifier
      */
@@ -231,12 +231,12 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI
     } else
 	idoc = xsltLoadHtmlDocument(tctxt, URI);
     xmlFreeURI(uri);
-    
+
     if (idoc == NULL) {
 	if ((URI == NULL) ||
 	    (URI[0] == '#') ||
 	    ((tctxt->style->doc != NULL) &&
-	    (xmlStrEqual(tctxt->style->doc->URL, URI)))) 
+	    (xmlStrEqual(tctxt->style->doc->URL, URI))))
 	{
 	    /*
 	    * This selects the stylesheet's doc itself.
@@ -257,7 +257,7 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI
 	valuePush(ctxt, xmlXPathNewNodeSet((xmlNodePtr) doc));
 	return;
     }
-	
+
     /* use XPointer of HTML location for fragment ID */
 #ifdef LIBXML_XPTR_ENABLED
     xptrctxt = xmlXPtrNewContext(doc, NULL, NULL);
@@ -270,11 +270,11 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI
     resObj = xmlXPtrEval(fragment, xptrctxt);
     xmlXPathFreeContext(xptrctxt);
 #endif
-    xmlFree(fragment);	
+    xmlFree(fragment);
 
     if (resObj == NULL)
 	goto out_fragment;
-	
+
     switch (resObj->type) {
 	case XPATH_NODESET:
 	    break;
@@ -288,11 +288,11 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI
 	case XPATH_RANGE:
 	case XPATH_LOCATIONSET:
 	    xsltTransformError(tctxt, NULL, NULL,
-		"document() : XPointer does not select a node set: #%s\n", 
+		"document() : XPointer does not select a node set: #%s\n",
 		fragment);
 	goto out_object;
     }
-    
+
     valuePush(ctxt, resObj);
     return;
 
@@ -303,7 +303,7 @@ xsltHtmlDocumentFunctionLoadDocument(xmlXPathParserContextPtr ctxt, xmlChar* URI
     valuePush(ctxt, xmlXPathNewNodeSet(NULL));
 }
 
-xsltDocumentPtr	
+xsltDocumentPtr
 xsltLoadHtmlDocument(xsltTransformContextPtr ctxt, const xmlChar *URI) {
     xsltDocumentPtr ret;
     xmlDocPtr doc;
@@ -316,7 +316,7 @@ xsltLoadHtmlDocument(xsltTransformContextPtr ctxt, const xmlChar *URI) {
      */
     if (ctxt->sec != NULL) {
 	int res;
-	
+
 	res = xsltCheckRead(ctxt->sec, ctxt, URI);
 	if (res == 0) {
 	    xsltTransformError(ctxt, NULL, NULL,
Original file line number	Diff line number	Diff line change
Expand Up		@@ -14,4 +14,4 @@ depends_lib port:argp-standalone \
		port:json-c \
		port:libxslt \
		port:pcre