Skip to content

Latest commit

 

History

History
51 lines (45 loc) · 1.76 KB

README.md

File metadata and controls

51 lines (45 loc) · 1.76 KB

regeXML

A Python library to support parsing regeXML expressions.

Tired of trying to wrap your tiny brain around difficult-to-understand regular expressions?

<[a-z]+>.*</[a-z]+>

What does that even mean? It's impossible to know!

What if I told you there was a human-readable method of expressing your regular expression?

<?xml version="1.0" ?>
<expression type="regular" dialect="posix"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="regexml.xsd">
 <character encoding="utf-8" locale="en-US"><![CDATA[<]]></character>
 <repeat type="oneormore">
  <characterset>
   <character encoding="utf-8" locale="en-US">a</character>
   <character encoding="utf-8" locale="en-US">z</character>
  </characterset>
 </repeat>
 <character encoding="utf-8" locale="en-US"><![CDATA[>]]></character>
 <repeat type="zeroormore">
  <wildcard />
 </repeat>
 <character encoding="utf-8" locale="en-US"><![CDATA[<]]></character>
 <character encoding="utf-8" locale="en-US">/</character>
 <repeat type="oneormore">
  <characterset>
   <character encoding="utf-8" locale="en-US">a</character>
   <character encoding="utf-8" locale="en-US">z</character>
  </characterset>
 </repeat>
 <character encoding="utf-8" locale="en-US"><![CDATA[>]]></character>
</expression>

Obviously, this example is much easier to read in regeXML format. It has the following additional benefits:

  • Easier for a computer to read, parse, serialize and transmit.
  • Is more extensible than standard regex syntax.
  • Can be validated against a schema to ensure correctness.
  • Can be easily and safely embedded into other XML-based documents for composability.

Usage:

import regexml
regex_object = regexml.re_from_xml(some_xml_string)
regex_object.search(...)
regex_object.match(...)