Skip to content

Create a lens from bottom to top

brianredbeard edited this page Apr 30, 2013 · 2 revisions

Table of Contents

Introduction

This example will show how to create a lens from bottom to top. If your brain is wired so that you prefer a top-down approach, you may also read Creating-a-lens-step-by-step

This example will focus on developing a lens for Debian control file. This file is used to create Debian package.

Test as you write the lens

Writing a lens is easier if you write the tests along the lens.

Let's start by the file layout. Let's create the lens file and the test file:

libdebctrl-augeas$ tree
.
|-- debctrl.aug
`-- tests
    `-- test_debctrl.aug

At every step, you will have to tests the syntax of the lens:

 $ augparse debctrl.aug

And test the lens :

 $ augparse -I. tests/test_debctrl.aug

To run the test continuously, you may want to run this command in a terminal:

  watch -n 1 'augparse debctrl.aug && augparse -I. tests/test_debctrl.aug'

A sample control file

A typical control file has (e-mails are mangled to avoid spam):

Source: libconfig-model-perl
Section: perl
Uploaders: Dominique Dumont <dominique.dumont@xx.yyy>,
           gregor herrmann <gregoa@xxx.yy>
Priority: optional
Build-Depends: debhelper (>= 7.0.0),
               perl-modules (>= 5.10) | libmodule-build-perl
Build-Depends-Indep: perl (>= 5.8.8-12), libcarp-assert-more-perl,
                     libconfig-tiny-perl, libexception-class-perl,
                     libparse-recdescent-perl (>= 1.90.0),
                     liblog-log4perl-perl (>= 1.11)
Maintainer: Debian Perl Group <pkg-perl-maintainers@xx>
Standards-Version: 3.8.2
Vcs-Svn: svn://svn.debian.org/svn/pkg-perl/trunk/libconfig-model-perl
Vcs-Browser: http://svn.debian.org/viewsvn/pkg-perl/trunk/libconfig-model-perl/
Homepage: http://search.cpan.org/dist/Config-Model/

# a comment
Package: libconfig-model-perl
Architecture: all
Depends: ${perl:Depends}, ${misc:Depends},libcarp-assert-more-perl,
         libexception-class-perl, libparse-recdescent-perl (>= 1.90.0),
         liblog-log4perl-perl (>= 1.11)
Suggests: libconfig-tiny-perl,
          libterm-readline-perl-perl | libterm-readline-gnu-perl
Description: describe and edit configuration data
 Config::Model enables project developers to provide an interactive
 configuration editor (graphical, curses based or plain terminal) to
 their users. For this they must:
    - describe the structure and constraint of the project's configuration
    - if the configuration data is not stored in INI file or in Perl data
      file, provide some code to read and write configuration from
      configuration files must be provided
 .
 With the elements above, Config::Model will generate interactive
 configuration editors (with integrated help and data validation).
 These editors can be graphical (with Config::Model::TkUI), curses
 based (with Config::Model::CursesUI) or based on ReadLine.

Note that:

  • Blank lines separate sections between source package and binary package
  • Description text must have a leading space
  • data fields can span several lines. In this case, the "continuation" line must begin with space or tab

The lens squeleton

As described there, let's start with an almost empty lens:

module Debctrl =
  autoload xfm

(* lens must be used with AUG_ROOT set to debian package source directory *)
let xfm = transform lns (incl "/debian/control")

and a test that is almost as empty

module Test_debctrl =

The first lens: source fields

Let's start small by creating a lens and its test (or should I say the test and its lens) for source. This is one keyword (Source) with a single value. Since we are working from top to bottom, we must test this lens snippet and not the whole dectrl lens. Otherwisem we will have to modify this unit test as long as we make the lens more complex (and the resulting tree deeper)

module Test_debctrl =

 (* the source line *)
 let source = "Source: libtest-distmanifest-perl\n"

 (* declare the lens to test and the resulting tree^Wtwig *)
 test Debctrl.source get source =
    { "Source" = "libtest-distmanifest-perl" }

and the lens:

module Debctrl =
  autoload xfm

(* import eol from util.aug lens *)
let eol = Util.eol 

(* keywords and values are separated by a colon *)
let colon = del /:[ \t]*/ ": "

(* note that no space is allowed between "Source" and ':' *)
let source = [ key "Source" . colon . store /[^ \t]+/ . eol ]

(* this lens will get more complex in time *)
let lns = source

(* lens must be used with AUG_ROOT set to debian package source directory *)
let xfm = transform lns (incl "/debian/control")

Now let's test the lens:

 $ augparse debctrl.aug
 $ augparse -I. tests/test_debctrl.aug

No news is good news :-)

More complex lens: Uploaders

This keywords may have several values on several lines. In this case, eol does not matter

Here's the test:

 let uploaders 
   = "Uploaders: foo@bar, Dominique Dumont <dominique.dumont@xx.yyy>,\n"
   . "  gregor herrmann <gregoa@xxx.yy>\n" 
 test Debctrl.uploaders get uploaders =
    { "Uploaders"
       { "1" = "foo@bar"}
       { "2" = "Dominique Dumont <dominique.dumont@xx.yyy>" }
       { "3" = "gregor herrmann <gregoa@xxx.yy>" } }

and the lens:

let del_opt_ws =  del /[\t ]*/ ""

(* lens that defines a "continuation" line in a data field *)
let cont_line = del /\n[ \t]+/ "\n "
let comma     = del  /,[ \t]*/  ","

(* defines comma separated data which may span several lines *)
let sep_comma_with_nl = del_opt_ws . cont_line* . comma . cont_line*

(* 2 regex to catch 2 types of email: Foo Bar <foo@bar> and plain foo@bar *)
let email =  /([A-Za-z]+ )+<[^\n>]+>/ | /[^\n,\t ]+/ 

(* define a function with a keyword and an array of data separated by commas *)
let multi_line_array_entry (k:string) =
    [ key k . colon . [ seq k . store email] . 
      [ seq k . sep_comma_with_nl . store email ]* . eol ]

(* apply the above function to Uploaders fiels *)
let uploaders =
    multi_line_array_entry "Uploaders"

(* now the lens can parse Uploaders and Source *)
let lns = ( uploaders | source )*

Parsing the Package field and factorisation of lenses

Pasing the Package field is similar to the Source field. Instead of duplicating Source lens, one can use Augeas variable to minimize code duplication.

The "meat" of the Source lens is moved to the simple_entry function:

 let simple_entry (k:regexp) =
   let value =  store /[^ \t\n]+/ in
   [ key k .colon . value . eol ]

So the Source lens is now

 let source = simple_entry "Source"

Likewise, the Package lens is

 let package = simple_entry "Package"

To factorize even further we can define the simple fields for the binary package:

 let simple_bin_keyword = "Package" | "Architecture" |  "Section"
    | "Priority" | "Essential" | "Homepage" 
 let simple_bin_entry = simple_entry simple_bin_keyword 

Likewise, the source package parser will become:

 let simple_src_keyword = "Source" | "Section" | "Priority"
    | "Standards-Version" | "Homepage"
 let simple_src_entry = simple_entry simple_src_keyword

Since the test calls the lens by its name, the test must be changed to:

  test Debctrl.simple_src_entry get source =
    { "Source" = "libtest-distmanifest-perl" }

Maintainers: a not so simple entry

Looks like "simple_entry" lens is working well. Let's apply it to the maintainer field:

  test (Debctrl.simple_entry Debctrl.simple_src_keyword ) get 
  "Maintainer: Debian Perl Group <pkg-perl-maintainers@xxx>\n"
   = {  "Maintainer" = "Debian Perl Group <pkg-perl-maintainers@xxx>"
     }

Rats, augparse complains:

 Error encountered here (0 characters into string)
                         <|=|Maintainer: Debian Perl Grou>

Note: Contrary to other generated parsers like Perl's Parse::RecDescent, Augeas either succeeds completely or parse nothing. The simple entry lens fails because the Maintainer fields contains white space.

ok let's set authorize white space and tabs in the simple entry value:

 let value =  store /[^\n]+/ (* /[^ \t][^\n]+/*) in ...

Rats, another error:

 debctrl.aug:11.5-.26:exception: ambiguous concatenation
      'Package: A' can be split into
      'Package:|=| A'
     and
      'Package: |=|A'

Even before parsing the test file, Augeas complains: any white space between the keyword ("Package:") and the value ("A") can be part of the left lens or the right lens.

Here, the solution is to specify that the value must begin with a non white space character:

   let value =  store /[^ \t][^\n]+/ in ...

Note that the value will not contain trailing white spaces. Contrary to Perl regexp, one cannot specify a non-greedy quantifier ("+"), any white space is gobbled until the next lens. And the next lens cannot be "eol" but only 'del "\n" "\n"':

 let simple_entry (k:regexp) =
   let value =  store /[^ \t][^\n]+/ in
   [ key k . colon . value . hardeol ]

Parsing the dependency list (first try)

The dependency list is a tough nut to crack:

  • dependencies are separated by commas
  • newlines don't matter if they are followed by a space
  • dependencies can be or'ed with "|"
  • dependencies have a optional field to specify constraints regarding package version (e.g. ">= perl 5.8.8-12" )
  • dependencies have an optional field to specify compatible (or not) arch (e.g. "[i386]" or "[!amd64]")
The package and version information are stored in 3 nodes:
  • name (package name)
  • version (optional)
  • arch (optional_
version is divided in
  • relation (can be >=,< = ...)
  • number
arch in:
  • prefix ( "!" or "" )
  • name
Let's parse the version field:
let version_depends = 
    [ label "version"  
     . [   del_opt_ws . del /\(/ "(" . label "relation"
         . del_opt_ws . store /[<>=]+/ ]
     . [   del_opt_ws . label "number" . store /[a-zA-Z0-9_\.\-]+/ 
         . del_opt_ws . del /\)/ ")" ] 
    ]

and the test:

test Debctrl.version_depends get "( >= 5.8.8-12 )" = 
   { "version" { "relation"  = ">=" } { "number"  = "5.8.8-12" } }

Parsing the arch won't be detailed as it is similar to version parsing.

Here's the package deprendency lens :

let package_depends 
  =  [ label "name" . store /[a-zA-Z0-9_\-]+/ ]
   . ( version_depends | arch_depends ) *

This lens is tested with :

let p_depends_test = "perl ( >= 5.8.8-12 ) [ !hurd-i386]"

test Debctrl.package_depends get p_depends_test =
   { "name" = "perl" }
   { "version"
               { "relation"  = ">=" }
               { "number"  = "5.8.8-12" } }
   { "arch" { "prefix"  = "!" } { "name"  = "hurd-i386" } }

A complete package dependency with the "or" can be parsed with a simple sequence. Having several packages in this sequence implies a "or" :

(* "counter" is required to reset the counter that list the alternate dependencies *)
let dependency = [ counter "dep" . seq "dep" . package_depends ]
                  . [ del_opt_ws . seq "dep" . del /\|/ "|"
                      .  del_opt_ws . package_depends ] * 

And the test:

let dependency_test = "perl-modules (>= 5.10) | libmodule-build-perl"
test Debctrl.dependency get dependency_test = 
  {
    "1" { "name" = "perl-modules" }
        { "version" { "relation"  = ">=" } { "number"  = "5.10" } } }
  {
    "2" { "name" = "libmodule-build-perl" } } 

Now, let's tackle the dependency list. Since the multi line list is similar to the "Uploaders" list, one might want to reuse the "multi_line_array_entry" lens. But this one was dedicated to find list of emails.

Now it's time to add a second parameter to this lens so it can deal with email and package dependencies (of course, this has an impact of the lens that currently use "multi_line_array_entry"):

 (* k and v are the lens parameters *)
 let multi_line_array_entry (k:string) (v:lens) =
    [ key k . colon . [ seq k .  v ] . [ seq k . sep_comma_with_nl . v ]* . eol ]

The dependency lens is :

 let dependency_list (field:string) = 
    multi_line_array_entry field dependency 

The field parameter is necessary because the lens will be used for several types of dependencies

And the test is :

 test (Debctrl.dependency_list "Build-Depends-Indep") get 
  "Build-Depends-Indep: perl (>= 5.8.8-12), libcarp-assert-more-perl,\n"
  . "      libconfig-tiny-perl\n"
  = { "Build-Depends-Indep"
       {  "1" { "1" { "name" = "perl" }
                    { "version"
                    { "relation"  = ">=" }
                    { "number"  = "5.8.8-12" } } } }
       {  "2" { "1" { "name" = "libcarp-assert-more-perl" } } }
       {  "3" { "1" { "name" = "libconfig-tiny-perl" } }}
    }

Great, but the resulting tree is ugly and might be confusing to the user who might not know the relation between the package: the "and" and "or" relations are not explicit. As a developer, you will probably pay such a mistake with tons of docs or FAQs ;-)

Parsing the dependency list (second try)

So the "and" and "or" must be explicit. Let's try this new lense:

let dependency = [ label "or" . package_depends ]
               . [ label "or" . del / *\| */ " | "
                   . package_depends ] *

let dependency_list (field:regexp) = 
    [ key field . colon . [ label "and" .  dependency ]
      . [ label "and" . sep_comma_with_nl . dependency ]*
      . eol ]

And the resulting tree:

test (Debctrl.dependency_list "Build-Depends-Indep") get 
  "Build-Depends-Indep: perl (>= 5.8.8-12) [ !hurd-i386], \n"
  . "   perl-modules (>= 5.10) | libmodule-build-perl,\n"
  . "   libconfig-tiny-perl\n"
  = { "Build-Depends-Indep"
       { "and" { "or" { "perl" 
                        { "version"
                          { "relation"  = ">=" }
                          { "number"  = "5.8.8-12" } }
                        { "arch" 
                          { "prefix"  = "!" } 
                          { "name"  = "hurd-i386" } } } } }
       { "and" { "or" { "perl-modules" 
                        { "version" { "relation"  = ">=" }  
                                    { "number"  = "5.10" } } } }
               { "or" { "libmodule-build-perl" } } }
       { "and" { "or" { "libconfig-tiny-perl" } } } }

The resulting tree may be a little lispish, but the relation is now explicit. The lens with "name" label was also replaced by a lens which use the package name as "key"

Parsing the control paragraphs

The main paragraphs of fields of the control file are separated by blank lines. The lens must return a list of paragraph that will contain the fields.

Let's parse these paragraph with this test:

 let simple_bin_pkg = "Package: libconfig-model-perl\n"
     . "Architecture: all\n"

 let paragraph_simple = source . uploaders . "\n" 
       . simple_bin_pkg 

 test Debctrl.lns get paragraph_simple =
   { "srcpkg"   { "Source" = "libtest-distmanifest-perl" }
                { "Uploaders"
                  { "1" = "foo@bar"}
                  { "2" = "Dominique Dumont <dominique.dumont@xx.yyy>" }
                  { "3" = "gregor herrmann <gregoa@xxx.yy>" } } }
   { "binpkg" { "Package" = "libconfig-model-perl" }
                    { "Architecture" = "all" } }
   { "binpkg" { "Package" = "libconfig-model2-perl" }
                    { "Architecture" = "all" } } 

To separate the different paragraph, we'll use a label lens so the source package paragraph is clearly separated in the tree from the binary package paragraphs.

Now, the top lenses is modified to take the paragraphs into account:

  let lns =  [ label "srcpkg" . ( uploaders | simple_src_entry )* ] 
   .  [ label "binpkg" . del "\n" "\n" . simple_bin_entry* ]*

Description: A multi line field

This field is a multi-line field that contains:

  • a summary on the same line as the Description
  • a multi-line description ended by a blank line or a line that does not begin with a space
So this lens must be divided in 2 parts that match the Description content:
let hardeol = del "\n" "\n"

(* store any line that begins with ' ' *)
let multi_line_entry (k:string) =
     let line = /[^\n]+/ in
      [ label k .  del /^ / " " .  store line . hardeol ] *  

(* Description will contains 2 nodes: summary and text *)
let description 
  = [ key "Description" . colon 
     . [ label "summary" . store /[a-zA-Z][^\n]+/ . hardeol ]
     . [ multi_line_entry "text"  ] ]

Here the matching test:

 let description = "Description: describe and edit configuration data\n"
 ." Config::Model enables [...] must:\n"
 ."    - if the configuration data\n"
 ." .\n"
 ." With the elements above, (...) on ReadLine.\n"

 test Debctrl.description get description = 
  { "Description" 
    { "summary" = "describe and edit configuration data" }
    { "text" = "Config::Model enables [...] must:" }
    { "text" = "   - if the configuration data" }
    { "text" = "." }            
    { "text" = "With the elements above, (...) on ReadLine."} }
 

Note that the author could not find any way to replace the single '.' with a blank line. Suggestions are welcome.

Putting it all together

Drum rolls, please ...

The source paragraph lens is:

let uploaders  =
    multi_line_array_entry /Uploaders/ email
 
let simple_src_keyword = "Source" | "Section" | "Priority" 
    | "Standards\-Version" | "Homepage" | /Vcs\-Svn/ | /Vcs\-Browser/
    | "Maintainer"
let depend_src_keywords = "Build-Depends" | "Build-Depends-Indep"

let src_entries = (   simple_entry simple_src_keyword 
                    | uploaders 
                    | dependency_list depend_src_keywords ) *

The binary paragraph lens is:

let simple_bin_keywords = "Package" | "Architecture" 
let depend_bin_keywords = "Depends" | "Recommends" | "Suggests"

let bin_entries = ( simple_entry simple_bin_keywords
                  | dependency_list depend_bin_keywords
                  ) + . description

And the final lens is:

let lns =  [ label "srcpkg" .  src_entries  ] 
        .  [ label "binpkg" . hardeol+ . bin_entries ]+
        . eol*

(* lens must be used with AUG_ROOT set to debian package source directory *)
let xfm = transform lns (incl "/control")

Write or "put" tests

Having a successful parser does not mean that you are out of the woods. Some subtle bugs are often discovered during put tests.

Put tests are also useful to check how the file is written when starting from scratch.

First tests that your lenses does not break the file:

 test Debctrl.src_entries put uploaders  
     after set "/Uploaders/1" "foo@bar"
   =  uploaders

Then that additions are correctly handled:

 test Debctrl.src_entries put uploaders  
     after set "/Uploaders/4" "baz@bar"
   =  "Uploaders: foo@bar, Dominique Dumont <dominique.dumont@xx.yyy>,\n"
   . "  gregor herrmann <gregoa@xxx.yy>,\n"
   . " baz@bar\n"

And last but not least, test from a minimal file:

test Debctrl.lns put (source."\nPackage: test\nDescription: foobar\n")
  after
  set "/srcpkg/Uploaders/1" "foo@bar" ;
  set "/srcpkg/Uploaders/2" "Dominique Dumont <dominique.dumont@xx.yyy>" ;
  set "/srcpkg/Uploaders/3" "gregor herrmann <gregoa@xxx.yy>" ;
  set "/srcpkg/Build-Depends-Indep/and[1]/or/perl/version/relation" ">=" ;
  set "/srcpkg/Build-Depends-Indep/and[1]/or/perl/version/number" "5.8.8-12" ;
  set "/srcpkg/Build-Depends-Indep/and[1]/or/perl/arch/prefix" "!" ;
  set "/srcpkg/Build-Depends-Indep/and[1]/or/perl/arch/name" "hurd-i386" ;
  set "/srcpkg/Build-Depends-Indep/and[2]/or[1]/perl-modules/version/relation" ">=" ;
  set "/srcpkg/Build-Depends-Indep/and[2]/or[1]/perl-modules/version/number" "5.10" ;
  set "/srcpkg/Build-Depends-Indep/and[2]/or[2]/libmodule-build-perl" "";
  set "/srcpkg/Build-Depends-Indep/and[3]/or/libcarp-assert-more-perl" "" ;
  set "/srcpkg/Build-Depends-Indep/and[4]/or/libconfig-tiny-perl" "" ;
  set "/binpkg[1]/Package" "libconfig-model-perl"  ; 
  (* must remove description because set cannot insert Architecture before Description *)
  rm  "/binpkg[1]/Description" ;
  set "/binpkg/Architecture" "all"  ;
  set "/binpkg[1]/Description/summary" "dummy1" ;
  set "/binpkg[1]/Description/text" "dummy text 1" ;
  set "/binpkg[2]/Package" "libconfig-model2-perl" ;
  set "/binpkg[2]/Architecture" "all" ;
  set "/binpkg[2]/Description/summary" "dummy2" ;
  set "/binpkg[2]/Description/text" "dummy text 2" 
  =  
"Source: libtest-distmanifest-perl
Uploaders: foo@bar,
 Dominique Dumont <dominique.dumont@xx.yyy>,
 gregor herrmann <gregoa@xxx.yy>
Build-Depends-Indep: perl ( >= 5.8.8-12 ) [ !hurd-i386 ],
 perl-modules ( >= 5.10 ) | libmodule-build-perl,
 libcarp-assert-more-perl,
 libconfig-tiny-perl

Package: libconfig-model-perl
Architecture: all
Description: dummy1
 dummy text 1

Package: libconfig-model2-perl
Architecture: all
Description: dummy2
 dummy text 2
"

Conclusion

This lens was developed for Google Summer of Code 2009 by the project's mentor. This was done to show that an alternative solution based on Augeas was possible.

In order to kill 2 birds with one stone, this wiki page was created to show how to create a lens from bottom to top. So that every one can benefit from the explanations I would have to give to my student ;-)

Anyway, I hope that the test strategy developed here will be useful.

Here are the complete lens and test from SVN