Skip to content
hzhou edited this page Mar 8, 2012 · 31 revisions

In this example, I would like to bring the topic of literate programming. The essential idea of literate programming is one writes program in the best way that human can understand, rather than the best way computer can understand. So far, it seems there are few people who really appreciates literate programming. And among those who appreciate it, there are very different takeaways. One of the differentiating question is, what is the best way to write a program for human to understand? The examples set by the original mind Donald Knuth in his TeX and Metafont are to write in extremely verbose fashion, more as writing a book. And in one of the advocates, Tim Daly's words: "The best programming language is English. Everything else is notation." However, I, who also think myself as an advocator of literate programming, would beg to differ. In my opinion, the best way to communicate a program is through code, just as the best way to communicate math idea is through math. You remember the pseudo codes in the text book? They are there because that is the best way to communicate. The difference between pseudo code and actual code is, the actual computer code had too much noises. Often, the variables, functions, structures has to be declared first, and portions of code that is key to certain ideas has to be surrounded by certain contexts of scoped variables and stacks. These noises are necessary for current compilers to understand but they prevents the effective communication to human. Human are not well equipped to track those noises.

Until better programming system be developed, I will try to use MyDef to help a little. Here I would like to demonstrate programming C in MyDef using the example from this page: http://axiom-developer.org/axiom-website/litprog.html

The idea of the program is to extract code blocks embeded in a html document between <pre id="name">...</pre>. Again, in my opinion, the best way illustrate the idea is through code. Here is the equivalent in perl:

    my ($file, $name)=@ARGV;

    open In, $file or die "Can't open $file\n";
    my @all_lines=<In>;
    close In;

    getchunk($name);

    # ------------------------
    sub getchunk {
        my $name=shift;
        my $flag_begin_dump;
        foreach my $l(@all_lines){
            if(!$flag_begin_dump and $l=~/^<pre id="$name">/){
                $flag_begin_dump=1;
            }
            elsif($flag_begin_dump){
                if($l=~/^<\/pre>/){
                    $flag_begin_dump=0;
                }
                elsif($l=~/^&lt;getchunk id="([^"]+)"&gt;/){
                    getchunk($1);
                }
                else{
                    $l=~s/&lt;/</g;
                    $l=~s/&gt;/>/g;
                    print $l;
                }
            }
        }
    }

Perl has a few noises such as ugly brackets, but for such a simple programs, I really think reading the code is easier than trying to describe it in English.

C is a much lower level programming language. To achieve the same function in C requires more plumbing or noises. The code in Tim Daly's example is 159 lines long, compared to perl's 30 lines (20 if eliminate brackets). With MyDef, I attempt to make the C code more readable.

    page: tangle
        subcode: main
            $list n_main, getchunk, nextline

I always like to start with outline. Here, we have 3 functions in total. This is much less than the function counts in Tim's example. Functions in C is used when certain block of code will be called many times. But it is often used to split the code into smaller pieces just for human to digest. However, functions in C (and most other languages as well) have cost. Each function need define its interface or context and this interface or context has to be carried consistently at both definition and invocations (and declarations) -- all contribute to noises that prevent us to understand. With MyDef, we can refactor codes with macros that lives outside the compilers so we don't really need that many functions defined anymore. Here, getchunk is needed for recursion, nextline gets called multiple times and has its own local scope variables that I would like to isolate from the rest, both are valid reasons for use of functions.

The complexities arises from branches and all the functions are point of branching (as they may be called upon multiple times from multiple places). But within a single function is a single story line. Human have no trouble following single story line no matter how long it may be. Seeing this program only uses three functions gives us the idea of overall complexity of the program, that calms us down. In human communications, a starting mood is also very important.

    fncode: n_main
        $if argc != 3
            perror("Usage: tangle filename chunkname");
            exit(-1);
        $else
            (s_filename, s_chunkname)=(argv[1], argv[2])

        $global s_buffer, n_bufsize
        &call read_file
            getchunk(s_chunkname, strlen(s_chunkname));

Next, the main function. Even though we haven't read the code read_file and getchunk yet, with its choice of names, I hope intelligent human can guess its details. So by here, we got the whole story, with a few details to be filled out. One of design goal of MyDef is to fight feature creeping. Even without the rest of def file, by supplying empty stub, the code should compile and run. The basic skeleton frame is one of the main ideas of a program. I consider the ability to isolate functional version individual unit of ideas important.

Note that I have declaration of two global variables right above &call read_file. I could move that declaration inside the read_file macro, however, I felt it is best to be placed in the main code as I think they belong to part of the main story line. Different programers certainly may choose differently and that is the point: the author write their story in the best way they deems -- with MyDef, the computers don't mind.

    # /* find the named chunk and dump it, RECURSE when necessary. */
    fncode: getchunk(s_chunkname, n_chunklen)
        n_pos=0
        &call foreach_line
            $if s_buffer[n_pos]=~/<pre id="$s_chunkname(n_chunklen)">/
                $call skip_line
                &call foreach_line
                    $if s_buffer[n_pos]=~/&lt;getchunk id="/
                        ### RECURSE HERE ################# Do whatever that best communicates
                        getchunk(s_buffer+n_pos+17, n_linelen-23)
                    $elif s_buffer[n_pos]=~/<\/pre>/
                        break
                    $else
                        $call printline

    subcode: printline
        i=0
        $while i<n_linelen
            $if s_buffer[n_pos+i]=~/&lt;/
                putchar('<')
                i += 4
            $elif s_buffer[n_pos+i]=~/&gt;/
                putchar('>')
                i += 4
            $else
                putchar(s_buffer[n_pos+i])
                i++

This is pretty much the same as the perl routines. The C code is still noisier, which is unfortunate, but I think MyDef helped a lot. If we desire, we could build more higher level handling in MyDef by either extend perl or simply build def library, and then we may achieve even less noisier code.

Also notice with macro system, how easily we can break code into smaller pieces and move around without worry about contexts. Certainly if the contexts gets mixed up, the code will break. But as the intelligent human programmer, we know what we are doing.

    # ---- simple utility routines ----
    subcode: read_file
        n_fd = open(s_filename, O_RDONLY);
        $if  n_fd == -1
            perror("Error opening file for reading");
            exit(-2);
        $local struct stat filestat
        $if  fstat(n_fd,&filestat) < 0
            perror("Error getting input file size");
            exit(-3);

        n_bufsize = (int)filestat.st_size;
        s_buffer = (char *)malloc(n_bufsize);
        read(n_fd,s_buffer,n_bufsize);

        BLOCK
        close(n_fd)

    subcode: foreach_line
        n_linelen = nextline(n_pos)
        $while n_linelen != -1
            BLOCK
            $call skip_line
            n_linelen = nextline(n_pos)

    subcode: skip_line
        n_pos=n_pos+n_linelen

    #/* return the length of the next line */
    fncode: nextline(n_pos)
        $if n_pos >= n_bufsize
            return -1
        i = 0
        $while (n_pos+i<n_bufsize) && (s_buffer[n_pos+i] != '\n')
            i++
        return i+1

The rest of the simple code. Most of them a human reader can infer its details. Therefore when time is short, omitting them does not prevent the reader to grasp the main ideas of the program. When some of these codes does not look trivia to some programmers, with MyDef, one could easily move those pieces to any prominant places, to the main story line if the author feels it is necessary to grasp before move on, and it can be moved into lesser include file or part of the library once the author find it well understood -- again, you write the code in the best way you need to understand.

Now we had every pieces, save to tangle.def, run mydef_make.pl then make, it will create the tangle.c that is listed below. Notice how some of those plumbing are handled by MyDef. Readers who read Tim Daly's page may note that this C code is different from Tim's example. Actually, I started with Tim's original C code, then moved it into MyDef. Once it is in MyDef, it is easy to try move individual blocks here and there. With MyDef, it is so easy to refactor a code that it is hard to not to.

I noticed some of the strings in the following c code doesn't come out right. I will fix it when I understand how mark-down for this wiki works.

    #include <string.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/stat.h>
    #include <fcntl.h>

    void getchunk(char * s_chunkname, int n_chunklen);
    int nextline(int n_pos);

    char * s_buffer;
    int n_bufsize;


    /**** END GLOBAL INIT ****/
    int main(int argc, char** argv){
        char * s_filename;
        char * s_chunkname;
        int n_fd;
        struct stat filestat;

        if(argc != 3){
            perror("Usage: tangle filename chunkname");
            exit(-1);
        }
        else{
            s_filename = argv[1];
            s_chunkname =  argv[2];
        }
        n_fd = open(s_filename, O_RDONLY);
        if(n_fd == -1){
            perror("Error opening file for reading");
            exit(-2);
        }
        if(fstat(n_fd,&filestat) < 0){
            perror("Error getting input file size");
            exit(-3);
        }
        n_bufsize = (int)filestat.st_size;
        s_buffer = (char *)malloc(n_bufsize);
        read(n_fd,s_buffer,n_bufsize);
        getchunk(s_chunkname, strlen(s_chunkname));
        close(n_fd);
        return 0;
    }

    void getchunk(char * s_chunkname, int n_chunklen){
        int n_pos;
        int n_linelen;
        int i;

        n_pos = 0;
        n_linelen = nextline(n_pos);
        while(n_linelen != -1){
            if(strncmp(s_buffer+n_pos, "<pre id=\"", 9)==0 && strncmp(s_buffer+n_pos+9, s_chunkname, n_chunklen)==0 && s_buffer[n_pos+n_chunklen+9]=='\"' && s_buffer[n_pos+n_chunklen+10]=='>'){
                n_pos = n_pos+n_linelen;
                n_linelen = nextline(n_pos);
                while(n_linelen != -1){
                    if(strncmp(s_buffer+n_pos, "&lt;getchunk id=\"", 17)==0){
                        getchunk(s_buffer+n_pos+17, n_linelen-23);
                    }
                    else if(strncmp(s_buffer+n_pos, "</pre>", 6)==0){
                        break;
                    }
                    else{
                        i = 0;
                        while(i<n_linelen){
                            if(strncmp(s_buffer+n_pos+i, "&lt;", 4)==0){
                                putchar('<');
                                i += 4;
                            }
                            else if(strncmp(s_buffer+n_pos+i, "&gt;", 4)==0){
                                putchar('>');
                                i += 4;
                            }
                            else{
                                putchar(s_buffer[n_pos+i]);
                                i++;
                            }
                        }
                    }
                    n_pos = n_pos+n_linelen;
                    n_linelen = nextline(n_pos);
                }
            }
            n_pos = n_pos+n_linelen;
            n_linelen = nextline(n_pos);
        }
    }

    int nextline(int n_pos){
        int i;

        if(n_pos >= n_bufsize){
            return -1;
        }
        i = 0;
        while((n_pos+i<n_bufsize) && (s_buffer[n_pos+i] != '\n')){
            i++;
        }
        return i+1;
    }
Clone this wiki locally