Skip to content

File System Query. A Python library to deal with the file system that looks like JQuery

License

Notifications You must be signed in to change notification settings

interstar/FSQuery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FSQuery (File System Query)

Working with the file system in Python isn't exactly difficult.

But it is verbose. And a low level way of talking to a complex tree of directories and files of different types.

FSQuery is a tiny Python library inspired by JQuery designed to let you query and manipulate a tree in the file system the way JQuery lets you query and manipulate the DOM.

Quickstart

Install from PyPI

pip install fsquery 

On PyPI

Things you can do with FSQuery

Create a query starting at a path and list it

    from fsquery import FSQuery

    fsq = FSQuery(path)
    for n in fsq :
        print(n)

When you iterate through an FSQuery, you get objects of class FSNode back. FSNodes have a useful .abs property which is the full path name of the file or directory they represent.

Find files that match a particular criteria

    fsq = FSQuery(path).Match(".js$")
    for n in fsq :
        print(n.abs)

Note that FSQuery matches using an ordinary regex, so you can use regex elements in it. But be careful not to put bad regex in, it will crash.

Add a FileOnly() call, to constrain the output to show only files (not directories)

    fsq = FSQuery(path).Match(".js$").FileOnly()  
    for n in fsq :
        print(n.abs)

You can also explicitly match just file extensions.

    fsq = FSQuery(path).Ext("css").FileOnly()  
    for n in fsq :
        print(n.abs)

Filter out a directory you aren't interested in, using NoFollow.

    fsq = FSQuery(path).Match(".js$").NoFollow("vendor").FileOnly()  
    for n in fsq :
        print(n.abs)

Finally, of course, it's very useful (though somewhat slow) to be able to look inside the files and see what they contain.

    fsq = FSQuery(path).NoFollow("vendor").Ext("py").Contains("GNU Lesser General Public License").FileOnly()
    for n in fsq :
        print(n.abs)

Note that you can add as many NoFollows and Matches and Contains as you like. NoFollows are filters that are only applied to directories. Matches and Contains are filters that are only applied to files. Note also that, although you have added a FileOnly() to a query this doesn't affect the pattern matches on the directory names.

Note also that if you add two Ext() constraints these will fight (no file can have two extensions at the same time) and your query will return nothing.

Command Liners

The FSQuery object has a pp() method which actually does the looping and printing for you.

This makes FSQuery useful for one-liners run directly on the command line like this.

    python -c 'from fsquery import FSQuery; FSQuery(".").Ext("html").FileOnly().pp()'

FSNodes

FSNode.changed() returns a formatted time-stamp of when the file was last changed. (Uses os.path.getmtime)

FSNode.open_file() will, if the FSNode is a file, open it for reading and return a file handle. It throws an exception if the FSNode is a directory.

Eg.

    fsq = FSQuery(path).NoFollow("vendor").Ext("py").FileOnly()
    for n in fsq :
        print(n.abs)
        for line in n.open_file() :
            print(line.rstrip())
        print("______________")

FSNode.children() will return a list of the immediate children of the node. (As new FSNodes)

FSNode.spawn_query() creates a new FSQuery, starting at this FSNode.

A file tree Processor

There's another way to process all the nodes in the query.

You can create a custom class to process the nodes, which has three methods : process_dir, process_file and quit and then pass it to an FSQuery object's walk_each method.

For example :

    class Processor :
        def process_dir(self,fsNode) :
            print("this is a dir : %s" % fsNode)
        def process_file(self,fsNode) :
            print("this is a file : %s" % fsNode)
        def quit(self,fsNode,depth) :
            return False

    fsq.walk_each(Processor())

This is potentially useful if you want to process files and directories differently (so you can't use FilesOnly or DirOnly in the query). And might be more elegant in certain situations. You can reuse the same Processor with different queries.

Note that quit returns a boolean. By default it should be False. But return True if this node meets some criterion for abandoning the descent at this point. For example, if you've hit a maximum depth or because you have found a file you are searching for.

Shadowing

A more radical, and very useful thing that you often want to do with directories of files is to make some kind of derivative "copy" or transformation of them. Here we use FSQuery's shadow functionality.

shadow, like walk_each takes a custom processor class, and runs it through a query result.

However, when shadowing, FSQuery is not just iterating through the original query result, is is also building up an isomorphic, shadow result that is hooked on to a different root. At each step of the iteration, your Processor class is presented with the original FSNode from the original location, and a "shadow node", at an identical relative location, but from a different root.

Here's a simple example :

Say we have a simple directory structure hanging under o

> find o
o
o/a
o/a/c
o/a/c/hello.txt
o/b
o/b/hello.txt

We now run the following example code :

    class Shadower :
    
        def process_dir(self,node,shadow_node) :
            print(node.abs)
            print(shadow_node.abs)
            print()
            shadow_node.mk_dir()
            
        def process_file(self,node,shadow_node) :
            pass
                        
    fsq = FSQuery("o")
    fsq.shadow("x",Shadower())

The Shadower has two methods process_dir and process_file to handle directories and files directly. In this example process_file does nothing.

Note that these methods now take two arguments : the original node, and a new shadow_node created by the shadow functionality of FSQuery. When we call the shadow method of FSQuery, you see we give it two arguments. A new root directory, in this case, x, and an instance of the Shadower class.

Here's the result.

    PATH/o
    PATH/x
    
    PATH/o/a
    PATH/x/a
    
    PATH/o/a/c
    PATH/x/a/c
    
    PATH/o/b
    PATH/x/b

Here you see that the the paths in the shadow_node are the same as the path of the original node, except the root of the query o is now replaced by the alternative root x

We also called the method mk_dir on each shadow node, which conveniently created the appropriate directory under the new root x

    > find x
    x
    x/a
    x/a/c
    x/b

You'll see that this has successfully cloned the directory structure matching the query. It hasn't, of course, copied the files. Let's do that :

    class Shadower :
    
        def process_dir(self,node,shadow_node) :
            print(node.abs)
            print(shadow_node.abs)
            print()
            shadow_node.mk_dir()
            
        def process_file(self,node,shadow_node) :
            content = node.open_file().read()
            shadow_node.write_file(shadow_node.abs,content)
                        
    fsq = FSQuery("o")
    fsq.shadow("x",Shadower())

We use two more convenience functions on FSNode, open_file to open the file represented by the node. And write_file to write to a file of that name. Now we see we've successfully cloned the directories.

    > find x
    x
    x/a
    x/a/c
    x/a/c/hello.txt
    x/b
    x/b/hello.txt

Of course, shadow is running through a standard FSQuery result, so if we had restricted that result, using our usual NoFollow, Match, or even Contains queries, we would be processing only that query result.

And, of course, you don't need to limit yourself to just copying one directory. Depending on your Shadower class, this can be used for a wide variety of functionality.

Note that currently shadow does not support a quit style function which terminates shadowing early. Nor does a Shadower class need a quit function. This functionality may be added in future.

Known Issues

Currently FSQuery ignores symbolic links.

About

File System Query. A Python library to deal with the file system that looks like JQuery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages