Skip to content
/ lesspipe Public
forked from wofr06/lesspipe

lesspipe (formerly on sourceforge)

License

Notifications You must be signed in to change notification settings

0xACE/lesspipe

 
 

Repository files navigation

lesspipe.sh, a preprocessor for less
====================================

Version: 1.87
Author : Wolfgang Friebel, DESY (Wolfgang.Friebel AT desy.de)
License: GPL

Latest version available from:
 https://github.com/wofr06/lesspipe.sh/archive/lesspipe.zip
 https://github.com/wofr06/lesspipe (git repository)

The development version can be cloned using git:
 git clone https://github.com/wofr06/lesspipe.git
To report bugs or make proposals to improve lesspipe please contact
the author by email.

Contents
========

 0.  Motivation
 1.  Introduction
 2.  Usage
 3.  Required programs
 4.  Supported file formats
 4.1   Supported compression methods
 4.2   List of preprocessed file types
 4.3   Conversion of files with alternate character encoding
 5.  Syntax highlighting
 5.1.1   List of supported languages
 5.1.2   Syntax highlighting alternatives
 5.2   Colored directory listing
 5.3   Colored listing of tar file contents
 6.  Displaying files with special characters in the file name
 7.  Examples
 8.  Other documentation about lesspipe
 9.  External links
 9.1   URLs to some utilities
 9.2   References
 10. Contributors

0. Motivation
===============

 If you use

 - the pager `less` in the command line,
 - the version control system `git`,
 - the text editor `Vim` or
 - the mail client `mutt`,

 then lesspipe.sh enables these programs to *read* non-text files, such as:

 - PDFs,
 - (Microsoft or LibreOffice) Office documents, or even
 - media (such as [JPG or PNG] images, [MP3] audio or video) files

 where *read* means,

 - (format and) show the contained text (of a document or tag in a media file),
   or
 - show sensible file information (such as length of the video).

 To enable `less` respectively `git`, `Vim` or `mutt` to read non-text files by
 lesspipe.sh, see

 - Section 2 on the Usage of lesspipe.sh, respectively
 - the Wiki at https://github.com/wofr06/lesspipe/wiki

  For the text and info extraction, lesspipe.sh will depend on external tools,
  but many use cases are covered by an installation of

  - LibreOffice and a common text browser (such as `lynx`),
  - pandoc,
  - pdftotext, and
  - mediainfo (or exiftool).

1. Introduction
===============

 To browse files under UNIX the excellent viewer less [1] can be used. By
 setting the environment variable LESSOPEN, less can be enhanced by external
 filters to become even more powerful. Most Linux distributions come already
 with a "lesspipe.sh" that covers the most common situations.

 The input filter for less described here is called "lesspipe.sh". It is able
 to process a wide variety of file formats. It enables users to deeply inspect
 archives and to display the contents of files in archives without having to
 unpack them before. That means file contents can be properly interpreted even
 if the files are compressed and contained in a hierarchy of archives (often
 found in RPM or DEB archives containing source tarballs). The filter is easily
 extensible for new formats.

 The input filter which is also called "lesspipe.sh" is written in a ksh
 compatible language (ksh, bash, zsh) as one of these is nearly always installed
 on UNIX systems and uses comparably few resources. Otherwise an implementation
 in perl for example would have been somewhat simpler to code. The code looks
 less clean than it could as it was tried to make the script compatible with
 a number of old shells and applications especially found on non Linux systems.

 The filter does different things depending on the file format. In most cases
 it is determined on the output of the "file" command [2], [6], that recognizes
 lots of formats. Only in a few cases the file suffix is used to determine what
 to display. Up to date file descriptions are included in the "file" package.
 Maintaining a list of file formats is therefore only a matter of keeping that
 package up to date.

2. Usage
========

 (see also the man page lesspipe.1)

 To activate lesspipe.sh the environment variable LESSOPEN has to be defined
 in the following way:

 LESSOPEN="|lesspipe.sh %s"; export LESSOPEN	(sh like shells)
 setenv LESSOPEN "|lesspipe.sh %s"		(csh, tcsh)

 If lesspipe.sh is not in the UNIX search path or if the wrong lesspipe.sh is
 found in the search path, then the full path to lesspipe.sh should be given
 in the above commands.

 As lesspipe.sh is accepting only a single argument, a hierarchical list of file
 names has to be separated by a non blank character. A colon is rarely found
 in file names, therefore it has been chosen as the separator character. If a
 file name does however contain at least one isolated colon, the equal sign =
 can be used as an alternate separator character. In that case the = character
 has to reoccur as the last character of the argument. At each stage in
 extracting files from such a hierarchy the file type is determined. This
 guarantees a correct processing and display at each stage of the filtering.

 To view files in multifile archives the following command can be used:
	less archive_file:contained_file      or
        less archive_file=contained_file=     (with = as separator)
 This can be used to extract single files from a multifile archive:
	less archive_file:contained_file > extracted_file
 For extracting files less is not required, that can be done also using:
	lesspipe.sh archive_file:contained_file > extracted_file
 Even a file in a multifile archive that itself is contained in yet
 another archive can be viewed this way:
	less super_archive:archive_file:contained_file

 The script is able to extract files up to a depth of 6 where applying a
 decompression algorithm counts as a separate level. In a few rare cases the
 file command does not recognize the correct format (especially with nroff).
 In such cases the filtering can be suppressed by a trailing colon on the file
 name.

 Display the last file in the file1:..:fileN chain in raw format:
 Suppress input filtering:	less file1:..:fileN:  (append 1 colon)
 Suppress decompression:	less file1:..:fileN:: (append 2 colons)

 Suppress syntax highlighting:	less file1:..:fileN:  (append 1 colon)
 Syntax highlighting (see below) is only tried if less is called with -r or -R
 and highlighting support was requested when generating lesspipe.sh !!!

 Several environment variables can influence the behavior of lesspipe.sh.

 LESSQUIET will suppress additional output not belonging to the file contents
 if set to a non empty value.

 LESS can be used to switch on colored less output (has to contain -r or -R).

 LESSCOLORIZER can contain a syntax highlighting program different from
 the code2color used by default (currently only pygmentize allowed).

 Code for using LESS_ADVANCED_PREPROCESSOR is optionally generated (configure).
 LESS_ADVANCED_PREPROCESSOR will switch on filtering methods for html, rtf, ps
 files and files with alternate character encoding, if this variable is set.
 Filtering these formats is also done if there is no LESS_ADVANCED_PREPROCESSOR
 support (then this string is not contained in lesspipe.sh). Otherwise these
 types of files will be shown unmodified.

3. Required programs
====================

 bash	or zsh or ksh (also pdksh, tested with version 5.2). Configure puts an
        appropriate first line in the script
 file	(a version with an up to date magic file) (GNU file 4.xx recommended)
 perl	(for configure, code2color and tarcolor, lesspipe.sh works without it)
 Standard UNIX programs like ar, cat, cut, dd, egrep, gzip, ln, ls, mkdir,
 rm, sed, strings, tar, tput and further programs for special formats.

4. Supported file formats
=========================

 Currently lesspipe.sh [3] supports the following compression methods
 and file types (i.e. the file contents gets transformed by lesspipe.sh):

4.1 Supported compression methods
---------------------------------
 gzip, compress, pack	requires gzip
 bzip2			requires bzip2
 zip			requires unzip
 rar			requires rar or unrar
 7-zip			requires 7za
 lzip			requires lzip
 lzma			requires lzma (limited support only)
 xz			requires xz
 zstd			requires zstd
 brotli			requires bro
 lz4			requires lz4

4.2 List of preprocessed file types
-----------------------------------
 tar		requires GNU tar and optionally tarcolor for coloring
 nroff(mandoc)	requires groff
 ar library	requires ar
 jar archive	requires fastjar or unzip
 rar archive	requires unrar or rar
 7-zip archive	requires 7za
 lzip archive	requires lzip
 shared library	requires nm
 executable	requires strings
 directory	displayed using ls -lA
 RPM		requires GNU cpio and rpm2cpio or rpmunpack, optionally rpm
 Microsoft Word < 2007	requires antiword or catdoc or libreoffice
 MS Powerpoint < 2007	requires ppthtml or libreoffice
 MS Excel < 2007	requires xlhtml or libreoffice
 Microsoft Word (docx) >= 2007	requires pandoc or doc2txt.pl or libreoffice
 MS Powerpoint (pptx) >= 2007	requires pptx2md or libreoffice
 MS Excel (xlsx) >= 2007	requires xlscat from the Perl module Spreadsheet::Read or excel2csv or libreoffice
 epub requires pandoc
 Debian 	requires ar, gzip and tar, shows more info if dpkg is installed
 html		requires html2text or elinks or links or lynx or w3m
 pdf		requires pdftotext or pdftohtml or pdfinfo
 perl		requires pod2text
 rtf		requires pandoc or libreoffice or unrtf
 dvi		requires dvi2tty
 djvu		requires djvutxt
 ps		  requires pstotext or ps2ascii (from the gs package)
 mp3, ogg		requires mediainfo or id3v2 or mp3info or mp3info2
 jpg, png, gif	requires identify
 mp4	  requires mediainfo
 iso images	requires isoinfo
 MacOSX archive	requires lsbom
 MacOS X bom	requires lsbom
 MacOS X plist	requires plutil
 cab		requires cabextract (version 1.0 or above)
 gpg encrypted	requires gpg
 perl storable	requires perl (and the perl modules Storable and Data::Dumper)
 perl pod       requires perldoc
 OASIS		Opendocument text documents (used for Openoffice, Libreoffice)
 		requires odt2txt or pandoc or libreoffice or unzip and o3tohtml or sxw2txt
 	  (distributed together with lesspipe)
 nc4		requires ncdump (NetCDF format)
 hdf5		requires h5dump (Hierarchical Data Format)
 crt, pem, csr, crl requires openssl

4.3 Conversion of files with alternate character encoding
---------------------------------------------------------
 If the file utility reports text containing ISO-8859, UTF-8 or UTF-16 encoded
 characters then the text will be transformed using iconv into the default
 encoding. This does assume iconv has the right default which can be wrong
 in some situations. It is checked if iconv would fail. Then the text is
 displayed unmodified.

4.4 File formats currently not supported
----------------------------------------
(code contributed but commented out)

 jpeg and pbm graphics files to be displayed in ASCII art. The ASCII art
 library works with overprinting that does not work properly within less.
 Therefore the resulting quality of the converted picture is not satisfactory.

 Display of video streams using mplayer with -aadriver (again ASCII art) is
 considered abuse of less and also commented out.

 looking at contents of DOS formatted disks by accessing the proper device file

5. Colorizing the output
========================

 ATTENTION: Syntax highlighting and other methods of colorizing the output
 is only activated if the environment variable LESS is existing and contains
 the option -R or -r or less is called with one of these options.

 This guarantees, that instead of literal escape sequences colors are
 displayed. The detection of the -r/-R presence at run time is rather
 dependent on the operating system and may not work in all cases.
 Putting the option in the LESS environment variable is guaranteed to
 work. By installing the perl module Proc::ProcessTable the OS dependence
 can be reduced as well.

 The display of wrapped long lines and moving backward in a file using the
 options -r/-R can give weird output. For an explanation see
 http://www.greenwoodsoftware.com/less/faq.html#dashr

5.1 Syntax highlighting
-----------------------
 Experimental support for syntax highlighting was added through a perl
 script 'code2color' which is derived from code2html [5].

 As syntax highlighting is rather resource intense it can be switched off by
 appending a colon after the file name if the output was colorful. If the
 wrong language was chosen for syntax highlighting then another one can be
 forced by appending a colon and a suffix to the file name as follows (assuming
 this is a file with perl syntax):

	less config_file:.pl

 That works as well to force the call of code2color for a given language.

5.1.1 List of supported languages (code2color)
----------------------------------------------
 Text files for the following languages can be highlighted using code2color:
 ada, asm, awk, c, c++, groff, html, xml, java, javascript, lisp, m4,
 make, pascal, patch, perl, povray, python, ruby, shellscript, sql
 The corresponding suffixes recognized by code2color are:
 .ada .asm .inc .awk .c .h .cpp .cxx .groff .html .php .xml .java .js .lsp .m4
 Makefile .pas .patch .diff .pm .pl .pod .pov .py .rb .sh .sql

5.1.2 Syntax highlighting alternatives
--------------------------------------
 The enabling of syntax highlighting contains OS dependent code and is not
 guaranteed to work (it was tested on Linux, Solaris, IRIX, HPUX, AIX,
 MacOS X, Cygwin and FreeBSD). It is deactivated by default and not
 recommended by me. It can be activated using "configure" or "make MODE=ask".

 The function code2color contains code to guarantee that color codes are only
 sent if less is called with one of the options -r or -R. To ensure that these
 checks are always performed, alternate syntax colorizers will be called
 from within code2color by setting the environment variable LESSCOLORIZER
 to the name of another program. Currently only pygmentize (and code2color as
 the default) is allowed. This can be changed in the first lines of code2color.

 Much better syntax highlighting is obtained using the less emulation of vim:
 The editor vim comes with a file less.sh, in my case located in
 /usr/share/vim/vim73/macros. Assuming that file location
 a function lessc (bash, zsh, ksh users)

	lessc () { /usr/share/vim/vim73/macros/less.sh "$@"}

 or an alias lessc (csh, tcsh users)

	alias lessc /usr/share/vim/vim73/macros/less.sh

 is defined and "lessc filename" is used to view the colorful file contents.

5.2 Colored Directory listing
-----------------------------
The conditions to display a colored listing are described above. Depending
on the operating system ls is then called with appropriate options to
produce colored output.

5.3 Colored listing of tar file contents
----------------------------------------
As above less has to be called with -r or -R. If also the executable tarcolor
(contained in the lesspipe tar file, see also [7]) is installed, then the
listing of tar file contents is colored in a similar fashion as directory
contents.

6. Displaying files with special characters in the file name
============================================================

 Shell meta characters in file names: space (frequently used in windows
 file names), the characters | & ; ( ) ` < > " ' # ~ = $ * ? [ ] or \
 must be escaped by a \ when used in the shell, e.g. less a\ b.tar.gz:a\"b
 will display the file a"b contained in the gzipped tar archive a b.tar.gz.

 Files within an archive that do have an isolated colon in the name cannot
 be displayed using the
	archive_name:contained_file_name
 notation. These files can be displayed using a notation with the alternate
 separator character = as follows:
	archive_name=contained_file_name=
 Please note the trailing = which is required.

7. Examples
===========

 As a typical usage case it is shown how one could display the man page
 "file.man" found in the Fedora10 RPM source archive file-4.26-3.fc10.src.rpm

 The less command enhanced with the lesspipe.sh filter

	less file-4.26-3.fc10.src.rpm

 yields the following output
 ...
 -rw-r--r--   1 root     root       584803 Sep 15 16:29 file-4.26.tar.gz
 -rw-r--r--   1 root     root        17124 Oct 16 13:01 file.spec

 Then the command

less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz

 produces the output
 ...
 -rw-rw-r-- 10080/10080   16027 2008-08-30 12:01:41 file-4.26/doc/Makefile.in
 -rw-rw-r-- 10080/10080   16097 2008-03-07 16:00:07 file-4.26/doc/file.man
 -rw-rw-r-- 10080/10080   16943 2008-08-30 11:50:20 file-4.26/doc/magic.man
 ...

 The desired man page can finally be viewed with

less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:file-4.26/doc/file.man

 The subcomponents of the argument to less were easily obtained by cut and
 paste using information contained in the previous lines of output.
 Care has been taken to display the subcomponents already in the way
 required by lesspipe, so that in most cases double clicking will select it.
 If the nroff sources should have been displayed instead, appending
 another colon at the end of the argument would have done the job:

less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:file-4.26/doc/file.man:

 If the man page was compressed (e.g. as file.man.gz) it would have been
 uncompressed anyway. To also disallow uncompressing the source file.man.gz
 a second colon would have to be appended to the argument.

 Even extracting single files from an archive is possible, like with

less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:file-4.26/src/file.c > file.c

 Files with binary contents can be extracted as well:

less file-4.26-3.fc10.src.rpm:file-4.26.tar.gz:: > file-4.26.tar.gz

 Here the two colons at the end of the argument are required to suppress the
 unzipping of the resulting file and to extract the tar file instead of
 interpreting it.

 Another interesting example is to get the dominating colors of a picture,
 that contains a diagram with a few colors only. The command

less diagram.png

 does produce a lot of information, among others
 ...
  Histogram:
       720: (  0,  0,127)       #00007F
      3032: (127,127,127)       grey50
     18935: (  0,  0,255)       blue
     21480: (  0,255,  0)       lime
     21041: (  0,255,255)       cyan
      8719: (255,  0,  0)       red
     14476: (255,  0,255)       magenta
      8822: (255,183,  0)       #FFB700
     13608: (255,255,  0)       yellow
     49167: (255,255,255)       white
 ...

 Other interesting examples are the inspection of Java's .jar files or Debian
 package contents without unpacking the files and even without having java
 installed or without working necessarily on a Debian system.

 The contrib directory does contain a less wrapper (a bash/zsh function) that
 can be used to display URLs using less. This allows to pass the URL contents
 through lesspipe as if it would be a local file.

8. Other documentation about lesspipe
=====================================
	http://ref.cern.ch/CERN/CNL/2002/001/unix-less/
	http://www.linux-magazine.com/issue/21/lesspipe.pdf
        in bash cookbook (Ch. 8.15) by Carl Albing, Cameron Newham, J. P. Vossen
        http://carloscosta.org/2008/07/05/how-to-get-more-from-less/
 Documentation in german:
	german.txt (distributed with lesspipe, not updated)
	http://www.linux-magazin.de/Heft-Abo/Ausgaben/2001/01/Bessere-Sicht
	http://www.linux-user.de/ausgabe/2002/04/060-ootb/lesspipe-1.html

9. External links
=================

(last checked: Jan 10 2013):

9.1 URLs to some utilities
--------------------------
 mediainfo            https://mediaarea.net/MediaInfo
 wvText               https://github.com/tsgouros/wv
 antiword             http://www.winfield.demon.nl/
 html2text            http://www.mbayer.de/html2text/
 cabextract           http://www.cabextract.org.uk/
 7za                  https://sourceforge.net/projects/p7zip/
 lzip                 http://download.savannah.gnu.org/releases/lzip/
 dvi2tty              http://www.ctan.org/tex-archive/dviware/dvi2tty/
 unrtf                http://ftp.gnu.org/gnu/unrtf/
 docx2txt.pl          https://github.com/arthursucks/docx2txt
 pandoc               http://pandoc.org
 libreoffice          http://www.libreoffice.org
 odt2txt              https://github.com/dstosberg/odt2txt
 git-xlsx-textconv    https://github.com/tokuhirom/git-xlsx-textconv
 git-xlsx-textconv.pl https://github.com/yappo/p5-git-xlsx-textconv.pl
 pptx2md              https://github.com/ssine/pptx2md
 id3v2                http://id3v2.sourceforge.net/

9.2 References
--------------
 [1] http://www.greenwoodsoftware.com/less/	(less)
 [2] ftp://ftp.astron.com/pub/file/		(file)
 [3] https://github.com/wofr06/lesspipe
 [5] http://www.palfrader.org/code2html/	(code2html)
 [6] http://www.darwinsys.com/file/		(file)
 [7] https://github.com/msabramo/tarcolor	(tarcolor)

10. Contributors
================

 The script lesspipe.sh is constantly enhanced thanks to suggestions from
 users. Among the additions to lesspipe.sh is the code to browse the ASCII
 contents of Word or Openoffice files, to show characteristics of mp3 files
 or to decode MacOS X formats.

 Thanks to (in alphabetical order):
 Marc Abramowitz: allow for color output when ls or tar commands are used
 James Ahlborn: do not interpret .xml files as html
 Sören Andersen: PPD files colorization requested
 Andrew Barnert: shell syntax fix
 Peter D. Barnes, Jr.: plist files for Mac OS X
 Eduard Bloch: proposed support for ISO images
 Mathieu Bouillaguet: add support for xz compression
 Florian Cramer: MS Word, Openoffice support (o3read), ASCIIart, DjVu support
 Philippe Defert: unattended installation
 Antonio Diaz Diaz: proposed support for lzip
 Bastian Fuchs: Issues using bash vs. sh
 Matt Ghali: more conservative usage of html2text
 Carl Greco: enhanced output for .deb files
 Stephan Hegel: suggested better 7za support
 Michel Hermier: support for DESTDIR in the Makefile
 Tobias Hoffmann: fixed a bug introduced in v 1.72, add support for Debian 2.0
   files with xz packed data
 Christian Höltje: suggested to use LESSCOLORIZER like in the Gentoo distro
 Jürgen Kahnert: display debian files without dpkg
 Sebastian Kayser: suggested a less wrapper to display URLs
 Ben Kibbey: works on FreeBSD
 Peter Kostka: mktemp on MacOS fixes
 Heinrich Kuettler: formatting, html via lynx
 Antony Lee: correctly call pygmentize (no - argument, use -g)
 Vincent Lefèvre: runtime checks for shell, many enhancements, provided sxw2txt,
   determine options for 'file' at runtime, display text in the proper encoding
 David Leverton: detect helper programs at runtime
 Jay Levitt: suggested to use enscript for highlighting support (see 4.2 above)
 Vladimir Linek: inspired me to add ps and dvi support
 Oliver Mangold: correctly display directories with a colon in the name
 Istvan Marko: speedup of the procedure
 Markus Meyer: improved mp3 handling
 Remi Mommsen: Mac OS X support
 Derek B. Noonburg: PDF files support
 Martin Otte: mktemp on MacOS fixes
 Jim Pryor: many enhancements, bug fixes, restructuring of code, tar detection
 Slaven Rezic: Cygwin support, bug fixes
 Daniel Risacher: gpg support
 Jens Schleusener: ksh syntax fixes
 Ken Teague?: support more versions of file command
 Matt Thompson: add support for NetCDF and HDF5 files using ncdump, h5dump
 Paul Townsend: improved zip support for Solaris, bug fixes in configure
 Petr Uzel: detect helper programs at runtime
 Chelban Vasile: trap command not working under /bin/sh
 Götz Waschk: suggested lzma support
 Michael Wiedmann: Debian packages support
 Dale Wijnand: Proposed the suppression of informal output
 Peter Wu: shortcut for unreadable files

About

lesspipe (formerly on sourceforge)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Perl 72.9%
  • Shell 25.0%
  • Roff 1.9%
  • Makefile 0.2%