Skip to content

terminaldweller/cgrep

Repository files navigation

example workflow Codacy Badge

cgrep

cgrep is grep for C-family source files.

You can write something like this:

cgrep --regex [a-z]* --func -A 1 -B 1 myawesomecode.cpp

and it will match your regex against all function declarations, and will output the result, plus one line before and after the context.

cgrep is implemented using Clang's libtooling libraries.

Features

  • It's basically Clang regexing it's way through your C-family source-code. You have all the context you can ever need.
  • Can output whether to print the declaration of a match even if the match itself is not a declaration along with the matched result.
  • Can output matches in a script-friendly format which could be used in turn by a secondary script.

Will cgrep try to implement all of the grep switches?

The answer is no. The main distinction is that cgrep is only meant to work on C-family source files not text files. Most of grep's switches don't apply to the usecase or provide almost no benefits at all.
That being said, I might have missd something so you can always make suggestions in the form of a new issue.

Will cgrep support a new switch that matches X?

If it makes sense sure, but I want to be careful with what cgrep implements. If everything gets implemented, that is, cgrep implements every possible switch(well, a subset of "all"), we end up with an inferior version of clang-query that would be too slow to be of any use to anyone. So please keep in mind that I will have to draw the line somewhere.

Building

There are a couple of examples under docker. You can use those if you get stuck.

Good Ole' Makefiles

NOTE: Good ole makefiles are no longer supported.

Cmake

To do an out-of-source build simply do:

git clone https://github.com/terminaldweller/cgrep
cd cgrep
git submodule init
git submodule update
mkdir build
cmake ../ -DLLVM_CONF=llvm-config-15 -DCMAKE_CXX_COMPILER=clang++-15 -DUSE_MONOLITH_LIBTOOLING=ON
make

The 4 variables denote the llvm-config executable name, the clang++ name and finally, the 3rd one tells cmake whether to build using the single c++ libtooling library or just use the old way with all the libtooling libraries. The last one lets cmake know which version of llvm/clang is being used.

Usage

A simple usage example:

cgrep -A 1 -B 1 --func --declrefexpr --regex n[aA]m --nocolor --nodecl ./myawesomecode.cpp

In order for cgrep to work, you need to have a compilation database, tools like cmake can generate one for you.
You can, by all means, run cgrep without a compilation database but whether that works or not really depends on your source file. Can you build your source file with clang without passing it any options? If the answer to that is yes, then you can just run cgrep without a compilation database like so:

cgrep -A 1 -B 1 --func --declrefexpr --regex n[aA]m --nocolor --nodecl ./myawesomecode.cpp --

the -- at the end is an explicit way of saying that you will not be providing a compilation database. Newer versions of clang will try to still go through with the compilation even if there is no compilation database found. Otherwise you need a compilation database.

Please do note that the regex will pass through both C++ and the regex engine, so if you would want to escape \, the regex you pass as the command line arg would be \\\\ instead of the normal \\.
If your build tool doesn't do that, you can just use bear or scan-build.
You can also skip the compilation database altogether passing cgrep -- after the input file name which means you have chosen not to pass it anything.
You can pass the options by hand using --extra-arg= since cgrep is a clang instance so it recognizes every option clang has. As a general rule, if you're not going to pass cgrep a compilation database, it's always better to explicitly let cgrep know using --. Not doing so can result in instances when cgrep behaves in a way that you might not expect it.

cgrep uses ANSI escape sequences for colors so your terminal should support those. In case your terminal does not support ANSI escape sequences or you don't want thos for any other reason, you can silence those using the --nocolor option.

By default, cgrep will print out the declaration location for a match. In case you don't want those in the output, you can pass cgrep the --nodecl switch.

You can use --extra-arg=--std= to tell cgrep which C-family language the source file is supposed to be in.

Options

Here's an option list, though it's usually not up-to-date.
For an up-to-date list, you can run cgrep --help or look at the man page.

  -A=<int>                    - Same as grep, how many lines after the matched line to print. Defaults to 0.
  -B=<int>                    - Same as grep, how many lines before the matched line to print. Defaults to 0.
  --all                       - Turns on all switches other than nameddecl.
  --awk                       - Outputs location in a gawk friendly format, not meant for human consumption. Defaults to false.
  --call                      - Match function calls.
  --class                     - Match class declarations.
  --cxxcall                   - Match member function calls.
  --declrefexpr               - Matches declrefexpr.
  --dir=<string>              - Recursively goes through all the files and directories. Assumes compilation databases are present for all source files.
  --extra-arg=<string>        - Additional argument to append to the compiler command line
  --extra-arg-before=<string> - Additional argument to prepend to the compiler command line
  --func                      - Match functions.
  --header                    - Match headers in header inclusions.
  --macro                     - Match macro definitions.
  --mainfile                  - Match identifiers in the main file only. Defaults to true.
  --memfunc                   - Match member functions.
  --memvar                    - Match member variables.
  --nameddecl                 - Matches all named declarations.
  --nocolor                   - For terminals that don't support ANSI escape sequences. The default is to false.
  --nodecl                    - For switches that are not declarations, don't print declarations. Defaults to false.
  -p=<string>                 - Build path
  --recorddecl                - Match a record declaration.
  --regex=<string>            - The regex to match against.
  --struct                    - Match structures.
  --syshdr                    - Match identifiers in system header as well. Defaults to false.
  --union                     - Match unions.
  --var                       - Match variables.

cgrep is a clang tool, so it will accept all valid clang command line options.

Known Issues

cgrep complains that it cannot find stddef.h or some other similar header. If that happens to you , it's because cgrep can't find the clang built-in headers. run llvm-config --libdir, then head on to clang. Inside that directory you should see one(or maybe more) llvm/clang versions. Pick the one you used to build cgrep against. Inside that directory there will be a directory named include. Pass that to cgrep any way you see fit.
Alternatively, $(llvm-config --libdir)/clang/$(llvm-config --version)/include should give the path cgrep needs to include. If you build your llvm/clang from upstream, this might not work. SVN builds will have the svn string attached to the version number.
You could,for example, use --extra-arg=-I/usr/lib/llvm-9/lib/clang/9.0.0/include to call cgrep or you could just alias cgrep to cgrep --extra-arg=-I/usr/lib/llvm-9/lib/clang/9.0.0/include.

cgrep, replaces the clang diagnosticConsumer with a simple one that only tells you there are erros during the compilation. You can get the normal clang output using the --clangdiag switch. The decision was made to declutter the output generated by cgrep.