Skip to content
jordanell edited this page Jun 20, 2012 · 16 revisions

The main goal of Call Graph Analyzer is to create technical dependency graphs. In such graphs, node are represented as source code contributors to a Java project, and edges are representative of a technical relationship between them which is weighted.

The relationship is established in the following scenario. Contributor A writes a method called foo and contributor B writes a method called bar which in turn invokes method foo. There is now a one way directional relationship going from B to A. However, in our technical dependency graph, we are only interested in relationships that are involved in some source code change. So, for this previous relationship to exist, contributor A must change some code in method foo. From this change we can establish the relationship previously mentioned.

The weight of the previous example comes from the formula: ((delta foo) * ((owned bar) / depth))*100. Delta foo is the percent change in method foo by contributor A, owned bar is the percent of method bar which contributor B owns and depth is the distance between the two functions. For this example, the depth is 1 because bar directly calls foo. If however, there was another function foobar which calls bar, the depth from foobar to foo would be 2.

From this technical dependency network we can determine the impact of a certain contributor changing code to all of his or her related contributors.

Building The Network

In order to be able to build this network, several actions need to be taken before we see any results. It is highly recommended that the project you are trying to analyze is a native Git repository. However, we also provide tools for converting other major source control management systems into git although we do not guarantee 100% accuracy in the conversion. (Some data may not be able to be converted or may be lost)

Convert to Git

If you have a native git repository, you can skip this step, otherwise you must convert your SVN, CVS or Mercurial repository to git. We have written a series of scripts that will do this for you. The scripts are included in Eggnet's side project called scm2pgsql. The scripts are located in the utils folder and can be run on any linux based operating system. To find out how to run any of these scripts you can consult the wiki for the project. Here we describe how to use the scripts to convert as well as describe the next step of the process which is to export the git repository to a database.

Export to Database

Once you have your git repository, we need to dump its data to a PostgreSQL database. This step again involve's Eggnet's project called scm2pgsql. You can use the project's wiki to find out how to run the project for the database dump.

Building Networks

Now that we have all of our data all neat and prepared, we can actually build our networks! To build the technical networks we want, we will run this project (Call Graph Analyzer).

In order to run Call Graph analyzer, you must supply three arguments. The first is the name of the git repository which you dumped into the database in the previous step. (This is usually the root folder name of the git repository not the full path) The second arguments is the branch name. (Usually "master") The third argument is the starting commit ID that you wish to build networks from. The program will generate all networks for commits that are reachable along some network path starting from the given commit.

There are a series of steps that this project does in order to build the technical network:

Parsing Stage
Resolving Stage
Ownership Stage

Through the above three stages, we build up all of our "behind the scenes" data which enables us to build our actual networks. For more information on each of these stages, click the links.

Network Building Stage

After the network build stage is completed, we have our networks! All of the networks are then dumped to the PostgreSQL database. The schema for the database can be found here.

As an additional step, our team imports the database networks into the graphing program known as Gephi which is an open source program that can run on Windows, OS X and Linux.

Additional Resources

In several stages of this program, our code uses custom "libraries" that our team has written. The term libraries is used loosely as they are not necessarily stand alone code pieces.

The two main additional libraries that our team uses are:
database which is our standard database connector code.
differ which is our code for diffing two source files.

Clone this wiki locally