Skip to content

paslandau/pagerank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#pagerank Build Status

Calculating the PageRank of nodes in a directed graph.

##Description A PHP implementation of the PageRank algorithm described in The PageRank Citation Ranking: Bringing Order to the Web (PDF). The project was created to support my session "PageRank und TrustRank" at the AFS - Akademie für Fortbildung in SEO and gives some basic examples as well as something to get interested participants started quickly.

The PageRank itself is an indicator of the importance of a node in a linked graph. The basic idea behind the algorithm is as follows:

  • an edge ("link") between two nodes in a (directed) graph can be regarded as an endorsement from the originating node to the target node
  • hence, the more nodes link to "me" the more important "I" am
  • moreover, the more important "I" am, the more weight carries my endorsement

In this implementation an iterative approach is used to calculate the PageRank (PR) (source: Wikipedia):

PageRank formula

where p_1, p_2, ..., p_N are the pages under consideration, d is a damping factor to avoid "rank sinks", M(p_i) is the set of pages that link to p_i, L(p_j) is the number of outbound links on page p_j, and N is the total number of pages. The PR values change on every iteration step until a predefined threshold for the difference between old and new PR value is reached.

Probably the most important application of the PageRank algorithm is being a part of the ranking algorithm of (web) search engines. The PageRank is commonly known as the most important piece that set Google apart from other search engines back in 1998. See The Anatomy of a Large-Scale Hypertextual Web Search Engine for reference.

##Basic Usage

// define the nodes
$a = new Node("a");
$b = new Node("b");
$c = new Node("c");
$d = new Node("d");
$e = new Node("e");
$f = new Node("f");
$x1 = new Node("x1");
$x2 = new Node("x2");
$x3 = new Node("x3");
$x4 = new Node("x4");
$x5 = new Node("x5");

// define the links between the nodes
$graph = new Graph();
$graph->addEdge(new Edge($b, $c));

$graph->addEdge(new Edge($c, $b));

$graph->addEdge(new Edge($d, $a));
$graph->addEdge(new Edge($d, $b));

$graph->addEdge(new Edge($e, $b));
$graph->addEdge(new Edge($e, $d));
$graph->addEdge(new Edge($e, $f));

$graph->addEdge(new Edge($f, $b));
$graph->addEdge(new Edge($f, $e));

$graph->addEdge(new Edge($x1, $b));
$graph->addEdge(new Edge($x1, $e));
$graph->addEdge(new Edge($x2, $b));
$graph->addEdge(new Edge($x2, $e));
$graph->addEdge(new Edge($x3, $b));
$graph->addEdge(new Edge($x3, $e));
$graph->addEdge(new Edge($x4, $e));
$graph->addEdge(new Edge($x5, $e));

// calculate the PageRank
$pageRank = new PageRank();
$result = $pageRank->calculatePagerank($graph);

// print the result
$formatter = new ResultFormatter(4);
echo $formatter->toString($result);

Output

93. Round
Node	OldPr	NewPr	Difference
b	    0.3242	0.3242	-0
c	    0.2892	0.2892	0
d	    0.033	0.033	0
a	    0.0276	0.0276	0
e	    0.0682	0.0682	0
f	    0.033	0.033	0
x1	    0.0136	0.0136	0
x2	    0.0136	0.0136	0
x3	    0.0136	0.0136	0
x4	    0.0136	0.0136	0
x5	    0.0136	0.0136	0

The Graph in the example corresponds to the following graphic, taken from the Wikipedia article on PageRank: