Worked PageRank Example

Introduction

One of my sons asked me if I could work through a PageRank calculation example a couple of different ways (algebraic and iterative). It was an interesting exercise and I thought it would be worth documenting here. I used Mathcad for both my algebraic and iterative solutions.

The Wikipedia has an excellent definition of the PageRank algorithm, which I will quote here.

PageRank is a link analysis algorithm, named after Larry Page[1] and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set.

PageRank is computed using a relatively simple function (see Equation 1), but a number of web-based examples treat the weighting of inbound links from sites external to a particular group of pages as a special case. I did not see any explicit calculation examples, so I thought I would include this calculation here.

Background

PageRank views the web as graph, with inbound links being viewed as measure of the significance of a web page. Equation 1 shows the PageRank equation. Note that this equation can be solved several different ways. One approach involves eigenvalues, which my son does not know about yet. The equation can be solved algebraically or iteratively. I will use both approaches for this example.

Eq. 1 PR\left( {{p}_{i}} \right)=\frac{1-d}{N}+d\cdot \sum\limits_{p\in M\left( {{p}_{i}} \right)}{\frac{PR\left( {{p}_{j}} \right)}{L\left( {{p}_{j}} \right)}}

where

  • N is the number of pages.
  • d is called the damping factor and it is an arbitary weighting factor.
  • PR(pi is the PageRank of page pi.
  • L(pi) is the number of outbound links from page pi.
  • M(pi) is the set of links to page pi.

Equation 2 shows a matrix form of Equation 1. The matrix form is most likely the form used "out in the wild."

Eq. 2 \mathbf{R}(t+1)=d\cdot \mathcal{M}\cdot \mathbf{R}(t)+\frac{1-d}{N}\cdot \mathbf{1}\text{ }~

where

  • R is the PageRank vector.
  • 1 is a column vector with all elements equal to 1.
  • t is a discrete time variable (really a sequence number).

Note that many examples of PageRank are computed using a variant of Equations 1 and 2 that multiplies the PageRank value by the number of pages (N·PageRank). Equation 3 illustrates Equation 2 modified with the substitution \mathbf{{R}'}=N\cdot \mathbf{R}. Equation 1 can be modified similarly. This is the equation that Ian used for his examples. Since I am going to duplicate his results, I will multiply my results by N.

Eq. 3 N\cdot \mathbf{R}(t+1)=N\cdot d\cdot \mathcal{M}\cdot \mathbf{R}(t)+N\cdot \frac{1-d}{N}\cdot \mathbf{1}
\left( N\cdot \mathbf{R}(t+1) \right)=d\cdot \mathcal{M}\cdot \left( N\cdot \mathbf{R}(t) \right)+\left( 1-d \right)\cdot \mathbf{1}
\therefore \quad \mathbf{{R}'}(t+1)=d\cdot {\mathcal{M}}'\cdot \mathbf{{R}'}(t)+\left( 1-d \right)\cdot \mathbf{1},\text{ where }\mathbf{{R}'}=N\cdot \mathbf{R}

There has been quite a bit written about the nuances of this equation because of its importance in determining a web page's position in a list of search results. I am not concerned about those details here. I am focused here on the calculation of the PageRank for a specific set of pages.

Analysis

Example

My son was using Ian Roger's excellent site for learning about the details of PageRank. The question he had is on Example 10, which assigns a PageRank of 1 to an external page. Figure 1 shows the Example 10's web page configuration. Ian's PageRank results are shown in the boxes, which represent web pages. I want to show the details on obtaining Ian's results as an illustration of how to handle an external link from a page with a defined PageRank.

Figure 1: PageRank Example from Ian Roger's Website.

Figure 1: PageRank Example from Ian Roger's Website.


I have added the letters A-F to Figure 1, which constitute the variable names I will use in my solutions.

Algebraic Solution

When Ian uses a link from an external site, he sets the PageRank value to 1. Here is his rationale.

We'll assume there's an external site that has lots of pages and links with the result that one of the pages has the average PR of 1.0.

Algebraically, this is easy to handle. Figure 2 shows my solution implemented in Mathcad.

Figure 2: Algebraic Solution for Example 10.

Figure 2: Algebraic Solution for Example 10.

Iterative Solution

Figure 3 shows how I setup my iterative solution. To force page A to have a PageRank of 1, I needed to remove page A from the R vector and M matrix, but add it back in so that page A's contribution can be included. Again, it is a slight modification of Equation 3 so that I can force page A to have a PageRank of 1.

Figure 3: Setup for My Iterative Solution of Equation 2.

Figure 3: Setup for My Iterative Solution of Equation 2.


Figure 4 shows my iteration stage. Note that I was lazy and simply let it iterate a 1001 times. I could have made this much more efficient by simply monitoring the convergence, but that would have taken a bit more time.
Figure 4: Iterative Solution of Equation 2.

Figure 4: Iterative Solution of Equation 2.

Conclusion

I obtained the same results as Ian using two different approaches -- algebraic and iterative. This example was different than most in that a particular web page was forced to a particular PageRank. I hope that I answered my son's question.

 
This entry was posted in software and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *