Saturday, August 05, 2006

Internal linking


Fact:
A website has a maximum amount of PageRank that is distributed between its pages by internal links.


The maximum PageRank in a site equals the number of pages in the site * 1. The maximum is increased by inbound links from other sites and decreased by outbound links to other sites. We are talking about the overall PageRank in the site and not the PageRank of any individual page. You don't have to take my word for it. You can reach the same conclusion by using a pencil and paper and the equation.


Fact:
The maximum amount of PageRank in a site increases as the number of pages in the site increases.


The more pages that a site has, the more PageRank it has. Again, by using a pencil and paper and the equation, you can come to the same conclusion. Bear in mind that the only pages that count are the ones that Google knows about.


Fact:
By linking poorly, it is possible to fail to reach the site's maximum PageRank, but it is not possible to exceed it.


Poor internal linkages can cause a site to fall short of its maximum but no kind of internal link structure can cause a site to exceed it. The only way to increase the maximum is to add more inbound links and/or increase the number of pages in the site.


Cautions:
Whilst I thoroughly recommend creating and adding new pages to increase a site's total PageRank so that it can be channeled to specific pages, there are certain types of pages that should not be added. These are pages that are all identical or very nearly identical and are known as cookie-cutters. Google considers them to be spam and they can trigger an alarm that causes the pages, and possibly the entire site, to be penalized. Pages full of good content are a must.

What can we do with this 'overall' PageRank?

We are going to look at some example calculations to see how a site's PageRank can be manipulated, but before doing that, I need to point out that a page will be included in the Google index only if one or more pages on the web link to it. That's according to Google. If a page is not in the Google index, any links from it can't be included in the calculations.

For the examples, we are going to ignore that fact, mainly because other 'Pagerank Explained' type documents ignore it in the calculations, and it might be confusing when comparing documents. The calculator operates in two modes:- Simple and Real. In Simple mode, the calculations assume that all pages are in the Google index, whether or not any other pages link to them. In Real mode the calculations disregard unlinked-to pages. These examples show the results as calculated in Simple mode. pagerank, page rank


Let's consider a 3 page site (pages A, B and C) with no links coming in from the outside. We will allocate each page an initial PageRank of 1, although it makes no difference whether we start each page with 1, 0 or 99. Apart from a few millionths of a PageRank point, after many iterations the end result is always the same. Starting with 1 requires fewer iterations for the PageRanks to converge to a suitable result than when starting with 0 or any other number. You may want to use a pencil and paper to follow this or you can follow it with the calculator.

The site's maximum PageRank is the amount of PageRank in the site. In this case, we have 3 pages so the site's maximum is 3.

At the moment, none of the pages link to any other pages and none link to them. If you make the calculation once for each page, you'll find that each of them ends up with a PageRank of 0.15. No matter how many iterations you run, each page's PageRank remains at 0.15. The total PageRank in the site = 0.45, whereas it could be 3. The site is seriously wasting most of its potential PageRank.

__________________________________________
Example 1

Now begin again with each page being allocated PR1. Link page A to page B and run the calculations for each page. We end up with:-

Page A = 0.15
Page B = 1
Page C = 0.15


Page A has "voted" for page B and, as a result, page B's PageRank has increased. This is looking good for page B, but it's only 1 iteration - we haven't taken account of the Catch 22 situation. Look at what happens to the figures after more iterations:-

After 100 iterations the figures are:-

Page A = 0.15
Page B = 0.2775
Page C = 0.15


It still looks good for page B but nowhere near as good as it did. These figures are more realistic. The total PageRank in the site is now 0.5775 - slightly better but still only a fraction of what it could be.

NOTE:
Technically, these particular results are incorrect because of the special treatment that Google gives to dangling links, but they serve to demonstrate the simple calculation.

_________________________________________________

Example 2

Try this linkage. Link all pages to all pages. Each page starts with PR1 again. This produces.

Page A = 1
Page B = 1
Page C = 1


Now we've achieved the maximum. No matter how many iterations are run, each page always ends up with PR1. The same results occur by linking in a loop. E.g. A to B, B to C and C to D. View this in the calculator.

This has demonstrated that, by poor linking, it is quite easy to waste PageRank and by good linking, we can achieve a site's full potential. But we don't particularly want all the site's pages to have an equal share. We want one or more pages to have a larger share at the expense of others. The kinds of pages that we might want to have the larger shares are the index page, hub pages and pages that are optimized for certain search terms. We have only 3 pages, so we'll channel the PageRank to the index page - page A. It will serve to show the idea of channeling.

________________________________________________

Example 3

Now try this. Link page A to both B and C. Also link pages B and C to A. Starting with PR1 all round, after 1 iteration the results are:-

Page A = 1.85
Page B = 0.575
Page C = 0.575


and after 100 iterations, the results are:-

Page A = 1.459459
Page B = 0.7702703
Page C = 0.7702703


In both cases the total PageRank in the site is 3 (the maximum) so none is being wasted. Also in both cases you can see that page A has a much larger proportion of the PageRank than the other 2 pages. This is because pages B and C are passing PageRank to A and not to any other pages. We have channeled a large proportion of the site's PageRank to where we wanted it.

________________________________________________

Example 4

Finally, keep the previous links and add a link from page C to page B. Start again with PR1 all round. After 1 iteration:-

Page A = 1.425
Page B = 1
Page C = 0.575


By comparison to the 1 iteration figures in the previous example, page A has lost some PageRank, page B has gained some and page C stayed the same. Page C now shares its "vote" between A and B. Previously A received all of it. That's why page A has lost out and why page B has gained. and after 100 iterations:-

Page A = 1.298245
Page B = 0.9999999
Page C = 0.7017543


When the dust has settled, page C has lost a little PageRank because, having now shared its vote between A and B, instead of giving it all to A, A has less to give to C in the A-->C link. So adding an extra link from a page causes the page to lose PageRank indirectly if any of the pages that it links to return the link. If the pages that it links to don't return the link, then no PageRank loss would have occured. To make it more complicated, if the link is returned even indirectly (via a page that links to a page that links to a page etc), the page will lose a little PageRank. This isn't really important with internal links, but it does matter when linking to pages outside the site.

________________________________________________

Example 5: new pages

Adding new pages to a site is an important way of increasing a site's total PageRank because each new page will add an average of 1 to the total. Once the new pages have been added, their new PageRank can be channeled to the important pages. We'll use the calculator to demonstrate these.

Let's add 3 new pages to Example 3 [view]. Three new pages but they don't do anything for us yet. The small increase in the Total, and the new pages' 0.15, are unrealistic as we shall see. So let's link them into the site.

Link each of the new pages to the important page, page A [view]. Notice that the Total PageRank has doubled, from 3 (without the new pages) to 6. Notice also that page A's PageRank has almost doubled.

There is one thing wrong with this model. The new pages are orphans. They wouldn't get into Google's index, so they wouldn't add any PageRank to the site and they wouldn't pass any PageRank to page A. They each need to be linked to from at least one other page. If page A is the important page, the best page to put the links on is, surprisingly, page A [view]. You can play around with the links but, from page A's point of view, there isn't a better place for them.

It is not a good idea for one page to link to a large number of pages so, if you are adding many new pages, spread the links around. The chances are that there is more than one important page in a site, so it is usually suitable to spread the links to and from the new pages. You can use the calculator to experiment with mini-models of a site to find the best links that produce the best results for its important pages.

________________________________________________

Examples summary

You can see that, by organising the internal links, it is possible to channel a site's PageRank to selected pages. Internal links can be arranged to suit a site's PageRank needs, but it is only useful if Google knows about the pages, so do try to ensure that Google spiders them.

________________________________________________

Inbound and Outbound links

Examples of these could be given but it is probably clearer to read about them (below) and to 'play' with them in the calculator.

________________________________________________

Questions

When a page has several links to another page, are all the links counted?

E.g. if page A links once to page B and 3 times to page C, does page C receive 3/4 of page A's shareable PageRank?

The PageRank concept is that a page casts votes for one or more other pages. Nothing is said in the original PageRank document about a page casting more than one vote for a single page. The idea seems to be against the PageRank concept and would certainly be open to manipulation by unrealistically proportioning votes for target pages. E.g. if an outbound link, or a link to an unimportant page, is necessary, add a bunch of links to an important page to minimize the effect.

Since we are unlikely to get a definitive answer from Google, it is reasonable to assume that a page can cast only one vote for another page, and that additional votes for the same page are not counted.

When a page links to itself, is the link counted?

Again, the concept is that pages cast votes for other pages. Nothing is said in the original document about pages casting votes for themselves. The idea seems to be against the concept and, also, it would be another way to manipulate the results. So, for those reasons, it is reasonable to assume that a page can't vote for itself, and that such links are not counted.

No comments: