PageRank Sculpting Why and How to Do It

by Rob Laporte
Visibility Magazine – Fall 2012

Executive Summary

PageRank sculpting is the channeling of search engine spiders, indexing, PageRank, and other trust and authority assignations. It usually pertains to tactics within a single website domain, but some tactics can apply to a family of domains or sub-domains. SEO pros have been doing it successfully throughout the 2000s. It is much more than the justly discredited nofollow meta-tag sculpting. The emergence of the canonical tag in 2009 changed best practices for synchronizing PageRank sculpting tools. PageRank sculpting is essential in SEO management of large, dynamic websites, and can be helpful in smaller websites undergoing redesigns. This article clarifies which tools and combination of tools channel spidering and indexing and circulate PageRank and “TrustRank.” Seasoned SEO pros can skip to “PageRank Sculpting Tools” below.

The Nofollow Meta-Tags Debacle

Around 2007, SEO pros debated the effectiveness of applying nofollow tags to links within a single website in order to funnel PageRank to the best content. Many SEO pros adopted the practice, but then in 2009 Google’s Matt Cutts dropped the bomb that the tactic does not work – and did not work for about a year prior to Matt’s announcement. The nofollow tactic and the debate about it diverted attention from several other crucial ways to channel PageRank effectively. Mr. Cutts himself noted this importance, being sure to distinguish PageRank sculpting from the bogus nofollow sculpting. (For more on this, see http://searchengineland.com/pagerank-sculpting-is-dead-long-live-pagerank-sculpting-21102.)

Envision PageRank Sculpting

The selection and sequence of PageRank sculpting tools varies according to each website’s technology, information architecture, and budget, so a general understanding will help you to arrive at solutions specific to your website.

Let’s use the metaphors of liquid gold and a genealogical chart. Picture your website’s organization in the form of a family genealogical chart where a single page is a breeding couple and children are connected by horizontal lines and some breed more pages as you go down the vertical lines. Every connecting line is a little pipe, and the home page has a bucket into which the liquid gold of PageRank is poured. Some of the families and individuals in this tree are brilliant and beautiful–the hope and promise of your bloodlines and nation–and are worth investing in so that they can return benefit to everyone. Sadly, some of the offspring are loafing crack addicts with little (content) to offer. Sure, we should and someday we will put those individuals in rehab (add good content to those “thin” pages), but right now we have to distribute that liquid gold to the most productive people in the family tree so that overall prosperity is maximized. With the returns on those investments, we can help the weaker members.

So, PageRank sculpting directs the flow of that liquid gold by using tools to open and close valves in the pipes of your website’s hierarchy.

By “spidering,” I refer to the search engine’s inclusion of a page in its SERPs (search engine results pages), whereas indexing goes further to include the page’s words in the keyword index. If a page is spidered but not indexed, the SERP has no text snippet and is unlikely to appear for keyword searches as opposed to “site:” searches. Henceforth, I use “PageRank” to also encompass trust and authority signals.

I focus on Google. Bing, Yahoo and others may treat some of the tools a little differently, but because the tools can only be aimed at all search engines, you should deploy the tools for Google.

PageRank Sculpting Tools

All search marketing tools are like the tools of wood sculpting: without the experience and talent of the artist, even the best tools are worthless. Likewise, the most talented and inspired artist lacking tools will produce one ugly duck. PageRank sculpting uses standard tools in a combination and sequence that varies according to each website’s technology, information architecture, and budget. These tools are:

1. The robots.txt file
2. 301 redirects
3. Canonical meta-tags
4. Pagination with rel=“next” and rel=“prev”
5. The noindex meta-tag
6. X-Robots-Tag HTTP header directive
7. XML sitemaps
8. The nofollow meta-tag
9. Advanced and less essential tools (perhaps for a future article here) include cache controls, last-modified headers, the unavailable_after X-robots-tag, and a few others.

In addition to these tools, many of the standard rules of CMS-SEO apply to PageRank sculpting.
(For an overview of CMS-SEO, see VisibilityMagazine.com/internet_marketing_magazine/previous_issues/html/december-2007 and VisibilityMagazine.com/disc-inc/rob-laporte/cms-and-database-seo-guide).

It’s crucial to know, and frankly difficult to remember, which of the tools block or redirect spidering, keyword indexing, and PageRank. The rest of this article serves as a quick reference to help your team choose and coordinate PageRank sculpting tools.

The robots.txt File

Excluded pages are often spidered (not always excluded from the SERPs).
Excluded pages are not indexed.
PageRank may “leak” to excluded pages.
PageRank does not pass through excluded pages.

Although this simple and imperfect tool is generally not recommended for PageRank sculpting now, it can be used alone to make big improvements quickly. If your time or budget is very constrained, or you need to show quick results before getting funding for more improvements, you can often begin with the robots.txt file. Remember that if a section or page is the only one with links to content that you do want placed well in the search engines, then don’t exclude that page or section via the robots.txt file.

Sample application: A website has multiple directory paths and URLs to end pages. In the robots.txt file, exclude all but the most important directory path or two. True, PageRank may leak to the first page of every excluded directory path, but you avoid vast PageRank dilution via many differing URL paths to end pages.

301 Redirects

Source pages are not spidered (are always excluded from the SERPs).
Indexing of the source page is replaced by indexing of the destination page.
PageRank is passed to destination pages, though if the source page is on an external website, then probably not 100% is passed (I estimate 85% as a rule of thumb).

Only rare and special situations merit using other redirects, like 302, JavaScript, and the deprecated and very risky meta-refresh. These other redirects are beyond the scope of this article. It is unclear whether 301s pass 100% of PageRank even when source and destination are within the same website, but I assume that over 95% is (this topic could occupy an entire article). Suffice it all to say that 301s are usually the best way to do redirects, and several common situations require them, like site redesigns with URL changes, and domain and sub-domain changes.

The Canonical Meta-tag

Source pages are spidered and are not always excluded from the SERPs, but of course destination pages get much more preference.
Indexing of the source page is usually replaced by indexing of destination page, though Google does compare content and reserves the right to not honor this directive if, for example, the source page is too different from the destination page.
My research reveals that nobody really knows to what extent PageRank is passed from source to destination, and Google does not make this clear either.
PageRank is generally consolidated from the source page(s), as in product pages that are identical except for size or color, into the destination page, though a little PageRank may stay with source pages rather than pass to destination pages, certainly if Google disregards this directive.
If the source page has enough PageRank, the page’s links are followed by the spiders.
Whereas neither search engines nor humans see the source page of a 301 redirect, humans continue to see the source page of a canonical “redirect.”

The canonical meta-tag is sometimes called the “soft 301 redirect”; it works like a 301 but less absolutely. Given that the search engines see and assess both the source and the destination pages, one wonders whether the source page may keep a little PageRank or at least appear in search results when the search pertains to the small part of the source page that is unique relative to its destination page. That is, if pages about a gold-plated pen with various colors of ink have canonical tags pointing to the black ink page, would a search for that pen’s exact name and the phrase “blue ink” produce a top ranking for the blue ink pen, even though the canonical tag points to the black ink pen? I would think so, given all of the above. If so, one gets the dual benefit of consolidating PageRank on the destination or “canonical” page, while still allowing the source page to appear for searches highly specific to it.

Pagination with rel=“next” and rel=“prev” and View-All Pages

Google prefers that you have a view-all page, while appreciating cases where a view-all page is poor usability, as when too many items slow download.

Where a view-all page does not exist, all previous-next pages are spidered and included in the SERPs, with the first page most likely to appear in the SERPs.
Where a view-all page exists, previous-next pages are spidered but are much less likely than the view-all page to appear in the SERPs.
Where a view-all page does not exist, all previous-next pages are indexed and can appear in the SERPs for searches relevant to each page, with preference on the first page of the series.
Where a view-all page exists, all previous-next pages are indexed, but the view-all page is much more likely to appear in the SERPs.
Where a view-all page does not exist, PageRank of previous-next pages is averaged and applied equally to all pages.
Where a view-all page exists, PageRank is consolidated from the previous-next pages into the view-all page.

In all cases, all pages’ links for spidering are consolidated as though all pages are one, and PageRank is passed accordingly (if there’s enough initial PageRank to encourage more spidering and indexing).

For more about this important topic, begin with http://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html.

The noindex Meta-tag

Spidering is not blocked, but over time the page should disappear from SERPs, despite anecdotal evidence to the contrary (probably due to timing of implementation relative to indexing).
Indexing is blocked of course.
Links will pass PageRank to the excluded page, wasting PageRank that is kept to that excluded page, but most of that PageRank can pass to pages linked from the excluded page, and the “follow” tag may help this pass-through.

The above synopsis averages the uncertain and contradictory views among even the best SEO minds. However, opinion is quite unanimous that this tag is preferable to the robots.txt exclusion in most cases. Google seems to concur: “To entirely prevent a page’s contents from being listed in the Google web index even if other sites link to it, use a noindex meta tag. As long as Googlebot fetches the page, it will see the noindex meta tag and prevent that page from showing up in the web index
(http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710).

X-Robots-Tag HTTP Header Directive

This tag behaves like the noindex tag above. It is best for non-HTML content, like images or pdfs, which of course can’t have the noindex tag put in them. However, this can be used for HTML pages, and one major advantage is that you can program regular expressions in the .htaccess file in order to create site-wide exclusion rules efficiently. For example, to exclude all pdf files:
Header set X-Robots-Tag “noindex, nofollow”

For more about this tag, see https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag

XML Sitemaps

Included pages are spidered, and so too are pages not in the sitemap (if CMS-SEO is healthy).
Of course pages are indexed, and so too are ones not listed in the sitemap.
XML sitemaps do not pass PageRank, so if they are the only means of Google discovering your pages, those pages likely will rank poorly. This is why it is wrong to consider XML sitemaps a means by which to avoid having to produce good CMS-SEO.
XML sitemaps at best reinforce the deployment of the other PageRank sculpting tools; in itself this is a very weak tool. For example, be sure to exclude from the XML sitemap the source pages in canonical redirects.

XML sitemaps merely suggest to search engines what pages to index. In large websites that don’t have enough PageRank to convince a search engine to devote the resources to index the entire website, the XML sitemap can guide spidering and indexing to the most important (content-rich, profitable) pages. Many CMSs and ecommerce systems come with an automatic XML sitemap generator which you can click on to activate, but many of these generators don’t obey your robots.txt file and few if any will incorporate your implementation of the other PageRank sculpting tools. In such cases, you have to manually edit your XML sitemap.

The nofollow Meta-tag

Pages linked to by a nofollow page or link are not spidered, unless, as is very often the case, the pages are linked to from any other pages or links without the nofollow tag.
Pages linked to by a nofollow page or link are indexed, unless the only link to the page – from anywhere on the web, in your website, or in your XML sitemap – is in the page containing your nofollow tag.
PageRank is not passed through nofollow pages or links. But again, if any other links without the nofollow tag link to the destination page, that page will receive PageRank from those links.
The PageRank of a page with nofollow tags is “leaked” as though the nofollow tag did not exist. That is, PageRank to be passed on is divided by the number of outbound links, regardless of whether some of the links have nofollow tags. This fact was Matt Cutts’ bomb I referred to at the beginning of this article.

The above four bullet points show just how weak this tool is for PageRank sculpting. Generally, this tool is useful only if coordinated with one or more of the other tools.

Orchestrating PageRank Sculpting Instruments

Choosing and coordinating the best PageRank sculpting tools for your particular website, and avoiding conflicts among the tools, is probably the most demanding and brainy job in SEO. There’s a need for an article or ebook that shows various implementation scenarios and pitfalls. Meanwhile, a good primer on avoiding conflicts is seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts.

The search engines may change some rules or become better at dealing with sites that have weak or faulty PageRank sculpting, but usually such changes will be backward compatible. Therefore, PageRank sculpting, like its parent categories of CMS-SEO and technical SEO, is an investment where you “write once, and profit in perpetuity.”