CMS & Database SEO Guide: Part 2: Site-wide URL and Code Issues
by Rob Laporte
Visibility Magazine – March 2008
(original article reprinted below)

CMS (content management system) & database SEO entails programming your CMS to pull keywords and phrases from your product or information database into all web-page parts relevant to SEO, except for body copy. The first installment of this two-part article, published in the December edition of Visibility, situated CMS SEO within the steps of a complete SEO job, discussed database and site design issues, and explained per-page rules of CMS SEO. This second installment conveys an intellectual attitude helpful in solving CMS SEO problems, and it addresses such site-wide CMS SEO issues as Ajax, IFrames, breadcrumb trails, product sorting systems, and much more.

The December installment concluded: “Programming your CMS to pull keywords into your pages may require a week of skilled labor, but you end up with an intelligent SEO machine that works for you in perpetuity. True, the resulting SEO is not as good as careful manual work – which should be done in addition – but the ROI escalates over months and years because your SEO costs in this area drop to zero.”

At the risk of sounding self-serving, I suggest that, unless you have a long history of research, practice, measurement, and reflection, you should consult with an experienced SEO professional when dealing with site-wide CMS SEO, though this article will resolve many common problems for you. One or two wrong decisions in this area can do serious damage or prevent serious profit.

How to Think about CMS SEO

Page-level CMS SEO can be summarized with brief rules of thumb, but site-wide code and platform issues in SEO are so varied and often so complex and contingent on context that we should begin by establishing a general perspective that will help you to think your way towards solutions to unique and future problems. In all of your SEO work, imagine that the search engines are asking you to be a good librarian, and are providing database rules for classification and taxonomy. Your job is to play by those rules. Deviation from the rules may work but at the cost of increased risk. Moreover, black-hat tactics usually take as much time and/or money to implement as do the more enduring, low-risk, white-hat tactics. In any one case, shady tactics may be countervailed by legitimate tactics, thus creating the illusion that the shady tactics worked, when in fact they hurt a little, and may suddenly hurt a lot more in the future. If you’re a good “librarian” who uses legitimate SEO conservatively, your SEO investment will survive the likes of Google dances and pull in years of handsome returns.

For example, I’ve seen successful use of the <noframes> tag for stuffing links to spider food pages on a site without frames, but why risk it when you can accomplish the same ends using legitimate tactics with little or no additional cost? A DHTML link, when clicked, can instantly reveal more good text below the link, which is a useful design tactic, but since this tactic breaks the rule of making all text visible to both search engines and people, think twice about using it. This DHTML example illustrates another general principle of SEO that has existed since 1995: sometimes you have to compromise design to obey rules that emerged rather arbitrarily from the technical constraints of search engines. One could insist that a design tactic, like a Flash interface, should be allowed, or one could compromise a little by using an SEO-friendly alternative that earns more profit in the end. Having said all this, my firm recently did use DHTML to reveal more text upon a user’s click because it really was the best usability solution for that page. If that client’s SEO results prove questionable, we may remove the DHTML – which means that there is a present cost, in the form of a risk of future cost, but I deemed the present cost worth assuming in this one case. The golden rule of SEO is: Do anything that might help, and nothing that might hurt, within the practical constraints of good design. If your site has good but not great SEO design or has some SEO-unfriendly code, it may be impractical – that is, too expensive or untimely — to change now. Again, I advise consulting with a qualified professional, sometimes even if you are a qualified professional, to help you decide what SEO risks are worth assuming in the interest of good design.

Are XML Sitemaps a Magic Bullet?

No, they aren’t. They are a must but are not sufficient. They can be buggy because of the search engines or your XML programmer, and protocols change. Unlike proactive CMS SEO, they don’t boost the rank of pages already in an index, or at least not by much. They do not prevent damage from many CMS SEO errors.

See http://www.SiteMaps.org, http://www.google.com/webmasters/, http://help.live.com

A Quick List of No-Nos

  • Don’t neglect robots.txt protocols. See http://www.robotstxt.org and http://en.wikipedia.org/wiki/Robots.txt.
  • In the <head>, don’t use the “revisit-after” or

    tags, unless you want to raise a little flag that tells the search engines, “hey, I don’t know what I’m doing, and here and maybe elsewhere I’m trying to manipulate you deviously.” Follow the rules: http://www.robotstxt.org and http://en.wikipedia.org/wiki/Robots.txt.

  • Frames: Just say no. They are counter to both good usability and good SEO, and they rarely if ever offer advantages not available with better design tactics. Since you should not use frames, you should not use the <noframes> tag under any circumstance.
  • IFrames can be useful in design, but with regard to SEO be advised that the engines ascribe the content of the IFrame and any links within that content to the external page being invoked, not to the site employing the IFrame.
  • Session IDs: Fuggetaboutit. They assign unique URLs to each session, creating an infinite number of “pages” for the spiders to either choke on or deem duplicate. A surprising number of sites still use this obsolete method to maintain and track user sessions, but most sites use and should use cookies. One recent reliable study posited that 1% of browsers block first party cookies, while controversy surrounds reports of higher percentages.
  • Google explains: “Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.”

SEO’d URLS

The first consideration in SEO’d URLs is the naming of files and folders. File and folder names provide further opportunities to bolster the SEO keywording in your site and in links to your inner pages. Files include html pages, images, videos, and any other object embedded in or linked from a page. The file naming rules apply to small, static sites as well as large, database-driven sites. File names should have up to three words, with the keywords separated by dashes, not underscores. The same goes for folder or directory names, though try to limit folder names to one or two words. Evidence has emerged in the professional forums that Google may assign a spam flag (which is less than a penalty but could become one) to long, multi-worded folder and page names. The phrases you pick should always pertain to the file or folder, while also adhering to an organizational scheme that facilitates site management. Ideally, your CMS automatically assigns product names to product images, with dashes instead of spaces between descriptive keywords. On a scale of 1 to 3, with 1 the most important, the importance of renaming files and folder is about 2.

Renaming URLs may require the mod_rewrite module for Linux/Apache or the likes of IISAPI Rewrite for Windows/.NET. Many CMSs produce long parameter strings in the URL, usually after the “?” character, and though the search engines have gotten better at indexing two, maybe three, parameters, you must rename such URLs to contain what looks like a directory path with keywords (for example, DomainName.com/bedroom-lighting/children/brass-lamps.php). This renaming not only helps SEO directly, but also tends to make incoming links to inner pages contain keywords, which in turn makes PageRank boost SEO further. Moreover, such renaming can help some intra-site search modules return more accurate results.

If it is impractical to do this substantial CMS programming now, consider that a programming guru at the Search Engine Strategies conference series has been saying for years that there is never a need, no matter how involved the database queries, for more than two URL parameters. Meanwhile, SEO is helped by pulling certain pages out from a recalcitrant CMS and making them static. These pages should be infrequently changed and keyword-rich, like the About Us page, general information or how-to pages, and perhaps some key category pages.

Your URL structure should be as consistent and stable as possible, so that incoming links remain valid. If the URLs do change while remaining spiderable, eventually the search engines will include the new URLs, but there would be a delay of several days to weeks, during which time organic search engine referrals will drop and 404 pages will be delivered when searchers click from the search engines.

Over the years, SEO pros have debated the value of a flat directory structure. Without getting into the many pros and cons, it’s safe to say that you shouldn’t go to either extreme. That is, don’t have all pages on the root after DomainName.com/, and don’t have directories buried more than, say, four folders deep.

Your JavaScript Brew: Ajax, “Degradable Ajax,” Hijax, and an Easy Way Out

The first simple rule is to offload your JavaScript to an external .js file on the server. This has both code and SEO benefits.

The topic of JavaScript quickly becomes detailed and technical. In general, search engines don’t execute JavaScript, so that navigation menus using JavaScript (e.g., for menu roll-outs) won’t be spidered and wont circulate and amplify PageRank. There have long been CSS solutions to this problem. Space precludes detailing the code, but the tactic involves “externalizing” the hrefs, as is well described here: www.textlinkbrokers.com/blog/more/A180_0_1_0_M/.

Ajax (Wikipedia.com defines) is generally inimical to SEO. As with Flash, you should make plenty of your site and all of your navigation not require this technology. A good primer that explains how to do this and how to use what some call “Hijax” to produce “degradable Ajax” and thus do less harm to SEO, is found here: www.softwaredeveloper.com/features/google-ajax-play-nice-061907.

Now for an easier way out. In the navigation menus of larger sites, you can unite excellent usability and SEO in several ways that do not require special JavaScript, CSS, and Ajax/Hijax coding. You can make a category page’s submenus appear only when a viewer is in that category. Examples of this solution are found in http://www.StacksAndStacks.com, http://www.Gene.com, and http://www.DickBlick.com. These solutions, especially DickBlick’s, not only avoid JavaScript roll-out menus that impede SEO, but they also boost SEO and circulate PageRank.

Finally, use an old-fashioned, non-XML sitemap that viewers can use, and, where fitting, a text-link menu to main categories on the bottom of pages. These solutions are not sufficient for large sites, but they help, and usually they are sufficient for smaller sites.

DHTML – Don’t Hide Text; Make Legitimate

The section immediately above and the first section, “How to Think about CMS SEO,” address this topic. (DHTML uses JavaScript and CSS.)

Leave One Breadcrumb Trail for Search Engines

Breadcrumb trails, which greatly improve the usability of larger sites, should use href links and not cause the search engines to index more than one URL path to any page. The one-URL-path rule is a lot easier said than done. Implementing this rule with precision is essential to prevent the search engines from indexing duplicate pages or punishing duplicate content or simply giving up on indexing because of multiple breadcrumb paths seeming to exponentiate the number of actual pages. In the section below I touch on solutions that apply here too.

Straiten Salamandrine Sorting Systems

Sites that present a large selection of products with multiple styles, sizes, or other attributes can greatly enhance usability and findability via sorting functions that winnow down the list of products according to the order that a user clicks on the attributes. For example, in a landscape lighting category a user can click a price range, then a finish, then a brand. And the user can sort by those three attributes in a different order of clicks. As with the breadcrumb trail, this system can create an almost limitless number of “pages” which, though not duplicate, can exhaust the spiders long before all the product pages are indexed. (A “product” could also be videos, articles, or anything in a large, sortable database.) This problem can be amplified by << Previous 1 – 2 – 3 – n Next >> menus within a sorted page. This is a problem for which there is no brief explanation of a solution. Suffice it to say that you need to apply a masterful combination of robots.txt, noindex and nofollow meta-tags, and XML Sitemap feeds.

CMS SEO means programming once and profiting many, many times.