Friday 23 August 2013

Using canonical href to exclude duplicates from your xml sitemap

Here's the problem. Scrutiny is finding the same page on your site twice, each with a different url, and including both in your sitemap.
Duplicates in your xml sitemap may not be such a problem according to Google.

However, the same article explains that Google like to know which version of your url they should rank and which page they shouldn't.

The canonical href is the answer. Here is the explanation from Google, but in short, you need to insert a meta tag like this:


<link rel="canonical" href="http://peacockmedia.co.uk" />


This line means 'this is the url I'd like you to rank for this content'. (The page at the url given should obviously have the same content.)

From version 4.3, Scrutiny will pick up this canonical href. You'll find it listed in the SEO table but the column may be way over to the right, or you may need to switch it on in Preferences > Views. Note (as with any of the columns in Scrutiny)  if you're interested in this column you can move it by dragging:

 You can see in the screenshot above that the index page in this example now has the canonical href. After re-crawling the site, the problem at the start of this article has gone away. Scrutiny's Sitemap tab now only excludes pages where canonical (if present) doesn't match the url of the page. When I export my XML sitemap, only the http://peacockmedia.co.uk  version will be included.

Note that Scrutiny will exclude pages according to canonical href in version 4.3.1 and higher

No comments:

Post a Comment