Drupal is a great system, but one of its flaws is the existence of multiple routes to the same content. This effectively creates duplicate content. Search engines like Google do not like duplicate content and you are likely to be penalised for it.
You can control which content search engines index by using a robots.txt file. The robots.txt file gives you the ability to disallow certain pages from being indexed. Therefore, used correctly, you can prevent a duplicate content penalty.
Drupal 5 comes with a robot.txt file by default. If you are running a version prior to Drupal 5, you will need to add your own robots.txt file. It is fairly straightforward. Open a text editor like notepad. Add the code (an example is shown below). Save the file as "robots.txt". FTP the file to the root folder on your web server.
Here is the default code that comes with Drupal 5's robots.txt file:
I recommend adding the following:
# Files
Disallow: /rss.xml
If you are using URL alias's to create specific URL's, then add:
Does Drupal development make your head explode and drive you crazy?
Why not learn from someone who has paved the way instead?
Sign up to my upcoming learning series.
I am Blair Wadman and this is where I write about Drupal, PHP, CSS etc
© Blair Wadman
2005 - 2011
Quick question, why /node/* (with a following asterisk), whereas /taxonomy/ (without asterisk)?
My robots.txt skills are a bit rusty so I thought I'd ask.
Thanks!
.:Joshua
Good Question Joshua!
It has been a while and I can't remember the reason why I added the * to node and not taxonomy. It does seem illogical. I will do a bit of research and see which was is considered to be the most correct. My feeling at the moment is that it does not matter if you put a asterisk (that is just a wildcard) or not. But I'll see if I can find out for sure.
when I added /node/* then I checked in http://www.sxw.org.uk/computing/robots/check.html (Robots.txt Syntax Checking). I have a report "Unrecognised field. The field Disallow could not be recognised. Whilst the robots.txt standard allows for expansion by the use of undefined fields, it is likely that this line is a mistake in your file". May you explaine that. Thanks..
fajar
thanks for sharing this. This is a serious matter to which I just became aware of. My partner installed Drupal on various sites. The sites have been getting great traffic for a few years, then Wham! Google kills them. Now I find out that I have thousands of useless pages indexed to which do doubt caused the ban. So this is something that HAS to be done even if you have been unaffected for years!
Post new comment