UK SEO and Internet Marketing Forums
Robots.txt for SMF forum with pretty urls

December 02, 2008, 01:40:05 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: We are back and we've changed software
 
   Home   Help Search Login Register  
Del.icio.us Digg FURL FaceBook Stumble Upon Reddit SlashDot

Pages: [1] 2
  Print  
Author Topic: Robots.txt for SMF forum with pretty urls  (Read 457 times)
ash
Administrator
General
*****

Karma: +7/-0
Offline Offline

Posts: 4462


View Profile WWW
« on: July 02, 2008, 04:50:32 PM »

Would any of you be so kind as to check the forum robots.txt for any errors of omissions?

I was hoping to find a 'default' robots.txt for a SMF forum running the pretty urls mod but couldn't find one anywhere so i quickly knock up my own trying to remove any duplicate pages and others that we wouldn't need indexed.

Have i misses any urls permutations? Any glaring mistakes etc?

Logged

daniboy
Colonel
*****

Karma: +18/-0
Offline Offline

Posts: 1432



View Profile WWW
« Reply #1 on: July 02, 2008, 05:11:29 PM »

User-agent: *
Disallow: /BB/help/
Disallow: /BB/search/
Disallow: /BB/admin/
Disallow: /BB/pm/
Disallow: /BB/logout/
Disallow: /BB/*?*
Disallow: /BB/?action*
Disallow: /BB/*msg*
Disallow: /BB/Themes/


So the /BB/*?* disallows the feeds etc?

I'd ad these three

/BB/activate/
/BB/register/
/BB/login/




and

/BB/profile/ash/  Grin
Logged

Discount Shopping UK for Voucher Codes
Savings on LCD TVs, drinks from The Purveyor and the Wii Fit
ash
Administrator
General
*****

Karma: +7/-0
Offline Offline

Posts: 4462


View Profile WWW
« Reply #2 on: July 02, 2008, 06:30:22 PM »

Yeah the wildcard ? command will stop the rss feeds getting indexed, any stray session ID's etc.

Thanks for those additions i'll add them..
Logged

daniboy
Colonel
*****

Karma: +18/-0
Offline Offline

Posts: 1432



View Profile WWW
« Reply #3 on: July 15, 2008, 01:31:37 PM »

Another one:

/BB?topic

Note no forward slash between BB and ?
Logged

Discount Shopping UK for Voucher Codes
Savings on LCD TVs, drinks from The Purveyor and the Wii Fit
treblesix
Captain
*****

Karma: +5/-0
Offline Offline

Posts: 398



View Profile
« Reply #4 on: July 23, 2008, 12:38:39 PM »

I am sure you have already "killed" the install files?
and reverted any temporary 777 perms back to normal.
Can't remember whether SMF does all that on auto or
nags you.

Just double checking .... not lessons on sucking eggs  Grin
Logged

A nurse a day helps you work, rest and play Shocked))>
crankydave
Global Moderator
Major
*****

Karma: +3/-0
Offline Offline

Posts: 718



View Profile WWW
« Reply #5 on: July 28, 2008, 08:12:38 PM »

There's still a lot of issues with the thread URL's. Keeps plugging in characters. Here's the latest I noticed...

http://www.davidcastle.org/BB/general-search-engine-optimisation-discussion/really-really-good-seo-)/

Dave
Logged

daniboy
Colonel
*****

Karma: +18/-0
Offline Offline

Posts: 1432



View Profile WWW
« Reply #6 on: July 28, 2008, 08:47:48 PM »

We still need to disallow

/BB?topic       - with no forward slash after BB

I saw some print pages indexed aswell, "Disallow: /BB/*?*" is in the robots.txt which I think should solve it, unless they're old caches.
Logged

Discount Shopping UK for Voucher Codes
Savings on LCD TVs, drinks from The Purveyor and the Wii Fit
ash
Administrator
General
*****

Karma: +7/-0
Offline Offline

Posts: 4462


View Profile WWW
« Reply #7 on: July 28, 2008, 10:30:14 PM »

There's still a lot of issues with the thread URL's. Keeps plugging in characters. Here's the latest I noticed...

http://www.davidcastle.org/BB/general-search-engine-optimisation-discussion/really-really-good-seo-)/

Dave

I don't think i can do anything about this. It seems to be a bug with the 'pretty url' mod and its beyond my skills to fix it
Logged

treblesix
Captain
*****

Karma: +5/-0
Offline Offline

Posts: 398



View Profile
« Reply #8 on: July 29, 2008, 11:11:54 AM »

There's still a lot of issues with the thread URL's. Keeps plugging in characters. Here's the latest I noticed...

http://www.davidcastle.org/BB/general-search-engine-optimisation-discussion/really-really-good-seo-)/

Dave

I don't think i can do anything about this. It seems to be a bug with the 'pretty url' mod and its beyond my skills to fix it

Is this any help Huh???

"As for SEO stuff, SMF will be indexed fairly well out of the box, but adding some exclusions to robots.txt will help make search engines focus on just the content of the posts. There is some good info at the official community forums on this. Below is the robots.txt entries I use, mostly based off suggestions.


Code:
Disallow: /forum/*.msg*
Disallow: /forum/*sa=showPosts*
Disallow: /forum/*prev_next*
Disallow: /forum/*action=emailuser*
Disallow: /forum/*action=printpage*
Disallow: /forum/*action=recent*
Disallow: /forum/*action=help*
Disallow: /forum/*action=login*
Disallow: /forum/*action=profile*
Disallow: /forum/*action=register*
Disallow: /forum/*action=search*
Disallow: /forum/*action=stats*
Disallow: /forum/*action=unread*
Disallow: /forum/*action=verificationcode*
Disallow: /forum/*action=who*
Disallow: /forum/Themes/With SMF 2.0, many of these entries provide noindex information in their headers.
__________________

Motoko-chan
Simple Machines Marketing Agent
"

Forgot to add link to the relevant forum
http://www.theadminzone.com/forums/forumdisplay.php?s=cf6a6530880da9306bceec1470f144d2&f=145
« Last Edit: July 29, 2008, 12:48:22 PM by treblesix » Logged

A nurse a day helps you work, rest and play Shocked))>
ash
Administrator
General
*****

Karma: +7/-0
Offline Offline

Posts: 4462


View Profile WWW
« Reply #9 on: August 09, 2008, 12:30:21 PM »

Thanks treblesix but those entries are for a forum without the pretty url mod installed.
Logged

Webnauts
Global Moderator
Captain
*****

Karma: +5/-0
Offline Offline

Posts: 415


Search Editor & Consultant


View Profile WWW
« Reply #10 on: August 09, 2008, 03:22:54 PM »

I had a look at the board robots.txt. There are areas exposed which should not be. I took some hours, crawled and analyzed the board and I created a new robots.txt, which you only need to copy and paste. You do not need anything more there. If someone disagrees, please correct me.

User-agent: Googlebot
Disallow: /BB?topic
Disallow: /BB/*?
Disallow: /BB/*vt
Disallow: /BB/*msg
Disallow: /BB/*/*/*msg
Disallow: /BB/*value=
Disallow: /BB/*javascript
Disallow: /BB/help/
Disallow: /BB/search/
Disallow: /BB/register/
Disallow: /BB/login/
Disallow: /BB/activate/
Disallow: /BB/profile/
Disallow: /BB/stats/
Disallow: /BB/recent/
Disallow: /BB/reminder/
Disallow: /BB/statistics.php
Disallow: /BB/groupcp.php
Disallow: /rss.php

User-Agent: Slurp
Crawl-delay: 300
Disallow: /BB?topic
Disallow: /BB/*?
Disallow: /BB/*vt
Disallow: /BB/*msg
Disallow: /BB/*/*/*msg
Disallow: /BB/*value=
Disallow: /BB/*javascript
Disallow: /BB/help/
Disallow: /BB/search/
Disallow: /BB/register/
Disallow: /BB/login/
Disallow: /BB/activate/
Disallow: /BB/profile/
Disallow: /BB/stats/
Disallow: /BB/recent/
Disallow: /BB/reminder/
Disallow: /BB/statistics.php
Disallow: /BB/groupcp.php
Disallow: /rss.php

User-agent: msnbot
Crawl-delay: 120
Disallow: /BB?topic
Disallow: /BB/*?
Disallow: /BB/*vt
Disallow: /BB/*msg
Disallow: /BB/*/*/*msg
Disallow: /BB/*value=
Disallow: /BB/*javascript
Disallow: /BB/help/
Disallow: /BB/search/
Disallow: /BB/register/
Disallow: /BB/login/
Disallow: /BB/activate/
Disallow: /BB/profile/
Disallow: /BB/stats/
Disallow: /BB/recent/
Disallow: /BB/reminder/
Disallow: /BB/statistics.php
Disallow: /BB/groupcp.php
Disallow: /rss.php

User-agent: *
Disallow: /BB/help/
Disallow: /BB/search/
Disallow: /BB/register/
Disallow: /BB/login/
Disallow: /BB/activate/
Disallow: /BB/profile/
Disallow: /BB/stats/
Disallow: /BB/recent/
Disallow: /BB/reminder/
Disallow: /BB/statistics.php
Disallow: /BB/groupcp.php
Disallow: /rss.php

NOTICE: I retain full copyrights to the robots.txt I published above.

This robots.txt may be reproduced on a web site, CD-ROM, e-zine, book, magazine, etc. so long as permission is first received from me, and in the robots.txt first line, after the User-agent: Googleblot should read Allow: /BB/profile/Webnauts

If the reproduction is by electronic media, a link back to my web site should be included.

For the moment, you already have my permission. Play fair. Cheesy

« Last Edit: August 12, 2008, 02:57:37 PM by Webnauts » Logged

Webnauts
Global Moderator
Captain
*****

Karma: +5/-0
Offline Offline

Posts: 415


Search Editor & Consultant


View Profile WWW
« Reply #11 on: August 09, 2008, 03:55:48 PM »

There's still a lot of issues with the thread URL's. Keeps plugging in characters. Here's the latest I noticed...

http://www.davidcastle.org/BB/general-search-engine-optimisation-discussion/really-really-good-seo-)/

Dave

I don't think i can do anything about this. It seems to be a bug with the 'pretty url' mod and its beyond my skills to fix it
Ash it is not beyond your skills to fix it.   Cheesy

The problem there is that the OP had this title: "Really, Really Good SEO : )" If you edit the thread title taking that smiley out, the URL will look good. Wink
« Last Edit: August 09, 2008, 03:57:38 PM by Webnauts » Logged

Webnauts
Global Moderator
Captain
*****

Karma: +5/-0
Offline Offline

Posts: 415


Search Editor & Consultant


View Profile WWW
« Reply #12 on: August 09, 2008, 04:37:32 PM »

We still need to disallow

/BB?topic       - with no forward slash after BB

I saw some print pages indexed aswell, "Disallow: /BB/*?*" is in the robots.txt which I think should solve it, unless they're old caches.
Have you seen these issue being logged in? If not can you share some example URLs?
Logged

Webnauts
Global Moderator
Captain
*****

Karma: +5/-0
Offline Offline

Posts: 415


Search Editor & Consultant


View Profile WWW
« Reply #13 on: August 09, 2008, 06:22:48 PM »

An extended SEO Advise:

We can boost the relevancy and rankings of the core key terms of the board, if we would disallow the Break and Rant Rooms in the robots.txt too;

Disallow: /BB/break-room/
Disallow: /BB/the-'rant'-room/
Logged

daniboy
Colonel
*****

Karma: +18/-0
Offline Offline

Posts: 1432



View Profile WWW
« Reply #14 on: August 10, 2008, 09:25:56 PM »

We still need to disallow

/BB?topic       - with no forward slash after BB

I saw some print pages indexed aswell, "Disallow: /BB/*?*" is in the robots.txt which I think should solve it, unless they're old caches.
Have you seen these issue being logged in? If not can you share some example URLs?

I've seen the print pages cached in Google, but about a month ago. It seemed that was all (and profiles) that were wanting to be indexed at the time.

/BB?topic was coming up through stumble buttons. In essence stumble were giving 3 different urls for the same post depending where you were to the stumble button from.
Logged

Discount Shopping UK for Voucher Codes
Savings on LCD TVs, drinks from The Purveyor and the Wii Fit
Pages: [1] 2
  Print  
 
Jump to:  

Powered by SMF | SMF © 2006-2008, Simple Machines LLC | Sitemap Valid XHTML 1.0! Valid CSS!


Google visited last this page November 24, 2008, 03:38:58 PM