Search Engine sitemaps
Sitemaps (or otherwise known as Google sitemaps) provide a way of informing search engines of what pages exist within your site. This helps them to determine which pages should be being indexed rather than just relying on crawling through your site trying to find pages themselves. At time of writing, only the three of the major search engines were supporting the sitemap protocol.
Sitemap format
The sitemap will contain a xml section for each page that is present within the website. Each section will take the following format:
<url>
<loc>http://www.website.co.uk/a_page.html</loc>
<lastmod>2008-07-25T10:02:16+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>1</priority>
</url>
Loc - This defines the page location.
lastmod - This defines when the page was last updated.
changefeq - This defines how often you expect the page to modified or updated.
priority - This value places a rank of importance of the page. Search engines will use it as advice but not an explicit instruction. Ranks can range from 0.1 through to 1.0 with 1.0 being most important.
It is important to ensure that all of the above <url> sections are surrounded by a set of urlset tags, for example:
<urlset>
<url>
<loc>http://www.website.co.uk/a_page.html</loc>
<lastmod>2008-07-25T10:02:16+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.6</priority>
</url>
<url>
<loc>http://www.website.co.uk/another_page.html</loc>
<lastmod>2008-07-25T10:02:16+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
</urlset>
For a full example, please feel free to look at Guidetotheweb.co.uk's search engine sitemap.
Generating Sitemaps
There are a number of ways to generate a sitemap for your website:
Hand Edited - You can create the file by hand using a tool such as Notepad or other text editor. This will allow you to ensure that the file has everything you wish but will provide to be timeconsuming.
Googles Sitemap Generator - Google provide a tool which you can download and install on your server which allows for the generation of the sitemap files. However, it is quite a technical program involving you making changes to some configuration files and then runnig the file. You will also need Shell access to enable yourself access to the file and ensure that your server is running python. Not the easiest method for non-technical people.
Other automated tools - There are plenty of other tools available for you to use varying from seperate websites offering this functionality (for example, http://www.xml-sitemaps.com) through to plugins if you use specific CMS systems or blogging tools (such as Word Press).
Notifying the Search Engines
Once you have generated your sitemap, you will need to inform the engines that it exists and where it lives. There are a number of ways of achieving this. Firstly you could visit each search engine and explicitly tell them where the file is located however, there is a universal method which most search engines support. This involved making an amendment to the robots.txt file to notify search engines when they visit as to the location of your site map.
The robots.txt file will need to have the following line inserted into it:
Sitemap: http://www.yourwebsite.co.uk/sitemap.xml.gz
Make sure you leave a blank line both before and after the entry. You can either reference the standard sitemap.xml file or the Gzipped version (sitemap.xml.gz). You don't really need the sitemap.xml.gz file unless your site is very big with a lot of pages.