Articles
Pluginlab Articles

Site Maps: Frequently Asked Questions

Author: admin Posted on February 2nd, 2008

What is a site map?

A site map is a schematic representation of a web site showing home page, zone pages and other constituent parts of a web site and the relationship between the various parts including, specifically, links between pages.

Why are site maps so important?

For human beings (site visitors) a site map enables visitors to navigate to a specific zone or page quickly. Instead of having to back out to the home page or access the navigation bar, visitors can click on the site map link and have available all pages with links to each one.

The second reason site maps are so important is because search engine spiders use site maps as an index. Spiders follow links and a site map is a series of links. A well-designed, properly formatted site map is one way to assure that your site is completely and accurately indexed by search engine spiders. A site map also indicates the scope of your site, including drill-down screens that might be missed by a spider the first or second time through crawling the site.

What is Pluginlab Site Maps?

Pluginlab Site Maps is script generator that allows designers and webmasters to create a simple, schematic image of the site's site map. Then, the site map generator formats site maps for all W3C XHTML and CSS issues of compliance.

Pluginlab Site Maps will also develop- site maps that are submitted to the major search engines, each of which requires a different site map format for accurate indexing. Google uses XML file formats, Yahoo uses HTML, and other engines use ROR format.

Now, this doesn't mean that you can't submit the same site map to every search engine. However, unless properly formatted for that specific search engine guidelines, you're likely to run into some difficulties. At the very least, your site will be assessed and classified differently by different search engines based on varying formats employed by each.

A site generator takes the guess work out of web site submission to search engines, making it a particularly useful tool if you've never created or submitted a single site map, much less a few dozen to map out the terrain on your various sites.

If you don't know how to develop a site map in XML format and in compliance with W3C standards, a Pluginlab Site Maps will save you hours of times and endless frustration. Highly recommend for inexperienced and experienced web masters.

Who uses Pluginlab Site Maps?

Busy site designers and SEO professionals, among others. The development of a site map using a sitemap generator greatly cuts down on hand coding time, which is very expensive time.

At the other end of the spectrum, new site owners who recognize the value of site map submission but don't understand W3C open standards also employ site map generators to ensure that every site map is properly formatted, fully compliant with web-wide protocols and optimized for that specific search engine based on various search engine site map criteria.

What happens if I submit a site map that isn't formatted for that specific search engine?

Chances are, nothing. Most likely the search engine will ignore the submission and ship it off to the trash bin as unreadable. In this case, you'll receive a notice that there were problems with your recent submission and, in most cases, the search engine will provide a diagnosis of how to solve the problem.

It could be worse. Your site could be partially indexed or miss-indexed - two serious problems. A partially indexed site won't be given the same level of relevance in the SERPs and, because only some of the site pages have been indexed the search engine may not know what your site is about.

This, and other things, can lead to having a site miss-indexed - placed in the wrong category. Each search engine employs a proprietary taxonomy - a system for sorting large numbers of things. Like websites.

If your sitemap submission is miss-indexed or bounced back to you with an error message, you'll have to request a new review of your site with a correctly formatted, optimized site map. The problem is, if you didn't get it right the first time, how can you be sure you get it right the second time.

Each day you remain un-indexed is a day of business lost.

Will search engines accept automated site map submissions?

Yes. Search engines won't accept some software-driven submissions. URLs, for example have to be submitted by a human. No automated submissions of URLs is allowed.

However, search engines recognize that the submission of a site map is an "invitation" to spider, making the search engine's job easier.

How specific do I have to be when identifying a URL in a sitemap?

The greater the specificity the more likely the sitemap will be indexed. Fully identify the location of the URL sitemap as follows:

http://www.yourwebsitename.com/

Note that routine protocols are followed with the addition of the http (or https) identifier. The trailing slash is required by some web servers, especially those using shared hosting wherein over a thousand websites may be contained on a single hardware server.

Should I also list the https protocol if my site has a secure zone or checkout?

No, you should only submit one http address per site or zone. If you submit more than one address, e.g. http and https, spiders may not be able to crawl your entire site. Only some of the pages of your site will be indexed in this case.

How does the positioning of a URL in a sitemap affect its use?

Search engines use a number of criteria in assessing the importance of site information including positioning, the assumption being that the more prominent the positioning the more valuable the information.

To an extent this is true. <h6> headings are still assigned more "value" than <h1> headings simply because they're bigger. However, search engines don't assign value to a URL within a site map regardless of where it appears.

Where does the sitemap go in my HTML code?

The site map should appear in the root directory of your host server, assuming you have root access. So, the site map would appear as http://yoursitename.com/sitemap.xml.

You can also place different sitemaps for different paths within the URL. For example, some parts of a site may be restricted to certain viewers for security reasons. In this case, you would upload a site map using an address such as http://yoursitename.com/path. You can develop site maps for various zones of a large site to better equip spiders to crawl the entire site.

It's important to note that all URLs within a site map must be listed on the same host site map, rather than in a sub-domain. For example, if your site map is located at http://yoursitename.com/sitemap.xml, the site map can not contain URLs and other meta data from http://subdomain.yoursitename.com. In other words, to be read successfully by a search engine spider, the site map must reside in the same xml folder as the host site map. Sub-domains would require a separate sitemap if appropriate or applicable which isn't always the case.

Is there a size limit to a site map?

There is. A sitemap should be less than 10MB and shouldn't contain any more than 50,000 URLs. This ensures servers don't get clogged with bandwidth traffic jams.

You can build larger site maps but they must be constructed in pieces following the limits set above. All of your site maps can be placed in a single XML site map file for easier access by SE spiders.

Should I zip sitemaps before submission?

Yes, you should. It makes transmission easier and the compressed file will still be fully indexed.

How do I submit new pages to a search engine when my site contains millions of URLs?

The easiest way to address this common problem is to separate sites that change frequently from sites that are static. Maps for static sites can be placed in the XML site map file. In this way, spiders will crawl those sites that change frequently (as often as you like) while skipping over static sites that don't change frequently. However, when you do change content or redesign a site, move it to the lastmod Tag to ensure the changes are picked up and indexed.