47 Comments
- Ninjamonk, on 10/12/2007, -2/+13now if they'd only agree a standard for widgets/gadgets then the world would be a truely greater place :p
- DenTPuzz, on 10/12/2007, -6/+17I'll give them about 6 months to try to 'extend' the standard to fit their products
- Sonic_Molson, on 10/12/2007, -1/+11http://www.sitemaps.org/
- cuzican, on 10/12/2007, -0/+9Glad to see this happening from the standards based perspective and legality perspective since site owners often complain about what is being indexed & cached by the likes of google.
- dwight0, on 10/12/2007, -0/+7Too bad their sample wont work, its missing a close tag (URL).
http://www.sitemaps.org/protocol.html - mousy, on 10/12/2007, -1/+7Will be interesting, once they are using the same protocol we can see how differently they rank certain sites.
- uptown, on 10/12/2007, -1/+6What's your site URL?
- noodlez, on 10/12/2007, -1/+6sitemaps don't have an effect on page rank, per se.
it just lets the crawler know when and where to look, instead of having to rely on links to reach remote sections of your page.
so, there shouldn't really be a notable change in the engines' outputs. how they rank sites now should be how they rank sites after this is implemented (if it isn't already). - FredSanford, on 10/12/2007, -0/+4> All the "sitemaps" are for is to ensure a poorly made, un-navigatable
> site can be indexed by holding a spiders' hand.
Not true. Wikis, for example, present a unique challenge. Especially if the spider crawls every link on a page, including talk, edit, history, etc. - ludwik, on 10/12/2007, -1/+4But you still have to send the sitemap to each search engine separately. That's not good. This should work automatically, like robots.txt mechanism.
- rocjoe71, on 10/12/2007, -1/+4Just add a robots.txt file to instruct any robots to NOT index the folder where your sitemap file lives.
- theonlyvlad, on 10/12/2007, -2/+5what about the open search standard in IE7 that they adopted before all other browsers?
- inactive, on 10/12/2007, -1/+4Those locations can and are blocked by robots.txt:
http://en.wikipedia.org/robots.txt
Omitting them from a sitemap file is not going to stop search engines indexing them if they find a reference to them on a page.
PS - sorry if the truth hurts kids but Google created sitemaps because kids created such poorly engineered sites. Rather then waste their time and money on catering to badly structured sites Google is using your time to make your crap work irrelevant to the indexing process. - willcode4beer, on 10/12/2007, -0/+3there's not much to a sitemap.xml file. it could probably benefit from some extending.
Besides, its XML, you can extend it by using another schema. XML by nature is extandable (or extensible)
See "Extending the Sitemaps protocol"
http://www.sitemaps.org/protocol.html#extending - willcode4beer, on 10/12/2007, -0/+3"and I don't want that xml file public."
Umm, then don't put it on a public web server? - inactive, on 10/12/2007, -0/+3You complain about your oh-so-precious database of cross-referenced artists which is not stored in the sitemaps file, because the sitemaps file doesn't have any method of defining a relationship between pages, artists or anything else.
Elaberate on how you managed to convert the url, last change date, update frequency and some value for priority into a relational database of artists.
Someone can easily scrape your site and recreate your database without your sitemaps file. Essentially you get the homepage, save it, stuff internal urls into an array, continue until you reach the end of the array. I would be surprised if scrapers even bother changing to use sitemaps since it's widely unused. - willcode4beer, on 10/12/2007, -0/+2huh?
its just an xml file on a public web server.
So, security doesn't apply, its PUBLIC.
Reliability is based on the reliability of your server. - firehydra2k, on 10/12/2007, -11/+13That's surprising. Microsoft never really complies with any standards at all...
- bikegriffith, on 10/12/2007, -0/+2Anyone else notice that their sample isn't even valid XML? (missing end url tag)
- rocjoe71, on 10/12/2007, -1/+3It doesn't matter which one you use, robots.txt or sitemaps, each search engine still has to know that your website exists before it will crawl your site. That is, both methods are passive so neither is really automatic.
- mousy, on 10/12/2007, -0/+2@fkr2 - Yes that results in the same pages getting checked out by the "big" three. So they have the same data to give us better search results. So we see who does that best.
- robertDouglass, on 10/12/2007, -0/+2A Drupal module that lets you submit sitemaps to Google can be found here:
http://drupal.org/project/gsitemap
I don't know how well it conforms to the standards set on sitemaps.org. There is also a discussion on the topic evolving in the SEO Drupal group at:
http://groups.drupal.org/search-engine-optimization
Cool development! - inactive, on 10/12/2007, -2/+4I think you're talking out your ass. A sitemaps file isn't sensitive data. The only property that's not publicly accessible is what you set for the priority.
http://www.sitemaps.org/protocol.html
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</urlset>
Where's the cross referencing? Huh? Where? WHERE? Someone can scrape your site with or without a sitemaps file.
Paranoid retard. - armbar, on 10/12/2007, -1/+2Why are people modding down fkr2's second comment? He's right. Sitemaps are just used to supplement a spider's existing knowledge of the site, and won't prevent them from indexing the rest of the site like a robots.txt file (theoretically) will.
- sherifgmansour, on 10/12/2007, -0/+1Thats great, but its still a shame robots.txt is not a recognized standard
http://blog.sherifmansour.com/?p=16 - clinko, on 10/12/2007, -3/+4"People who use Google Sitemaps don’t need to change anything, those maps will now be indexed by Yahoo and Microsoft. "
So is google is giving away the url of my sitemap? What if I don't want to give it to anyone?
I've worked for years to create my db of 100,000+ cross-referenced artists, and I don't want that xml file public.
Personally, I don't mind giving it to msft and yahoo. I just mind sites like FreeRingtoneScamArtistsGalore.com having the ability to get their hands on it. - parislemon, on 10/12/2007, -0/+1http://digg.com/tech_news/Search_engines_united_Yahoo_and_Microsoft_joining_Google_s_Sitemap
- elix3r, on 10/12/2007, -0/+1This is awesome news!!
- Grimboy, on 10/12/2007, -0/+1@fkr2: Why block talk pages? Just give them less relevancy in the sitemap.xml file. That way if a talk page has what the searcher is interested in it won't be completely eliminated, but real wiki pages will be prioritised. That's one of the chief reasons for sitemaps.
Also, I don't see why a site that's hard to navigate (for humans) should stop spiders. As long as they use anchor tags to link to other pages then there's no reason for the spider to not index all of the pages to not index it. The reason for sitemaps is to give search engines information about different pages on your site. Discovery of all of the pages in a website is a less important function of sitemaps. - picaman, on 10/12/2007, -0/+1And for WordPress as well:
http://www.arnebrachhold.de/2005/06/05/google-sitemaps-generator-v2-final - toprank, on 10/12/2007, -0/+1Here is a video of Vanessa Fox from Google and Tim Mayer from Yahoo giving a pre-release explanation of sitemaps.org during this week's WebmasterWorld Pubcon conference in Las Vegas: http://videos.webpronews.com/2006/11/16/yahoo-and-google-collaborate-on-search/
- Grimboy, on 10/12/2007, -0/+1And django: http://www.djangoproject.com/documentation/sitemaps/
- Jack9, on 10/12/2007, -0/+1Which Microsoft will quickly abandon or add to without support from the other "parties" and declare it the new standard. Welcome to another attempt by MS to define what is "Standard"
- aaronjay, on 10/12/2007, -0/+1From Yahoo itself: http://digg.com/tech_news/Yahoo_Google_and_Microsoft_join_forces_really
- xmilky, on 10/12/2007, -0/+0Sounds like an anti-competetive cartel to me. By recommending that people use this /ping?sitemap=... mechnism described in their FAQ, the big three (Microsoft, Google, Yahoo) ensure that smaller search engines will fall off in being up to date.
Neither 'bloggers' nor automated scripts / CMSs will take the time to seriously "ping" 500 search engines (I guess not even Ask.com will receive it) whenever a page is added or updated.
Guilefully achieved advantage for the three fat ones. And I guess webmasters will buy in. I just hope it has the same "success" as nofollow. - inactive, on 10/12/2007, -0/+0Sitemaps don't give or define relevancy of any sort to any search engine. The closest they come is the "priority" which is a self-made measure of how often that page should be reindexed.
Sites can be hard to navigate due to poorly written code, poorly structured urls, redundant duplicate/triplicate/etc pages and inconsistent urls. - inactive, on 10/12/2007, -0/+0Nothing to stop smaller search engines using them too. Sitemaps don't actually do anything except supplement what a spider finds when crawling the site anyway which is hardly competitive.
- hongxiaowan, on 10/12/2007, -0/+0http://www.sitebases.org
Sitebases, the next protocol of Sitemaps. - Nanobe, on 10/12/2007, -1/+0@theonlyvlad
There was a standard before OpenSearch called Sherlock, created by Apple and long supported by Mozilla/Firefox. - Microdot, on 10/12/2007, -3/+2im sure m$ will try to force their own views on this... but good to see something happening at least.
- inactive, on 10/12/2007, -3/+3It's XML ( http://www.sitemaps.org/protocol.html ), there's no reason why it couldn't be extended without breaking compatibility.
RSS has evolved and I don't hear anyone complaining. - rhettnyedotorg, on 10/12/2007, -2/+1I too am pro-standards.
I can't help though, but wonder if they also discussed price fixing their way to continued search engine market dominance? - clinko, on 10/12/2007, -2/+1@fkr2
Read what I wrote this time. Try replying again. - itisme, on 10/12/2007, -3/+1I mostly welcome our new robot spider overlords
however much as those three companies foster a sense of cuddly friendliness I'd like to know the w3c/berners lee et al opinion on this! - ehudokai, on 10/12/2007, -3/+1In other news, Microsoft announced today that it has discovered a wonderful way to enhance the security and reliability of the sitemap protocol, and will be implementing these new enhancements. They claim it will be fully compatible with all systems.
Minimum Requirements:
Windows Server 2003
IIS
SQL Server 2005
Some other ridiculously outrageous thing to put more money in their pocket... - inactive, on 10/12/2007, -6/+2@ mousy - as opposed to now where the big 3 index your sites and rank your pages ....
All the "sitemaps" are for is to ensure a poorly made, un-navigatable site can be indexed by holding a spiders' hand.
I'm glad they're cooperating on it but this is hardly going to change anyone's life. The spiders g/m/y use are sophisticated programs that have no problem indexing most sites, and log files have always told you about missing files etc. - Crepsley, on 10/12/2007, -7/+0Yea I know, that is a bit odd


What is Digg?
Browsing Digg on your phone just got easier with our enhancements to the