techcrunch.com— In an encouraging act of collaboration, Google, Yahoo and Microsoft announced tonight that they will all begin using the same Sitemaps protocol to index sites around the web.
Nov 16, 2006View in Crawl 4
You complain about your oh-so-precious database of cross-referenced artists which is not stored in the sitemaps file, because the sitemaps file doesn't have any method of defining a relationship between pages, artists or anything else.Elaberate on how you managed to convert the url, last change date, update frequency and some value for priority into a relational database of artists.Someone can easily scrape your site and recreate your database without your sitemaps file. Essentially you get the homepage, save it, stuff internal urls into an array, continue until you reach the end of the array. I would be surprised if scrapers even bother changing to use sitemaps since it's widely unused.
@fkr2: Why block talk pages? Just give them less relevancy in the sitemap.xml file. That way if a talk page has what the searcher is interested in it won't be completely eliminated, but real wiki pages will be prioritised. That's one of the chief reasons for sitemaps.Also, I don't see why a site that's hard to navigate (for humans) should stop spiders. As long as they use anchor tags to link to other pages then there's no reason for the spider to not index all of the pages to not index it. The reason for sitemaps is to give search engines information about different pages on your site. Discovery of all of the pages in a website is a less important function of sitemaps.
Which Microsoft will quickly abandon or add to without support from the other "parties" and declare it the new standard. Welcome to another attempt by MS to define what is "Standard"
Sitemaps don't give or define relevancy of any sort to any search engine. The closest they come is the "priority" which is a self-made measure of how often that page should be reindexed.Sites can be hard to navigate due to poorly written code, poorly structured urls, redundant duplicate/triplicate/etc pages and inconsistent urls.
Sounds like an anti-competetive cartel to me. By recommending that people use this /ping?sitemap=... mechnism described in their FAQ, the big three (Microsoft, Google, Yahoo) ensure that smaller search engines will fall off in being up to date.Neither 'bloggers' nor automated scripts / CMSs will take the time to seriously "ping" 500 search engines (I guess not even Ask.com will receive it) whenever a page is added or updated.Guilefully achieved advantage for the three fat ones. And I guess webmasters will buy in. I just hope it has the same "success" as nofollow.
Nothing to stop smaller search engines using them too. Sitemaps don't actually do anything except supplement what a spider finds when crawling the site anyway which is hardly competitive.
Closed AccountNov 16, 2006
You complain about your oh-so-precious database of cross-referenced artists which is not stored in the sitemaps file, because the sitemaps file doesn't have any method of defining a relationship between pages, artists or anything else.Elaberate on how you managed to convert the url, last change date, update frequency and some value for priority into a relational database of artists.Someone can easily scrape your site and recreate your database without your sitemaps file. Essentially you get the homepage, save it, stuff internal urls into an array, continue until you reach the end of the array. I would be surprised if scrapers even bother changing to use sitemaps since it's widely unused.
nanobeNov 16, 2006
@theonlyvladThere was a standard before OpenSearch called Sherlock, created by Apple and long supported by Mozilla/Firefox.
grimboyNov 16, 2006
@fkr2: Why block talk pages? Just give them less relevancy in the sitemap.xml file. That way if a talk page has what the searcher is interested in it won't be completely eliminated, but real wiki pages will be prioritised. That's one of the chief reasons for sitemaps.Also, I don't see why a site that's hard to navigate (for humans) should stop spiders. As long as they use anchor tags to link to other pages then there's no reason for the spider to not index all of the pages to not index it. The reason for sitemaps is to give search engines information about different pages on your site. Discovery of all of the pages in a website is a less important function of sitemaps.
jack9Nov 16, 2006
Which Microsoft will quickly abandon or add to without support from the other "parties" and declare it the new standard. Welcome to another attempt by MS to define what is "Standard"
Closed AccountNov 16, 2006
Sitemaps don't give or define relevancy of any sort to any search engine. The closest they come is the "priority" which is a self-made measure of how often that page should be reindexed.Sites can be hard to navigate due to poorly written code, poorly structured urls, redundant duplicate/triplicate/etc pages and inconsistent urls.
xmilkyNov 16, 2006
Sounds like an anti-competetive cartel to me. By recommending that people use this /ping?sitemap=... mechnism described in their FAQ, the big three (Microsoft, Google, Yahoo) ensure that smaller search engines will fall off in being up to date.Neither 'bloggers' nor automated scripts / CMSs will take the time to seriously "ping" 500 search engines (I guess not even Ask.com will receive it) whenever a page is added or updated.Guilefully achieved advantage for the three fat ones. And I guess webmasters will buy in. I just hope it has the same "success" as nofollow.
Closed AccountNov 17, 2006
Nothing to stop smaller search engines using them too. Sitemaps don't actually do anything except supplement what a spider finds when crawling the site anyway which is hardly competitive.
toprankNov 17, 2006
Here is a video of Vanessa Fox from Google and Tim Mayer from Yahoo giving a pre-release explanation of sitemaps.org during this week's WebmasterWorld Pubcon conference in Las Vegas: <a class="user" href="http://videos.webpronews.com/2006/11/16/yahoo-and-google-collaborate-on-search/">http://videos.webpronews.com/2006/11/16/yahoo-and-google-collaborate-on-search/</a>
hongxiaowanDec 5, 2006
<a class="user" href="http://www.sitebases.org">http://www.sitebases.org</a>Sitebases, the next protocol of Sitemaps.
sherifgmansourMar 5, 2007
Thats great, but its still a shame robots.txt is not a recognized standard<a class="user" href="http://blog.sherifmansour.com/?p=16">http://blog.sherifmansour.com/?p=16</a>