120 Comments
- ZaNkY, on 11/08/2007, -2/+62Google doesn't store data in itself. Google stores WHERE the data is stored. The other 99.98% is hosting companies, Flickr, ... Websites that have lots of content.
Google has very little content in itself. Compare Google's homepage versus Yahoo. If you do a Google Image search all of the image results are around 5 KB in size (stored on Google's servers for speed). The actual images that are linked are hundreds of times bigger.
In short, Skynet, errr I mean Google doesn't have to store all the data in the world, it already knows where EVERYTHING is. And when the time is right... :) - eternal464, on 11/08/2007, -2/+54porn
- scoreboard27, on 11/08/2007, -1/+37They sure are organizing lots of the world's market cap.
- leetice, on 11/08/2007, -3/+34what percentage of this data do you think is just backup data?
- 3Den, on 11/11/2007, -1/+24Not really relevant statics......
Google's INDEX is 1000 TB... that's not the actual content, just the index.
The figures for data produced globally refers to all data produced, not just on the internet and publicly available. - Dwardumt, on 11/08/2007, -1/+22Bit confusing the way they interpret 'data' as being 'information' in the old-style sense of the word...
- phike, on 11/07/2007, -0/+16not all data is useful. One would argue google organizes the most valuable 0.02% of data
- wiifm69, on 11/08/2007, -2/+18rule 34
- elDooderino, on 11/08/2007, -2/+17so wait... what's the other 99.98%?
- electrobutter, on 11/08/2007, -1/+14stuff being censored by China
- cplusplus, on 11/08/2007, -2/+15Er, except of the google cache of every website and google images of all images on the net.
- ardarvin, on 11/08/2007, -0/+12Newsflash: Digg reporting less than 0.0004% of Google's organized information.
- bradstuff, on 11/08/2007, -2/+12i would bet my ritalin that the .02% that google has organized is the most interesting .02%.
- insllvn, on 11/08/2007, -0/+10Scooty Puff Jr sucks!
- sfrederiksen, on 11/08/2007, -1/+10I wish I had a woman.
- vinblackham, on 11/08/2007, -5/+14wonder if they'll ever catch up...if that's possible...
- Bridea, on 11/08/2007, -0/+8Google becomes self-aware at 2:14 A.M., September 6, 2008
- inactive, on 11/07/2007, -1/+9YEOW! Growth potential. No wonder the stock has gone up 150 points in a month.
- etnu, on 11/08/2007, -1/+9The world has an infinite amount of "data".
Consider a CGI script with a URL that looks like this:
http://cgi.tld/my-script?date=2007-11-07
It has an infinite number of inputs. You wouldn't want to index all of them, would you? - WhiskeyLemur, on 06/30/2009, -0/+8If you look at the article, it includes "the volume of unique data created in the world each year saved to film, disk, optical, and paper." That means that it includes every hand-written high school essay, every inter-office memo, every diary, every draft of a new play or novel in the works. There's no way that all of this is going to be available to the public, and that's perfectly normal.
I would be more interested in seeing what percentage of _online_ data is searchable through Google. That would be much more informative. - EvilNapkin, on 11/07/2007, -1/+8you could also say that a lot of that data is duplicate of each other.
I mean how many LOLcat sites are out there - MacSuxWindozSux, on 11/08/2007, -1/+8The whole thing is an assumption based on assumption based on questionable statistics.
There's no way those numbers are even remotely accurate. - KirbyMeister, on 11/08/2007, -0/+75,609,121TB? That's a lot of porn.
- smart394, on 11/11/2007, -2/+8/robots.txt:
User-agent: *
Disallow: * - fkr3, on 11/07/2007, -0/+68 billion small files (say 2 - 200 kilobytes) is still a whole assload of storage. Even if you average to just 50kb a page you're looking at hundreds of terabytes.
- ibanezdtx120, on 11/08/2007, -3/+9Brings me to the Futurama episode where Fry has to defeat the super brain thats recording all the knowledge of the universe before destroying it.
- marcog, on 11/08/2007, -0/+6Tell me how you can view their cache of the page then?
http://www.google.com/help/features.html#cached
What they don't store are the images, videos and any other "data" files. Those are the files that take up an enormous amount of storage.
And they don't know where "everything" is. How do they reach a page that hasn't been linked to? - sotopheavy, on 11/08/2007, -0/+5Yes they are organizing lots of steal my icon.
- ikrit2006, on 11/08/2007, -0/+5You would too if your parents didn't have parental controls on your computer.
- inactive, on 11/07/2007, -5/+10With simple math, you can calculate that once they have organized all of the world's information, their stock price will be north of $350,000 per share ;)
- psykiv, on 11/08/2007, -0/+4I, for one, welcome our new Google overlords.
- SPThom, on 11/07/2007, -0/+4Don't like it? Don't use it, and shaddup.
- devindotcom, on 11/08/2007, -1/+5But that's just the html and some of the images - it doesn't cache say an embedded movie file, which would be 10 times bigger than the page itself.
- jynweythek, on 09/17/2008, -1/+5Actually google is storing 0.02% of the world's data. it is organizing a lot more.
- dood, on 11/08/2007, -0/+4They cache a ton of data. I think it's fair to say they store most of the contents of the web that they crawl, other than movies and images. And that's only a matter of time, heh.
- neuropsychguy, on 11/07/2007, -0/+4And one could be wrong. Don't misunderstand me. Google has done a fabulous job at "organizing" a lot of important and pertinent information. However, for every useful site there are many useless or fluff sites. Google Scholar is pretty good at organizing (mostly) useful information but, due to copyright restrictions, still has very limited access to primary scientific data and information. There is so much extremely useful data that hasn't even begun to appear on the internet or world wide web yet.
- WhiskeyLemur, on 06/30/2009, -0/+4The article specifically mentions unique data. Not sure how they would verify that, but the intention seems to be to avoid repetive data.
- jriggs420, on 11/08/2007, -1/+5linky no-worky
- Kijael, on 11/11/2007, -1/+5A lot of people seem to have missed the point that not all the world's data is on the internet.
- fkr3, on 11/07/2007, -0/+3Probably not, but then again probably 80% of the world's information is stupid crap like spam and blogs anyway.
- digggggggggg, on 11/08/2007, -1/+4Stuff on paper, film, restricted sites, all things that Google is NOT SUPPOSED to organize.
This entire "report" is totally invalid. It implies that google needs to index every piece of data ever produced on every kind of media imaginable, which is clearly not the case. - Verytastycheese, on 11/08/2007, -0/+3This is a stupid article. Here you're adding every picture ever taken, every uncompressed high definition clip recorded from every camera on every set that may not even be used, every personal audio clip ect ect... Of course a huge percentage of the the worlds data is private and should never be searchable by everyone.
Buried.
How about we compare what Google indexes VS other search engines? - electrobutter, on 11/08/2007, -0/+3who we kiddin we all know
/britney spears crotch shot
//i'm feeling lucky - arjie, on 11/07/2007, -1/+4SEO pages.
- inactive, on 11/07/2007, -1/+4Details here: http://www2.sims.berkeley.edu/research/projects/ho ...
It looks to me like their methods for determining magnitude are weak but the determination that the volume is increasing looks pretty sound. So there is more information being produced but the total amount is really pretty hard to even estimate. - reeder, on 11/07/2007, -1/+4It's not quantity, it's quality and relevance that matters. Still, if they are not careful, they could turn the web into something like they have in China, simply by choosing the wrong information to archive.
- firechill, on 11/08/2007, -0/+3You are a ____. Fill in the blank with an infinite amount of data.
- plarp, on 11/08/2007, -3/+6r u kitten me? is caturday yet?
- haochi, on 11/07/2007, -0/+3Those 99.98% are duplicates.
- profundiz, on 11/07/2007, -0/+3So i think it's safe to say that only 0.02% of all the "data" is actually "information"
-
Show 51 - 100 of 119 discussions



What is Digg?
Browsing Digg on your phone just got easier with our enhancements to the