111 Comments
- inactive, on 10/12/2007, -6/+101This is nothing compared to my midget porn collection.
- inactive, on 10/12/2007, -0/+79Nope, they're all excel spreadsheets.
- blankartist, on 10/12/2007, -2/+37I take it these aren't Access databases.
- magicRob, on 10/12/2007, -2/+37Marked as inaccurate... It's not even close to an accurate list... just guessing. No real proof.
Walmart is estimated to have their DBMS (Teradata) at 570-580TB.
http://www.intelligententerprise.com/print_article.jhtml?articleID=196801888
And this article compares the NSA's call record database to a bunch of other things:
http://www.usatoday.com/money/industries/technology/maney/2006-05-16-nsa-privacy_x.htm
So this article is just some Blog Spam... - Jo9100, on 10/12/2007, -8/+41my pron archive is the #1 largest db in the world!
- sydlexius, on 10/12/2007, -2/+33We'll never know exact figures, but your speculation is probably close to the truth. This list is missing a lot of big players, and one in particular that came to mind was Wal-Mart. They have an astoundingly large Data Warehouse that, as of January of '06 was estimated to be 583TB, and should be considerably larger by now (I couldn't find any more contemporary statistics). http://www.informationweek.com/story/showArticle.jhtml?articleID=175801775
- MAdaXe42, on 10/12/2007, -1/+29Also, CERN should be on that list - I've been in their datacentre - a single particle collision generates terabytes of data, and when the LEP was running, they did hundreds a day - the LHC is going to generate crazy amounts of data - they have silos 20 meters in diameter full of robots, which automatically mount and unmount magnetic media depending on what data is being sought - truly a sight to behold. The ATLAS detector alone will require in excess of a petabyte a year, after compression.
- geodescent, on 10/12/2007, -5/+25SELECT * WHERE First_Name = "roach"
0 row(s) returned
Yes; but it would seem that you're not in it - roach, on 10/12/2007, -4/+24Don't the Mormons have some mega database of everyone that has ever lived so they can save our souls when it's all over.
- inactive, on 10/12/2007, -3/+2210. Library of Congress
9. Central Intelligence Agency
8. Amazon
7. YouTube
6. ChoicePoint
5. Sprint
4. Google
3. AT&T
2. National Energy Research Scientific Computing Center
1. World Data Centre for Climate - scrimaxinc, on 10/12/2007, -2/+18Did i really just see a mock SQL query?
- takeda, on 10/12/2007, -0/+9Actually there are some problems with it. They mentioned YouTube and Google separately... Google is now the owner of YouTube. Also what's with Google Groups, GMail and other services they provide?
- inactive, on 10/12/2007, -4/+13@geodescent
Well there's your problem. You searched for his first name using his username. Also, you forgot the semicolon. - inactive, on 10/12/2007, -2/+8will it what? Be able to make predictions of the weather based on data previously collected? Yes.
- SwellGuy007, on 10/12/2007, -0/+6My ex-wife's list of stuff she used to bitch about was at least 8 Petabytes. Why is she not on this list?
- nixonrichard, on 10/12/2007, -1/+7Considering they aren't even talking about "database" in the true sense of the word (it's just "who stores the most crap"), there are probably several organizations that should be on the list but are not.
- Insightful, on 10/12/2007, -1/+7Also marked as an inaccurate because the article is a wild guess. While Yahoo Search is not as popular as Google in US. It is, however, the top overall destination site in page views in US, Japan, and many countries around the world. Yahoo has also more properties than Google such as Mail and Answers - each is #1 in its field. Yahoo Search Marketing (formerly know as Overture) is also another source of VLDB.
Disclaimer - I used to work for Yahoo. - inactive, on 10/12/2007, -0/+5Anybody have an idea how big is Digg's database? Just curious..
- nashdawg, on 10/12/2007, -1/+6I think Walmart keeps master and transactional data for data mining. I imagine their data warehouse is ridiculously huge.
- BigSlacker, on 10/12/2007, -2/+7No inaccurate but a different definition of "big". That would be big in pure storage size verse big in lots of data points.
- SneezingTree, on 10/12/2007, -0/+5Well where is the Wayback machine?
"How large is the Wayback Machine?
The Internet Archive Wayback Machine contains almost 2 petabytes of data and is currently growing at a rate of 20 terabytes per month. This eclipses the amount of text contained in the world's largest libraries, including the Library of Congress."
This list is quite inaccurate, although pretty interesting at the same time. - BigSlacker, on 10/12/2007, -0/+5Yep...the biggest stuff I've ever worked with has been natural data acquisition. All that instrumentation being sampled constantly produces piles of raw data, and you don't want to reduce it because there may be new processing methods developed later that need to look at the raw data differently.
- fotoman, on 10/12/2007, -0/+4Actually this kinda makes sense. If you look at the computers in the Top 500 lists (http://www.top500.org), a lot of them are climate/weather related. I'm sure they producing some huge amounts of data.
- pickypg, on 10/12/2007, -1/+5The whole thing is just some seemingly random guy's guesses. What a waste.
- grumpyrain, on 10/12/2007, -0/+4and not the least Google Earth.
- dbalaski, on 10/12/2007, -0/+4IMHO -- not enough technical data in the article -- Seems there are some assumptions and missing things -- some of which I would expect for Security Reasons (such as NSA)...
As a DBA -- I wish the article talked a little about the Software and hardware platforms. - twooranges, on 10/12/2007, -1/+5The walmart data center has been dubbed area-71.
http://en.wikipedia.org/wiki/Area_71 - sophiaperennis, on 10/12/2007, -0/+4Usenet is over 200 TB.
- there, on 10/12/2007, -0/+3"How would you store stuff without a DBMS? The simplest would be a hierarchal file system but I think that still would be considered a DBMS. You'd probably still need a relational database to find those files."
You could use a object oriented model too but that's besides the point. You're still thinking like a tech. Language exists outside of the confines of the tech world. Step away for a second and imagine you know very little about how a computer operates (the vast majority). Layman typically don't understand things like data types or normalization and even the simple acronym DBMS becomes some unfathomable term that could mean anything.
Thus, to them, the concept of "a database" is black box of information. And I would argue "size" to them will be determined primarily by what terms they know on the subject... in this case bytes (it's something called the Sapir–Whorf hypothesis which I think is applicable in this case)
All I'm saying here is using raw bytes to explain database size is probably best if you plan to say something to the general public that's meaningful. Every other gage should probably be reserved for strictly tech politics (and would be incoherent to most people). - BigSlacker, on 10/12/2007, -0/+3That's rather pointless without a definition for the term "large". Disk space, data points, indexing...a lot of ways to look at it.
- inactive, on 10/12/2007, -0/+3@daeken:
If they are so different then can you explain to me the difference of a database full of "videos" and a database full of "climate paterns"? Data is data my friend, 1's and 0's, bits, bytes, and the like. Sure if you are talking the way in which the data is organized you could further classify the items on this list as "content datastores" and "numerical data" databases or something similar, but why, I am only interested to know who holds the most data! - inactive, on 10/12/2007, -0/+3wait, did you just say that youtube doesn't store their videos in databases, then say that myspace does?
- arnar, on 10/12/2007, -0/+3There is something about this that I'm not buying - it says 100 million videos are viewed on YouTube per day but compare that to that it also states that Google gets 91 million searches per day... that can't be correct, either of these numbers have to be wrong.
- inactive, on 10/12/2007, -1/+4I'm surprised no credit card and insurance companies made the cut...
- inactive, on 10/12/2007, -1/+4FWIW, Google has MASSIVE datacenters (yes, plural) scattered all throughout the US. I've seen two of their locations in Atlanta. Each location spans multiple buildings the size of wal-mart supercenters. I tend to think their size is being understated.
- there, on 10/12/2007, -2/+5
I don't think "database" means quite the same thing as DBMS any more. I would say the term has evolved in the popular culture to mean roughly "a repository of information searchable by computer". Using that definition... the taxonomy seems to get dwindled down to simply the number of bytes rather than a particular technology used to search/house it.. - tizz66, on 10/12/2007, -0/+2All of the largest databases are in the US? I think not. This isn't "The largest databases in the world", it's "The largest databases in the US", and even then it's just guesses.
- op12, on 10/12/2007, -0/+2Or the term database. YouTube doesn't actually store the videos themselves in the database! They're just talking about overall storage, not databases specifically.
- inactive, on 10/12/2007, -0/+2This cannot be very reliable. It is very US-centric and highly suspect.
Atleast 2 of the 5 largest employers in the world alone are in India and the UK (Indian Railways + NHS) and they each have huge terrabytes of data that they handle. The UK NI system is reputedly a huge database as is one at Germany's Deutsche Post.
This article should be buried. - hankyone, on 10/12/2007, -3/+5One that does surprise me by its absence is Microsoft, they have a larger database than Google when it comes to web search + all their other services and internal stuff.
- noshyuz, on 10/12/2007, -0/+2incredibly inaccurate article -- about the only thing accurate in it is the author's admission:
> you may want to call this the top 10 databases "that we could find." - coredumper, on 10/12/2007, -0/+2This list is sadly small.. I know banks already have PETABYTE databases for transactions and the related check images that are required for storage with the new US check 21 law..
- jbus, on 10/12/2007, -1/+3Walmart is a data mining pig slut... Why were they left off this list?
- thevikas, on 10/12/2007, -0/+2the indian railways database can least beat into the top 3 spots. The scale and usage of that database is beyond imagination. Railways are the primary means of distant transport in india. India being most populous county in the world. Database running past 10 years. That website sleeps every night for 5 hours to refresh its cache and distribute data to more then 10 thousand servers all over india which again spreads to even more counters in each city. The same database does not just do ticketing but even manages the whole inventory of wagons and engines. That is not a SAP equipped but still much high end ERP.
- grumpyrain, on 10/12/2007, -0/+2It sort of becomes a bit pointless to compare databases with trillians of records to databases with only millions that happen to contain a few MB in each BLOB.
- nixfu, on 10/12/2007, -1/+3
Innacurate article full of guessing... this list is much more accurate because its based on data submitted by REAL companies themselves.
http://www.wintercorp.com/VLDB/2005_TopTen_Survey/TopTenWinners_2005.asp - cowabuse, on 10/12/2007, -0/+2Choicepoint is pretty scary. They gather information on people, I wonder what other information they're gathering we don't know about?
- BigSlacker, on 10/12/2007, -0/+1That would 100K of storage per delivery. It's probably more in the hundreds of bytes per transaction.
- inactive, on 10/12/2007, -1/+2How about the torrent sites? who has the biggest database?
- wastegas, on 10/12/2007, -1/+2Good. No aol.
-
Show 51 - 100 of 108 discussions



What is Digg?
Digg is coming to a city (and computer) near you! Check out all the details on our