Introducing Digg Dialogg!
Check out the first Digg Dialogg with Nancy Pelosi. More guests to be announced soon!
Why the cloud cannot obscure the scientific method
arstechnica.com — With petabytes of storage and advanced datamining techniques, our ability to ferret out obscure relationships among items and events has never been greater. But it's beyond hyperbole to suggest that this capacity has made scientific reasoning obsolete, as was proposed earlier this week.
- 825 diggs
- digg it
- underthewether, on 06/26/2008, -4/+60Great article. Puts that idiot from Wired in his place.
- PabloMac, on 06/26/2008, -8/+4...our ability to ferret out obscure relationships among items and events..."
Like the Bible Code.
Google it.- Iztikeit, on 06/26/2008, -2/+3Good call! But the Moby Dick Code is way better.
- Lukesed, on 06/26/2008, -1/+6Or the random data code.
- toastjam, on 06/27/2008, -0/+1Exactly... we're wrong a lot more than we're right, but for some reason we hold these machine powered methods to some higher standard :P
- MWeather, on 06/27/2008, -1/+2Sort of like the Bible code, only real.
- PabloMac, on 06/27/2008, -2/+1The Bible code theory has withstood numerous peer reviews. There may be something to it.
- MWeather, on 06/27/2008, -0/+2"The Bible code theory has withstood numerous peer reviews."
You and I must have a different definition of peer review. Hers an article from Statistical Science that tears the bible code a new one. What mathematicians have reviewed the bible code favorably, and what journal did they do it in?
http://projecteuclid.org/DPubS?service=UI&version= ...
- Iztikeit, on 06/26/2008, -2/+3Good call! But the Moby Dick Code is way better.
- Iztikeit, on 06/26/2008, -0/+2But Wired already put journalism in its place, of rest.
- Varz, on 06/27/2008, -0/+4Yeah I know I wasn't the only one waiting for a rebuttal.
- molave, on 07/01/2008, -0/+0Does being wrong automatically make one an idiot on Digg?
- PabloMac, on 06/26/2008, -8/+4...our ability to ferret out obscure relationships among items and events..."
- orenshk, on 06/26/2008, -4/+8I thought it was good, but not great. One of the two main arguments in the article is that we could not have reached cloud-level computing without theory and its application. While this is true, it doesn't counter the thesis of the original article - that could computing will replace the scientific method.
The good parts are when he debunks Anderson's examples.- qiemem, on 06/26/2008, -0/+2Agreed, though I think he also missed a significant portion of the point of Anderson's article. Traditionally, the theories developed with the scientific method were causal theories, theories that supplied our predictions and pictures of what is going on with a causal story. It is this approach to science that Anderson is declaring obsolete. The thing is, the causal story may, or likely is, beyond our abilities of comprehension and representation. The causal theories we can give are limited by the finite expressions of language and our finite cognitive capacities. The purely statistical methods of doing research that Anderson praises are capable of dealing far more amounts of information than our causal theories can account for.
Or that was my understanding of Anderson anyway.- SapientWolf, on 06/27/2008, -0/+1While we are better at collecting data, as soon as someone wishes to act on this data they will have to interpret what it means and determine how it can be useful. That step still involves applications of the scientific method, such as forming and testing hypotheses and theories.
Using an example from the article, "All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species."
Guesses about the differences that would indicate a new species (and what forms those species might take) still have to be interpreted from the data with the help of (models or patterns) and then tested.
- SapientWolf, on 06/27/2008, -0/+1While we are better at collecting data, as soon as someone wishes to act on this data they will have to interpret what it means and determine how it can be useful. That step still involves applications of the scientific method, such as forming and testing hypotheses and theories.
- qiemem, on 06/26/2008, -0/+2Agreed, though I think he also missed a significant portion of the point of Anderson's article. Traditionally, the theories developed with the scientific method were causal theories, theories that supplied our predictions and pictures of what is going on with a causal story. It is this approach to science that Anderson is declaring obsolete. The thing is, the causal story may, or likely is, beyond our abilities of comprehension and representation. The causal theories we can give are limited by the finite expressions of language and our finite cognitive capacities. The purely statistical methods of doing research that Anderson praises are capable of dealing far more amounts of information than our causal theories can account for.
- wynja, on 06/26/2008, -0/+15Very well written article. I like how he points out that, even though the scientific method isn't going anywhere, science is definitely being changed forever by the ability to data mine the cloud.
- mathcreative, on 06/26/2008, -1/+8We make models off of data. Algorithms help to pick the important data.
As long as things in science need to be understood, then we will always need models. - Iztikeit, on 06/26/2008, -3/+8"Put in more practical terms, would Anderson be willing to help test a drug that was based on a poorly understood correlation pulled out of a datamine? These days, we like our drugs to have known targets and mechanisms of action and, to get there, we need standard science. "
I think that was all that needed to be said.- Trammel, on 06/26/2008, -1/+3It's called a strawman, it's a very popular logical fallacy.
Although Anderson makes stupid assumptions, he doesn't make any claims on what you quoted Timmer as saying.
Think about that quote. What does Timmer mean by pulled out of a datamine? In this hypothetical situation would Anderson have some disease and the datamine chose a drug using keywords? It's very obscure and lacks strength.
I don't agree with you, I don't think it needed to be said at all. - exosome, on 06/26/2008, -1/+1To be fair, no one understands how lithium works, but people still take it.
- Trammel, on 06/26/2008, -1/+3It's called a strawman, it's a very popular logical fallacy.
- nullx42, on 06/26/2008, -1/+19The scientific method is the optimums prime of science.
- Versh, on 06/26/2008, -0/+4Ah, quoting Galileo eh?
- Origin415, on 06/27/2008, -0/+1What is the megatron of science?
- nullx42, on 06/29/2008, -0/+1Religion
- jexdawg, on 06/26/2008, -0/+12Isn't using the phrase "beyond hyperbole" beyond, well, hyperbole?
- ElAssoWipo, on 06/26/2008, -0/+9That is the craziest, most insane overstatement I've ever heard in my past 90 lives.
- ndgcs, on 06/26/2008, -9/+0Am i the only one who didn't understand the description but dugg the story anyways? :/
- rebotfc, on 06/26/2008, -1/+14At last some sanity on this ridiculous idea that any number of data points can supersede understanding gained through application of the Scientific Method.
Ars put the smackdown on wired.- desertDenizen, on 06/26/2008, -1/+2I'm not so sure, in the context of complex systems. Take weather models, for instance. No human brain could ever understand the complex interactions going on in the atmosphere, whereas computer models can be evolved to simulate and predict the future quite well, and ever better with more cloud data (heh) and computational power. I think this weather is a better (stronger) example of the Wired article's basic idea than those offered by the article itself. I wouldn't go so far as to claim that the scientific method is made obsolete; but in this example, it seems to take a back seat to alternative approaches such as brute force automated data mining and genetic algorithms. And if so for weather, then why not, in principle, for any complex system for which vast datasets exist?
- jeffdjohnson, on 06/26/2008, -0/+1Predicting the weather and understanding why changes in weather patterns occur are 2 different things.
This is the fundamental flaw of the Google example too: Google's goal as a business is to maximize profits by predicting which content you want, but understanding why you want specific content is an entirely different beast.
The importance of a model is to bring together different understood/known concepts in order to generate predictions about novel/unknown concepts. A data-driven approach, by definition, will never be able to generalize in such a way.- desertDenizen, on 06/26/2008, -0/+2A sufficiently nuanced data-driven approach absolutely could make generalizations and be applied to novel scenarios/domains, simply by tuning weights and data resolution and filtering factors based on sensitivities (e.g., making data more coarse-grained == generalizing). And even if a model didn't work well initially, it could learn and adapt -- with no human understanding driving the improvement. So the more prickly epistemological question, to me, is whether or not we *understand* the model. The most useful models (including those that make 1) good predictions, 2) generalizations and 3) can be applied to novel situations) might end up being ones we simply aren't capable of understanding. It amounts to a form of AI. Resistance to this humbling possibility, while natural and to be expected, feels like pre-Copernican anthropocentism to me.
- toastjam, on 06/27/2008, -0/+1I'd have to agree... as a computer scientist getting into the AI/machine learning field, I have rapidly come to the realization that "knowledge engineering" (that is, programming discrete rules and knowledge for everything in existance, a la the Psyc project) is a dead end.
You'll find correlation everywhere, but hard and fast rules are few and far between, at least above the atomic level. To me, it seems that the future of AI lies in self-bootstrapping entities which learn their own interpretations of the world. Like neural nets, though, we may not understand them. But given enough computational power, they may "understand" the world better than us. And by that I just mean that their generalizations go down a few levels further than ours.
- jeffdjohnson, on 06/26/2008, -0/+1Predicting the weather and understanding why changes in weather patterns occur are 2 different things.
- desertDenizen, on 06/26/2008, -1/+2I'm not so sure, in the context of complex systems. Take weather models, for instance. No human brain could ever understand the complex interactions going on in the atmosphere, whereas computer models can be evolved to simulate and predict the future quite well, and ever better with more cloud data (heh) and computational power. I think this weather is a better (stronger) example of the Wired article's basic idea than those offered by the article itself. I wouldn't go so far as to claim that the scientific method is made obsolete; but in this example, it seems to take a back seat to alternative approaches such as brute force automated data mining and genetic algorithms. And if so for weather, then why not, in principle, for any complex system for which vast datasets exist?
- insanebrain, on 06/26/2008, -0/+10'the cloud' is such an STUPID name.. .it's called the internet. Yes, we use a symbol in the form of a cloud, but that doesn't mean we name it a cloud. I guess it comes from dumb ass managers ... . something like : "So you say my data is stored in that cloud you have drawn there"
- desertDenizen, on 06/26/2008, -0/+2Point taken, but Andreesen (among others) has called it the cloud as long as I can remember, so it's understandable that the term might catch on among people talking about related abstractions. It's also not connotatively identical to "the Internet," even if they refer to the same thing. Like the saying goes, "there are no true synonyms."
- dafragsta, on 06/26/2008, -0/+4I knew this was stupid. It's like saying happy accidents are a way to get things done. Just because a computer can correlate data doesn't mean it can decide which new data to start collecting, and even when the AI is strong enough to do that, it will still follow the scientific method to make that decision.
- SpykerSpeed, on 06/26/2008, -2/+2What if we find correlation between two groups of datasets that were once considered totally different fields of science? And what if we can't explain that correlation by the scientific method? It's a tough call, but I think I would have to side with correlation in the end. Even quantum mechanics is forced to settle for probabilities, so I don't see why other fields of science shouldn't be allowed the same luxury.
- eighties, on 06/27/2008, -2/+1Of course! We always need to side with correlations over science and common sense. Otherwise, how could we possibly come to the conclusion that "global warming, earthquakes, hurricanes, and other natural disasters are a direct effect of the shrinking numbers of Pirates since the 1800s. "
http://www.seanbonner.com/blog/archives/001857.php- SpykerSpeed, on 06/27/2008, -0/+1...these things existed before Pirates existed, retard.
- eighties, on 06/27/2008, -2/+1Of course! We always need to side with correlations over science and common sense. Otherwise, how could we possibly come to the conclusion that "global warming, earthquakes, hurricanes, and other natural disasters are a direct effect of the shrinking numbers of Pirates since the 1800s. "
- Hetman, on 06/26/2008, -1/+9The wired article was stupid. I am glad someone wrote a rebuttal. Do not get me wrong drawing conclusions from all that information is use full especially to marketers. But it defiantly will never replace the scientific method.
- desertDenizen, on 06/26/2008, -0/+1I'm unclear as to whether simulations run on computer would be considered cloud data, scientific method, or a hybrid of the two. Anyone? Seems like there could be a gradient from one to the other, such as with simulations using cloud data as inputs, to varying degrees of sensitivity depending on the model being tested (high energy particle physics being very different from, say, epidemiological simulations).
The reason I ask, in the context of theoretical M-Brains (Matrioshka Brains -- massive computer shell clouds the size of planetary orbits)... is there any difference between experiment and simulation, when your simulations are accurate down to the positions of molecules, etc.? Perhaps only to the limits imposed by the uncertainty principle?- vlad43210, on 06/27/2008, -0/+1A simulation is an approximation of a real-world process. At some point you can argue that the approximation is "close enough," for certain purposes, to be indistinguishable from said process.
The scientific method is in the model that usually embeds any simulation. What parameters are you setting for the initial state? What are the interaction dynamics? Does the simulation elicit predicted behavior? Etc.
- vlad43210, on 06/27/2008, -0/+1A simulation is an approximation of a real-world process. At some point you can argue that the approximation is "close enough," for certain purposes, to be indistinguishable from said process.
- desertDenizen, on 06/26/2008, -0/+1I'm unclear as to whether simulations run on computer would be considered cloud data, scientific method, or a hybrid of the two. Anyone? Seems like there could be a gradient from one to the other, such as with simulations using cloud data as inputs, to varying degrees of sensitivity depending on the model being tested (high energy particle physics being very different from, say, epidemiological simulations).
- Aanidaani, on 06/26/2008, -0/+1Yes, after all, an engineer (like myself) has to be there to replace all those computers when they break!
- BrewmasterC, on 06/26/2008, -0/+1The wired article was interesting, but for different reasons. This Google-IBM machine should be a very interesting thing. Coupling a nifty management system like Tivoli with GFS you might be able to do some very interesting stuff. Hopefully the GFS/MapReduce stuff is open sourced soon and well documented. MPI has it's limitations.
- bullhead2007, on 06/26/2008, -0/+2I wonder how many other diggers know what Tivoli is. The only reason I do is I used to sell IBM Software. You'd think I'd hate Microsoft if you knew what I had to go through to get a sale too.
- BigManOnCampus, on 06/26/2008, -1/+4Thank god someone rebutted that nonsense.
It was almost like they were trying to justify computer profiling of data as superior to understanding underlying principles, which is total hogwash.- toastjam, on 06/27/2008, -0/+1But outside of pure math, the underlying principles themselves are just abstractions/generalizations of underlying complex systems.
I'm not saying the scientific method is worthless, in fact it's the best thing we've had up until now. I just think that AI, once it finally matures (I know, I know), can take us a lot farther.
To put it another way, the computer IS building a model. Just a model with far more complexity and further reaching interconnects than a human mind is capable of.- BigManOnCampus, on 06/27/2008, -0/+1Data can only demonstrate the specific case in which it was collected.
With understanding of principles you can extrapolate to any case.
Yes, with lots of data, you have more instances to pick from in terms of data, however without comprehension of the laws that cause said effects you can never know how much you do not know.
For instance, if instead of Newton's Laws we had approximations based on lots and lots and lots of apples being dropped, would we understand the motions of planets as we do? Would Einstein have further hypothesized that space was curved? Those further advances would be in jeapordy without a clear understanding of the rules of the game.
- BigManOnCampus, on 06/27/2008, -0/+1Data can only demonstrate the specific case in which it was collected.
- toastjam, on 06/27/2008, -0/+1But outside of pure math, the underlying principles themselves are just abstractions/generalizations of underlying complex systems.
- tomazkovacic, on 06/26/2008, -3/+4public class MrBabyManException extends Exception{
public MrBabyManException(String s){ super(s); }
}
@DIGG developers - please delete this snip of code! - Aidenf77, on 06/26/2008, -0/+1It's not like one method is mutually exclusive to the next. My thought as I was finishing Anderson's article was that, like Timmer states in his article, the scientific method is what allowed the "cloud" and a means of analyzing it to come into existence in the first place. The more I pondered Anderson's article, the more I concluded that, if anything, the two methods could be used in conjunction with each other; as opposed to the one replacing the other.
- elementop, on 06/26/2008, -0/+2No doubt.
It seems to me that when you start seeing correlations where no known theory applies, scientists would be tripping over themselves to figure out *why* there is a correlation...which requires the scientific method.
In other words, data mining should enhance the scientific method, not replace it.
- elementop, on 06/26/2008, -0/+2No doubt.
- cambob76, on 06/26/2008, -0/+1I'm pretty sure data mining and the scientific method will continue to be used for the foreseeable future. The data can lead to theories and hypotheses, but scientific verification or exclusion will always be necessary for us as rational beings (as far as we are anyway). To me the data mine (***** you, its not a cloud) is just like the regular empirical data that is all around us... it does nothing and is useless until we apply it to something.
- induren, on 06/26/2008, -1/+3Thank you Arstechnica! No matter how much data you have, you need to have a construct to hold it in, a guiding principle to make predictions with!
- flashback99, on 06/26/2008, -1/+1This page should stand as proof that digg isn't as stupid as reddit seems to claim it is.
- doggerrel, on 06/26/2008, -2/+1Models can reflect, but their primary purpose is to predict. Let us not forget some of the utility of the original article. The editor from Wired cited specific instances involving cost and energy wherein our models have very little power to predict. As we move forward, it is quite possible that we will run into stretches of invention where we will lack models. The best example that I can come up with is the LHC. We don't have the means of testing what will happen beforehand, but based on nebulous models, we are fairly sure that the collider will not destroy the earth. Our best minds have approved the event, and placed the potential viability of the earth at stake. The reasoning given by scientists rests in the fact that nothing has happened over an extremely long timeline.
'The safety group, however, pointed out that cosmic rays have produced equivalently energetic collisions with the Earth and other objects in the cosmos over and over again. “This means that Nature has already completed about 1031 LHC experimental programs since the beginning of the Universe,” they write. But the stars and galaxies endure.'
http://www.nytimes.com/2008/06/21/science/21cernw. ...
They are willing to put probability numbers to work to justify an event that could kill us. From this, I must rethink the usefulness of data mining, and how it may, in limited instances, replace the scientific method as we grow our thinking. - catsongs, on 06/27/2008, -2/+0Amen.
The human brain thinks in models. To approach the cloud—to derive anything of any value from the cloud—requires some sort of thought, some sort of... model.
And what does the human brain do with a model? Formally or informally the brain tests that model.
A final thought: why either/or? Why not both/and? Science and clouds?
Kirtland Peterson- toastjam, on 06/27/2008, -0/+1Perhaps we consciously think in symbolic models, at a very high level. But that's just the surface of it.
Subconsciously, your brain is processing vast amounts of analog data, data that could not cleanly fit any model you could imagine. And yet most of the time, people manage, because the brain can predict based off of previously learned correlations in stimuli.
Computers would just do it a bit better than us. They would build models, but perhaps models so complex that we would not consider them as such (think neural nets). But they would be able to predict, which is the important thing.
- toastjam, on 06/27/2008, -0/+1Perhaps we consciously think in symbolic models, at a very high level. But that's just the surface of it.
- busket, on 06/27/2008, -0/+1I think that there are certain areas like fluid mechanics and certain modes of heat transfer where developing an explicit theory isn't all that useful. It can be a lot more useful to use dimensional analysis, which from my understanding is nothing more than fitting an equation to data.
In instances where it doesn't make sense to develop an explicit theoretical framework then theory isn't necessarily all that relevant. This isn't to say that it wouldn't be valuable.- Varz, on 06/27/2008, -0/+1In certain areas these algorithms certainly are usefull but the author's proposition of them being a replacement to scientific method was just total *****.
The algorithms would even be useful in developing theories but they certainly aren't a replacement. - busket, on 06/27/2008, -0/+0Yeah. I'm not sure how one could conceptually reconcile the idea that something that something that really just seems like glorified statistical analysis could eliminate theory. The wired guy screwed the pooch a bit.
The only places it could replace theory would be places where theory is completely untenable.- toastjam, on 06/27/2008, -0/+1Oh ye of little faith :P
- busket, on 06/27/2008, -0/+0It isn't a matter of faith. It's just not possible. You would first need a theoretical framework upon which to base your analysis of the data. It doesn't really seem reasonable to me to claim that it is possible to eliminate the need for theory by using a method which must rely on theory.
It's like saying "Someday, we'll be able to build machines that make machines obsolete."
- Varz, on 06/27/2008, -0/+1In certain areas these algorithms certainly are usefull but the author's proposition of them being a replacement to scientific method was just total *****.
- vlad43210, on 06/27/2008, -1/+0I'm glad to see this article. I think Anderson got a little confused... what, exactly, is he proposing? That we see some new data, throw a statistical model at it, and find comparisons with existing data? It's a fine way to get insight, but it's certainly not very useful in the long run.
I don't want to get into an involved discussion, so I will focus on two points: false patterns and false premises. Suppose you just got some cool new data and wanted to run a statistical analysis on it. Suppose that you just got your hands on some new genetic data, and you incorporate it into a statistical model that includes existing genetic data. Chances are, you will find *something* interesting about it. And so on with the next piece of data. You can produce neat little charts and tables all day long, but it will not get you anywhere closer to an understanding of genetics, much less of life. It may be argued that you don't really need to understand genetics, just the particular data you have on hand, but, as Anderson notes, there are simply MASSIVE amounts of data out there. Without theory, we have to analyze all of them. With theory, we can make broad statements after analyzing a few, and revise those statements occasionally as new data come in. Theory helps us make predictions about a broad range of phenomena, and if the theory is good, those predictions are mostly right.
The other, even more important point, is of false premises. If you just throw things at models, you can often come to some very, very wrong conclusions. Theory guides you to doing the *right* data analysis and the *appropriate* models, and prevents you from confusing yourself and all your colleagues. And in this age of overabundant information, confusion is something we very much wish to avoid.
Kudos to ars for the article. As a scientist, I found Anderson's earlier one to be misinformed. - mikeyla, on 06/27/2008, -0/+1interesting article, and you seem to have theory on your side, but I say with respect you're just dead wrong. Your example about medication is a perfect illustration of the counterargument to what you say: many medications, most specifically those involving depression and other mental disorders, have very poorly understood mechanisms of action. If you noticed a 96% correlation over a sample size of 1 billion people between a certain gene and, say, tendency to develop cancer, what is the invalidity of that method?
The traditional notions of statistical analysis are different when you are talking about datasets of this size, and of much greater than the ones we currently have. A statistical analysis is invalid when the amount of data points is close to the amount of subjects. For example, if we are analyzing 1,000 genes in 800 subjects, there is a very significant chance that we will find things that look like patterns but actually are not. However, if we analyze 100,000 genes in 10 billion subjects, all of a sudden we have results even more significant than were we analyzing a few variables in a traditional scientific study.
Of course there is always the chance for a falsely positive result to come up, but there's tremendous validity to what he's saying that I think you're overlooking.
Digg is coming to a city (and computer) near you! Check out all the details on our