48 Comments
- antoniojvr, on 10/12/2007, -1/+44They used Bistromatics: The math of restaurants.
- Rayza, on 10/12/2007, -0/+24Not much detail in the article. Basically: They can do it, but it's not easy. They don't explain how or what the process they're using is..
- birdwatcher3000, on 10/12/2007, -3/+19We are screwed.
- elnerdo, on 10/12/2007, -0/+14If they explained it, it would probably be wayyyyyyy over my head, anyway. Though, I would at least like to SEE all that stuff that's over my head.
- inactive, on 10/12/2007, -3/+14Yes, now Vista will be able to pick up several people's voices all at the same time and generate entire conversations with the amazing accuracy of "Dear aunt, let's set so double the killer delete select all."
- drsnooks, on 10/12/2007, -0/+11> I do find it rather amusing that the word "cocktail" is edited though... :-)
You're lucky you don't live in Scunthorpe ;) - osipov, on 10/12/2007, -0/+10I was skeptical as soon as I saw this announcement coming out of a dinky university. This is an extremely difficult problem and the claim is that the problem has been solved in its entirety, a grandiose statement if I ever heard one. Furthermore, looking at the references in the preprint, these guys aren't even familiar with the key advancements in the area of the cocktail party problem. The state of the art today is an approach called independent component analysis (http://en.wikipedia.org/wiki/Independent_component_analysis) and if you are interested here's a demo http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi
- wilk, on 10/12/2007, -1/+10You have profanity filter enabled. Check your profile.
- TigerX, on 10/12/2007, -1/+7I do find it rather amusing that the word "cocktail" is edited though... :-)
- kwisatzhaderach, on 10/12/2007, -0/+6That's right. I became intrigued in this article, thinking they have something better than ICA, as ICA is not en exact solution - unlike say PCA.
Also ICA relies strongly on some assumptions (non-gaussianity of signals etc etc), which may not hold.
Unfortunately, the solution presented here is, well BS.. :( - Deschain, on 10/12/2007, -0/+5You can't have an algorithm if the problem and bounds aren;t understood entirely. A neural net can create a solution thats nearly indistinguishable from the correct answer using statistics. A lot of thermodynamics does not use exact equations because of the statistical nature of the interactions between particles.
- omatsei, on 10/12/2007, -2/+7Can I use that to figure out how much to leave as a tip?
- rolosworld, on 10/12/2007, -3/+7I call BS..
"The computer we use is doing the work without an algorithmic program. It uses a system called a neural net, which is designed for the computer to teach itself. Basically, it works on trial and error," Casazza said. "This isn't consistent and cannot be duplicated easily. We need to find a way to design an implementable algorithm that could do this consistently and quickly."
how come a mathematical solution works on trial and error? and how come they have a mathematical solution and don't have an algorithm?
I think this comments broke my BS detector! - Machismo, on 10/12/2007, -0/+4I agree. We have a lot of Internet-Intellects trying to disprove this claim with out having ANY detailed or serious information.
For example, perhaps the neural net was used to extract a voice from a crowd in a SINGLE instance. It would be an implementation specific solution. That solution being based on a mathematical solution they developed previously.
It is also no use to refute this example as it is an example of the many possibilities. We simply need more information. - AnalystX, on 10/12/2007, -0/+4In addition to what Deschain pointed out, they stated, "We showed that this 'cocktail party problem' is mathematically solvable." While not as easily implementable as an algorithm, a neural net is still a mathematically defined set of processes, therefore they have demonstrated that the solution can be solved mathematically. Eventually there will be an algorithm.
- Otto, on 10/12/2007, -0/+3>>>"No mathematic formula = not mathematically provable = Trumped up headline"
Even a neural net is deterministic. If the thing can do it, then it's possible to describe it mathematically. It may be very, very difficult, but it's possible nevertheless.
You're correct that it's not anywhere close to a general solution, however. - jarcoal, on 10/12/2007, -3/+6can't wait to have this in the form of voice recognition software
- NCC1701A, on 10/12/2007, -0/+3Same way Bones did it on TOS - Point a mic at someone and turn of the heartbeat!
- unitedkronos, on 10/12/2007, -1/+4It's a technique, not a bill or anything, film companys could use it too... I call BS on it though until I hear it in motion.
- Durinthal, on 10/12/2007, -0/+3And just yesterday I had been thinking about how odd it was that I could pick out one other person's voice in a crowded cafeteria (a couple of tables away and in a completely unrelated conversation) and listen to whatever they're saying.
- bigkm, on 10/12/2007, -0/+3that makes me laugh, still.
- MattL920, on 10/12/2007, -0/+3If a neural net was used to solve the problem, it isn't a straight conversion from a working NN to a mathematical algorithm that we can understand symbolically and apply to other problems. It's still a deterministic process to solve a problem, as other commenters said, so it qualifies as some form of algorithm.
And for those who don't know about neural nets, if they came up with one that was able to solve this problem, then yes, the researchers did solve this problem. It's not as simple as creating some generic neural net and turning it loose on a problem to get a solution, or nothing would be intractable. Choosing the right architecture for the NN, formulating the problem in such a way that it can be understood by the network, and compiling a useful corpus for training are not trivial tasks, especially for such a difficult problem. - roosterjm2k2, on 10/12/2007, -0/+2Machismo...
More info...not likely, not here. You forget, our fellow digg members here seem more interested in proven that -their- scientific knowledge far surpasses any other, and as such, an idea that they can't wrap their heads around is wrong and worthy only of trash-talk...
God save the know-it-alls!!!
Im not saying it is or isn't possible, but the article doesnt give enough info to make either call. - transeunte, on 10/12/2007, -3/+5Oh yeah, those mathematicians are always so full of crap...
/irony - tilmaniac, on 10/12/2007, -0/+2Those assumptions happen to hold for speech.
I've used ICA for source separation of voice signals and it works incredibly well. - KWhat, on 10/12/2007, -1/+3I am not audio expert, although i am a computer programmer that is currently working on a voip application... it seems as though each voice would have some amount of overlap in tone/sound. If you were to filter out lets say one persons voice... parts of other peoples voices would be removed thus distorting what they sound like and possibly making what they are saying indecipherable.
- AnalystX, on 10/12/2007, -0/+2netdroid9, yes they solved the problem, as one of the definitions of the term "solve" is to find a means to effectively deal with a problem.
- axonal, on 10/12/2007, -0/+2Yes but what would a computer neural network want with funding?
- jcims, on 10/12/2007, -0/+1I have had a modicum of success recording with a microphone array, then timeshifting the channels to focus on a specific area of the room. Not the same problem they are trying to solve, but it's kind of cool because you can replay it over and over and 'look' at different parts of the room.
- Newton2001, on 10/12/2007, -0/+1Don't ignore the fact that they have published a journal paper in Applied and Computational Harmonic Analysis and so their work has been peer reviewed and it will continue to be scrutinized for years to come. Also, you can get the paper if you go to Pete Casazza's homepage (http://www.math.missouri.edu/personnel/faculty/casazzap.html) and click on the MathSciNet links-the paper in question is the 3rd result for me. If you read the paper you will see that it is very mathematical so most of your comments are off.
That said, the only thing wrong with this article is that the guy has to use language appropriate for the average folk, i.e., the average Digg user, and so he sounds like a hack.
Cheers! - AgenteSegreto, on 10/12/2007, -0/+1"we have the first mathematical solution to it" yet it "isn't consistent and cannot be duplicated easily"
I think my head is going to explode trying to comprehend that logic! - mrch3w, on 10/12/2007, -0/+1kwhat brings up a very good point, how do they know what to cancel and what to leave alone...
i'm no expert, but i think what they actualy did is find a way to know what range of tones a persons voice is in, then cancell out all other tone ranges except the ones you want and maybe normalize the tone ranges in the persons voice, to cancel out destructive/constructive interferance.
now i'm even less a mathmatician then a theorist, so i have no idea how they could do that, but i am sure that it *could* be done. - Machismo, on 10/12/2007, -0/+1Amazing! I was reading about this kind of research in an engineering journal. I had no idea anyone was this close to a solution!
I really want to find more info. Anyone able to find additional sources? Perhaps the thesis or whatever, itself? - AgenteSegreto, on 10/12/2007, -0/+1could someone please attach the bibliography info for the journal article related to this?
- elnerdo, on 10/12/2007, -0/+1toeknee, Of COURSE it's possible. I can do it EASILY without the use of a computer. That means it's possible.
- las3rjock, on 10/12/2007, -0/+1The most general form of the "cocktail party problem" is called "blind source separation." You have a signal (sound) which originates from multiple sources (people talking), and you want to separate the components of the signal that originate from each source (which person said what). A Google search for the terms "blind source separation" should introduce you to the various techniques applied to solving the problem--independent component analysis (ICA), neural networks (as in this research), etc.
- mmoser, on 10/12/2007, -0/+1Hah. weird seeing this article. Casazza is currently my math professor @ mizzou.
- TheRealPod, on 10/12/2007, -3/+3Why are there ***** instead of the word "C*o*c*k*t*a*i*l"? Is digg trying to stop our potty mouth?
- anodos, on 10/12/2007, -0/+0I saw someone demonstrate something similar back in 90s. It was based on "Cortical Thought Theory" ( http://portal.acm.org/citation.cfm?id=912278 ), and it was a beautiful thing. They used a series of two dimensional Fourier transforms to arrive at "gestalts" of the audio. For example, you might have a 256 x 256 grid upon which the audio is projected. As the audio plays (or is recorded), it is continually moving across this grid. The idea of the grid is to continually "digest" its content, but it is easier to think of a snapshot in time: at any point in time you have a particular set of sound waves on the grid. Now, take that snapshot, do a 2D Fourier transform, and isolate the median frequencies in the X and Y dimensions. This will give you a X and Y coordinate which you can then project onto ANOTHER 2D plane. You've boiled down that one 256 x 256 snapshot into a single "gestalt" point. Now, continue to do this as more audio is projected on your main plane. At this point, your second plane is gathering more and more gestalts and is also doing its own 2D transforms and projecting those onto a 3rd plane. You can do this down to as many planes as wish, and what you find over time is that each plane has a different "level" of gestalt of the audio. The deepest plane will have boiled down an entire audio clip to one dot, while the fourth plane, for example, will have boiled down the audio clip to, maybe, 256 dots. The goal of the experiment, though, was recognition. You could have hundreds of people say the same sentence in radically different accents, and the lower level planes would map their audio very closely together. It was only a matter of matching the pattern of the lower levels to previously "learned" patterns of other audio clips and the computer could understand what they were saying, regardless of accent.
The other part of the experiment is what frequencies you decide to use when digesting a plane down to a single X/Y coordinate. By playing around here, you could isolate sounds from a crowded room. The other approach they took to isolating sounds was to increase the "weight" of portions of the lower level grids. Different areas of these grids would represent different areas of interest in the crowded room. By putting a spot light on a particular area, you would in turn filter out the other sounds in which you are not interested. - olegk, on 10/12/2007, -2/+2Neural net is hardly a mathematical solution.
- RichGC, on 10/12/2007, -3/+3Not impressed, what they describe is in no way a 'solution'.
- netdroid9, on 10/12/2007, -3/+2If we're using a neural network, it should be possible to 'create' the algorithm. But then, did the mathematicians actually 'solve' the problem, or is the neural network soley responsible for this discovery?
- toekneebullard, on 10/12/2007, -3/+2From my understanding of audio (I'm an audio engineer) it's impossible. It's really unfortunate that they don't explain the technique at all.
The biggest thing is this would change audio recording as we know it. If you could isolate specific sounds, and eliminate them, you're talking about a huge change in the way things are recorded. - progidy, on 10/12/2007, -2/+1With neural nets, you have a set of data and some "neurons" with possible connections to each other and a range of values they can end up with. Then you feed them junk and tell them how close to the solution they are, and they try to do better (repeating thousands of times). So your neural net can end up being very specialized on your data set.
No mathematic formula = not mathematically provable = Trumped up headline - quine, on 10/12/2007, -5/+3I thought that was already pretty much taken care of with the advent of the infinite improbability drive and the Theory of Indeterminacy?
- inactive, on 10/12/2007, -4/+1Translated: "Need more funding" and in the current political climate they will get it.
THEY didnt really solve it, the computer neural network did. - LucasVB, on 10/12/2007, -7/+4Excellent reference to HHGG! Wop!
- strcmp, on 10/12/2007, -16/+9It's probably illegal under the PATRIOT act.
I'm not kidding.


What is Digg?
The Digg Toolbar for Firefox lets you Digg, submit content, and keep track of Digg even when you're not on the Digg site. Download the official