43 Comments
- Agret, on 10/12/2007, -0/+7Sorry, Small is faster than Lua.
- WilliamTanksley, on 10/12/2007, -0/+6The advantages of Stackless Python aren't even faintly related to the advantages or disadvantages of a stack processor. In that context, "stackless" refers to the fact that Stackless Python doesn't use the C stack (instead it uses its own data structure) to handle function calls. It's faster simply because Python code uses the stack very differently from how C code is supposed to use the stack.
In contrast, a stack-based processor is programmed in its own machine language, and of course is programmed in a way that's optimal for it. (It's also easy to optimize for.)
As for Lua's fast register-based VM: I don't see any reason why a stack-based VM wouldn't be faster -- but I'd be willing to read an explanation of why I'm wrong. I spent quite a while trying to find one. To the best of my knowledge and research, Lua's VM is just a nice, fast VM which happens to be register-based; it's faster than its old VM, but there's no reason anyone's advanced that it should be faster than a stack-based VM, such as colorForth's. - wush, on 10/12/2007, -0/+5the bittorrent transfer speeds are smoking too
- holloway, on 10/12/2007, -2/+6Most kernels could be a few kilobytes if they didn't have any features.
Remember that rule of data compression that says there's a limit based around the archive actually describing the data. That restoring a 1bit archive to anything other than 2 states is impossible, and similarly I bet their 3 kilobyte kernel wouldn't do IPV6 and Bluetooth or anything people expect of a modern kernel.
It's interesting stuff but this inane bragging makes it easy to deride. - dupswapdrop, on 10/12/2007, -1/+5Stack computers are fast and can run FORTH as the os and high level system.
- inactive, on 10/12/2007, -0/+3The server smells burning (well actually more just warning about the CPU over heating, but whatever). Please use bittorent if you can :-)
- goggleBOX, on 10/12/2007, -0/+3I second that. This torrent is perhaps the fastest I have ever downloaded.
- WilliamTanksley, on 10/12/2007, -0/+3Interesting to see this here today. I was just reading some articles from Chuck Moore's latest company -- it seems he's finally built his 24-core asynchronous stack processor. Very impressive high speed and low power draw (and of course because it's asynchronous its power draw can be VERY low). Clever design, too.
- MorningCoder, on 10/12/2007, -0/+3"...stack computers and why they are better than register-based computers"
What's the big deal here? The Java VM (a very popular stack based VM) has been around for many years. Now it is news?
"He also claims that a kernel would only be a few kilobytes large!"
Code generation for stack base machine is easier in most cases, and easier code generation means less compiler bugs. Some say stack base machine is slower, but that depends on what assumptions you base your arguments on. For the small kernel size, I don't think it has anything to do with stack or register based machine. - daven1986, on 10/12/2007, -0/+3looks interesting, downloading now.
- pak314, on 10/12/2007, -0/+3The big problem with stack machines is that it is hard to make them superscalar. Superscalar means more than one instruction can be dispatched at a time to multiple function units. This is the method the x86 and RISC processors use to obtain high performance. To do superscalar effectively, you must be able to execute instructions out of order but for stack machines this is hard because everything accesses the top of stack so there is always data access conflicts forcing a defined order for execution.
Some stack processors have worked around this using VLIW or very long instruction words where multiple compatible instructions are stuck together at compile time. This works well for stack machines since their opcodes can be made very tiny due to lack of register operands in most instructions. The itanium also used VLIW but has failed miserably for a variety of reasons.
Some hybrid stack processors losen the restriction on stack accesses by concept of stack frames or allowing access to first few entires on top of stack thus allowing more opporunity for instruction level parallelism. This can also reduce a lot of the stack thrashing of dup and drops of pure stack processors. I think the JVM and CLR(?) follow this method. - WilliamTanksley, on 10/12/2007, -0/+2A well written analysis. Thank you. I'm not familiar with any VLIW stack processors, though -- what do you mean? Such a thing would seem contradictory to what I know of stack processors; unless possibly you're talking about squeezing multiple instructions into a single memory fetch, as the SeaForth processor does -- but I wouldn't call that VLIW, but rather an extremely small cache line :-).
(Stack processor advocates -- of whom I am obviously one -- would say that short cycle times are a tradeoff for superscalar performance. Sometimes you need one, sometimes the other. If you're doing hard realtime, you ALWAYS want short cycle time and rarely want superscalar performance; if you're running a general purpose OS, you'd be very happy with superscalar performance. Multiple cores can substitute for superscalar performance for some general-purpose apps.) - WilliamTanksley, on 10/12/2007, -0/+2Thanks for a clear explanation, KCorax. I now somewhat understand the problem we're having communicating :-) -- as much as I can with my lesser experience in your field.
I have two responses.
First, I must note that the best way we know of to use stack processors is to tie a HUGE number of very simple ones together; this takes best advantage of their simplicity. See www.intellasys.com. But this also reveals a serious shortcoming: such a system would make a BAD general-purpose machine as we understand it today. The best use for a system like the SeaForth-24 would be to use it as an embedded controller in a (very) smart device. So this argues in your favor: stack processors have their limitations because their strength is in their simplicity.
Second, I get to argue MY point :-). Optimizers can output good code for register machines; but even with much less research we can already output "good enough" code for stack machines. With more research our optimizers will only get better. The problem you're describing sounds like a dynamic optimization, and with any actual processor running an actual problem the stack isn't going to fit on-chip; the data structures that control the execution will be in memory, a realm where all processors are equal. A realistic stack processor has between 6 and 36 elements in each of its 2 on-chip stacks; no backtracking system will operate with only 36 levels of recursion (and that's ignoring the fact that you're working with stack frames, not simple return addresses).
No, your problem domain lives in memory, not onboard a processor. This means that the architecture of the processor matters very little. - marxmarv, on 10/12/2007, -0/+2And it's embedded in very big devices. It's fun to run FORTH code on a Sun UE10k :)
- kodek, on 10/12/2007, -0/+2Each kernel is designed for one hardware/processor architecture. You're probably talking about hardware devices, but that's what drivers/kernel modules are for.
- WilliamTanksley, on 10/12/2007, -0/+1Oh, KCorax - I forgot to add that I agree with you on the L4 point: small kernels are not a secret property of stack machines :-). Claiming so only makes the post look ignorant.
- lobrien, on 10/12/2007, -0/+1Not necessarily. Object-orientation was invented around '67 (Simula), but only caught on ~20 years later, when C++ became popular. As the context of "what's hard" evolves over time, "failed" technologies have a _chance_ of re-emerging and becoming dominant.
- eddigg, on 10/12/2007, -2/+3The technology has been around for 40 years and there are no successful commercial products today. Says it all
- brandonr, on 10/12/2007, -0/+1Kernels would only be a few kilobytes if they didn't have to conform to all the different hardware and processor architectures.
- WilliamTanksley, on 10/12/2007, -2/+3Where did you get the idea that stack processors are impossibly complex to make? When I made/simulated mine (for a CPU architecture class at UCSD) it wasn't complex. It's actually extremely simple -- just build your stack and your ALU, tie the top-of-stack and second-on-stack to the ALU inputs, and MUX the ALU output to the top-of-stack. Add a "pop" to remove the second-on-stack for two-operand instructions, and all that's left is the same stuff you have to do in any processor. No need for register files, instruction decoding is dead simple...
- KCorax, on 10/12/2007, -1/+2Stack computers are nothing new. They are simply impossibly complex to make. In the past we have had processors that run prolog or lisp opcodes natively. It's like taking a RISC processor and cutting down on register specific features.
The truth is that we don't need to do that. Modern JIT compilers translate the stack machine code on the fly while taking advantage of every available command.
As for tiny kernels, please we've had microkernels like that for decades. Try L4 or L4Linux for a usable product. - marnaq, on 10/12/2007, -1/+2Bittorrent scales.
- lobrien, on 10/12/2007, -0/+1URI for the .torrent, please?
- dwhitbeck, on 10/12/2007, -0/+1Were the old TI computers stack based? I don't think they had traditional registers.
- digitalunltd, on 10/12/2007, -0/+1Page is down, can anyone post the torrent for the divx format
- pak314, on 10/12/2007, -0/+1Reply to WilliamTanksley...
By VLIW, you get multiple instructions in each memory fetch *and* execute them simultaneously. Take the Novix chips designed by Chuck Moore as an example. Most ALU instructions could be combined with a return opcode so the return happened with no overhead. Or you could do a increment and load at once as onther example. Probably best to search Google about the Novix processors to understand better. The key feature is that each pair of compatible opcodes didn't share any bits so to combine them you just 'or' the opcodes together. So the Forth compiler would look at each intruction pair to see if they are compatible and they merge them on the fly. In a stack processor this packing is easier since most instruction had no register operands leaving many free bits. - dupswapdrop, on 10/12/2007, -0/+1FORTH runs on more stuff than you think, it's very big in embedded devices.
- pantuky, on 10/12/2007, -0/+1Thanks for that information. Does anyone have any resource URLs about the combo of VLIW and stack computing? Since these are two competing next-gen ideas, your natural thought is to marry them together and produce super-hybrid offspring, but of course, they may be of different spieces and incapable of breeding together. It would be fantastic to see these two ideas work together to generate some new form of super processor. If you could multi-core the processor, then you would have the ultimate. Multicore, VLIW, stack computing.
This is all such a trip! My first experiances programming were with FIG Forth on the Commodore 64 back in the year 1983-1984. I never was successful with Forth, but I learned a lot about it. I discarded that knowledge years ago, assuming that it was just a dead end. Now here it comes around again. Everything old is new again! - KCorax, on 10/12/2007, -1/+2Apparently you didn't include support for operations like working below the current stack top and in-stack registers. This disallows for logical operations like green/red cuts + assertions thus making a stack machine super inefficient. By super I mean having n^2 execution time for most linear programs. It's not your fault, it's just that the design isn't meant to be fast but clean and simple. This is ok for smartcards and personal devices like watches but absolutely not for bigger implementations.
- brandonr, on 10/12/2007, -0/+1You're right, my mistake. I meant the hardware drivers, only the CPU arch you specify in the config file is in the kernel binary.
But, fun experiment for all you diggers: If you're running ubuntu, debian, redhat, etc (I've never installed a BSD distro, but I suspect it's the same), go look at your kernel build config file, and the size of the actual kernel binary. You may have to download the kernel source from your distros site. But either way... It's not pretty. - pauldonnelly, on 10/12/2007, -1/+1Kcorax, what are you talking about? A stack based computer's biggest advantage is its simplicity.
- soyulent, on 10/12/2007, -1/+1Been a long time since I've seen any Forth. It tends to be a little cryptic:
CR .( hell world )
I might argue that Lisp (CLOS) is the fastest dynamic scripting language, but then again, I'm older than dirt. - megafrenzy, on 10/12/2007, -3/+3I've been using FORTH for years and I still am, for several successful commercial embedded products. It can beat the pants off any C developed code in both programming time and execution time.
- zydeco, on 10/12/2007, -0/+0Most of the old videogames by Bally-Midway, especially the ones that ran on the MCR II and MCR III hardware (eg TRON, Gorf, Spy Hunter) used a FORTH interpreter running on the Z-80. They were pretty sucessful commercially, and ran wickedly fast on that simple little chip.
- KCorax, on 10/12/2007, -1/+1I understand where you are coming from, and from your standpoint you are right. I've been in reseach on compilers and interpreters for logical/constraing languages (which are better described in stack automata than functional) and I am right too.
By working under the stack top I mean optimizing for alternate execution paths when performing a goal search. When a search fails we change an element below the stack to simulate having backtracked there and runing to the top again. Search the ACM held ICLP for such papers. This is trivial to implement on register systems while it requires operations that are far from the model when working with stack machines. My point is there is no reason to implement stack processors when the compilers that target registers create so much better code.
In stack registers = changing a stack frame below the top. It's an assertion that is only visible to the current thread/automaton. This increases the applicable surface for optimizations like the previous one.
Google scholar sucks if that's what you mean. You need IEEE's explorer or some in house system. www.heal-link.gr searches in IEEE/ACM/Elsevier/Springer. Most universities have some contract like that.
> and backtracking is ALWAYS exponential
more like NP complete you mean ?
>and _never_ seen anything involving an algorithmic complexity increase
Compare the two classic prolog samples:
# underscore is used as space in digg
reverse([],[]).
reverse([H|T], L2) :-
____reverse(T,NT), append(NT,[H],L2).
and
reverse(List, Reversed) :- reverse(List, [], Reversed).
reverse([], Reversed, Reversed).
reverse([Head|Tail], SoFar, Reversed) :- reverse(Tail, [Head|SoFar], Reversed).
The latter can be optimized as n while the first is n^2. However both are optimized to be n. This is because we can cut and unify the implicit variables to death. There are many cases where a complex program comes down to this and we can't force compilation to be made like the second one because we fear there might be side effects, unless a developer is smart enough to assert is as pure virtual.
There are way too many optimizations that can be done on stateless languages, and this is the primary reason why people scream not to use global assertions or make them impossible by design or isolate them to the current stack. The most efficient target to compile stack machines to are circular overlapping register machines *by far*. Check the SPARC for reference.
Btw you need heavy kohones to develop in Forth. I think it's psychedelic.
For the people that are moding him down: You really don't have a clue to what you are reading don't you ? He's making a valid point and you are still defending the article. This is why slashdot will always be a more interesting read. - stripes, on 10/12/2007, -0/+0A UofMD student in the late 80s took a stack based language (FORTH derivative) and used a microcontroler with 6K of ROM and 2K of RAM and implemented SLIP (this was before PPP was popular, maybe before it existed), IP, ICMP, UDP, and TCP. Several of the "small servers" (time of day, chargen), and a telnet client. He also implemented an ADM3a compatible UART and since the pinouts on his microcontroler were compatible with the ADM3a's UART you could replace it and turn a dumb terminal into a multi-session telnet client.
That is a lot more then I see most systems do with 256K.
Remember this is smaller then the boot ROM of most systems and he had a working TCP/IP, so next time someone wines about "we had to netboot via TFTP because TCP is too much to do in ROM!" think about stack languages (and CPUs). - WilliamTanksley, on 10/12/2007, -0/+0pantuky, there's two good ways to get info, but both have problems.
The first way is to read old information, which is fairly complete but takes some work to find; the shortcoming is that not many people have done much work on stack processors, so the old info can be misleading. A good place to start on this path is Phil Koopman's online text "Stack Computers: the new wave" (published 1989). Google it.
The second way is to try to read between the lines of current efforts. These are obscured by efforts to keep trade secrets and by excessive marketting fluff, but if you can see past the chaff you'll get a better picture of what's possible. The most avant-garde site I can find is www.intellasys.net. A search for Machine-Forth or Color-Forth can also be enlightening (Intellasys's chips were designed using a written-from-scratch CAD system in colorForth). The (modern) chips Chuck Moore designed before are named the P20, P21, i21... One other, I forget the name. - marxmarv, on 10/12/2007, -0/+0It had a memory architecture, kinda like the Microchip PIC.
http://en.wikipedia.org/wiki/TI-99 - stripes, on 10/12/2007, -0/+0There aren't any successful commercial stack processors on the market right now (well, there might be in the embedded space, I don't know everything that is going on there), but Sperry use to do stack based mainframe CPUs. I think Lucent's Hobbit was modestly successful and it was a stack based CPU.
Failing to be pretty much the only surviving mainframe isn't exactly a black mark.
I don't see current CPU trends favoring the super simple stack CPU designs, but maybe something will swing that around...like if the whole programming industry gets behind massively parallel software design (I find that a bit unlikely, but who knows) a simple stack CPU can give you more CPUs per die area...
Change the assumptions designs are based on and old discarded ideas are all new and shiney again. - WilliamTanksley, on 10/12/2007, -2/+1KCorax, I still don't understand what you're talking about. "Working below the current stack top" is done by either swapping stack elements, shoving elements off somewhere else (perhaps into memory, perhaps onto the second stack), or reworking your order of operations so that the things you need are already where they should be; all of those are constant-time operations, and are no worse than the constant-time overhead of muxing and demuxing a large register file (which is the corresponding work that a register-addressing processor must do and a stack processor doesn't have to). The phrase "in-stack registers" doesn't mean anything to me or Google -- I don't know of any architecture where it would make any sense.
What DO you mean?
As for the rest of your post -- I don't know why your claims could possibly be true. I've done a LOT of stack programming (although usually for stack languages like Forth and Postscript) and _never_ seen anything involving an algorithmic complexity increase. Your talk about green cuts and red cuts and linearity seem incoherent to me -- those are terms for pruning decision trees for a backtracking problem, and backtracking is ALWAYS exponential by definition, so there's NO WAY any processor technology is going to make it linear. Backtracking is easy to implement on a stack processor, especially if the processor has two stacks (a data stack and a return stack); I've done it several different ways (although I admit that I've never written a backtracking optimizer, so I don't have practical experience in red-cut/green cut). - WilliamTanksley, on 10/12/2007, -3/+1soyulent, rather than saying that a hello-world program is cryptic, wouldn't it make more sense to say that you're not familiar enough with the language to read and write it easliy? :-)
(Although clearly you're familiar enough to remember dot-paren.) - jesusphreak, on 10/12/2007, -4/+1Well, I know there are quite a few disadvantages to stack computers, hence why certain projects such as Stackless Python are attempting to move away from such stuff. Also, likely the fastest dynamic scripting language in existence, Lua, is based off of a register-based virtual machine.
- goggleBOX, on 10/12/2007, -6/+1*nix has been around for about that long and no one uses that either! Popularity has no correlation to performance, usefulness or cost effectiveness. Remember everyone still drives to work even though the bus offers better price performance (yes the bus is cheaper than driving your own car), better use of your time (you can read or listen to music rather than driving, even work on your laptop) and can even be faster (bus lanes).
What is Digg?
Check out the new & improved