If you have been doing programming, you already know what a stack is so I'll skip the details. What I'm more interested is, the call stack. You know how it works...
Methods call each other and as the call chain grows the addresses are pushed down the stack, so the processor knows where to go once the called method is returned. That's the basic idea.
For the sake of the argument let's skip the magic going on under the hood and ignore any registers that might be involved. Now we have a stack that stores the return address as well as the arguments passed to a function. This is a very elegant design in fact and proven to do the job.
The only problem with a stack residing in the memory space of a process is, well, it is finite. You can easily run out of stack space especially if you are doing graph or tree traversal or even calculating factorials for no reason. There are of course well known methods to evade that problem such as using non-recursive algorithms or tail recursion (that last one is usually handled by the compiler optimizations though, so check your compiler's documentation before relying on me or even better use loops).
Now let's imagine a stack that is spread across computers (or nodes, remember?) and let's call a function on the very first node, Factorial(20) recursively which would normally need 20 stack frames for the called functions. (I know that depends on a lot of other things).
How about if it only needed 1 on each node? So basically the very first node would just call a function on another node and push the data down the stack and wait for a pop, the second node would do the same and that would go on until a terminating condition is reached. (Assuming you know where to stop and preferably done some validation before blindly accepting the input argument). What we have done here is, basically, replaced the number stack frame required with nodes required to finish the processing. And the good thing is a node can again call itself but with a small difference, compared to our old school local, per process stack; that is there's no shared stack between processes or between nodes. (That would be inefficient to implement for now.).
Just changing the words in this scenario will help us a lot, trust me with this one;
We will not pass arguments to the callee, instead we will send messages to the callee. You see where I'm going?
One local process will be replaced by a cluster of processing nodes that save us stack space and more-importantly "distribute" processing across nodes. That's obviously not something new but something to keep in mind to achieve greater scalability. It also allows you to do one more thing; "asynchronous processing by design".
One major question might be how to pass messages across the nodes especially if the nodes reside on different computing units. Well, we will use any well established "wire" protocols (I know we have wireless now) and frameworks which are at our disposal. Yet I must say, I prefer queues (can be anything from Msmq to Enterprise Service Bus) and I'll tell you why in the next post.
Next: Queue as a distribution system.