adriva: stack

Saturday, January 28, 2017

You don't own that thread

Well if you have created a thread of course you can manage its timeline, that’s not my point. My point is not all threads created equally. So, where does this come from?

It comes from a problem my team have experienced very recently:

The requirement was very simple actually, create an AppDomain, load some assemblies into it then unload the dynamically created AppDomain. Now the first problem is “who is going to unload the AppDomain?”. In our scenario (besides one exceptional case) it’s the dynamic AppDomain that decides that its work is completed and now it’s safe to be unloaded. Of course, the dynamic AppDomain cannot unload itself. So what it does is notify the main AppDomain and ask it to unload itself. Simple…

So how do you send that notification to the main AppDomain? Call a method on a remote object (MarshalByRefObject maybe) or trigger an event and hope that someone on the other side of the wall is listening? These are all valid decisions but may lead to some problems. In either scenario you “call” a method, which also means there’s a thread running and you use that thread to “execute” the “Dear main AppDomain… please unload me” command.

In that very same thread the main AppDomain may try to unload the dynamically AppDomain. See the problem now ?

We spawn a thread in the dynamic domain which travels across memory (and execution) boundaries and goes into the main domain and then should rewind (stack based programming sucks ha?) in the originating (dynamic) domain.

So in order to properly unload an AppDomain (unless you don’t care about the work it is doing) you need to make sure that all your worker threads are synchronized and preferably gracefully shutdown or completed. If there’s one thread that goes back and forth between domains, that is trying to unload the dynamic AppDomain you are very likely to have problems.

The solution seems obvious now. Make sure the dynamic AppDomain is unloaded by another thread, which has nothing to do with (actually “in”) the dynamic domain. One easy solution could be; having the request thread (RT, the one that requests the main AppDomain to unload the dynamic one) spawn unloader thread (UT, that will actually unload the AppDomain).

And make sure UT and RT are synchronized before unloading the dynamic AppDomain. This will ensure that the RT’s call is returned and stack is rewound properly. Something like UT.Join(RT) then Unload(DynamicAppDomain).

Monday, May 9, 2016

Stack-Oriented Programming

If you have been doing programming, you already know what a stack is so I'll skip the details. What I'm more interested is, the call stack. You know how it works...

Methods call each other and as the call chain grows the addresses are pushed down the stack, so the processor knows where to go once the called method is returned. That's the basic idea.

For the sake of the argument let's skip the magic going on under the hood and ignore any registers that might be involved. Now we have a stack that stores the return address as well as the arguments passed to a function. This is a very elegant design in fact and proven to do the job.

The only problem with a stack residing in the memory space of a process is, well, it is finite. You can easily run out of stack space especially if you are doing graph or tree traversal or even calculating factorials for no reason. There are of course well known methods to evade that problem such as using non-recursive algorithms or tail recursion (that last one is usually handled by the compiler optimizations though, so check your compiler's documentation before relying on me or even better use loops).

Now let's imagine a stack that is spread across computers (or nodes, remember?) and let's call a function on the very first node, Factorial(20) recursively which would normally need 20 stack frames for the called functions. (I know that depends on a lot of other things).
How about if it only needed 1 on each node? So basically the very first node would just call a function on another node and push the data down the stack and wait for a pop, the second node would do the same and that would go on until a terminating condition is reached. (Assuming you know where to stop and preferably done some validation before blindly accepting the input argument). What we have done here is, basically, replaced the number stack frame required with nodes required to finish the processing. And the good thing is a node can again call itself but with a small difference, compared to our old school local, per process stack; that is there's no shared stack between processes or between nodes. (That would be inefficient to implement for now.).

Just changing the words in this scenario will help us a lot, trust me with this one;
We will not pass arguments to the callee, instead we will send messages to the callee. You see where I'm going?

One local process will be replaced by a cluster of processing nodes that save us stack space and more-importantly "distribute" processing across nodes. That's obviously not something new but something to keep in mind to achieve greater scalability. It also allows you to do one more thing; "asynchronous processing by design".

One major question might be how to pass messages across the nodes especially if the nodes reside on different computing units. Well, we will use any well established "wire" protocols (I know we have wireless now) and frameworks which are at our disposal. Yet I must say, I prefer queues (can be anything from Msmq to Enterprise Service Bus) and I'll tell you why in the next post.

Next: Queue as a distribution system.