Thursday, May 12, 2016

More on Queues...

In the previous post I wrote about how queues could be used as a transport and why I prefer using them and we have successfully identified a problem and a possible solution to the (fast writer slow reader) problem.

Another solution could be having multiple readers, in our context, that would be having multiple computing nodes. At the end of the day we don't need to out-swim the shark, we just need to out-swim the other prey. So as long as we can process n messages in a second, it's OK to have millions of other nodes sending, at most, n messages every second.

Of course, there could be other potential solutions to this particular problem, such as limiting the number of requests that can be sent or having some kind of batch processing logic in place or anything else that fits into your design and constraints.

I'd like to write some more about having multiple computing nodes, listening to the same queue. I will assume your queue system is delivering messages to one listener at a time. In other words, once the message is received no other nodes can receive the same message since it will be marked as "delivered".

This is a good enough (yet I must say, there's not one size fits all design as you already know) design for now. And, in fact, it's a very good task scheduler that comes with built-in clustering and scheduling so you don't need to worry about those problems. 

To visualize the system let's say we have one Web API that sends messages to the queue where we have 2 computing nodes that processes them. Messages will be processed by only one node at a time, so we don't need to worry about processing the same message over and over again. Also if one node goes down (poisonous message perhaps, or a disk failure) the workload of the second node will not change in terms of messages per second, yet it will take over the first node's responsibility as well. That's not an ideal situation, because at the end of the day a node that you have invested in, is not working properly. Overall system performance, including your response rate, may go down but it's still better than shutting everything down while you bring up your system back to life. Basically it gives you a time window where you can fix the problem.

In this example we have two nodes doing the heavy lifting. And they are constantly asking the queue to deliver the next message and at some point there will be concurrent receive requests that are sent to the queue. But we said our queue is delivering the next message to one node at a time, so only one node will receive the message and the other node will receive the message after it. That's a fair thread scheduler and the idea behind the cloud computing isn't it? You know your request will be processed by your rules but you do not know which node will process it. It's just like you don't know the thread handle, until the operating system creates the thread but you know it'll be started at some point.

One thing you should remember is, you need to keep the message processing logic identical across the nodes (for that specific message type) and keep them doing one thing (single responsibility) at all times. So if you need to resample a photo, then send an e-mail to the customer, then do X,Y and Z, it might be a good idea to have separate nodes doing X,Y and Z. But like I said, there's no one size fits all design.

But I would keep things simple and have micro-monoliths instead of giants in my nodes as a rule of thumb.

Wednesday, May 11, 2016

Queue as a Distribution System

Since we have successfully distributed our workload across computing nodes so far, it's time to think about the distribution system now. Basically we need a post office to deliver our messages to the nodes.

There's quite a few options out there and if you are familiar with distributed programming you probably already know about them and to be honest I can't do benchmarks and cover 5 gazillion different scenarios to see the cons and pros of each. Instead, I'll just write about why I prefer using queues (where applicable of course) to other means of distribution systems.

Let's start with the name "distribution system". I'll simply call it a "transport", not that it's an industry standard or so, but only because that's the name I've been using for a while.

Queue, as we all know, is a first in first out data structure that can be used to put and take items in order. Simple but elegant. And again as we all know, by using a queue to post messages to a node, you can ensure the workload of the node stays (almost) same.

For example, assume we're implementing a photo sharing system and to keep the marketing guys happy let's give it a name. Zinstagraham (any resemblance to reality is just coincidence) will do fine. And again, just to keep it simple we can assume that Zinstagraham is an MVC application that has a "Photo Controller" that is used to upload an image file on to the server and resamples it to a certain size. It works perfectly fine with one user (which must the developer itself, I guess). Then when it becomes a world-wide system, with millions of users trying to upload photos it's almost inevitable to experience performance problems. Well, there can be different reasons but using the memory space and (even worse) the processing power of our web server for image processing, suddenly seems to be a bad idea. 

The developers decide to separate the image resampling piece from the rest of the system and deploys it onto another server. Now somehow, we need to tell that image processing node, to process an image. But we still need to be careful since just by distributing the execution of our system across nodes, doesn't mean that specific node has infinite memory and processor power. So if we choose the wrong transport and ,for example, directly call the method on the node, we'll simply be moving our original problem to a different node. We need to ensure that the workload of the image processing node doesn't go through the roof. Among different alternatives using a queue as a transport may help. Zinstagraham will just send a message to the queue saying there's an image file at a specific location to be processed and our node (let's assume it's a single threaded application) will receive the messages one-by-one and process the image, thus keeping the resource usage around the same levels at all times. 

This design introduces another problem though. If the speed you write to the queue is greater than the speed you take items off the queue, then you'll start running out of storage space very fast. Yes, you are right, that can easily turn into a DoS attack.

The simple solution is, keep your queue processing logic simple, small and fast yet, I know, it may not be possible at all times. So we need to discuss a bit more...

Next: More on Queues

Monday, May 9, 2016

Stack-Oriented Programming

If you have been doing programming, you already know what a stack is so I'll skip the details. What I'm more interested is, the call stack. You know how it works...

Methods call each other and as the call chain grows the addresses are pushed down the stack, so the processor knows where to go once the called method is returned. That's the basic idea.

For the sake of the argument let's skip the magic going on under the hood and ignore any registers that might be involved. Now we have a stack that stores the return address as well as the arguments passed to a function. This is a very elegant design in fact and proven to do the job.

The only problem with a stack residing in the memory space of a process is, well, it is finite. You can easily run out of stack space especially if you are doing graph or tree traversal or even calculating factorials for no reason. There are of course well known methods to evade that problem such as using non-recursive algorithms or tail recursion (that last one is usually handled by the compiler optimizations though, so check your compiler's documentation before relying on me or even better use loops).

Now let's imagine a stack that is spread across computers (or nodes, remember?) and let's call a function on the very first node, Factorial(20) recursively which would normally need 20 stack frames for the called functions. (I know that depends on a lot of other things).
How about if it only needed 1 on each node? So basically the very first node would just call a function on another node and push the data down the stack and wait for a pop, the second node would do the same and that would go on until a terminating condition is reached. (Assuming you know where to stop and preferably done some validation before blindly accepting the input argument). What we have done here is, basically, replaced the number stack frame required with nodes required to finish the processing. And the good thing is a node can again call itself but with a small difference, compared to our old school local, per process stack; that is there's no shared stack between processes or between nodes. (That would be inefficient to implement for now.). 

Just changing the words in this scenario will help us a lot, trust me with this one;
We will not pass arguments to the callee, instead we will send messages to the callee. You see where I'm going?

One local process will be replaced by a cluster of processing nodes that save us stack space and more-importantly "distribute" processing across nodes. That's obviously not something new but something to keep in mind to achieve greater scalability. It also allows you to do one more thing; "asynchronous processing by design".

One major question might be how to pass messages across the nodes especially if the nodes reside on different computing units. Well, we will use any well established "wire" protocols (I know we have wireless now) and frameworks which are at our disposal. Yet I must say, I prefer queues (can be anything from Msmq to Enterprise Service Bus) and I'll tell you why in the next post.

Next: Queue as a distribution system.

Wednesday, May 4, 2016

Micro-monoliths

That really doesn't sound right, probably for years and years we thought monoliths are "bad". They are not bad necessarily, for example, if you are happy with your legacy application, if it is not too hard to maintain or update, if it is still making money... I wouldn't try to change it either.

This is again one of those "size matters" debates and probably there is no absolute right or wrong. In a world of micro-controller instructions, even a few lines of code may seem to be a monolith whereas in a world of enterprise giants it may not even be visible.

But let's focus on something else now. Since we have taken a big step towards improving our codebase, let's focus on improved reusability.

If you are reading this post there's a good chance you have a few "internet identities". Remember the time you got your first free web based e-mail account. You also had to come up with a password that you are not supposed to forget. Then things got complicated and we all ended up with quite a few accounts to remember. It's hard... The solution is simple though, either your client application saves the account information and you don't need to remember a thing or you use a single sign on service so you can limit the number of passwords to remember. Long story short, welcome to the era of services. It doesn't really matter how these services are accessed or consumed for the time being. The main point is, there are things out there, doing things that you would like to. And you can get them do their magic anytime you want.

Applying this logic to our micro-monoliths we'll start seeing thing differently at another scale. Now instead of moving (or maybe packing) our micro-monoliths with our application core (orchestrator maybe?) we may start not deploying them at all or deploy onto a specific set of cluster only. 

VoilĂ  ! We have successfully improved our design and have a basic understading of "micro services".

If you are designing a software application, you are dealing with a micro processor which has something called an "instruction pointer". No, I'll not go into the details of the processor architecture but I'll focus on the phrase itself. It tells us every bit of software has instructions, so the processor can execute them, and a pointer so the processor knows what to do now and what to do next. That is, by definition, a flow.. a workflow if you will.

Since we have our building blocks in place now, let's put them together. We are designing and implementing a workflow and we want to do it fast (time to market), efficient (minimum resources) and maintainable (code aging) to stay competitive.

To achieve our goal, we know that we need to break the problem into smaller problems, make solutions of those smaller problems easy to maintain, at the end of the day our application will be as maintainable as the most complex component of it, make deployment and versioning as managable as possible. It may also help if we could "distribute" the execution of our components across nodes (not computers, because a node is simply a resource that can execute some instructions and not necessarily a computer).

It's clear that by making our micro-monoliths act like services we can achieve them all. So building worklow, besides making thousand of design decisions (smiley goes here), is just putting our "micro-services" together.

Of course, easier said than done.

Next: Stack-oriented programming.