Wednesday, May 11, 2016

Queue as a Distribution System

Since we have successfully distributed our workload across computing nodes so far, it's time to think about the distribution system now. Basically we need a post office to deliver our messages to the nodes.

There's quite a few options out there and if you are familiar with distributed programming you probably already know about them and to be honest I can't do benchmarks and cover 5 gazillion different scenarios to see the cons and pros of each. Instead, I'll just write about why I prefer using queues (where applicable of course) to other means of distribution systems.

Let's start with the name "distribution system". I'll simply call it a "transport", not that it's an industry standard or so, but only because that's the name I've been using for a while.

Queue, as we all know, is a first in first out data structure that can be used to put and take items in order. Simple but elegant. And again as we all know, by using a queue to post messages to a node, you can ensure the workload of the node stays (almost) same.

For example, assume we're implementing a photo sharing system and to keep the marketing guys happy let's give it a name. Zinstagraham (any resemblance to reality is just coincidence) will do fine. And again, just to keep it simple we can assume that Zinstagraham is an MVC application that has a "Photo Controller" that is used to upload an image file on to the server and resamples it to a certain size. It works perfectly fine with one user (which must the developer itself, I guess). Then when it becomes a world-wide system, with millions of users trying to upload photos it's almost inevitable to experience performance problems. Well, there can be different reasons but using the memory space and (even worse) the processing power of our web server for image processing, suddenly seems to be a bad idea. 

The developers decide to separate the image resampling piece from the rest of the system and deploys it onto another server. Now somehow, we need to tell that image processing node, to process an image. But we still need to be careful since just by distributing the execution of our system across nodes, doesn't mean that specific node has infinite memory and processor power. So if we choose the wrong transport and ,for example, directly call the method on the node, we'll simply be moving our original problem to a different node. We need to ensure that the workload of the image processing node doesn't go through the roof. Among different alternatives using a queue as a transport may help. Zinstagraham will just send a message to the queue saying there's an image file at a specific location to be processed and our node (let's assume it's a single threaded application) will receive the messages one-by-one and process the image, thus keeping the resource usage around the same levels at all times. 

This design introduces another problem though. If the speed you write to the queue is greater than the speed you take items off the queue, then you'll start running out of storage space very fast. Yes, you are right, that can easily turn into a DoS attack.

The simple solution is, keep your queue processing logic simple, small and fast yet, I know, it may not be possible at all times. So we need to discuss a bit more...

Next: More on Queues