Showing posts with label queue. Show all posts
Showing posts with label queue. Show all posts

Thursday, May 12, 2016

More on Queues...

In the previous post I wrote about how queues could be used as a transport and why I prefer using them and we have successfully identified a problem and a possible solution to the (fast writer slow reader) problem.

Another solution could be having multiple readers, in our context, that would be having multiple computing nodes. At the end of the day we don't need to out-swim the shark, we just need to out-swim the other prey. So as long as we can process n messages in a second, it's OK to have millions of other nodes sending, at most, n messages every second.

Of course, there could be other potential solutions to this particular problem, such as limiting the number of requests that can be sent or having some kind of batch processing logic in place or anything else that fits into your design and constraints.

I'd like to write some more about having multiple computing nodes, listening to the same queue. I will assume your queue system is delivering messages to one listener at a time. In other words, once the message is received no other nodes can receive the same message since it will be marked as "delivered".

This is a good enough (yet I must say, there's not one size fits all design as you already know) design for now. And, in fact, it's a very good task scheduler that comes with built-in clustering and scheduling so you don't need to worry about those problems. 

To visualize the system let's say we have one Web API that sends messages to the queue where we have 2 computing nodes that processes them. Messages will be processed by only one node at a time, so we don't need to worry about processing the same message over and over again. Also if one node goes down (poisonous message perhaps, or a disk failure) the workload of the second node will not change in terms of messages per second, yet it will take over the first node's responsibility as well. That's not an ideal situation, because at the end of the day a node that you have invested in, is not working properly. Overall system performance, including your response rate, may go down but it's still better than shutting everything down while you bring up your system back to life. Basically it gives you a time window where you can fix the problem.

In this example we have two nodes doing the heavy lifting. And they are constantly asking the queue to deliver the next message and at some point there will be concurrent receive requests that are sent to the queue. But we said our queue is delivering the next message to one node at a time, so only one node will receive the message and the other node will receive the message after it. That's a fair thread scheduler and the idea behind the cloud computing isn't it? You know your request will be processed by your rules but you do not know which node will process it. It's just like you don't know the thread handle, until the operating system creates the thread but you know it'll be started at some point.

One thing you should remember is, you need to keep the message processing logic identical across the nodes (for that specific message type) and keep them doing one thing (single responsibility) at all times. So if you need to resample a photo, then send an e-mail to the customer, then do X,Y and Z, it might be a good idea to have separate nodes doing X,Y and Z. But like I said, there's no one size fits all design.

But I would keep things simple and have micro-monoliths instead of giants in my nodes as a rule of thumb.

Wednesday, May 11, 2016

Queue as a Distribution System

Since we have successfully distributed our workload across computing nodes so far, it's time to think about the distribution system now. Basically we need a post office to deliver our messages to the nodes.

There's quite a few options out there and if you are familiar with distributed programming you probably already know about them and to be honest I can't do benchmarks and cover 5 gazillion different scenarios to see the cons and pros of each. Instead, I'll just write about why I prefer using queues (where applicable of course) to other means of distribution systems.

Let's start with the name "distribution system". I'll simply call it a "transport", not that it's an industry standard or so, but only because that's the name I've been using for a while.

Queue, as we all know, is a first in first out data structure that can be used to put and take items in order. Simple but elegant. And again as we all know, by using a queue to post messages to a node, you can ensure the workload of the node stays (almost) same.

For example, assume we're implementing a photo sharing system and to keep the marketing guys happy let's give it a name. Zinstagraham (any resemblance to reality is just coincidence) will do fine. And again, just to keep it simple we can assume that Zinstagraham is an MVC application that has a "Photo Controller" that is used to upload an image file on to the server and resamples it to a certain size. It works perfectly fine with one user (which must the developer itself, I guess). Then when it becomes a world-wide system, with millions of users trying to upload photos it's almost inevitable to experience performance problems. Well, there can be different reasons but using the memory space and (even worse) the processing power of our web server for image processing, suddenly seems to be a bad idea. 

The developers decide to separate the image resampling piece from the rest of the system and deploys it onto another server. Now somehow, we need to tell that image processing node, to process an image. But we still need to be careful since just by distributing the execution of our system across nodes, doesn't mean that specific node has infinite memory and processor power. So if we choose the wrong transport and ,for example, directly call the method on the node, we'll simply be moving our original problem to a different node. We need to ensure that the workload of the image processing node doesn't go through the roof. Among different alternatives using a queue as a transport may help. Zinstagraham will just send a message to the queue saying there's an image file at a specific location to be processed and our node (let's assume it's a single threaded application) will receive the messages one-by-one and process the image, thus keeping the resource usage around the same levels at all times. 

This design introduces another problem though. If the speed you write to the queue is greater than the speed you take items off the queue, then you'll start running out of storage space very fast. Yes, you are right, that can easily turn into a DoS attack.

The simple solution is, keep your queue processing logic simple, small and fast yet, I know, it may not be possible at all times. So we need to discuss a bit more...

Next: More on Queues