Another solution could be having multiple readers, in our context, that would be having multiple computing nodes. At the end of the day we don't need to out-swim the shark, we just need to out-swim the other prey. So as long as we can process n messages in a second, it's OK to have millions of other nodes sending, at most, n messages every second.
Of course, there could be other potential solutions to this particular problem, such as limiting the number of requests that can be sent or having some kind of batch processing logic in place or anything else that fits into your design and constraints.
I'd like to write some more about having multiple computing nodes, listening to the same queue. I will assume your queue system is delivering messages to one listener at a time. In other words, once the message is received no other nodes can receive the same message since it will be marked as "delivered".
This is a good enough (yet I must say, there's not one size fits all design as you already know) design for now. And, in fact, it's a very good task scheduler that comes with built-in clustering and scheduling so you don't need to worry about those problems.
To visualize the system let's say we have one Web API that sends messages to the queue where we have 2 computing nodes that processes them. Messages will be processed by only one node at a time, so we don't need to worry about processing the same message over and over again. Also if one node goes down (poisonous message perhaps, or a disk failure) the workload of the second node will not change in terms of messages per second, yet it will take over the first node's responsibility as well. That's not an ideal situation, because at the end of the day a node that you have invested in, is not working properly. Overall system performance, including your response rate, may go down but it's still better than shutting everything down while you bring up your system back to life. Basically it gives you a time window where you can fix the problem.
In this example we have two nodes doing the heavy lifting. And they are constantly asking the queue to deliver the next message and at some point there will be concurrent receive requests that are sent to the queue. But we said our queue is delivering the next message to one node at a time, so only one node will receive the message and the other node will receive the message after it. That's a fair thread scheduler and the idea behind the cloud computing isn't it? You know your request will be processed by your rules but you do not know which node will process it. It's just like you don't know the thread handle, until the operating system creates the thread but you know it'll be started at some point.
One thing you should remember is, you need to keep the message processing logic identical across the nodes (for that specific message type) and keep them doing one thing (single responsibility) at all times. So if you need to resample a photo, then send an e-mail to the customer, then do X,Y and Z, it might be a good idea to have separate nodes doing X,Y and Z. But like I said, there's no one size fits all design.
But I would keep things simple and have micro-monoliths instead of giants in my nodes as a rule of thumb.