Last Thursday the first Software Architecture Pressure Cooker meetup was held. The goal of these meetups is to exchange experiences on software architecture. This is done by working on a real life case using the following format:
- A specific challenging scenario is introduced in general terms
- An organization is invited to present a real life case that matches the scenario introduced, they act as product owner during the evening.
- In a couple of rounds the group brainstorms, breaks up in sub groups and works on solutions.
- All solutions are presented to the product owner
- Some heated discussions take place between the various subgroups where each group tries to convince the other groups.
The scenario for this first meetup was "3-2-1 Bang!". Characteristics of this scenario are:
- There is a point in time where you know you’re site will get an enormous load.
- The exact time when it will happen is known.
- The size of the load is unknown.
Examples of this scenario are online ticket sales for the Olympics or Madonna start at 09:00 on date X. Or a large marketing campaign starting at a specific date and time.
In this meetup we had Le Champion as organization presenting the case. Le Champion organizes popular sports events. For several of these events a lot of people are competing for a limited number of available places. Signup for these events opens at a specific date and time. At that point in time the sign up system is confronted with an enormous peak load. Users trying to sign up experience slow response times, are not certain if their signup and payment was successful or not, etc. In summary, a nice match for the "3-2-1 Bang!" scenario.
After a first round of brainstorming and some discussion three categories of improvements have been identified:
- Improvements on the existing application code (caching, client side validations, etc)
- Improvements on the deployment of parts of the functionality and infrastructure (hosting background processes on separate machines, horizontal scaling of the application, etc)
- Shield the existing application from peak loads
Although categories 1 and 2 will provide some improvement, there will always be a limit on what the application itself or the payment provider can handle. Even when you would completely rebuild the application itself, you still have to do category 3. Therefore we decided to focus on #3. In subgroups alternatives for #3 were worked out.
The reasoning behind category 3 is that the application logic itself should be shielded from loads that it cannot handle. So something should be put in front of it which only lets a controlled number of users access the sign up process and prevents the overload. A requirement of Le Champion was that tickets are handed out on a first-come, first-served basis and therefore it was not acceptable to simply respond to visitors with a message "It’s busy, come back later". Everybody who attempts to sign up, should be put in line and should enter the sign up process in sequence of the position in the line until places are sold out. Note that I’m deliberately using the word ‘line’ instead of ‘queue’ because the latter suggests a technical implementation.
Another reason to aim for category 3 was that realizing this solution is cheaper than rebuilding the sign up application such that it can handle an enormous load of concurrent sign ups. Given the facts that (a) there is a limited number of places available and (b) there is significant more demand for places then there are places, there is no need to be able to handle an enormous amount of concurrent sign ups. The limited number of available places makes it impossible to "sell more places" if the application could handle enormous amount of concurrent sign ups. Being able to handle the peak load in a controlled way in which the user gets accurate feedback about his position in the line is good enough.
Each of the presented solutions had in common that a component was put in front of the existing application.The responsibility of this component was to manage the line of of people wanting to sign up for the event and only let a controlled amount of people start the actual sign up process. The rest was kept in line, until a slot freed up at the existing application and then they could enter the sign up process.
Variations between the solutions were:
- The implementation of managing the line differed. One solution was to use a JMS queue, other proposed a simple in memory queue (shared between machine instances using some open source/commercial product). And a third solution was to not create a queue in memory but simply base the solution on counters.
- Some used client side polling (every X seconds) to check if a client could enter the sign up process (and update position in the line), others used Comet to push updates to the users browser.
- One team proposed to at runtime update the firewall rules and using the firewall rules ensure that only the right people can access the sign up process (although this could create issues when multiple people are behind the same proxy server).
One of the conclusions was that this was not a unique problem and that there probably already is an open source component available to easily realize a solution like this. One example that we found is Haproxy, although it seems to work at HTTP Request level and (out of the box) not be able to queue up on a client/user level. If anyone knows other open source components that can be used to manage a line of people wanting to sign up on a first-come, first-served basis, please leave a comment to this blog post mentioning the component.
It was an interesting evening with lots of good discussions. We’ll organize another meetup early September with a new subject and real life case. Keep an eye on the Software Architecture Pressure Cooker meetup site for details.