I am part of a team that recently implemented a containerized Stateless Applications and I thought it would be good to share our findings which might be useful for others.
Our product is based on Event Sourcing and CQRS (Command Query Responsibility Segregation) with nano services. As our product deals with complex domain, we went with Event Sourcing and CQRS. Not all products/applications are of this complexity, in such cases plain spring boot with mongoDB or PostgreSQL is more than sufficient.
This post is not about Event Sourcing and CQRS which is a huge topic by itself, lets save it for another day. It is more related to how to make real world stateless applications and horizontally scalable and issues that need to be considered when running multiple instances of the application/product. Careful examination of the concurrent processing needs to be analyzed to make it truly stateless, some might work superb in single instance but once you deploy it as a cluster it might behave strange if you do not consider concurrent running of the processes thoroughly.
Here I will provide some of the issues one might encounter while making an application cluster and give a glimpse of direction without getting into too much implementation details.
Here is our factory generation for starting stateless applications from scratch
Use separate stacks and deployments for front-end and back-end logic
- One example for front-end might be nodejs with angular/typescript/react/vue where you can bundle into single zip file and use a static server to serve the files.
- For back-end we use mostly java (based on the application requirements and complexity and future maintenance of the product) with nice dependency management framework like spring boot.
Lets say our sample application is using spring boot and it has some external connector using message queues. What are all the issues one might find when trying to make it stateless and can be run on multiple nodes (clustering)
Make sure this back end application is stateless. If there is any minimal state need to be maintained, use JWT (JSON Web Token).
The end goal of the stateless application is to run multiple instances to scale horizontally(using containerization like Kubernetes), It might seem simple to make an application stateless but in reality it is little tricky and you need to consider all the concurrent processing if the application started in more than one node.
Clustered application UI issues:
Lets say if multiple users are working on the same record and trying to update simultaneously, if we just override the db data with the latest some users updates might be lost. As their changes were overridden by other users without even looking into those. This can be solved easily in single instance where changes to that particular instance are routed to that agent, but in a clustered environment it becomes little tricky. The simpler approach to solve this is using version or instant fields. If concurrent updates are happening, only one will succeed and for other it will fail saying record is updated by some other user. Its simple CAS (Compare and Swap) based on version field.
This is pretty good solution where such concurrent updates to the same records is very rare. If contention ratio to update the same record is more, the other approach is to implement some kind of pipes (like Apache Kafka topic partitions) based on the record key and REST end points just forwarding the requests to them. It is similar to Akka actors but that is not needed for most of the applications. Version field could be sufficient.
Cluster application single instance only sub processes:
Another problem an application might face, if your application is listening on some kind of message bus(Rabbit MQ, AMQP etc…) to communicate with external system, the message from the bus can be consumed only by one node at a time, if you want to parallelize the process you have to really dig into types of messages and find the independent types of messages and for each independent section you can have one consumer node.
If you need only one instance of the application to process that logic in a cluster it becomes more complicated, which instance to make it as master. If you have such multiple sub processes, it becomes more complicated. As for each sub process, it needs master for that. And, what if that instance dies? How to make another instance as master for that logic? Yeah it gets pretty complicated.
What we have used is java etcd API and for each single sub process (like type 1 message consumption) we have developed watches using the etcd java API which ensures all the instances as same fight for the master for that logic, you now know the way to achieve 🙂
There is one problem what if only one instance becomes master for all the sub process that instance becomes bottleneck, this is simple to solve using kubernetes configMaps or environment variables. Its like you group the multiple sub processes into one group and using the environment variables that instance decides to participate in that sub process master race or not. This way you can achieve per node one sub process master only which is ideal scaling for that any way.
This is not a complete guide on how to make stateless application but it gives an introduction to the issues one can expect when a cluster of stateless application runs.
Stay tuned for more updates…