Why should I read this?
You’re working in an organisation that aims to explore the benefits of working according to DevOps principles. You’ve heard terms like “platform team” and “SRE” and you have an idea what “you build, you run it” means. These terms, however, have made your exploration into DevOps more complicated and now you even have to choose how to organise your team(s). This blog provides an overview of the three most applied DevOps topologies and which conditions make a specific topology a good fit for your company.
As a reference, Matthew Skelton’s “DevOps topologies” (https://web.devopstopologies.com/) page gives a nice overview of all kinds of organisational topologies. These topologies have been implemented by companies around the world in their quest for agility and operational excellence through DevOps. Although many topologies have been documented, I believe that they are all variants of these three topologies:
1. All teams are product teams. Each team does everything that is needed to run their software including the use of any infrastructure components, usually cloud-based PaaS.
2. Internal platform team(s) and Product team(s). Product-teams make use of the infrastructure/platform-services provided by internal platform-team(s). Services provided by the platform-team(s) can range from infrastructure and “run” services such as monitoring to Continuous Integration tools and dashboarding tools.
3. Internal Platform team(s), Product team(s) and Site Reliability Engineering team(s) (SRE). This topology is based on Google’s best practices around running software. Product teams can gain the SRE teams’ support in running their software if they need it and if their software adheres to standards defined by SRE teams. SRE teams can also share on-call responsibility with product teams. The platform-team(s) provide infrastructure/platform-services.
The DevOps topology that will have the best fit within your organisation is dependent on your current organisational hierarchy, scale, regulatory requirements and people’s skills. It is also important to recognise that any chosen topology has its pitfalls, which need to be dealt with.
All teams are product teams
This topology is probably the most common; every team adheres to the “you build it, you run it” mantra and uses (and therefore maintains) infrastructure and tools of their own choice. That means that teams need a lot of expertise to be able to run their services/applications.
Example of a topology where all teams are product teams.
Possible gains:
- Teams enjoy full autonomy in building and running their products
- Teams don’t have to share any infrastructure or tools.
- No shared responsibility for running software.
- Responsibility is easy to govern.
- Product teams can grow towards automation goals at their own tempo
Possible pitfalls:
- Potential inefficiencies; teams can and will build their own solutions for each problem. Re-use of tools and infrastructure components between teams limited.
- Each autonomous team needs their own infrastructure-, security- and compliance expert
- Each team needs to spend time on maintaining their infrastructure, and tools
- Each team needs to create solutions to comply with any regulatory requirements
This topology makes a good fit for your organisation if:
- Your company makes use of cloud-services (which can be automated) for the infrastructure.
- Your company/department consists of 1-5 teams or can provide each team with all the expertise that is needed
- Regulatory needs such as audit logging do not have to be standardised.
- There is no economic benefit in creating a separate team to provide infrastructure or tools to product teams.
- Your focus is speed, and you don’t care much for standardisation. Focussing on speed is a good idea when you want to explore and learn what the benefits of DevOps are without having to reorganise teams within your company,
- You don’t have a legacy IT-ops department/team because you’re the next unicorn startup.
Internal platform-team(s) and Product teams
Once your company grows past a certain threshold, it makes sense to have one or more platform teams provide generic platform services (such as infrastructure and/or CI/CD tools). Platform teams offer all their services in the form of self-service APIs to the various product teams. Often, platform teams themselves make use of cloud infrastructure services and combine those to offer more value. For example, a platform team can provide a container platform as a service with connectivity and access management already set-up to meet company requirements.
Example of a platform team/product team topology.
Possible gains:
- Efficient re-use of platform services between product teams.
- Re-use of infrastructure components provides standardisation as a side-effect
- Product teams are unburdened when it comes to maintaining infrastructure and tools.
- The separation of concerns between platform- and product teams means that product teams can focus on delivering value to customers. Platform teams can thus focus on enabling product teams in running their software.
- Regulatory requirements such as audit logging can easily be met by using services provided by the platform-teams.
Possible pitfalls:
- The Product Owner of the platform-teams needs to both have a vision on the future of the platform and manage requirements of all the product teams
- Product teams need to be coached to give regular feedback to the platform team instead of creating their own tools whenever the platform cannot provide those tools.
- Platform teams need to be able to collect feedback from product teams
This topology makes a good fit for your organisation if:
- It is cheaper to run a platform team that builds generic infrastructure/services instead of having each team doing it separately. This threshold depends on your organisational context.
- Regulatory needs such as audit logging have to be standardised.
- You cannot provide each product team with their own infrastructure-, security- and compliance expert
- You want to standardise your infrastructure and tools
- You have outsourced your infrastructure and/or tools
- You have an existing IT-ops department, and you are running a (highly) regulated business such as banking or government.
Internal Platform team(s), Product team(s) and Site Reliability Engineering team(s) (SRE),
The SRE team is “what happens when a software engineer is tasked with what used to be called operations.” according to Ben Traynor, who founded the first SRE team within Google. One could argue that an SRE team is like a classic IT-operations team and does not match with DevOps principles such as end-to-end responsibility. The difference of the SRE-model lies in the shared responsibility model between product teams and the SRE-teams. For further reference material on the SRE model; Google’s book on the topic is available online for free at https://landing.google.com/sre/book.html.
Platform teams provide all their services in the form of self-service APIs to the various product teams.
Example of an SRE/platform team/product team topology.
Possible gains:
- Product teams require a lot less Ops expertise than the previously mentioned topologies. This expertise can be grouped in the SRE teams.
- SRE team actively coach product teams in improving the quality of their software and in running the software
- SRE teams actively look for opportunities to automate and improve the delivery and running of software.
- Product teams that are responsible for business-critical applications feel safer in running their software because of the SRE teams’ support
Possible pitfalls:
- The difference between SRE vs classic IT-ops teams is nuanced. Misunderstanding this nuance will lead to more organisational complexity.
- SRE team members require software engineering and coaching skills
This topology makes a good fit for your organisation if:
- There is a limited number of IT-ops specialists
- The existing IT-ops specialists have extensive coaching skills
- There is a different requirement in Ops expertise between product teams. For example, only product teams that build business-critical software require the support of an SRE team, while other teams run their software on their own.
- Your company has more than five product teams
- You have an existing IT-ops organisation that separates infrastructure-ops teams from application-ops teams.
- You are in a (highly) regulated business such as banking or government
- You have outsourced your infrastructure and/or tools
- You want to ensure that product-teams that deliver business-critical software adhere to a defined software quality threshold
I’ve read your blog. What now?
Whether a DevOps topology will work for you is highly dependent on your current organisational context. The topologies mentioned in this blog are not mutually exclusive so consider mixing and matching them if you decide to start a journey towards DevOps. In that journey, remember that in the beginning, maximizing your learning experience is vital. Only by learning from your experience will you know what topology fits best at a certain point of your journey.
One thing is for sure: copying another company and it’s DevOps success stories will not work, but let yourself be inspired.
Further reading on this topic: