Blog

Are you ready for MLOps? 🫵

Xebia Background Header Wave

Introduction

MLOps has survived the hype cycle and is gaining in maturity. But are we looking at MLOps for answers for the right things? No matter how valuable MLOps can be for you, without proper building blocks in place MLOps cannot live up to its full potential. What are the prerequisites for MLOps? What parts of MLOps should you focus on? When should you even start thinking about MLOps, or when is ‘plain’ DevOps wiser to focus on first? Read along to learn more!

About being ready

So, what does it mean to be ready? Being ready means understanding why you need that technology and what it is. Then, we can start understanding what it takes to adopt it and especially: what prerequisites are required to do so. With all the prerequisities in place, we can say we are ready.

In short, being ready for MLOps means you understand:

  1. Why adopt MLOps
  2. What MLOps is
  3. When adopt MLOps

… only then can you start thinking about how to adopt MLOps.

As an analogy, think of building a house. You need the right foundation in place first before you build higher. Each level needs to be built properly before you advance. If the right foundations are not in place, you can build however beautiful house on top of those foundations but still the building will not stand time.

house built on a bad foundation

To build high, beautiful buildings, you need a solid foundation. So is the same for MLOps. Do you have the right foundation to adopt MLOps?

To start adopting MLOps the prerequisites need to be in place before we start building on top of them. What are those prerequisites? Let’s find out if you are ready, together.

Why MLOps?

Why bother with MLOps? Let’s first explore what pain you might experience that MLOps can help with. First let’s throw in a statistic. Gartner reported that on average only 54% of AI models move from pilot to production:

gartner survey ai models from pilot to production

Many AI models developed never even reach production.

… that is not an awful lot. What a waste! We spent time trying to get models into production but we are not able to. Why is that? These days Data Science is not anymore a new domain by any means. The time when Hardvard Business Review posted the Data Scientist to be the “Sexiest Job of the 21st Century” is more than a decade ago [1]. In 2019 alone the Data Scientist job postings on Indeed rose by 256% [2]. Universities have been pumping out Data Science graduates in rapid pace and the Open Source community made ML technology easy to use and widely available. Both the tech and the skills are there:

both tech and skills are present

ML tech is by now easy to use and widely available. Data Science profiles are more abundant in the market than ever before.

So then let me re-iterate: why, still, are teams having troubles launching Machine Learning models into production? Big part of the reason lies in collaboration between teams. Even though we all wish for seamless transitions from the development phase towards production, Machine Learning- development and operation teams can have conflicting interests, making it difficult to collaborate. The development- and operations world differ in various aspects:

  • Development ML teams are focused on innovation and speed
    • Dev ML teams have roles like Data Scientists, Data Engineers, Business owners.
    • Dev ML teams work agile and experiment rapidly using PoC’s.
    • Dev ML teams work in Jupyter notebooks, Python, R, etc.
  • Operations ML teams are focused on stability and reliability
    • Ops ML teams have roles like Platform Engineers, SRE’s, DevOps Engineers, Software Engineers, IT Managers.
    • Ops ML teams work with strict roadmaps, make long-term plans and might need to be available for on-call support.
    • Ops ML teams work in CI/CD pipelines, Kubernetes, Docker, Java, Scala, etc.

… that does not make things easier. The expectation is that dev- and ops magically work well together. We can just ‘hand over’ the model to the operations team and they will take care of it. This is not the case, unfortunately.

Expectation:

ML dev and ops working well together

It is often expected that development- and operations teams magically work well together.

… versus Reality:

ML dev and ops working badly together

In reality collaboration between development and operations is hard due to conflicting interests. With lacking collaboration an undesired handover between teams is introduced in which context is lost.

… such handovers make development cycles unnecessarily long and makes it difficult to get feedback from production models back to the developers. To conclude, a lack of collaboration between development and operations causes three issues:

  • ❌ Few models in production … or production solutions prove unreliable.
  • ❌ Long development cycles … to create models, update models or add features.
  • ❌ Lacking feedback … on model performance and added value.

How to solve this? Enter MLOps.

What is MLOps?

MLOps can help development- and production teams work better together and more efficiently deploy Machine Learning models to production. MLOps makes the dev- and ops worlds more familiar with each other and aims to bring the worlds closer together.

The term has gained in popularity since 2018 [3] [4], when the Machine Learning had undergone massive growth. Fair to say, some years have passed by now and we have moved beyond the hype by now:

mlops in the hype cycle

MLOps has moved beyond the hype and climbing up towards the plateau of productivity. Graph refers to Gartner hype cycle.

So what is MLOps comprised of? No longer is Machine Learning development only about training a ML model. We are concerned about many other things, too. Preprocessing, feature engineering, serving, scheduling and monitoring to name a few. We can map those to concrete methodologies and tooling that makes up the technical part of MLOps:

mlops components

ML development activities mapped to core MLOps components.

… some might already come familiar to you. So do they to major Cloud Providers. Cloud providers have answered the market need for better tooling in the Machine Learning space. In fact every major provider has a Machine Learning platform that helps you do MLOps:

alt text

Major cloud providers offer managed MLOps platforms.

That is massively useful. Not only is more tooling available to do MLOps, there also exist managed solutions that you can use out of the box.

Let’s recall the three issues we had with handovers between development- and operations teams. Remember them? We can now map them toward what MLOps promises to give us, helping us with these issues.

alt text

MLOps promises us speed by automation, rapid feedback and autonomy with end-to-end product teams.

… that sounds great! We have MLOps as a key methodology to reduce the handover and bring dev- and ops closer together. But what actually empowers and enables MLOps? What do we need to do successful MLOps? Did we any time before bring together Dev and Ops teams? In fact, we did! It’s in the name! Enter DevOps.

DevOps in the mix

MLOps is largely inspired by DevOps [5]. DevOps has existed for longer and is more established and mature. Since 2007 DevOps has been a massively influential methodology in software development. Key is that development is not sequential but continuous, best illustrated using the DevOps lifecycle:

alt text

The DevOps lifecycle [6].

In DevOps key principles are [6]:

  • Automation
  • Collaboration & communication
  • Continuously improving
  • Focus on user needs

… so how does MLOps interplay with DevOps? DevOps came from the Software Development world and therefore deals with Code. In Machine Learning, however, important to take into account are also Data and the Model:

alt text

Whereas DevOps deals mainly with Code, MLOps also entails data and a ML model.

Taking into account automating operations related to all of the code, data and model is what makes MLOps different from DevOps. For example, when it comes to automation we continuously check data quality, train models and run inference to create guarantees the state of our Machine Learning system.

Doing so continuously means we loop in our Monitoring practices back into our Model Building step like so:

alt text

Iterative, continuous development means using monitoring feedback to build- and improve your ML model.

That’s great. So again, what is this interplay between DevOps and MLOps? It is clear that MLOps lent a lot from DevOps. But so in what way do we need DevOps to do MLOps? The way we see it, is that you need to have DevOps in place before you can do MLOps. DevOps is a prerequisite for MLOps. It’s like stacking building blocks on top of each other, creating a solid foundation before you build higher:

alt text

Get DevOps in place before you start doing MLOps.

That is promising, isn’t it? Well, it is, but there’s also effort required on the organisational side. MLOps is not just about tech.

The MLOps twist

MLOps is about more than just tech. MLOps is about all of Technology, People and Processes. Getting MLOps right means have carefully considered all three:

alt text

MLOps is not just about tooling and tech. It is just as much about People and Processes.

What does that mean? That means that in the MLOps lifecycle we want the ML product team to operate over the entire of the ML lifecycle, owning the entire process:

alt text

The MLOps lifecycle [7]. The ML product team operates over the entire of the ML lifecycle.

… this does indeed mean that we need a diverse set of roles, together in one team. Data Scientists, Machine Learning Engineers, Data Engineers and such need to work together. Working together, the handover can be minimised and models can be built that are actually ready for production.

To facilitate such a ML product team, a ML platform team is used to enable them. In a central team, they can provide best-practices and tooling to the ML product team. Key is, though, that still the ML product teams own the process from development to operation:

alt text

ML product teams are enabled by a central ML platform team.

… admittedly, this is all not easy. Drawing out organisational changes on paper is easier than actually doing it. It requires serious effort to challenge your existing setup and restructure your organisation. But ask yourself – is it worth it? Is there serious Machine Learning potential in your organisation that you want to take advantage of? If yes, go for it.

Concluding: when to adopt MLOps

We learned a lot. We should do MLOps to eliminate handovers between dev- and ops teams and bring Software Engineering operational efficiency to the Machine Learning World. MLOps does so by providing useful technologies and tooling, widely offered by major cloud providers. MLOps is largely inspired by DevOps, which is a prerequisite for MLOps. This allowed the industry to take advantage of well-matured DevOps concepts like automation, collaboration and continuous improvement. Only do MLOps if you have your DevOps practices in place.

alt text

Dev- and ops can collaborate and co-exist happily together. Do MLOps once you get your DevOps practices in place.

If you end up doing so, you might very well end up in a better place because of it. Embrace the organisational investment and get teams closer together. We wish you good luck with your journey – and would love to help you along the way.

Jeroen Overschie
Jeroen is a Machine Learning Engineer at Xebia.
Jetze Schuurmans
Jetze is a well-rounded Machine Learning Engineer, who is as comfortable solving Data Science use cases as he is productionizing them in the cloud. His expertise includes: MLOps, GenAI, and Cloud Engineering. As a researcher, he has published papers on: Computer Vision and Natural Language Processing and Machine Learning in general.
Questions?

Get in touch with us to learn more about the subject and related solutions