Customer Stories

Sustainability and reliability: How to achieve a higher level of data quality with SRE

50% savings on energy and maintenance costs while working with a scalable and reliable data platform


Regions:

Capabilities:

Industries:

The client is a relatively young but major player in the big data platform industry. The company holds one of the largest clusters of the open-source framework Hadoop globally, with over 600 PB of client data. Its Hadoop network allows it to operate with transparency and rapid-paced development. It has a user group of over 3,000 internal users, comprising many data scientists and reporting analysts. Data includes booking information and client behavior, which play a significant role in core business decisions. The company has recently experienced substantial growth, resulting in expansion in several markets.

Organic growth at a high scale can lead to technical debt and instability

Why

Use automation to maintain consistent data flow and reliability

What

Build a scalable, reliable data platform using SRE principles

How

Alleviate Growth Pains

Rapid expansion and growth in a short amount of time led to a lot of issues around scaling, cost, and stability for the client. This meant new feature releases were slow to come and even the smallest changes could affect each part of the existing business ecosystem. The company came to Xebia for help with streamlining these systems and creating steps to win in product strategy and make better decisions in costs, reliability, and sustainability. To do this, implementing SRE was instrumental. As the site manager explains: We chose a Site Reliability engineering approach to ensure consistent data flow and platform stability.

Build a Better Process

The client is truly data-driven, providing a software framework for storing and handling data. All this is done using a large network of servers and machines as its backbone. Big data is processed in large clusters, but this existing approach proved inadequate to handle such tremendous growth. The organization, with input from Xebia's SRE experts, opted to change this by using SRE tooling and principles. Clusters with a single purpose helped to alleviate the bogged-down effect of having all the data on one platform. Moreover, the plan to tackle the problem head-on involved using SRE principles to guide the organization through the process effectively. This resulted in 45 % fewer costs.

“The choice for single purpose clusters also healed the growth pain of trying to balance out technical debt. We wanted our new architecture to be cutting edge.”

Product Manager

SRE: Less Toil, More Gain

Automation is pivotal to running a software- and data-rich organization such as this one. With SRE principles in place, the client's team could identify the processes that were causing repetitive, unnecessary work, ultimately eliminating that work. The product had to be up and running with data sets continuously being fed into the new platform. According to the product manager, the team appreciated the added role of SRE and the positive effects it had on the quality of our services. SRE provided an additional lens and focus for the team to reach its goals. It also resulted in a more sustainable product with a feedback loop that provides valuable insights and reduces costs.

This customer story is part of Xebia's SRE Consulting portfolio.

Contact

Let’s discuss how we can support your journey.