Customer Stories

Play Migrates a Petabyte Hadoop Cluster to Kubernetes with Open Source Technologies

Xebia helped Play modernize its data platform with a fully open-source, cloud-agnostic solution, enabling scalable analytics, secure multi-team access, and lean operations.


Capabilities:

Regions:

Industries:

At a Glance

Challenge

End of support for Hortonworks Data Platform (HDP) and rising demand for scalability, easier deployment, and multi-team collaboration.  

Solution

Migration of a petabyte-scale Hadoop cluster to a fully open-source, Kubernetes-based platform with ArgoCD-driven GitOps. 

Results

Containerized, cloud-ready data platform with improved reliability and scalability. 
Enabled multiple teams to deploy and manage applications independently. 
Simplified upgrades and fixes reduced rollout times to under an hour. 

The Client

Play is Poland’s leading mobile network operator, serving over 15 million subscribers with voice, messaging, data, and video services. Since launching in 2007, Play has grown into the #1 provider in the Polish market, offering modern services that cover 99% of the population. To maintain this competitive edge, Play continually invests in data platforms that support analytics, customer insights, and operational efficiency. 

The Challenge: Future-Proofing Analytics at Scale

For years, Play relied on the Hortonworks Data Platform (HDP) to manage large-scale processing and analytics. However, as support for HDP approached its end in December 2021, the company faced a critical challenge, as continuing with outdated infrastructure would limit flexibility, pose security risks, and slow down innovation. At the same time, Play’s expanding data needs demanded a solution that was scalable and reliable, easier to maintain, and capable of supporting multiple teams simultaneously. 

The requirements were ambitious: the new platform had to be 100% open source, cloud-ready, and free from vendor lock-in. Play needed a modern solution that could support secure multi-team access, simplify deployment, and offer robust scalability and resource management. Recognizing that containerization and Kubernetes had become the industry standard for orchestrating modern workloads, Play set out to rebuild its petabyte-scale platform using these technologies - laying the foundation for future growth without sacrificing independence. 

The Solution: Data Platform Modernization with Kubernetes, Open Source, and GitOps

Play partnered with Xebia to execute a migration strategy for utilizing Kubernetes as the foundation of the new platform to orchestrate containerized workloads with consistency and portability. Applications previously running on HDP - such as Spark, Trino, Jupyterhub, and Airflow - were redeployed as Kubernetes-managed services, supported by Helm charts for efficient installation and scaling. Hadoop HDFS was retained as the storage layer, separated from the Kubernetes cluster to ensure resilience and high performance, while allowing for a potential swap to cloud-native storage in the future. 

Play introduced ArgoCD for GitOps-based infrastructure management to empower multiple teams and streamline deployments. This ensured infrastructure was managed as code, with automated synchronization, rollback, and visibility into changes. Developers could easily spin up their Airflow instances or other tools by updating configuration files, while ArgoCD ensured secure, consistent deployments across environments. Complemented by Apache Superset for BI, Jupyterhub for data science, and strong security integrations with Kerberos, Ranger, and OPA, the platform became a flexible, open-source ecosystem tailored to diverse user needs. At peak demand, the Kubernetes-based Spark environment scaled to 1,500 cores and 4TB of memory without disruption - demonstrating the robustness of the solution at scale. 

The Results: A Scalable, Secure Data Ecosystem

With Xebia’s support, Play transformed its legacy Hadoop environment into a scalable, secure, and cloud-ready platform, delivering measurable improvements in cost, speed, and reliability. 

  • Migrated a petabyte-scale Hadoop cluster to a Kubernetes-based, fully open-source platform. 
  • Enabled multiple teams to work independently through ArgoCD GitOps, reducing friction and improving collaboration. 
  • Simplified maintenance and upgrades - e.g., deploying a new Trino version took under an hour, including testing. 
  • Achieved large-scale Spark processing reliability, scaling to 1,500 cores and 4TB memory without issues. 
  • Delivered a secure, cloud-agnostic setup with Kerberos, SSO, Ranger, and OPA. 

We've been successfully working with Xebia on the Data platform since 2017 - the expertise and experience their team brought to the table allowed us to fast-track the deployment of scalable, secure and extensible Data solution, used heavily by our business users to constantly improve our services. The talented Xebia team designed and modernized the architecture, participated in the implementation phase, provided necessary support and helped us to plan further steps.  
This platform became an important place to get a holistic view of our clients, and we treat it as the backbone of our development and growth. I can strongly recommend Xebia as the experienced, professional partner on the Data Management journey.

Piotr Lubańsk

Director of Corporate Support Systems Department, Play 

Looking Ahead 

With its new Kubernetes-based platform, Play is well-positioned for the future. The open-source foundation ensures continued flexibility, while GitOps practices streamline scaling and experimentation. With growing data volumes and evolving business demands, the platform can easily extend into hybrid or multi-cloud environments while avoiding vendor lock-in. 

Watch the presentation we gave during the Big Data Technology Summit 2024 about this project: WATCH THE VIDEO. 

Contact

Let’s discuss how we can support your journey.