Customer Stories
Truecaller Scales Data Analytics to Power 500M Installs with Google Cloud
Xebia helped Truecaller modernize its data platform on Google Cloud Platform (GCP), enabling petabyte-scale analytics, cost efficiency, and lean operations model.

At a Glance
Challenge
Exploding data volumes and the high cost of maintaining on-prem infrastructure.
Solution
Migrate from on-prem to Google Cloud Platform, leveraging BigQuery, DataProc, Kubernetes, and Data Studio.
Results
Achieved ~$6 per 10k users monthly.
Reduced developer cost share to 30% of infrastructure costs.
Operated pipelines with one data engineer per 42M monthly users.
The Client
Truecaller, founded in Sweden in 2009, is a global technology company best known for its mobile app that identifies calls, blocks spam, enables VoIP calls, and supports mobile payments. With over 200 million monthly active users worldwide and more than 500 million installs, Truecaller has become a household name for spam call protection.
The Challenge: Replacing Costly On-Prem Infrastructure with Cloud-Native Solution
As Truecaller's user base grew, so did its data. The app's core features, including spam identification, caller recognition, tailored advertising, and product analytics, relied on continuous ingestion and analysis of large volumes of data. By 2014, the on-prem Claudera platform with 1.5PB total storage was buckling under the load of 30B daily events. Scaling required constant hardware expansion, creating spiraling costs and frequent downtime. As maintaining a private data center became costly and inefficient, Truecaller needed a cloud-native platform that could scale seamlessly, reduce operational overhead, and offer flexibility for advanced analytics and machine learning.
The Solution: Scaling Data Analytics with Google Cloud Platform
Truecaller partnered with Xebia to redesign its analytics environment, starting with early platform work in 2014 and moving fully to Google Cloud by 2018. This move brought Cloud Storage to eliminate capacity planning, DataProc for scalable YARN clusters, and BigQuery as the preferred analytics engine for its speed, cost-effectiveness, and superior user experience. More advanced workloads, such as machine learning, were earmarked for Spark on Kubernetes, ensuring flexibility and scalability.
Beyond compute and storage, the modernization included infrastructure automation with Deployment Manager for faster, CI/CD-driven resource provisioning. On the reporting side, Tableau dashboards were replaced with Google Data Studio, chosen for its seamless BigQuery integration, serverless nature, and zero license cost. This step democratized access to insights for product owners and management while reducing the total cost of ownership. The migration was iterative, with each phase balancing open-source and cloud-native practices, allowing Truecaller to evolve its platform without service disruption.
The Result: Powering Efficiency, Agility, and Smarter Decision-making
Truecaller’s migration to Google Cloud, delivered a lean and future-ready data platform and helped the company:
- Achieve a lean cost structure: ~$6 per 10k users monthly for the data platform.
- Reduce developer costs to 30% of infrastructure spend.
- Scale pipelines to 42M monthly users, maintained by just a single data engineer.
- Retire on-prem data center entirely, embracing a cloud-native model.
- Transition from Tableau to Data Studio for faster, easier analytics at scale.
What's Next
Truecaller continues to enhance its cloud-native analytics stack, planning to expand the use of BigQuery for ETL and Spark on Kubernetes for machine learning workloads. The platform is designed to adapt as data volumes grow and new app features demand even more sophisticated analytics.
A
- Agent-Oriented Architecture
- Agentic AI Alignment
- Agentic AI for Customer Engagement
- Agentic AI for Decision Support
- Agentic AI for Knowledge Management
- Agentic AI for Predictive Operations
- Agentic AI for Process Optimization
- Agentic AI for Workflow Automation
- Agentic AI Safety
- Agentic AI Strategy
- Agile Development
- Agile Development Methodology
- AI Agents for IT Service Management
- AI for Compliance Monitoring
- AI for Customer Sentiment Analysis
- AI for Demand Forecasting
- AI for Edge Computing (Edge AI)
- AI for Energy Consumption Optimization
- AI for Predictive Analytics
- AI for Predictive Maintenance
- AI for Real Time Risk Monitoring
- AI for Telecom Network Optimization
- AI Governance Frameworks
- AI Implementation Approach
- AI Implementation Methodology
- AI in Cybersecurity
- AI Orchestration
- AI Performance Measurement (KPIs, ROI)
- AI Use-Case Discovery
- AI Use-Case Prioritization
- AI-Driven Business Transformation
- AI-Driven Cybersecurity Solutions
- Algorithm
- API Integration
- API Management
- Application Modernization
- Applied & GenAI
- Artificial Intelligence
- Artificial Neural Network
- Augmented Reality
- Autonomous AI Agents