Data Processing at Scale

Data and AI
Data Engineering

Data is knowledge, and knowledge is power. But the efficient processing of data can be challenging when scaled up. This training dives deep into one of the most popular and scalable tools for large-data transformation: Apache Spark. In this course, you will learn everything you need to know about how Apache Spark works. Through a combination of theory and hands-on exercises, you will also gain the skills to write efficient ETL Spark jobs to process large data sets.

Looking to upskill your team(s) or organization?

Nico will gladly help you further with custom training solutions for your organisation.

Get in touch

What will you learn?

After the training, you will be able to:

Use Apache Spark and its advanced features 

Write efficient ETL jobs 

Use the API to transform data at a basic and advanced level 

Think in terms of distributed systems when writing Spark jobs 

Key takeaways

  1. Inner workings of Apache Spark 
  2. Loading data from various formats 
  3. Basic and advanced data frame operations 
  4. Window and user-defined functions 
  5. Unit testing 
  6. Hands-on exercise to analyze large-scale logs to find trending topics 


  • Inner workings of Apache Spark 
  • Loading data from various formats 
  • Basic and advanced data frame operations Window and user-defined functions 

Who is it for?

This training is perfect for you if you are a data or machine learning engineer dealing with transforming large volumes of data.  


This training requires basic experience with Python. Still needing that experience? Then check out Python for Data Engineers instead. 

Why should I follow this training?

Optimal Spark Use

Use Apache Spark and its advanced features and write efficient ETL jobs

Go Advanced

Learn about the inner workings of Apache Spark, loading data from various formats, and basic and advanced data frame operations.

Processing data sets

Gain the skills necessary to process large data sets

What else
should I know?

After registering for this training, you will receive a confirmation email with practical information. A week before the training, we will ask you about any dietary requirements and share literature if you need to prepare.

See you soon!

Course information

All literature and course materials are included in the price. 

All literature and course materials are included in the price. 

After registering for this course, you will receive a confirmation email with practical information. 

Upcoming courses

View all trainings
Developing Data Models with LookML

This course empowers you to develop scalable, performant LookML (Looker Modeling Language) models that provide your business users with the standardized, ready-to-use data they need to answer their questions. Upon completing this course, you will be able to start building and maintaining LookML models to curate and manage data in your organization’s Looker instance.

Data Analytics
View training
Analyzing and Visualizing Data in Looker

In this course, you learn how to do the kind of data exploration and analysis in Looker that would formerly be done primarily by SQL developers or analysts. Upon completing this course, you will be able to leverage Looker’s modern analytics platform to find and explore relevant content in your organization’s Looker instance, ask questions about your data, create new metrics as needed, and build and share visualizations and dashboards to facilitate data-driven decision-making.

Data Analytics
View training
Data Science Bootcamp

Transform into a certified Data Scientist in just 12 weeks. This boot camp will kick-start your Data Science career with Python.

Lysanne van Beek

Data Science
11 days


25 Jan, 2024



View training
Advanced Power BI – DAX and Data Modeling

Increased your PowerBI knowledge with DAX & Data Modeling. Get started now!

Juan Manuel Perafan

Data Analytics
2 days


11 – 12 Dec, 2023



View training
Time Series Analysis & Forecasting

Learn how to extract insights, interpret seasonality and build forecasting models from time series data.

Data Science
4 days


4 Mar, 2024



View training

Can’t find the course you’re looking for? There’s more!