Data Processing at Scale

Data and AI
Data Engineering

Data is knowledge, and knowledge is power. But the efficient processing of data can be challenging when scaled up. This training dives deep into one of the most popular and scalable tools for large-data transformation: Apache Spark. In this course, you will learn everything you need to know about how Apache Spark works. Through a combination of theory and hands-on exercises, you will also gain the skills to write efficient ETL Spark jobs to process large data sets.

Looking to upskill your team(s) or organization?

Nico will gladly help you further with custom training solutions for your organisation.

Get in touch

What will you learn?

After the training, you will be able to:

Use Apache Spark and its advanced features 

Write efficient ETL jobs 

Use the API to transform data at a basic and advanced level 

Think in terms of distributed systems when writing Spark jobs 

Key takeaways

  1. Inner workings of Apache Spark 
  2. Loading data from various formats 
  3. Basic and advanced data frame operations 
  4. Window and user-defined functions 
  5. Unit testing 
  6. Hands-on exercise to analyze large-scale logs to find trending topics 

Program

  • Inner workings of Apache Spark 
  • Loading data from various formats 
  • Basic and advanced data frame operations Window and user-defined functions 

Who is it for?

This training is perfect for you if you are a data or machine learning engineer dealing with transforming large volumes of data.  

Requirements

This training requires basic experience with Python. Still needing that experience? Then check out Python for Data Engineers instead. 

Why should I follow this training?

Optimal Spark Use

Use Apache Spark and its advanced features and write efficient ETL jobs

Go Advanced

Learn about the inner workings of Apache Spark, loading data from various formats, and basic and advanced data frame operations.

Processing data sets

Gain the skills necessary to process large data sets

What else
should I know?

After registering for this training, you will receive a confirmation email with practical information. A week before the training, we will ask you about any dietary requirements and share literature if you need to prepare.

See you soon!

Course information

All literature and course materials are included in the price. 

All literature and course materials are included in the price. 

After registering for this course, you will receive a confirmation email with practical information. 

Upcoming courses

View all trainings
MLOps on AWS

Discover what MLOps is and how you can apply it in AWS (Amazon Web Services) with our MLOps on AWS training course.

AWS
Cloud
Data Engineering
Machine Learning
View training
Introduction to Generative AI

Get a non-technical introduction to the field of Generative AI and learn best practices when using Generative AI tools.

Lysanne van Beek

Generative AI
0.5 days
Virtual

Next:

25 Jun, 2024

From:

€350

View training
Data Quality and Observability

Navigate the landscape of data quality and observability. You will not only learn about the tools, technologies and solutions in the data quality and observability space, but also understand how to identify the data quality problems specific to your organisation to match the right solution to your organisation’s need.

Analytics Engineering
View training
Building LLM Applications

Delve into the world of Large Language Models (LLMs) and state-of-the-art generative AI.

James Hayward

Generative AI
4 days
Virtual

Next:

23 Jul, 2024

From:

€1570

View training
SQL for Data Analysis

Unlock the power of data with SQL – the ultimate data management system!

SQL
View training

Can’t find the course you’re looking for? There’s more!