Xebia Academy is ook beschikbaar in het NederlandsSwitch naar Nederlands
Data Science

Data Science with Spark

Perform large-scala Data Science with ease! In this 3-day course you will learn to master the tools Apache Spark offers to perform large-scale Data Science.

Spark your Data Science skills!

Data Science opens up endless possibilities to analyze data and put it to good use. Apache Spark is a powerful open-source processing engine built around speed, ease of use, and advanced analytics. In this 3-day training you will learn how to master the tools Apache Spark offers. Skills you can directly put into practice!

"Machine learning with Spark was fun, seeing the new Spark language, getting to learn and share enthusiasm with other people that are Data Scientists or becoming one." - Data Scientist, KPN

This Data Science training is perfect for

Data Science with Spark is perfect for anyone who wants to learn how to use Spark and its Machine Learning and Streaming capabilities. If you are a Data Scientist who is curious to find out more, please join us! Data Science with Spark is a Professional level training, which means you do need to know the basics of programming in Python and data manipulation and SQL. Check what we mean by "the basics of Python" here.

What will you learn during the Data Science with Spark training?

This 3-day training empowers you to use PySpark. Through instructor-led discussion and interactive, hands-on exercises, you will master the tools that Spark offers to perform large-scale Data Science. We will cover the basics, Juypter notebooks, Python shell and more!


Spark basics

You will learn:

  • Spark execution
  • SparkSession
  • Transformations vs Actions
  • Laziness and Lineage: how Spark optimizes code
  • To use the Spark UI

Spark Advanced

You will learn:

  • Partitioning and how Spark reads and writes data
  • Shuffling, narrow and wide operations, and their impact on performance
  • The Catalyst optimizer
  • About scheduling and job execution
  • About caching and persistence levels


You will learn:

  • The basic concepts
  • All about Spark DataFrames and Pandas DataFrames
  • How to load and save DataFrames
  • The functions API
  • How to join data
  • User Defined Functions and Pandas User Defined Functions (with performance implications)
  • Window operations


You will learn:

  • Machine Learning with Spark
  • Preprocessing data and feature engineering
  • Model selection
  • Pipeline API
  • Advanced topics

Spark structured streaming

You will learn:

  • Structured Streaming
  • Machine Learning & Streaming
  • Sources and Sink
  • Windows & Aggregations
  • Checkpointing & Watermarking
  • Fault tolerance & Kafka
  • Kafka as a source and as a sink

Data Science Trainers

This Data Science training is brought to you by our trainingspartner GoDataDriven. GoDataDriven works with experts in their field who are always on the lookout for the most innovative ways to get the most out of data. Your trainer is a data guru who enjoys sharing his or her experiences to help you work with the latest tools. 

Data Science Learning Journey

Your Data Science Learning Journey starts with a Foundation training, like Data Science met RData Science for Product Owners or the Data Science with Python training. We also offer a GoDataDriven Deep Learning Professional level course. If you are ready for an Expert level course, register for this 3-day Data Science with Spark training and learn all about large-scale Data Science.

Yes, I want to do more with data!

After registering for this training, you will receive a confirmation email with practical information. A week before the training we will ask you about any dietary requirements and share literature if there's a need to prepare. See you soon!

What else should I know?

  • For the training you need to bring your own laptop with: 8GB RAM minimum, 25GB of free hard disk space, SSH client installed and the ability to install software
  • This course is brought to you by our training partner GoDataDriven
  • Travel & accommodation expenses are not included

Get in touch

Our team is at your service

Get in touch!

Or call +31 (0)35 538 1921