Data Science
with Spark

31 January, 2024Amsterdam, The Netherlands

3 days
In Person
Apache Spark
Data Science

Apache Spark is a powerful, open-source processing engine built around speed, ease of use, and advanced analytics. This course teaches you to unlock its full potential and master this challenging tool.

Book this training

Book now

Looking to upskill your team(s) or organization? 

Nico will gladly help you further with custom training solutions. 

Get in touch

Duration

3 days

Time

09:00 – 17:00

Language

English

Lunch

Included

Certification

No

Level

Advanced

What will you learn?

After the training, you will be able to:

Process large-scale data using PySpark.

Understand the fundamentals of Apache Spark.

Perform machine learning on large-scale data. 

Key takeaways

Spark Basics 

  1. Spark execution and the Spark session.
  2. Transformations vs. actions. 
  3. Laziness and lineage: how Spark optimizes code. 
  4. How to use the Spark UI. 

Spark DataFrames 

  1. Spark DataFrames vs pandas DataFrames. 
  2. How to load and save DataFrames.
  3. How to join data. 
  4. User-defined functions and pandas’ user-defined functions (with performance implications).
  5. Window operations.

Advanced Spark 

  1. How to partition and how Spark reads and writes data. 
  2. Shuffling, narrow, wide operations, and their impact on performance. 
  3. The catalyst optimizer.
  4. About scheduling and job execution. 
  5. About caching and persistence levels.

Spark Machine Learning 

  1. Machine learning with Spark. 
  2. Pre-processing data and feature engineering. 
  3. Model selection.
  4. Pipeline API.
  5. Advanced topics.

Spark Structured Learning 

  1. Structured streaming. 
  2. Machine learning and streaming.
  3. Windows and aggregations.
  4. Fault tolerance and Kafka.
  5. Kafka as a source and sink.

Program

  • Spark execution and Spark sessions 
  • DataFrame methods, properties, and actions 
  • APIs: (Py)Spark DataFrame vs Spark SQL 
  • Reading and writing data in Spark 

Who is it for?

This training is perfect for anyone working in an organization that uses Apache Spark and wants to get the most out of it. The training is not limited to Data Scientists who wish to scale their projects. Data Engineers, Data Analysts, Software Programmers, and Database Administrators who want to exploit Apache Spark will also benefit from this course. 

Requirements

Prior experience with Python or software programming is required.

Experience with database languages such as SQL and pandas is helpful but not mandatory. 

Why should I follow this training?

Learn the fundamentals of Apache Spark

Learn from the Spark experts

Learn to process large-scale data using PySpark and perform machine learning

What else
should I know?

After registering for this training, you will receive a confirmation email with practical information. A week before the training, we will ask you about any dietary requirements and share literature if you need to prepare.

See you soon!

Course information

All literature and course materials are included in the price. 

After registering for this course, you will receive a confirmation email with practical information. 

Also interesting for you

View all trainings
Data Science Bootcamp

Transform into a certified Data Scientist in just 12 weeks. This boot camp will kick-start your Data Science career with Python.

Lysanne van Beek

Data Science
Python
11 days
Virtual

Next:

25 Jan, 2024

From:

€2625

View training
Time Series Analysis & Forecasting

Learn how to extract insights, interpret seasonality and build forecasting models from time series data.

Data Science
4 days
Virtual

Next:

4 Mar, 2024

From:

€1355

View training
Python for Data Analysis

Learn how to code in Python and perform data analysis with our Python for Data Analysis training.

James Hayward

Data Science
Python
2 days
Virtual

Next:

14 – 15 Dec, 2023

From:

€1360

View training
Advanced Data Science with Python 

Dive deeper into Advanced Data Science with Python and develop your skills even further.

Data Science
2 days
Virtual

Next:

8 – 9 Feb, 2024

From:

€1465

View training
Deep Learning for Natural Language Processing 

Get to know your full potential of Deep Learning and transform the way you work with text.

Data Science
View training

Can’t find the course you’re looking for? There’s more!