Optimizing Apache Spark & Tuning Best Practices

22 August, 2024 – Virtual

Data and AI

2 days

Virtual

Data Engineering

As data scales up, efficiently processing data becomes more crucial. Building on our experience as one of the world’s most significant Apache Spark users, this 2-day course provides an in-depth overview of the do’s and don’ts of one of the most popular analytics engines available.

Book this training

22-23 Aug, 2024 €1310,-
^{(excl. VAT)} 2 days, Virtual, English

7-8 Oct, 2024 €1465,-
^{(excl. VAT)} 2 days, Amsterdam, English 7 spots left

25-26 Nov, 2024 €1465,-
^{(excl. VAT)} 2 days, Amsterdam, English 9 spots left

Book now

Looking to upskill your team(s) or organization?

Rozaliia will gladly help you further with custom training solutions.

Rozaliia Khafizova
Data and AI Training Advisor

+31 6 11 58 19 37

Rozaliia.Khafizova@xebia.com
linkedin.com/in/rozaliya-khaf

Get in touch

Duration

2 days

Time

09:00 – 17:00

Language

English

Lunch

Included

Certification

Level

Professional

What will you learn?

After the training, you will be able to:

Explain what Apache Spark does under the hood.

Use best practices to write performant code.

Read and understand the query plans for your Spark applications.

Explain the Spark fundamentals, including the execution model: Driver/Executors.

Efficiently work with caching, shuffle service, and fair scheduling.

Troubleshoot optimization problems and memory issues.

Program

The trainer facilitates the content using notebooks hosted in a cloud environment. Each participant will have a Spark cluster to experiment with.

Day 1

Download & understand dataset used during training
Theory about various Spark basics and Spark UI
Apply optimisations in practice

Day 2

This training is for you if:

You are comfortable using Spark but want to learn how optimizations can be applied to improve runtime

You want to learn how Spark works fundamentally – from text, to plan, to execution.

You are comfortable using Python.

This training is not for you if:

You don’t use Python with Spark (PySpark)

You want to learn how to transform notebook code into production-ready code (check out our Production-Ready Machine Learning course instead)

You want to learn how to use Databricks (this course is based on open-source Spark and is applicable to Databricks, but we are not covering Databricks concepts such as Jobs, Notebooks, Sharing, Repos, connectors, Databricks-Runtimes, etc.)

Why should I follow this training?

 Learn about Apache Spark, using best practices to write performant code and tweaking and debugging Spark applications.

Grasp the Spark fundamentals, including the execution model: Driver/Executors, caching, shuffle service, and fair scheduling.

Learn from and network with Apache Spark data experts.

What else
should I know?

After registering for this training, you will receive a confirmation email with practical information. A week before the training, we will ask you about any dietary requirements and share literature if you need to prepare.

See you soon!

All literature and course materials are included in the price.

After registering for this course, you will receive a confirmation email with practical information. 

Also interesting for you

View all trainings

MLOps on AWS

Discover what MLOps is and how you can apply it in AWS (Amazon Web Services) with our MLOps on AWS training course.

Data and AI

AWS

Cloud

Data Engineering

Machine Learning

View training

Data Warehousing and Data Modeling

Build a solid foundation of data warehousing and modeling with this training. You’ll learn everything about data warehouse architectures, formal data modeling, performance tuning and more.

Data and AI

Data Engineering

Data Modeling

data warehousing

View training

MLOps on GCP

Discover what MLOps is and how you can apply it in GCP (Google Cloud Platform) with our MLOps on GCP training course.

Yke Rusticus

Data and AI

Data Analytics

Data Engineering

Data Science

Google Cloud Platform (GCP)

2 days

In Person

11 – 13 Nov, 2024

From:

€1995

View training

Apache Airflow Training

Master Apache Airflow’s workflow magic. Seamlessly schedule, monitor, and optimize workflows. Elevate your automation game today

Kris Geusebroek

Data and AI

Data Engineering

2 days

In Person

25 – 26 Sep, 2024

From:

€1310

View training

dbt Learn

In partnership with dbt Labs, we offer you the dbt Learn training course. Upgrade your dbt (data build tool) skills now.

Lucy Sheppard

Data and AI

Data Engineering

dbt

dbt labs

3 days

In Person

14 Aug, 2024

From:

€732

View training

View all courses

This is
who
we are

Let us help your business

We’re here to help

This is
who
we are

Let us help your business

We’re here to help

Optimizing Apache Spark & Tuning Best Practices

Trainer Title

Book this training

Looking to upskill your team(s) or organization?

Rozaliia Khafizova
Data and AI Training Advisor

What will you learn?

Program

Day 1

Day 2

This training is for you if:

This training is not for you if:

Why should I follow this training?

Learn about Apache Spark, using best practices to write performant code and tweaking and debugging Spark applications.

Grasp the Spark fundamentals, including the execution model: Driver/Executors, caching, shuffle service, and fair scheduling.

Learn from and network with Apache Spark data experts.

What else
should I know?

Also interesting for you

Yke Rusticus

Kris Geusebroek

Lucy Sheppard

Optimizing Apache Spark & Tuning Best Practices

Book this training

Looking to upskill your team(s) or organization?

Rozaliia KhafizovaData and AI Training Advisor

What will you learn?

Program

Day 1

Day 2

This training is for you if:

This training is not for you if:

Why should I follow this training?

Learn about Apache Spark, using best practices to write performant code and tweaking and debugging Spark applications.

Grasp the Spark fundamentals, including the execution model: Driver/Executors, caching, shuffle service, and fair scheduling.

Learn from and network with Apache Spark data experts.

What else should I know?

Also interesting for you

Yke Rusticus

Kris Geusebroek

Lucy Sheppard

Rozaliia Khafizova
Data and AI Training Advisor

 Learn about Apache Spark, using best practices to write performant code and tweaking and debugging Spark applications.

What else
should I know?