Course: Apache Spark Advanced Topics

$204.49 incl. vat

duration: 11 hours |

Language: English (US) |

access duration: 90 days |

In Onbeperkt Leren


In this Spark training course you will learn about the various advanced possibilities of Spark. You will learn how you can improve performance with data frames and Spark SQL. In addition you will learn how to work with Spark streaming and you learn how to work with different libraries such MLlib, graphx and R.

Among the subjects covered are RDD , Data Frames, JSON data set , JDBC / OCBC server Dstream , Twitter, AS , K-means , LDA, Pregel API and far more.


After finishing this course you are familiar with the advanced features that offer Spark


Before attending this course, we recommend having any knowledge of programming.

Target audience

Software Developer, Web Developer


Apache Spark Advanced Topics

11 hours

Spark Core

  • start the course
  • recall what is included in the Spark Stack
  • define lazy evaluation as it relates to Spark
  • recall that RDD is an interface comprised of a set of partitions, list of dependencies, and functions to compute
  • pre-partition an RDD for performance
  • store RDDS in serialized form
  • perform numeric operations on RDDs
  • create custom accumulators
  • use broadcast functionality for optimization
  • pipe to external applications
  • adjust garbage collection settings
  • perform batch import on a Spark cluster
  • determine memory consumption
  • tune data structures to reduce memory consumption
  • use Spark's different shuffle operations to minimize memory usage of reduce tasks
  • set the levels of parallelism for each operation
  • create DataFrames
  • interoperate with RDDs
  • describe the generic load and save functions
  • read and write Parquet files
  • use JSON Dataset as a DataFrame
  • read and write data in Hive tables
  • read and write data using JDBC
  • run the Thrift JDBC/OCBC server
  • show the different ways to tune up Spark for better performance

Spark Streaming

  • start the course
  • describe what a DStream is
  • recall how TCP socket input streams are ingested
  • describe how file input streams are read
  • recall how Akka Actor input streams are received
  • describe how Kafka input streams are consumed
  • recall how Flume input streams are ingested
  • set up Kinesis input streams
  • configure Twitter input streams
  • implement custom input streams
  • describe receiver reliability
  • use the UpdateStateByKey operation
  • perform transform operations
  • perform Window operations
  • perform join operations
  • use output operations on Streams
  • use DataFrame and SQL operations on streaming data
  • use learning algorithms with MLlib
  • persist stream data in memory
  • enable and configure checkpointing
  • deploy applications
  • monitor applications
  • reduce batch processing times
  • set the right batch interval
  • tune memory usage
  • describe fault tolerance semantics
  • perform transformations on Dstreams

MLlib, GraphX, and R

  • start the course
  • describe data types
  • recall the basic statistics
  • describe linear SVMs
  • perform logistic regression
  • use naïve bayes
  • create decision trees
  • use collaborative filtering with ALS
  • perform clustering with K-means
  • perform clustering with LDA
  • perform analysis with frequent pattern mining
  • describe the property graph
  • describe the graph operators
  • perform analytics with neighborhood aggregation
  • perform messaging with Pregel API
  • build graphs
  • describe vertex and edge RDDs
  • optimize representation through partitioning
  • measure vertices with PageRank
  • install SparkR
  • run SparkR
  • use existing R packages
  • expose RDDs as distributed lists
  • convert existing RDDs into DataFrames
  • read and write parquet files
  • run SparkR on a cluster
  • use the algorithms and utilities in MLlib

Course options

We offer several optional training products to enhance your learning experience. If you are planning to use our training course in preperation for an official exam then whe highly recommend using these optional training products to ensure an optimal learning experience. Sometimes there is only a practice exam or/and practice lab available.

Optional practice exam (trial exam)

To supplement this training course you may add a special practice exam. This practice exam comprises a number of trial exams which are very similar to the real exam, both in terms of form and content. This is the ultimate way to test whether you are ready for the exam. 

Optional practice lab

To supplement this training course you may add a special practice lab. You perform the tasks on real hardware and/or software applicable to your Lab. The labs are fully hosted in our cloud. The only thing you need to use our practice labs is a web browser. In the LiveLab environment you will find exercises which you can start immediatelyThe lab enviromentconsist of complete networks containing for example, clients, servers,etc. This is the ultimate way to gain extensive hands-on experience. 


Via ons opleidingsconcept bespaar je tot 80% op trainingen

Start met leren wanneer je wilt. Je bepaalt zelf het gewenste tempo

Spar met medecursisten en profileer je als autoriteit in je vakgebied.

Ontvang na succesvolle afronding van je cursus het officiële certificaat van deelname van

Krijg inzicht in uitgebreide voortgangsinformatie van jezelf of je medewerkers

Kennis opdoen met interactieve e-learning en uitgebreide praktijkopdrachten door gecertificeerde docenten


Once we have processed your order and payment, we will give you access to your courses. If you still have any questions about our ordering process, please refer to the button below.

frequently asked quesions

What is included?

Certificate of participation Yes
Monitor Progress Yes
Award Winning E-learning Yes
Mobile ready Yes
Sharing knowledge Unlimited access to our IT professionals community
Study advice Our consultants are here for you to advice about your study career and options
Study materials Certified teachers with in depth knowledge about the subject.
Service World's best service


Na bestelling van je training krijg je toegang tot ons innovatieve leerplatform. Hier vind je al je gekochte (of gevolgde) trainingen, kan je eventueel cursisten aanmaken en krijg je toegang tot uitgebreide voortgangsinformatie.

Life Long Learning

Follow multiple courses? Read more about our Life Long Learning concept

read more

Contact us

Need training advise? Contact us!