Hands-on Exercises


Welcome to the Stanford Workshop hands-on exercises! These exercises are extended and enhanced from those given at previous AMP Camp Big Data Bootcamps. They were written by volunteer graduate students and postdocs in the UC Berkelay AMPLab, and members of the open-source team at Databricks. Many of these individuals are present today as teaching assistants. The exercises we cover today will have you working directly with the Spark specific components of the AMPLab’s open-source software stack, called the Berkeley Data Analytics Stack (BDAS).

You can navigate around the exercises by looking in the page header or footer and clicking on the arrows or the dropdown button that shows the current page title (as shown in the figure below).

The components we will cover at the first Spark Training are listed below.


  1. Scala - a quick crashcourse on the Scala language and command line interface.
  2. Spark (project homepage) - a fast cluster compute engine.
  3. Machine Learning with MLLib (project homepage) - Build a movie recommender with Spark.
  4. [Optional] Graph Analytics with GraphX (project homepage) - Explore graph-structured data (e.g., Web-Graph) and graph algorithms (e.g., PageRank) with GraphX.

Getting Started

If you are attending Spark Training in person, the TAs will be handing out cluster hostnames and you can obtain the private key from the TinyURL address on the projector. Once you have your cluster hostname and private key you can follow the directions to log into your cluster.

If you are participating in the exercises from a remote location, you will want to launch a BDAS cluster on Amazon EC2 for yourself.

Submit an issue on GitHub
Hands-on Exercises