Welcome to the AMP Camp 5 hands-on exercises! These exercises are extended and enhanced from those given at previous AMP Camp Big Data Bootcamps. They were written by volunteer graduate students and postdocs in the UC Berkeley AMPLab. Many of those same graduate students are present today as teaching assistants. The exercises we cover today will have you working directly with the Spark specific components of the AMPLab’s open-source software stack, called the Berkeley Data Analytics Stack (BDAS).
In order to get the most out of this course, we assume:
- You have experience using the core Spark APIs
- You have a laptop
- Your laptop has Java 6 or 7 installed
If you would like a quick primer on Scala, check out the following doc in the appendix:
|SparkR||R only||R only||R only|
In several of the proceeding training modules, you can choose which language you want to use as you follow along and gain experience with the tools. The following table shows which languages this mini course supports for each section. You are welcome to mix and match languages depending on your preferences and interests.
The modules we will cover at the AMPCamp training are listed below. These can be done in any order according to your interests, though we recommend that new users start with Spark.
Note: Please follow the setup instructions at the Getting Started page before any of the exercises.
|Spark||Use the Spark shell to write interactive queries||Short||Programming Guide|
|Spark SQL||Use the Spark shell to write interactive SQL queries||Short||Programming Guide|
|Tachyon||Deploy Tachyon and try simple functionalities.||Medium||Project Website|
|MLlib||Build a movie recommender with Spark||Medium||Programming Guide|
|GraphX||Explore graph-structured data and graph algorithms||Long||Programming Guide|
|Pipelines||Image classification with pipelines||Medium|
|SparkR||Interactive Data Analytics using Spark in R||Short||Project Page; Github|
|ADAM||Genome analysis with ADAM||Medium|
IRC Chat Room
A chat room is available for participants to connect with each other and get realtime help with exercises. The room can be joined here or by using an IRC client to connect to the #ampcamp channel on the FreeNode (irc.freenode.net) network.