AMP Camp Two was a full day Big Data Bootcamp at the O’Reilly Strata conference 2013 in Santa Clara CA on Tue Feb 26 2013.
AMP Camp Two consisted of two 3hr tutorials:
- An Introduction to the Berkeley Data Analytics Stack (BDAS) Featuring Spark, Spark Streaming, and Shark – Part 1
- Hands-on with BDAS – Learn Spark and Shark via Real Data Analysis – Part 2
Here are the slides from the talks in the first tutorial:
- Berkeley Data Analytics Stack (BDAS) Overview, by Ion-Stoica
- Parallel Programming With Spark, by Matei Zaharia
- Machine Learning on Spark, by Shivaram Venkataraman
- Large-scale Near Real-time Stream Processing, by Tathagata Das
- Shark: SQL and Rich Analytics at Scale, by Reynold-Xin
For the second tutorial, we provide in-person attendees with a small EC2 cluster running the current versions of Spark and Shark and a set of training exercises. You can participate in the same exercises that in-person attendees did using your own Amazon EC2 account (note: you will be billed for the EC2 time you use).
These hands-on exercises will walk you through the process of setting up a small cluster on EC2 running Spark and Shark, then loading and analyzing a real wikipedia dataset using your cluster. We begin with simple analysis techniques at the command-line, and progress to writing standalone programs, and then onto more advanced machine learning algorithms.
Attendees getting hands-on analyzing real data with Spark and Shark in our second Strata tutorial