AMP Camp 2

AMP Camp Two was a full day Big Data Bootcamp at the O’Reilly Strata conference 2013 in Santa Clara CA on Tue Feb 26 2013.

AMP Camp Two consisted of two 3hr tutorials:


Here are the slides from the talks in the first tutorial:

  • Berkeley Data Analytics Stack (BDAS) Overview, by Ion-Stoica pdf ppt
  • Parallel Programming With Spark, by Matei Zaharia pdf pptx
  • Machine Learning on Spark, by Shivaram Venkataraman pdf pptx
  • Large-scale Near Real-time Stream Processing, by Tathagata Das pdf ppt
  • Shark: SQL and Rich Analytics at Scale, by Reynold-Xin pdf


For the second tutorial, we provide in-person attendees with a small EC2 cluster running the current versions of Spark and Shark and a set of training exercises. You can participate in the same exercises that in-person attendees did using your own Amazon EC2 account (note: you will be billed for the EC2 time you use).

These hands-on exercises will walk you through the process of setting up a small cluster on EC2 running Spark and Shark, then loading and analyzing a real wikipedia dataset using your cluster. We begin with simple analysis techniques at the command-line, and progress to writing standalone programs, and then onto more advanced machine learning algorithms.

Attendees getting hands-on analyzing real data with Spark and Shark in our second Strata tutorial