AMP Camp 1

AMP Camp One – Big Data Bootcamp Berkeley 2012 was held in Berkeley California in August 2012. In addition, it was broadcast live and video archived for free. Find links to the slides and video archives of the AMP Camp talks below.

Day 1 – Tuesday, Aug 21

8:00am Continental Breakfast

9:00am Introduction to Big Data & the AMPLab (Michael Franklin) slides video

An overview of the Big Data problem and its main concerns, such as data acquisition, cleaning, analysis, public data, and crowdsourcing.

9:20am Overview of the AMP Camp Curriculum (Andy Konwinski) slides

A preview of the next two days, including the AMP Camp agenda and vision.

9:30am Warehouse-scale computing (Ion Stoica) slides video

A discussion of the hardware and software typical for big data processing, the costs and capacities of different hardware resources, and an overview of the AMPLab’s BDAS stack.

10:00am Parallel programming with Spark – Part 1 (Matei Zaharia) slides video

A brief intro to Scala and exploring data in the Spark Shell.

10:55am Coffee Break

11:10am Parallel programming with Spark – Part 2 (Matei Zaharia) slides video

Writing standalone Spark programs using Scala or Java.

12:00pm Lunch

1:15pm Structured Data with Hive and Shark (Reynold Xin) slides video

Introductions to the Hive Data Model and Metadata Management, as well as querying Data in Shark.

2:00pm Machine Learning – Part 1 (Ariel Kleiner) slides video

An overview of Machine Learning and an introduction to classification.

3:00 3:20pm Coffee Break

3:15pm 3:40pm Machine Learning – Part 2 (Tamara Broderick) slides video

A continuation of the Machine Learning session, covering clustering.

4:15pm 4:45pm Hands-on Exercises using EC2

Manipulate multi-gigabyte datasets with Spark, Shark, and the distributed Machine Learning algorithms covered today. go to the exercises

6:30pm 6:45pm – 8:30pm Reception (Wozniak Lounge, Soda Hall)

Day 2 – Wednesday, Aug 22

8:00am Continental Breakfast

9:00am Crowdsourcing (Tim Kraska) slides video

A survey of crowdsourcing techniques and technologies including CrowdDB.

9:30am Advanced Spark Features (Matei Zaharia) slides video

An introduction to advanced Spark features such as controllable partitioning, caching formats, and serialization.

10:20am Spark Python API (Josh Rosen) slides video

A preview of features of the new Python API in development for Spark.

10:45am Coffee Break

11:00am Spark Streaming (Tathagata Das) slides video

An introduction to large-scale near-real-time stream processing with the soon to be released Spark Streaming.

11:30am Managing Twitter clusters with Mesos (Benjamin Hindman) slides video

An introduction to running frameworks on Mesos, including Hadoop and Spark, and an overview of how Twitter uses Mesos to build and manage production services.

12:15pm Lunch

1:30pm Carat – Collaborative Energy Debugging (Adam Oliner) slides video

Detecting and diagnosing misbehavior in mobile apps using big data + cloud + crowd.

2:00 Spark User Applications

3:30pm – 3:40pm Wrap-up & Future Directions (Michael Franklin) video

AMP Camp One Exercises

How To Participate

We will be using piazza.com to manage Q&A throughout both days of AMP Camp. Folks that register before opening day will be enrolled in the piazza class and will be able to post questions and collaborate to edit responses to these questions. In addition, AMP Camp instructors will also answer questions, endorse good answers, and moderate. For U.C. Berkeley students, enroll as a student at the AMP Camp Piazza Class. Non-students that registration before August 21st will be enrolled automatically.