Mini Course

AMP Camp aims to teach practitioners how to use the open source tools being built and released by the AMPLab for advanced data analytics and Machine Learning. Here you can find our free Big Data Mini Course. The mini course hosted here is the most up to date training document we have.

For in-person attendees at AMP Camps, we provide a small EC2 cluster running the current BDAS software stack (Spark/Shark/Mesos). You can participate in the same mini course that in-person AMP Camp attendees do using your own Amazon EC2 account (note: you will be billed for the EC2 time you use).

The most recent version of the AMP Camp mini course is the version we prepared for AMP Camp 3. It will walk you through the process of setting up a small cluster on EC2 running Spark and Shark, then loading and analyzing a real wikipedia dataset using your cluster. We begin with simple analysis techniques at the command-line, and progress to writing standalone programs, and then onto more advanced machine learning algorithms.

If you’re looking for the version of the mini course that we used at a previous AMP Camp, check out the archived event page from AMP Camp 3, AMP Camp 2, or AMP Camp 1.

Exercise Prerequisites

The exercises use a pre-built machine image (AMI) available on Amazon’s EC2 compute cloud. To participate in the exercises, you will first run a local Python script which uses your EC2 credentials to launch virtual instances. You can then log into those instances and step through the exercises.

  • The local run scripts we’ll use require Python 2.x and have been tested to work on Linux or OS X. We will use the Bash shell in our examples. If you are using Windows, consider installing Cygwin.
  • Participants will use Amazon EC2 to deploy a small cluster. This requires signing up for an EC2 account if you don’t already have one. General familiarity with EC2 concepts, such as AMI‘s will be helpful.
  • You’ll need to know two EC2 security credentials: your EC2 access key ID and your EC2 secret access key. These can be obtained from the AWS homepage by clicking Account > Security Credentials > Access Credentials.
  • You’ll need to create an EC2 key pair. This can be done by logging into your Amazon Web Services account through the AWS console, clicking Key Pairs on the left sidebar, and creating and downloading a key.
  • We will be using deploy scripts very similar to those on the Deploying Spark on EC2 web page. Feel free to try out those instructions to verify that your environment is correctly set-up.