Deep Dream / Caffe speed comps: Mac Pro vs AWS EC2 g2.2xlarge CUDA

My good friend, Dan Kitchens (of kozyndan), recently suggested that I should run Google's Deep Dream on some of my underwater video, which I thought was a fantastic idea. I started by experimenting with Docker containers like deepdream-locker and  because I didn't want to go through the pain of setting all of the Python dependencies on my machine. I've been out of software development for so long that I had never even heard of Docker, but it was pretty straightforward to get up and running. But everything in Docker was really slow (and not GPU accelerated), so I moved on and decided to install Caffe and the other Python dependencies on my Mac Pro. I was excited to get to run computer vision code on my blazingly-fast Mac Pro! I got the Deep Dream IPython Notebook up and running on my late-2013 Mac Pro (6-core, 3.5 Ghz), and... it was also really slow! After a bit more research, I discovered that Caffe only supports GPU acceleration using CUDA (NVIDIA), and unfortunately, my expensive Apple trash can computer uses AMD GPUs.

Luckily, some nice people built an AMI containing GPU-accelerated Caffe and all of its dependencies (Caffe on EC2 Ubuntu 14.04 Cuda 7). I had never brought an EC2 instance up, but Amazon's documentation is really good and it only took 15 minutes to get the instance running—mostly, the time was spent figuring out how to configure security settings. I chose to use the GPU instance g2.2xlarge—at $0.65/hour (on-demand), it's fairly cheap, and it was my first time, so I started small. ;) Adding 2 lines of code to Python scripts that use Caffe unlocked CUDA GPU acceleration... and I was in business! Everything is so easy these days—I'm always amazed.

The sample file I used was a 1920 x 1080 frame grab from a video I shot of snow monkeys in Japan (an amazing place to visit).

Source image for Deep Dream tests

Deep Dream did weird things, as expected:

Deep Dreaming with snow monkeys

I compared my Mac Pro and the EC2 g2.2xlarge instance with and without CUDA acceleration, running clouddream on each using inception_4c/pool at 10 iterations, 4 octaves, an octave scale of 1.4, and output resolution of 1280x720 (the defaults, other than the model and resolution). Here are the results:

  • Mac Pro, 3.5 Ghz 6-core, no CUDA: 59.2 seconds
  • EC2 g2.2xlarge, no CUDA: 7 min, 2.7 seconds
  • EC2 g2.2xlarge, CUDA: 19.5 seconds

Non unexpectedly, GPUs make a huge difference. At 720p output, the EC2 g2.2xlarge CUDA instance would take about 10 minutes of processing per second of video (30 frames), or nearly 10 hours of processing per minute of video (10 hours = $6.50 in AWS costs). That actually isn't too bad—I could get a minute of video by processing overnight, and because it's an EC2 instance, I could also just spin up a bunch of instances and run frames in parallel.

On the Mac Pro, it would take about half an hour to process 1 second of footage, or about 30 hours to process for 1 minutes of video. The multi-core Mac is essentially idling during this processing, and I've processed as many as 4 frames at once, so I could get processing down to 7.5 hours per minute. That's might be faster than a single EC2 CUDA instance could do it ("might," because I haven't tried anything in parallel in a single EC2 instance), but I'd be using a ton of electricity and generating a lot of heat at home using CPUs instead of a GPU.

The EC2 non-CUDA instance would take just under 9 days (!) to process a minute of video. Let's never try that!

Here's a visualization of the run times:

I learned a ton while getting Deep Dream to work in a CUDA-accelerated environment. I had never used Docker, Caffe, or EC2, and it was fun to figure all of this out. Now, the fun begins; I have some interesting things in mind to run through Deep Dream, and the environment is fast enough that I can run tons of different models and settings to tune the algorithms to produce the most interesting output.

I've also started experimenting with Deep Dream in 360 "VR" (spherical) video, and think there is some potential there to make freaky things, as well!

Testing Deep Dream inception4c/pool on a 360 spherical video (Mako's 1st birthday party). Frames processed at 1280x720 and played back at 15fps. makobday_inception_4c_pool_360