Routines TestedThe EC2 instance used was a High-CPU, c1.xlarge instance (7GB of memory and 8 Virtual Cores). The SMP Library contains 204 tuned and a total of 337 enhanced routines for use on Multicore machines from which I tested f11me(Sparse Matrix Factorization) and g02bn(Kendall/Spearman rank coefficient). While I had a maximum of 8 cores available, I decided to increase the number of threads beyond this, just to see the result. Below you will see how the time taken scales as the number of threads increases (click to enlarge):
Both routines scale well as you increase the number of threads, but f11me takes longer with 12 threads as opposed to 8! I suspect the slowdown is a result of dependencies between threads. In order for some to start, they may have to wait on other parts of the matrix factorization to finish. G02bn on the other hand doesn't require communication between threads so each one can run independently. This routine slightly benefited from running on 12 vs. 8 threads.
This raises an important point: One should always approach parallel programming cautiously. Some programs can take longer across a number of threads and even return an incorrect answer if the routine is not thread safe (thankfully, the NAG SMP Library is thread safe so we don't have to worry).
CPU UtilizationIn my last post we saw the CPU utilization didn't exceed 20% of the maximum. This left me slightly uneasy, as I was paying for 8 cores, but I could not use all of them! The CPU utilization for the above test looks better:
Ahh finally, we are able to use all the virtual cores on EC2.
One last interesting note is that during the initial testing of the Library on the Cloud, I ran into an allocation error with the 600 MB of memory available (my initial testing always uses the free tier before scaling up to a High-CPU instance). The allocation error can be easily remedied via changing the instance type. It takes only minutes to scale up to a large, cluster compute, or high memory instance.
In Summary-The NAG SMP Library works on EC2 and runtime scales well across the number of cores.
-Amazon's EC2 is a powerful cloud computing tool that allows you to easily increase Memory/CPU capacity (obtaining a maximum of 70GB memory or 16 cores).
-Care should be exercised when parallel programming.