These are the results from running the ml-style-transfer project on three different AWS EC2 instance types.
|Instance Name||$Cost/hour||250 epochs||2,500 epochs||50,000 epochs|
The p3.2xlarge is 32x more expensive per hour. Yet, for the 2,500 epoch tests, it is 266x faster, which combines to be 8x more cost-effective (e.g. $0.05 versus $0.40 for 2,500 epochs).
The p3.2xlarge also gets results faster (wall clock) – the 50,000 epoch run on p3.2xlarge only took 15 minutes wall clock, yet the projected run on t2.large is 4,892 minutes (over 3 days).
The c5.4xlarge is a “compute optimized” – large vCPU, large RAM. It is 7x the hourly price of the t2.large, and on the 2,500 epoch test, delivers about that much wall clock improvement – so, basically the same cost, but 7x faster delivery of results.
Note: the total test time was approximately 40 minutes on the p3.2xlarge ($2.04) and 280 minutes on the t2.large ($0.44). But, for a mere 5x total cost difference, the p3.2xlarge performed a 50,000 epoch run that would have taken forever on the t2.large.
In contrast, purchasing a RTX 3080Ti for $900 and using it for the 2,500 epoch run took 16 seconds (which is barely slower than the current record-holder p3.2xlarge at 14 seconds).
Next up is to upgrade my AWS account to allow “spot” pricing for p3.2xlarge – if Amazon will allow it for my non-commercial account. The $3.06 on-demand price seems to drop to about $1.01 for spot instances. Update (1 day later): “We have approved and processed your service quota increase request”. So my “EC2->Limits->All P Spot Instance Requests now says “8 vCPUs”, which is enough for one p3.2xlarge instance.
Just FYI: when that AMI is run under a non-GPU machine type, the first run takes an extra 5+ minutes, as the system says “Matplotlib is building the font cache; this may take a moment.” It only does this the first run. (This is an example of a thing that cause differences in “python elapsed time” and “wall clock elapsed time”.)