In this post, I’ll spell out some of the problems we encountered using Microsoft Azure to run a 3-week course for about 30 postgraduates in a typical “computer lab”. As you’ll see, a group of cloud-naive highly intelligent postgraduates are capable of breaking nearly anything and, perhaps, might constitute the perfect resilience test for your cloud software, if it survives that is…
As noted in the Azure instructions, we chose the simplest approach which was for all teachers and students to share the same Azure Subscription, using different Resource Groups for different parts of the course. As this is about 30-40 people in total, you quickly end up with a LOT of different VMs, Public IP addresses etc that, if they aren’t well named, gets very confusing and no-one wants to clean things up as they are afraid they’ll delete someone else’s Virtual Machine.
(a) Insist on a common naming scheme for resources and clean up as you go.
But even so you will find the odd, err, pedantic student. We asked them to call their VM “phylogenetics_YOURNAME” so guess what? Yep. Someone made one called “phylogenetic_YOURNAME”. The only solution I can see to this is ruthless deletion of anything that violates your naming scheme in BOFH style.
(b) Realise that there are resource limits per subscription
Because we didn’t and, as we hadn’t followed (a), quickly discovered that, by default, you can’t have any more than 60 Public IP addresses per subscription. (Later we also found there is a limit of 100 Security Groups per subscription). This led to some slightly panicked deleting during which an over-zealous student deleted the Image that one of the teachers had lovingly created for their course….
(c) If you want to keep it, Lock It.
Fortunately, the Image that was deleted was for the previous day’s course (not the next day’s). One of the teachers found that you can put delete locks on resources which is a very, very good idea for anything that you value as students can and will delete everything to get their VM working. More on this soon.
Not a new problem. In one of the practicals, because they were accessing the remote VM graphically via RDP (rather than simply via the terminal) it was necessary to have port 3389 open, as well as the default port 22. The practical clearly explained that you should attach the pre-defined security group that had this rule, but about half the class ignored this and created their own security group which, of course, had no port 3389 open, so they couldn’t connect graphically and then complained.
(e) No really, if you want to keep something, especially if it is critical, LOCK IT
Which brings us back to here, again. You need to picture the scene; about 15 students can only access their VM via the terminal as they haven’t followed the instructions and so are a bit grumpy and the other half are happily working away using an RDP clone on Ubuntu. Then one of the students, for reasons unknown, edits the shared Security Group and deletes the port 3389 rule. Instantly, half the class (who were feeling smug) are booted off their remote VM. Instant panic and confusion. Try figuring that one out on the hoof. We couldn’t – it was only later looking through the logs that we could see students trying to edit the Shared Security Group.
So, after all of this, I was pleasantly surprised that the feedback was positive and all the students could see how using cloud could help them in their research.