Last year I took over coordinating a three-week Bioinformatics module for the Interdisciplinary Bioscience Doctoral Training Centre in Oxford. Much of the course is taken up with computational practicals which we usually run on the low-spec desktop PCs that they have in the DTC. This year though we have been fortunate to receive a grant from Microsoft Azure to try using their cloud instead which I believe has three main advantages.
- We can create more realistic practicals since e.g. the students could try assembling a whole bacterial genome by launching an instance with sufficient physical memory or they could run a molecular dynamics simulation by launching a multi-core or GPU instance.
- The students will learn how to use a commercial cloud, which is a great transferrable skill and may also give them an advantage in whichever lab they choose to go to for their DPhil.
- The students can do the practical outside the DTC, since at present the firewall rules mean they cannot remotely access their machines.
- It should be easier for the lecturers to setup a Virtual Machine Image with all their software installed, any input files uploaded and even some instructions included.
- Currently the DTC have to install all the software required for each module on their PCs which is a difficult and complex task. Using a cloud avoids this entirely.
These instructions are therefore aimed at students and teachers who are using Azure to create and use Virtual Machines in a computational practical, but they might be useful if you are trying to use Azure for the first time.
Importantly, these instructions follow the procedure we chose to use, which is for all students and teachers to belong to the same subscription. This means everyone can automatically see any Images that have been created, but as I will explain in a subsequent post, it can bring some (unexpected) problems as well. There are probably more sophisticated ways of using Azure but this is how we did it (and by the time you read this, Microsoft may well have improved things yet further).
1. Signup for a Microsoft account
If you don’t have a Microsoft account, go to https://signup.live.com and create one. It can be easier if you use a personal email address (like Gmail, Hotmail etc) rather than an institutional or business email address and Microsoft may try and link that to e.g. your University Office365 account if you have one.
After entering your email address and password Microsoft will send you a confirmation email with a code you will need to enter on the next page. Then you’ll need to prove you are a human by transcribing a Captcha. (This sometimes took me several attempts to get right)
If you already have a Microsoft account, please check by logging in to
If you can’t remember the password, please reset it.
Whether you have just setup an account, or logged into your existing Microsoft account, you need to be able to see your Microsoft account screen as shown below.
If you are a student, at this point you need to let your course organiser know your Microsoft identity email address so they can add to you the course Azure subscription (probably using the Educator portal). This will give you access to some funds to launch and run Virtual Machines. If you are running Azure yourself, you’ll need to create your own pay-as-you-go subscription which has your credit card details attached – you can do this in the Azure Portal. All the following instructions assume you you are a student getting access through your course organiser.
If you did send your email address to your course organiser there is one more important step. You will get an email like this from email@example.com and you must click Get Started which will then invite you to login and take you through to the Azure Portal.
If the above steps didn’t take you to the Portal, navigate there and login. First we need to check you’ve been added to the subscription for your course.
Your email address will be shown at the top right of the screen. If you look directly beneath it you will see either a long alphanumeric string or an email address. What is correct will depend on exactly where your funds to use Azure have come from; since we were awarded some time by Microsoft, we saw the email address of the administrative owner of the grant.
3. Launching an Ubuntu Virtual Machine
Now let’s launch (“spin-up”) a small Linux virtual machine and log in remotely. Log into the Azure portal with your Microsoft identify and on the Dashboard click Create Resources (highlighted with a red box).
On the next window, choose Ubuntu Server 16.04 LTS, which at the time of writing was the most recent Long Term Support version of the Ubuntu Linux distribution.
On the next page you will need to enter some options. These are
- Name. This is the name that your Virtual Machine will be referred to by within the Azure portal.
Since we are all sharing the same subscription, and hence all our resources will appear in the same list, make sure you can identify what is yours by, for example, appending your initials to the name. Don’t call anything “test” or “VirtualMachine”! I’m going to call mine LinuxTestPWF as PWF is my initials so I can instantly see it is mine.
- VM disk type. Leave this as the default SSD.
- User name. This is the name of the Linux user you will login into the Virtual Machine as. Azure automatically creates a user on the Virtual Machine with this name when it spins it up.
- Authentication type. By default it will ask you for an SSH public key. If you know what this, you probably know what you are doing. Otherwise, choose a password. Azure will insist it includes an upper case letter and a number to make it, you know, harder to remember.
- Subscription. In other words, how are you going to pay for the Virtual Machine. You should select the one that has been setup for your course (which should also be your only option).
- Resource group. These are a way of organising different resources, like Virtual Machines and Images. We created a Resource group for each sub-module of the course as well as one called 0_Azure_Test that is intended for testing so we’ve chosen this one. Please do not create your own!
- Location. Azure have Datacenters all over the globe. We are based in Oxford, UK and there are now two in the UK so we’ve chosen UK South as it currently has a larger selection of Virtual Machine types to choose from.
Once you’ve filled out the options, click OK.
Now we get to choose the size of the Virtual Machine. By size we mean the number and speed of CPU cores it has, how much physical memory. There are a large number of options, and each location only has a subset. Understandably, our free access doesn’t give us access to some of the largest and most expensive options, for example, there are Virtual Machines with 4x NVIDIA K80 GPU cards which would allow you to run molecular dynamics blazingly fast.
By default Azure shows you three small, cheap, general-purpose instance types. (If you want to see the full list available to you at your chosen location, click View all). Let’s choose the first one which is a D2S_V3 instance as that should be good enough. Click it and then choose Select
Now we are presented with some additional options. Leave all of them set to the defaults, except Auto-shutdown. This is a nifty feature that, when activated, will automatically shutdown your instance at the time specified (7pm by default), so if you forget to stop your Virtual Machine you will only be charged up to 7pm that day.
Finally, the Azure Portal summarises all the choices we’ve made and asks us if we are happy to proceed.
You don’t have to check the box – just click Create and wait.
You’ll start to see some messages about your Virtual Machine being deployed. Azure is now finding an empty slot in the UK South datacenter and copying in an Ubuntu 16.04 image, adding the user you specified, booting it up and connecting it to the network etc. This can take a few minutes.
When it tells you it is done, click the All resources blade on the left. Notice that there is more than one resource here; Azure has not only created a Virtual Machine called LinuxTestPWF, but has also created a Virtual Network and a Network Interface and a Public IP address. Here you can see one of the key points of cloud; everything has been abstracted. In other words, my Virtual Machine is independent of the IP address which is independent of the Storage. Click on your Virtual Machine (if the list is very long, you might want to Filter by name, or change the type so only Virtual Machines are displayed).
This pane is very useful; it shows us all sorts of specifics, like whether the VM is running, the Location it is in, its Size, and, crucially, its IP address (in blue). Copy the Public IP address and, in a Terminal on your workstation or laptop, ssh into the remote Virtual Machine, putting your user name and the IP address of your Virtual Machine.
You should then arrive at a prompt on your Azure Virtual Machine! Well done!
At this point you can install additional software and configure your Virtual Machine until it is how you want it. To copy some files (here, hello.txt) from your workstation to the VM, open another Terminal and use scp (or rsync if you are familiar with that)
Now comes the neat bit. Let’s say we want to go home, but we are not finished using our Virtual Machine and we don’t want to pay for having our VM “up” overnight. Well, if we set an Auto-shutdown time we could let it run until then. Or we could Stop the VM, which would free up the slot in the datacenter, release the IP address and hence all we would be paying for would the disk space necessary to store the virtual machine image. Then tomorrow morning we could Start the VM back up again and it would be exactly as we left it! One key difference is the IP address as Azure will have assigned our VM a different one. Try it! Just be warned both stopping and starting a VM can take a few minutes each. Click Stop, wait until the Status is listed as Stopped (deallocated), then click Start.
This highlights an important difference between Stop and Delete. The latter will deallocate the Virtual Machine AND delete the Virtual Machine image so there is no going back. Since storage is generally a lot cheaper than using a computing slot in the Datacenter, it is ok to keep some stopped Virtual Machines, but please tidy up, by deleting resources, as you will rapidly get confused which one is right one. Since we all share the same subscription, the list of resources will get long very quickly…
4. Creating an Image from a Virtual Machine
This section is more relevant for instructors, but I’ve included it since it shows you how you can setup a Virtual Machine with input files, instructions and programs and then capture it as an Image which other people belonging to the same subscription can use to launch Virtual Machines, allowing you to rapidly allow other people to try out your code, or in our case, run a Bioinformatics exercise. Don’t go through it step-by-step unless you want to really want to try it out. This might be useful for the Hackathon though..
There are some very good instructions by Microsoft on how to create an image of a Linux Virtual Machine so I will just summarise the main steps here. On your local computer you would also need to have installed the Azure Command Line Interface (CLI) but you may find it already installed.
As a simple (but daft) test we will save the Virtual Machine that was created in the previous section which is identical to a clean Ubuntu 16.04 server with one key difference; there is a file called hello.txt at the root of the filesystem.
First, we must deprovision the VM. Typically, you also remove the last installed user, so that when people specify their own user when they setup their own Virtual Machine based on this image you only get one user account. This means that you can’t leave anything important in the $HOME directory of the user account. If you want to leave the user account simply omit the +user flag in the following. In a terminal that is logged into your Virtual Machine in the state that you wish to save and create an Image type.
sudo waagent -deprovision+user
Once you answer yes there is no going back. Azure will delete the user account and start the process of turning this Virtual Machine into an Image. Quit the Virtual Machine by typing exit.
All the following commands are in your local Terminal and use the Azure CLI. First we need to login to Azure on the command line.
Copy the URL into a browser, enter the code and then login into Azure, closing the browser window when prompted. After about 20 seconds some text will appear in the Terminal and are now logged in! First we deallocate the Virtual Machine, changing the resource group and name where appropriate (this step can take a minute or two)
az vm deallocate --resource-group 0_Azure_Test --name LinuxTestPWF
Next we generalize it
az vm generalize --resource-group 0_Azure_Test --name LinuxTestPWF
Finally, we create the Image. Make sure you choose a suitable name!
az image create --resource-group 0_Azure_Test --source LinuxTestPWF --name UbuntuTestImage
To summarise we have created an Image called UbuntuTestImage in the 0_Azure_Test Resource. Anyone belonging to the DTP_Bioinformatics subscription can then see this image.
5. Launching a Virtual Machine from a Saved Image
This is very similar to the steps we went through to create an Ubuntu 16.04 Virtual Machine. Except this time, click on the name of the Image that is stored in the DTP_Bioinformatics subscription and then click Create VM.
Now you should see this screen, exactly the same as before! Simply pick up the instructions from this point in the first section and soon you’ll be able to remotely login using ssh.