Git on EC2: Provision the Server

This is the second in a series of posts about running your own Git server on an EC2 image. The posts include:

Git on EC2: Spot on!

Git on EC2: Provision the Server

Git on EC2: Booting the Server

Git on EC2: Using Python to Automate Provisioning

In my previous post I explained why I chose Amazon’s EC2 Spot Instances to host my Git server. In this post I explain which OS I chose, how I provisioned the server, and how I configured the server.

Provisioning

Two of the early choices to be made when provisioning a server in the cloud are:

OS selection
Machine sizing

These are not completely independent decisions. The OS may have minimum requirements that restrict smaller machine sizes or capacity limits that make it pointless to choose the largest machines.

Machine sizing

In this case—running a remote Git server—the machine sizing decision is straight-forward. Because this is not a process or memory intensive application, I selected the smallest, cheapest machine configuration available. The good news is that EC2 offers very small (cheap) images that are 64-bit; they are called “Micro” images. Micro images have less than a Gigabyte of memory and only one CPU core. As a Spot Instance, Micro images can cost less than a penny per hour to run.

OS selection

EC2 permits both Linux and Windows instances, but charges more for the Windows instances. As of this writing, Windows EC2 Spot Instances are approximately twice as expensive as Linux instances. Nothing about what I’m doing requires Windows, so running Linux seems to be the economical choice.

Of course if I knew nothing about Linux or Unix and had deep knowledge of adminstrating a Windows Server system, then I might choose the more expensive Windows image. Although I do Windows software development and a bit of sysadmin work in my day job, I’m reasonably familiar with Linux administration. For me, running Linux seems to be the optimal choice.

The next question is: which Linux? I chose the latest Ubuntu release. I did so because Ubuntu have specific features for the EC2 environment, and because they actively maintain a suite of images for EC2, and there is a large community of Ubuntu EC2 users. Especially attractive is the Ubuntu concept of LTS releases. LTS means Long Term Support. Every couple years Ubuntu releases a version that is designated LTS; LTS versions of Ubuntu Server will have support for 5 years (instead of the usual 1 or 2 year support period). By using an LTS version, I will continue to get patches for the next 5 years and will not have to worry about adapting to significant OS changes. Nice.

Configuring a spot instance

One of the more interesting things about running servers in the cloud is that they can be terminated and restarted at any time. In a managed cloud such as Microsoft Azure, the vendor can take down and spin up images as needed to apply patches or reorganize the physical distribution of the virtual machines. In an unmanaged cloud, like EC2, there can still be outages if the provider needs to do hardware maintenance or change deployments to meet service level agreements. And, in the case of EC2 Spot Instances, instances are terminated when the spot price limit is exceeded.

The key to success in this environment is to use a combination of persistent storage volumes and clever initialization scripts to recreate the server capabilities when an instance is reprovisioned after termination. For spot instances, EC2 offers two kinds of spot requests: one time and peristent. A one time request is removed after it is satisfied or expires. A persistent request requeues after being satisfied – so that if EC2 terminates the spot instance, the peristent request will ensure that a new instance is started when conditions permit.

Here is what my persistent request looks like when submitted via the EC2 command line interface:

! /bin/sh
#
# Create and run an amd64 Ubuntu 12.04.1 LTS (Precise Pangolin) image
# [20120822]
#
ec2-request-spot-instances ami-3d4ff254 --price 0.005 --instance-count 1 \
  --type persistent --group devserver \
  --block-device-mapping "/dev/sdf=snap-b4cbe3ab::true" \
  --key my.id_rsa -t t1.micro --monitor \
  --availability-zone us-east-1c \
  --user-data-file setup/user-data.sh

This will instantiate a single (--instance-count 1) micro instance (-t t1.micro) using one of the Ubuntu images (ami-3d4ff254) as long as the spot price is at or below ½ cent (--price 0.005).

I’ve also requested that EC2 monitor activity (--monitor) and I’ve restricted access to the system to a certain security group (--group devserver). To eliminate cross area bandwidth charges, I launch the instance in the same availability zone (--availability-zone us-east-1a) as holds my EBS volumes.

When started, EC2 will instantiate and attach an EBS volume using a snapshot that contains some initial data and scripts (--block-device-mapping "/dev/sdf =snap-b4cbe3ab::true") and execute a script (--user-data-file setup/user- data.sh) once the system has finished booting. To keep storage costs down, that volume is just 1GB.

That startup script will mount the attached volume, copy some key files into the default user account’s home directory tree, and then will run another setup script that installs security patches and reboots the system (after scheduling a third script to run after the reboot). That third script mounts more volumes, installs extra software not part of the default Ubuntu image, and adds some users.

I’ll discuss the setup scripts in another post. As a teaser, here is what the startup script, setup/user-data.sh, looks like:

#! /bin/bash
#
mkdir -m 000 /secrets
mount -t ext4 -o noatime,nodiratime /dev/xvdf /secrets

# Put entry in fstab so volume is remounted on reboot
# echo "/dev/xvdf /secrets ext4 noatime,nodiratime 0 0" >> /etc/fstab
# copy items from /secrets
# cp -R /secrets/users/ubuntu/.ssh /home/ubuntu/
chown -R ubuntu:ubuntu /home/ubuntu/.ssh
cp /secrets/users/ubuntu/.inputrc /home/ubuntu/
chown ubuntu:ubuntu /home/ubuntu/.inputrc

# Run the first setup script (which should update kernel and extant packages)
#
mkdir /home/ubuntu/init
/secrets/setup/first-setup.sh > /home/ubuntu/init/first-setup.sh.log 2>&1

# restart the system to pick up kernel changes
#
shutdown -r +1

Notice that the mount point for that initial volume is called /secrets. Because the /secrets tree contains private keys and other files used for authentication, the secrets directory gives no user access permission. You have to use sudo to list the directory contents or view files, and only the ubuntu user will be configured to permit the use of sudo. Other user accounts can be created and accessed by normal users without fear of losing control of AWS authentication and privileged access.

In my next post, I’ll discuss further the automated configuration of the system. Assuming that does not overflow into yet another setup post, the deployment of the git server should be in the post after that.

Provisioning

Machine sizing

OS selection

Configuring a spot instance

Next post