This is the third in a series of posts about running your own Git server on an EC2 image. The posts include:

In my previous post I explained how to request the initiation of an EC2 Spot Instances, and I gave an overview of the scripts that provision the server during the boot process. In this post I present the detail of those scripts.

Overview

As described in the previous post, when the server is initiated, it will execute the setup/user-data.sh script passed into it via the ec2-request-spot-instances command. That script mounts a small volume to the /secrets mount point. /secrets contains all the user data, scripts, and supporting data needed to provision the system.

Secrets

The mount point is called /secrets because it contains the AWS secret IDs needed to perform various AWS actions, such as mounting EBS volumes and updating Route53 DNS info.

Here is a directory listing of /secrets on my system. Userids in filenames have been changed:

/secrets:
.  ..  lost+found  setup  users

/secrets/lost+found:
.  ..

/secrets/setup:
.                    attachTaggedVolume.py  second-setup.sh  statusVolume.sh
..                   first-setup.sh         setr53.py        user-data.sh
ak                   hz                     setupUsers.sh
ak.src               mountDataVolume.sh     sk
attachDataVolume.py  second-setup.conf      statusVolume.py

/secrets/users:
.  ..  git  bob  ubuntu

/secrets/users/git:
.  ..  .inputrc  .ssh

/secrets/users/git/.ssh:
.  ..  git.id_rsa.pub  gitolite-admin.pub  gitolite.keys

/secrets/users/bob:
.   bashrc-supplement-osxterm.src  .inputrc     .ssh
..  bashrc_supplement.src          link-dev.sh

/secrets/users/bob/.ssh:
.                 carol-id_rsa.pub
..                github-bob-id_rsa
config            github-bob-id_rsa.pub
alice-id_rsa      bob-id_rsa
alice-id_rsa.pub  bob-id_rsa.pub
carol-id_rsa

/secrets/users/ubuntu:
.  ..  bashrc-supplement-osxterm.src  bashrc_supplement.src  .inputrc  .ssh

/secrets/users/ubuntu/.ssh:
.  ..  authorized_keys

Of primary interest are the files in the /secrets/setup folder.

ak
hz
sk
these files contain special identifiers used by AWS. `ak` holds the AWS access key; `sk` holds the AWS secret key; `hz` holds the Route53 hosted zone identifier.
attachDataVolume.py
attachTaggedVolume.py
these files are identical. `attachDataVolume.py` is not used. `attachTaggedVolume.py` is a Python script that uses the boto library to perform various AWS operations, including mounting volumes. It uses tags to locate and identify the AWS volumes that should be mounted.
first-setup.sh
This is the script invoked by `user-data.sh`, during the first boot of the newly created instance. It is responsible for identifying the security updates that need to be applied, and with configuring the system to run the `second- setup.sh` script after a reboot. The last thing that `first-setup.sh` does is schedule a nearly immediate reboot of the system.
mountDataVolume.sh
Not used. A hard-coded script used to mount the data volume. Probably generated as an experiment when I was figuring out how to automatically provision the server.
second-setup.conf
This is an upstart job that will launch on the second boot (the reboot triggered by `first-setup.sh`). It’s only purpose is to run the `second- setup.sh` script and then remove itself.
second-setup.sh
This script will install all the optional sofware packages needed to provision the server, mount the remaining volumes, setup the user accounts, setup gitolite, and modify the Route53 DNS setup to map my subdomain to the external IP address of the server.
setr53.py
A Python script that uses boto to configure Route53 – specifically to update the DNS entry fo rthe subdomain with the new IP address of the server.
setupUsers.sh
Configures the home directory contents for the ubuntu user, for the gitolite user, and for the normal user account (in this case, “bob”).
statusVolume.py
statusVolume.sh
A Python script and the shell script that uses it to show the status of a particular AWS volume. Just used for experimentation, not used as part of the automatic provisioning process.
user-data.sh
A copy of the user-data.sh script used on my local system. I just keep a copy on the EC2 image so that I can reference it when maintaining the scripts.

first-setup.sh

first-setup.sh is invoked at the end of execution of user-data.sh. It looks like this:

#! /bin/bash
#

# Establish source list of only 'authorized' apt sources
#
cat /etc/apt/sources.list | egrep '^(deb|deb-src)[[:space:]]' \
    | grep '[[:space:]]main$' > /etc/apt/sec.sources.list

# Get security and distribution updates
#
apt-get update --assume-yes
apt-get dist-upgrade --assume-yes \
   -o Dir::Etc::sourcelist=/etc/apt/sec.sources.list

# Setup the second setup script (as an upstart job) to run after the reboot
#
cp /secrets/setup/second-setup.conf /etc/init

# restart the system to pick up kernel changes
#
shutdown -r +1

The first thing done is to strip from the sources.list file any package sources that are not entirely supported by Ubuntu. We only include uncommented lines that start with “deb” or “deb-src” and end with “main” (and not with “partner”, “universe” or “multiverse”, for example).

Next, an apt-get update followed by an apt-get dist-upgrade will apply any OS or security updates needed to get the image up to the current patch level. Note: as time goes by, this operation takes longer and longer. You can shorten this by using a different, newer, EC2 AMI to initialize the system.

Once the updates are done, the script configures an upstart job to run second-setup.sh when the system reboots, and then it reboots the system.

Here is the upstart job file:

start on (local-filesystems and static-network-up)
script
    # run second setup script (after reboot)
    #
    /secrets/setup/second-setup.sh > /home/ubuntu/init/second-setup.sh.log 2>&1
end script
post-stop script
    # clean up
    rm /etc/init/second-setup.conf
end script

All the action taking place in the script is logged to /home/ubuntu/init /first-setup.sh.log, so you can use that to troubleshoot or verify provisioning.

second-setup.sh

second-setup.sh is invoked during the second boot of the system (the reboot scheduled by first-setup.sh). The script looks like this:

#! /bin/bash
#
apt-get install git-core --assume-yes
apt-get install git-doc --assume-yes
#
apt-get install python3 --assume-yes
apt-get install python3-dbg --assume-yes
apt-get install python3-doc --assume-yes

# Attach the data volume
#
INSTANCE_ID_=`curl -s -g http://169.254.169.254/latest/meta-data/instance-id`
DEVICE_='/dev/sdg'
MOUNT_HOST_='dev.bob.org'
MOUNT_NAME_='data'
python /secrets/setup/attachTaggedVolume.py $INSTANCE_ID_ $DEVICE_ \
    $MOUNT_HOST_ $MOUNT_NAME_
RC_=$?
if [[ "X0" != "X$RC_" ]]; then
    echo "Problem attaching volume; cannot mount."
    exit 2
fi

# create the mount point and mount the volume
#
mkdir /mnt/data
echo "Mounting data ..."
mount -t ext4 -o defaults,noatime,nodiratime /dev/xvdg /mnt/data
if [[ "X0" != "X$?" ]]; then
    echo "terminating script"
    exit 4
fi
echo "Mounted"

# Put entry in fstab so volume is remounted on reboot
#
echo "/dev/xvdg /mnt/data ext4 defaults,noatime,nodiratime 0 0" >> /etc/fstab

# Attach the dev accounts volume
#
INSTANCE_ID_=`curl -s -g http://169.254.169.254/latest/meta-data/instance-id`
DEVICE_='/dev/sdh1'
MOUNT_HOST_='dev.bob.org'
MOUNT_NAME_='dev'
python /secrets/setup/attachTaggedVolume.py $INSTANCE_ID_ $DEVICE_ \
    $MOUNT_HOST_ $MOUNT_NAME_
RC_=$?
if [[ "X0" != "X$RC_" ]]; then
    echo "Problem attaching volume; cannot mount."
    exit 2
fi

# create the mount point and mount the volume
#
mkdir /mnt/dev
echo "Mounting dev ..."
mount -t ext4 -o defaults,noatime,nodiratime /dev/xvdh1 /mnt/dev
if [[ "X0" != "X$?" ]]; then
    echo "terminating script"
    exit 4
fi
echo "Mounted"

# Put entry in fstab so volume is remounted on reboot
#
echo "/dev/xvdh1 /mnt/dev ext4 defaults,noatime,nodiratime 0 0" >> /etc/fstab

# Create the user accounts
#
. /secrets/setup/setupUsers.sh

# Add the gitolite symbolic link
#
ln -s /mnt/data/tools/gitolite/src/gitolite /usr/local/bin/gitolite
su - git --shell /bin/bash -c "ln -s /mnt/data/tools/gitolite-home/.gitolite /home/git/.gitolite"
su - git --shell /bin/bash -c "ln -s /mnt/data/tools/gitolite-home/.gitolite.rc /home/git/.gitolite.rc"
su - git --shell /bin/bash -c "ln -s /mnt/data/tools/gitolite-home/projects.list /home/git/projects.list"
su - git --shell /bin/bash -c "ln -s /mnt/data/tools/gitolite-home/repositories /home/git/repositories"
cat /secrets/users/git/.ssh/gitolite.keys >> /home/git/.ssh/authorized_keys

# Update the CNAME for dev.bob.org to point to this
# instance
#
NEW_CNAME_='dev.bob.org'
PUBLIC_HOSTNAME_=`curl -s -g http://169.254.169.254/latest/meta-data/public-hostname`
# echo public hostname is $PUBLIC_HOSTNAME_
python /secrets/setup/setr53.py $PUBLIC_HOSTNAME_ $NEW_CNAME_

The script first installs some software that is required by the script, that is required by gitolite, and that I find otherwise useful.

Once that is done, the instance ID of the system is found by using an AWS web service, and then we read some of the necessary AWS identifier. All that information is used to locate and mount first the data volume, which contains the gitolite installation and repositories, and then the dev volume which is used for local development (i.e. connecting via SSH as a normal user … useful if I don’t want my git repo on a local system.)

After mounting the volumes, the various user accounts are completely configured (using the setupUsers.sh script), and then the gitolite installation is mapped to the “git” user’s home directory.

At this point the system is ready to use – except that no one can find it. To make the system discoverable by external users, I use AWS’s Route53 service to setup a subdomain pointing to the system. Here again we use an AWS web service to discover the public hostname (which is some long name like “ec2–54–224–40–225.compute–1.amazonaws.com” that changes with every restart) and map that to the well know name (in the example above, “dev.bob.org” – not my real domain).

Next topics

That’s plenty for one post. But there is a lot more to cover. A lot more – including:

  • The Python scripts used in the automatic provisioning.
  • Maintaining the /secrets volume and updating the AMI
  • Installing, configuring, and maintaining gitolite
  • Bootstrapping – how to start from an AMI and build up an autoprovisioning system