ryjo.codes

Hosting Your Own Git Server: Part 1

January 27th, 2019

Introduction

This sounds crazy. GitHub, GitLab and a myriad of other hosting services exist just so you don't have to do this. In fact, they provide a really nice interface over top of your repos. They render your READMEs with some stylish CSS. The give you pull request, issue tracker and wiki features built in. Repository forking is a pretty amazing feature, too. Oh, and don't forget about the social aspect.

All of these things are really attractive, and they lend a lot of convenience to the software development process.

You and I? We're explorers. We do things that are difficult just to say we did. In doing these difficult things, we learn. While we may find that this is not the best way to go about doing things, we may learn some things that help us in our day-to-day that we may not have known otherwise.

Before we get started, I published a repository on GitHub containing scripts based on this article. Give that a look if you'd like to skip to the end.

Alright. Let's do this.

Prerequisite

Actually, one more thing: in this article, I chose to leverage Amazon AWS ec2 infrastructure to host a virtual machine that will run my git server. I think this accurately reflects many modern day tech companies' infrastructures, so it's a valuable tool to learn.

However, there are many options available to us, and I would be remiss to limit the reader to one non-free route. In this article, I provide several options, including:

A git "server" running on your local computer
A separate machine running Ubuntu on your local network
A virtual machine running on your local computer
Amazon AWS

Once you've completed one of the above four sections, feel free to skip ahead to the non-implementation-specific section of this article.

From Scratch, As Usual

Let's start with nothing. This section details how you can use your local machine to host your repositories. You'll need to use your imagination in some places, but pretend that 127.0.0.1 is a different machine and you'll be good to go.

I'll assume your local machine is running Ubuntu. If you're not, just keep in mind that you may have to change something slightly if it doesn't work for you. I anticipate most unix-like environments should function similarly.

The rest of the guide assumes the git server is only accessible via an ssh connection. In order to get this working locally, you'll just need to install Openssh Server locally. You should follow that guide as it is the official Ubuntu documentation, but here's what I did to get a local ssh server running:

sudo apt install openssh-server
printf "PasswordAuthentication no\nPubkeyAuthentication yes" | sudo tee -a /etc/ssh/sshd_config
sudo systemctl restart sshd.service

The first line installs openssh-server which is what will run in the background and accept ssh connections from clients. In the second line, we edit the /etc/ssh/sshd_config by adding configuration options to the end of the file. These configuration options tell the ssh server to not accept the regular user's password as a means of authentication (PasswordAuthentication no) in preference to using a generated ssh public key instead (PubkeyAuthentication yes). printf simply prints these lines to STDOUT so that we can pipe (|) the text to the tee command. We use printf instead of echo so that we can specify a newline with \n. tee -a appends these two configuration options to the /etc/ssh/sshd_config file. We'll discuss the tee command more later in this article.

Now that we've set up our "server," we'll set up our "client." Assuming our local user's username is ryjo, we'll do the following:

ssh-keygen -t rsa -b 4096 -f ~/.ssh/ryjo
cat ~/.ssh/ryjo.pub | tee -a ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
cat >> ~/.ssh/config << CONFIG 
Host gitservadmin
    Hostname 127.0.0.1
    IdentityFile ~/.ssh/ryjo
    IdentitiesOnly yes
CONFIG
ssh gitservadmin

First, we generate an ssh key pair and save it to ~/.ssh/ryjo. This means we now have both a ~/.ssh/ryjo and a ~/.ssh/ryjo.pub file. We then use cat to pipe the contents of the pub file to ~/.ssh/authorized_keys. This is a special file that our openssh-server process uses to authenticate users attempting to log in with their key pair. We need to chmod this file so that only the user who owns the file can read or write it. This is required by the openssh-server process.

The next line is a convenience; instead of writing ssh 127.0.0.1 every time, it might be easier to remember the string gitservadmin so that we can log in as ssh gitservadmin. We'll see this cat >> ~/.ssh/config << CONFIG syntax again later, but basically it means "output the text surrounded by the two CONFIGs into the ~/.ssh/config file."

That last ssh gitservadmin should give you a login prompt. I assume your current user has sudo privileges on your machine. Once that's done, feel free to continue on to the non-implementation-specific section of this article.

A Separate, Dedicated Machine

If you've got a dedicated machine sitting in the corner of the room somewhere that's on your local network, you can follow the steps above replacing 127.0.0.1 with that machine's IP address. You can find that machine's IP address by doing the following:

ip -4 address

This will display all of the network interfaces connected to that machine. Find the line that starts with inet. This is the ip address on that interface. The one that's connected to your local network will probably look something like 192.168.xxx.xxx since this is how a lot of home routers set up their networks.

Once that's done, feel free to continue on to the non-implementation-specific section of this article.

A Locally Running Virtual Machine

If you don't want to install an ssh server on your local machine and you don't have a separate dedicated machine, you can use software like VirtualBox to run a virtual server on your local machine.

If you do decide to do this, the important step will be to get the hosted virtual server on the same network as your local machine. To do this, go to the settings for your installed virtual machine, then go to the "Network" section on the left hand side. Set the network adapter "Attached to:" to "Bridged Adapter." From here, you should be able to start your server as you normally would and follow the steps above. Once that's done, feel free to continue on to the non-implementation-specific section of this article.

AWS-specific Setup

All of the above steps so far have been free of cost. Using AWS costs money unless it's your first time registering an AWS account. If this is the case, there is a free tier you can use. I'll create a virtual machine (called ec2 instances) that is a t2.micro instance. The Free Tier, as of the date of this article, "includes 750 hours of Linux and Windows t2.micro instances each month for one year."

You'll also need to install and configure the AWS CLI. This will let us manage our AWS resources via the command line. Sweet!

We'll follow the steps described in Amazon's official documentation. First, we'll create a key pair. This will be used to ssh into the instance as the user "ubuntu":

aws ec2 create-key-pair \
  --key-name ssh-ubuntu-user \
  --query 'KeyMaterial' \
  --output text \
  > ~/.aws/ssh-ubuntu-user.pem
chmod 400 ~/.aws/ssh-ubuntu-user.pem

Note that this process does not allow you to create a password to help protect this key file. Give this a read to learn how you can generate a key locally (with a password if you wish) and push it up to AWS instead.

Now we'll create a security group that will let us open port 22 on the server. This will be how we push to and pull from the server. When we run this command, we'll get back a JSON response. We can use the program jq to parse the response and only save the security group's ID:

aws ec2 create-security-group \
  --group-name ssh-server \
  --description "SSH Server" |
  jq ".GroupId" -r

I'll use sg-0000000000 to denote the output of this command.

Now that we've created a security group, we must add an "ingress rule." This basically means machines with this security group will allow access on a given ip address and port pair. In our case, we want users in our organization to ssh into this machine in order to push/pull from the repository, so we'll need to open port 22 as a tcp port:

aws ec2 authorize-security-group-ingress \
  --group-id sg-000000000 \
  --cidr 127.0.0.1/24 \
  --port 22 \
  --protocol tcp

As an example, I used 127.0.0.1. In practice you would use your IP address. You can find this in many ways, one of which is simply asking another site what they see our IP address as. Use your favorite search engine to search for "What is my IP address," and you should be rewarded.

One final thing we need to find out is the image ID of the AMI that we want to use. Using the command described in Amazon's AWS Documentation, let's look for the latest release of Ubuntu:

aws ec2 describe-images \
  --owners 099720109477 \
  --filters 'Name=name,
    Values=ubuntu/images/hvm-ssd/ubuntu-cosmic-18.10-amd64-server*' |
  jq '.Images | sort_by(.CreationDate) | last(.[]) | .ImageId' -r

In my case, the output of this was ami-05b0fe5b9e7b8b5d7.

Alright, we've got our image id, security group id as well as our key pair. The next step that we'll do is "run" an instance:

aws ec2 run-instances \
  --image-id ami-05b0fe5b9e7b8b5d7 \
  --count 1 \
  --instance-type t2.micro \
  --key-name ssh-ubuntu-user \
  --security-group-ids sg-000000000 |
  jq '.Instances | first(.[]).InstanceId' -r

This command will output an instance id (which we'll denote as i-00000000000000000). With this, we'll be able to find the public ip address of our machine once it's running:

aws ec2 describe-instances --filter 'Name=instance-id,Values=i-00000000000000000,Name=instance-state-name,Values=running' |
  jq '.Reservations[].Instances | first(.[]).PublicIpAddress' -r

I'll use 0.0.0.0 to denote the output of this command.

It may take a few minutes for your ec2 instance to come up, so have patience!

Now that we know the IP address we'll use to access our git server, we have everything we need to access it:

ssh -i ~/.aws/ssh-ubuntu-user.pem ubuntu@0.0.0.0 "whoami"

I'll use 0.0.0.0 to stand in place of my server's IP address. Additionally, we use the -i (--identity-file) to specify which key pair we'll use to ssh into the server. The "whoami" at the end of that line will send a single command to be run on that machine, return the results to your terminal and then kill the connection. This is super helpful for scripts; no need to write complicated login/logout functionality.

For example, you may wish to get the latest version of all packages before you do anything else on that ec2 instance:

ssh -i ~/.aws/ssh-ubuntu-user.pem ubuntu@0.0.0.0 "sudo apt update; sudo apt upgrade"

The last thing we'll do is configure our local system so that we don't have to re-type the username, key and IP address every time we want to send a command to the git server. On your local machine, make your ~/.ssh/config file look like:

Host gitservadmin
    Hostname 0.0.0.0
    IdentityFile ~/.aws/ssh-ubuntu-user.pem
    IdentitiesOnly yes
    User ubuntu

Now we should be able to send commands to the server like so:

ssh gitservadmin "whoami"

Uh... So... You Said Something About git?

Check to see if git is already available:

ssh gitservadmin "git --version"

If not, you'll need to install it:

ssh gitservadmin "sudo apt install git-core"

I've read a few articles that recommend creating a single git user on the server hosting the repos. We would then simply add each user's public key to the /home/git/.ssh/authorized_keys file. Then, all users would pull/push code as the git user.

This is fine, but it prevents us from making use of the underlying user/group-based permissions in unix-like operating systems. If we instead create a new user account using useradd for every single user, we could restrict repository permissions on a per-user or per-group basis. Nice.

By default, we'll want all git users of our system to be able to access all repositories. Over time, we'll discover how we need to limit certain groups of users to only a select group of repos. With this in mind, let's create a group for our git users:

ssh gitservadmin "sudo groupadd git"

Well... that was easy. Now let's create our first user:

ssh gitservadmin \
  "sudo useradd -m -s /usr/bin/git-shell -G git ryjo"

By default, useradd creates the user with their password disabled. This is great news for us; this user will only ever login via ssh, so we don't need to create a temporary/throw-away password. -m (--create-home) is specified in order to create a directory in /home for the new user. Additionally, we use the -G (--groups) flag to specify that this user should also be in the git group.

We also use the -s (--shell) option to specify the user's default shell as something other than bash. git provides git-shell, a shell that only allows git commands to be executed by the user. It also disables the interactive shell by default; we won't be able to login to this server and get an interactive shell as this user until we add some custom commands. More on this later.

We'll need to enable this shell system-wide by adding it to the end of the /etc/shells file like so:

ssh gitservadmin \
  "echo /usr/bin/git-shell | sudo tee -a /etc/shells"

tee is a pretty nifty command; it let's us use echo with non-sudo privileges, only using heightened privileges to append the text we echo into a file. If we left off the -a (--append) option, we'd just wholesale overwrite /etc/shells.

Finally, we'll need to add the user's public key to the /home/ryjo/.ssh/authorized_keys file on the git server so that we can login using ssh. First, we'll create /home/ryjo/.ssh on the git server with the proper permissions. We could use mkdir, chown and chmod, but install let's us do all of these in a single command:

ssh gitservadmin \
  "sudo install -d -m 0700 -o ryjo -g ryjo /home/ryjo/.ssh"

Next, we'll create an SSH key pair locally:

ssh-keygen -t rsa -b 4096 -f ~/.ssh/ryjo_rsa

I decided to use -f to specify the filename for the key pair. I followed the recommended -t and -b flags as specified in GitHub's tutorial. Now, we'll put the public key in the /home/ryjo/.ssh/authorized_keys file on the git server:

ssh gitservadmin \
  "echo $(cat ~/.ssh/ryjo_rsa.pub) |
    sudo -u ryjo tee /home/ryjo/.ssh/authorized_keys"

That command might look a little funky. The stuff surrounded in $() is executed on our local machine. This way, we get the contents of the public key on the local machine with cat, then echo it on the server. We also see our old friend tee being used to put the contents of our ssh public key into the user on the server's authorized_keys file.

Finally, we'll change the permissions for /home/ryjo/.ssh/authorized_keys on the git server:

ssh gitservadmin \
  "sudo -u ryjo chmod 600 /home/ryjo/.ssh/authorized_keys"

We use -u (--user) to specify which user to run chmod as.

If we try to do ssh -i ~/.ssh/ryjo_rsa ryjo@gitservadmin, we'll see a ton of text that ends with: fatal: Interactive git shell is not enabled. Since we made this user's shell /usr/bin/git-shell, this is exactly what we expect. So far so good.

It'll be nice if we no longer have to reference our git server via gitservadmin or specify our ryjo user's key. We can add a second entry to our local ~/.ssh/config file:

Host gitserv
    Hostname 0.0.0.0
    IdentityFile ~/.ssh/ryjo_rsa
    IdentitiesOnly yes
    User ryjo

Now in addition to ssh gitservadmin we can do ssh gitserv. Bonus: if your local user's name is ryjo as well, you can remove the User ryjo line. Sweet.

Can It Be git Time Now, Please?

We're almost at a point where we can create an empty git repository and push/pull code to/from it. First, we need to create a directory where we will store all of the repositories. The question: "where, though?"

The Pro Git book recommends /srv. /srv, according to the Linux Filesystem Hierarchy, is meant to hold "data served by the system." I think this is intentionally vague, but sounds good enough to me! Let's create a directory that will hold all of our repos:

ssh gitservadmin \
  "sudo install -d -o ubuntu -g git -m 0770 /srv/git"

We set this to 0770 because we want users in the git group to be able to create repositories eventually. For now, we'll rely upon the ubuntu user to create a repo that we can push code to as our new user:

ssh gitservadmin \
  "sudo install -d -o ryjo -g git -m 0770 /srv/git/foo.git;
   git init --bare --shared /srv/git/foo.git;
   sudo chgrp -R git /srv/git/foo.git"

Using --bare initializes a new git repository without any checked out source code files. Basically, it only contains the things that would normally be in then .git directory. Using --shared specifies that the repo will be shared amongst several users. Since we want all of our users in the git group to push/pull to this directory, this sounds like what we want. Finally, we need to change the group with chgrp for all files and directories within this new directory to git.

We should now be able to push a local repository to this remote host. On our local machine, we'll do:

mkdir ~/foo
cd ~/foo
git init
echo "# Foo" > README.md
git add .
git commit -m "Initial commit"
git remote add origin gitserv:/srv/git/foo.git
git push -u origin --all

Woosh! Just like that, our new git repository is up and running!

Power to the Developers

Asking the user with access to the ubuntu user to create a repo for us every time would get really annoying. Let's make something that'll let our developers create repositories on their own. Using git-shell, we can create a directory in a user's home directory that can host other commands outside of the restricted capabilities of git-shell. We'll add a command addrepo that our user ryjo and anyone else in the git group can use to create a repository.

This can easily be done by creating a directory git-shell-commands in a user's home directory. Every user can have a list of their own unique commands if that makes sense for your organization.

Right now, I can't think of a great reason to do this. It's more of an inconvenience to me that we can't just specify a single group of commands for every user of this system.

Well, technically, we can. Using /etc/skel, we can create a directory holding all of our commands that gets copied to every new user's home directory. This is great if we never update our existing commands or add new ones. Likely, though, these things will eventually happen. For this reason, we can create a symlink to a directory in our system that will hold all of our commands. This way, when one command updates or we add a new one, every user will have these new capabilities with no manual work on our part. Woo!

In my last article, we discussed storing files like these commands in /usr/lib. This is where we installed our rails application library files when we were installing our app as a package. In this case, we're creating these commands locally; we're not installing these commands via a pre-packaged deb file, so we'll store them in /usr/local/lib:

ssh gitservadmin \
  "sudo install -d -m 0750 -o ubuntu -g git /usr/local/lib/git"

Perhaps this isn't the best location. I considered /usr/local/bin, but our users won't be able to run these commands if we store them here since they're using git-shell. Besides, we want to group all of our commands together so we can symlink to them. For now, they remain in /usr/local/lib/git.

The permissions are 0750 so that the users in the git group can read and execute them. This is required by git-shell.

Let's add a symlink to this directory in /etc/skel:

ssh gitservadmin \
  "sudo ln -s /usr/local/lib/git /etc/skel/git-shell-commands"

Now, every new user we add will have this symlink in their home directory. We need to manually add this for our existing ryjo user. We'll use sudo -u ryjo to create the symlink as the user ryjo in order to get the correct permissions on the file:

ssh gitservadmin \
  "sudo -u ryjo ln -s /usr/local/lib/git /home/ryjo/git-shell-commands"

Finally, we'll make the command itself. This involves a little bit of funky bash syntax:

ssh gitservadmin \
  'cat > /usr/local/lib/git/addrepo << "BASH"
#!/bin/bash
reponame=$(echo "$1" | tr -cd a-zA-Z0-9\-\_)
install -d -m 0770 -g git "/srv/git/$reponame.git";
git init --bare --shared "/srv/git/$reponame.git";
chgrp -R git "/srv/git/$reponame.git"
BASH'

cat > /usr/local/lib/git/addrepo << "BASH" basically says "Redirect the output of running cat on the heredoc delimited by the word BASH into /usr/local/lib/git/addrepo." A heredoc is basically a big block of text. We choose to delimit the beginning and end of our heredoc with the word BASH, but it could be any text that we want.

We also wrap the first BASH in double quotes. This allows us to write those $1s without the script attempting to insert the value for that variable from our local machine. We could also skip the double quotes and instead do \$1, but I like that this way displays exactly what will be put into the file.

One more mention for the tr command. That thing lets us say "delete all characters that aren't a to z, A to Z, 0 to 9, - or _." This may not be necessary, but it makes me feel better to limit these names to characters that I'd normally use for directory names. Besides, I like the idea of keeping our repo names looking (subjectively) tidy. tr is a nifty little tool. I highly recommend giving man tr a read.

We'll now set the proper permissions on this new file:

ssh gitservadmin \
  "sudo chgrp git /usr/local/lib/git/addrepo;
   sudo chmod 0750 /usr/local/lib/git/addrepo"

Remember when we couldn't login to the git server as our user ryjo? Well now that we've added the git-shell-commands directory to our user's home directory, that's no longer true. We'll get an interactive shell after running ssh gitserv that will look like git>, and the only command available to us is addrepo.

Let's take the repository we worked with from my last article and put it on the new gitserv we have now. First, we'll make the empty git repo on gitserv just like we did for the foo repository, but this time, we'll use addrepo:

ssh gitserv
# Very long MOTD shows,
# followed by a git> prompt
addrepo rails_new

We then logout by either typing exit or pressing the ctrl and d keys. We could also run single commands like we've been doing with the ubuntu user:

ssh gitserv "addrepo rails_new"

Now we have a new git directory initialized at /srv/git/rails_new.git:

ssh gitservadmin "sudo ls -la /srv/git/rails_new.git"

We needed to use sudo here; this directory is owned by ryjo and has a group of git. Since our ubuntu user falls under "others" and our rails_new.git permissions are 770, we have no rights to run ls on this directory as this user.

Finally, we'll clone the repo from GitHub, add our git server as a new remote and push:

git clone git@github.com:mrryanjohnston/rails_new.git
git remote add gitserv gitserv:/srv/git/rails_new.git
git push -u gitserv --all

Now we can push to our git server by doing git push gitserv, and we can still git push origin to push code updates to GitHub. Side note: this gives us a little more info about how GitHub saves repositories; there's a very good chance that repos are stored in the git user's home directory. For example, the above repository could be at /home/git/mrryanjohnston/rails_new.git. Of course this could be way wrong, but it's fun to think about.

Stay Tuned!

We covered a lot of ground in this article, and we're only scratching the surface. I think it's safe to say that it'll be a little cumbersome to manually do all of this work from scratch, so I put together some scripts based on this article and published a repository on GitHub (gasp! GitHub?!) as well as a release with pre-built deb binaries to make it much easier to create an ec2 instance and install the scripts we discussed.

I'm already planning on publishing parts 2 and 3 of this article as I have a few more ideas surrounding this topic. For now, this should be plenty to help get your own ideas forming for what you'd want to see in your own git server.

- ryjo