12 January, 2011

Linux, a StrongSWAN VPN to a Cisco Router and Amazon EC2 = Reduced costs!

So, this is my first blog post.  I decided to sign up and start a blog because while doing my best to configure a Cisco 2651XM router running IOS 12.4 to connect to a StrongSWAN IPSec VPN (Linux based) I found the information to be very patchy, some wrong, some confusing and some just a load of crap I could have done without reading.  On top of that, it was also difficult to find a good, working, example.

This blog post will provide you with WORKING Cisco IOS and Linux IPSec configurations, and lots of commentary about these configurations from me, which you could maybe do without. Some of my comments will be wrong, but if you know more about it than I do then please help out by commenting on this post and I shall correct this post where I am wrong.

Before I started configuring this VPN I looked around for configuration examples I could use. Sure, there were plenty of examples of VPNs with issues, where people had posted their configs for help, and in other cases the information seemed somewhat lacking.  For example, someone will tell you how to do something but they didn't explain why you had to do something a certain way, and in some cases it made no sense at all.

If you're only interested in how to get StrongSWAN to talk happily with the Cisco IOS IPSec implementation, just skip my ramblings and find the configs below.  But, this is a blog and so I'm going to ramble on a little.

Recently I've been re-vamping a small dial-up ISP network that is run by a local non-profit organisation (read more about them in my profile and/or google "Beyond Disability").  This is the second re-vamp I've done for them.  I've been reducing their physical hardware, reducing their running costs, and reducing the amount of time needed to perform routine maintenance on their servers.

In the first re-vamp, they were migrated from a bunch of analog POTS lines with 33.6k modems, to an ISDN PRI terminated with a Lucent PortMaster 3.  It was a decent upgrade at the time.  I entirely replaced their existing network infrastructure with new equipment.  Time has gone by, and thankfully Amazon EC2 has popped up now.

For those who don't know, and to do a dis-service to Amazon, EC2 is a very cheap and reasonably reliable service that allows you to start a virtual machine and use it.  Its pretty simple once you get used to it.  I've been responsible for administrating approximately 50 Amazon instances (virtual machines), and the virtual machines are just a small part of the overall offering from Amazon.

Anyway, since the last re-vamp of this network was a good 8+ years ago, lots has changed between then and now.  Basically, I realised that these guys could now use Amazon EC2 instead of running most of their current infrastructure themselves.  It would save in energy costs and cooling costs. Reliability would increase, and overall my workload would go down since there would be hardly any hardware to maintain anymore.

Of course, there were problems.  Firstly, I'd never (successfully) set up an IPSec VPN from a Linux box to a Cisco router.  I wasn't even sure how to really go about it.  I tried once a long time back and failed, so gave up.  The next problem was migrating their e-mail server into Amazon EC2.  A lot of Amazon EC2 seems to have a bad e-mail reputation, probably due to spammers and scammers, so its not as easy as just setting up a SMTP server and some DNS and being done with it.  I also didn't want to use a 3rd-party, paid-for, SMTP gateway.  The idea here is to keep everything as cheap as possible.

Anyway, so as not to bore you all too much, I won't go into the details about all of the bits and pieces that had to be picked up and moved into Amazon EC2 to achieve this, but what I will say, is that I've replaced 3 physical servers with 1 Amazon EC2 "t1.micro" instance.  This costs about $20 USD per month to run (and even cheaper if you reserve your instance!)

Let it be known that I have absolutely no formal or professional experience with any Cisco gear.  In fact, the only two Cisco routers I've ever set up have been for Beyond Disability (which is the place that runs the small dial-up ISP as part of their community programme).

Here is what I've done with a Cisco 2651XM (running IOS 12.4 Advanced Security) and an Ubuntu Linux-based Amazon EC2 instance running StrongSWAN (and a bunch of other stuff):

  • Routed a subnet of live IP addresses from the current Point of Presence (ISP POP) to the Amazon EC2 instance.
  • Made the internal 10.x.x.x address of the Amazon EC2 instance directly contactable to parts of the network at the POP that need access to it (eg. for RADIUS Authentication).
So, on the StrongSWAN side, those two things above are two separate VPN connections, both going back to the Cisco 2651XM (from here on I'll refer to that box as the "Cisco router", or maybe "the router", or something like that).

The first part I tackled was the most simple, allowing the Amazon instance to be accessed via its internal 10.x.x.x IP from within the existing network.  After all, I needed this Amazon instance to appear to be accessible just like it was any other host, but I needed the communication to be encrypted, as usernames and passwords travel over this connection.  I also needed to eliminate the need for any sort of Linux box or other server at the current location, since I was migrating everything from their physical servers into Amazon EC2 in order to reduce costs (primarily).

The next part I tackled was the e-mail problem.  I couldn't migrate the e-mail server to Amazon EC2 until I was able to get that e-mail server to send from a source IP address other than an Amazon IP.  First off, my plan was to push a single IP address over to the Amazon instance via the VPN.  That worked but not that well.  The problem I had was that outside traffic (eg. from the Internet) wouldn't ever get routed via the VPN and up to the EC2 instance, where as internal traffic would.  So, then I decided to try routing a subnet instead (a /30).  This worked very well and now I was in a situation where internet and internal traffic was being routed via the VPN up to the Amazon EC2 instance (and the reply packets were being routed back the same way as well, so there were no reverse-path filter issues or asymmetric routing issues).  That was a big win!

One thing I found difficult to grasp was how IPSec and traditional routing work together.  Basically, they don't seem to work together at all.  IPSec seems to be implemented in such a way that your routing table is more or less bypassed.  On the Cisco box there is no special IPSec device you can point static routes to, and on the Linux box its the same story.  This "magic" scares me, but it turns out that its a good thing, at least in my case.

My understanding of how IPSec is implemented (on both Cisco IOS and Linux) in terms of the routing side of things is that you define what is allowed to pass via the IPSec connection on the Cisco box (by way of an access-list) and on the Linux side by specifying rightsubnet=x.x.x.x/x and leftsubnet=x.x.x.x/x for each IPSec connection you define.  Then, when you establish the IPSec connection these connections are bought up and, then, as if some magic is involved, the packets that match these rules are automatically encrypted and encapsulated into IPSec packets and sent to the other end of the VPN connection.  I think the IPSec talk for this magic is "Security Associations".

Now, to be honest I still don't 100% understand how this all works, but I've managed to come up with a Cisco config that seems to work pretty well, seems to be reliable and seems to automatically re-establish itself when necessary.  Sure, I can break it easily enough, but in actual operation it seems to be solid.  Any improvements to my config will be greatfully received, though!

Below is the Cisco config.  I haven't censored this config any more than absolutely necessary.  This is mainly because if you are planning to use this config, or parts of it, and I censor it, I might inadvertently make a mistake and lead you up the wrong path, essentially making this configuration example useless, so here goes:

c2651xm#show running-config
Building configuration...

Current configuration : 12514 bytes
!
! No configuration change since last restart
! NVRAM config last updated at 23:11:27 ADST Tue Jan 11 2011 by rdavidson
!
version 12.4
!
crypto isakmp policy 1
 encr aes 256
 authentication pre-share
 group 5
 lifetime 3600
crypto isakmp key aVerySecretIPSecKeyGoesHere address 175.41.134.92
crypto isakmp invalid-spi-recovery
crypto isakmp keepalive 20 periodic
!
!
crypto ipsec transform-set ipsec-bdi-ts esp-aes 256 esp-sha-hmac 
!
crypto map ipsec-bdi 1 ipsec-isakmp 
 set peer 175.41.134.92
 set transform-set ipsec-bdi-ts 
 set pfs group5
 match address ipsec-bdi
!
interface Dialer0
 ... Lots of config junk ...
 crypto map ipsec-bdi
!
ip access-list extended ipsec-bdi
 permit ip host 202.164.207.194 host 10.128.45.168
 permit ip any 202.164.207.216 0.0.0.3
!
end



Before you dissect this configuration, here is what you should know:

  1. The IP address 175.41.134.92 is the IP address of the Linux box running StrongSWAN (in Amazon EC2).
  2. The IP address 202.164.207.194 is the live IP address of the dial-up access server (a Cisco 5350) that requires a secure connection to the Linux box in Amazon EC2.
  3. The subnet 202.164.207.216/30 (referred to in Cisco speak as "202.164.207.216 0.0.0.3" in the access-list above) is the subnet that has been routed up to the Amazon EC2 instance via IPSec.
  4. The "Dialer0" interface is a permanent ADSL connection to the upstream ISP.


So, I guess I'll go over this configuration a little.  I'll pick out some lines/sections I feel are worth mentioning below:

First section:

crypto isakmp policy 1
 encr aes 256
 authentication pre-share
 group 5
 lifetime 3600

This tells the router that we're using 256-bit AES encryption, a pre-shared key for authentication, and that something (probably the Security Associations, but I'm not sure) have a lifetime of 3600 seconds (1 hour).

The second section I'll pick on, is this:

crypto isakmp key aVerySecretIPSecKeyGoesHere address 175.41.134.92
crypto isakmp invalid-spi-recovery
crypto isakmp keepalive 20 periodic

Ok, so the first line defines the pre-shared key that we use for authentication, and which host this key is to be used for.

The second line sounded like a good idea.  Not really sure what it means or does.

The third line is all about keep-alives.  Every 20 seconds a keep-alive will be sent.  I was hoping this would help with dead peer detection, but I'm not convinced that it really does.

Now, this section:

crypto ipsec transform-set ipsec-bdi-ts esp-aes 256 esp-sha-hmac

This defines a "transform-set" called "ipsec-bdi-ts".  I believe this tells the router what encryption algorithm to use when transforming packets from their unencrypted state to their encrypted state, and vice versa.  From memory the last part (esp-sha-hmac) is the check-sum algorithm to use.  These settings must be matched in your StrongSWAN configuration as well, otherwise your VPN won't work.

Now, this section:

crypto map ipsec-bdi 1 ipsec-isakmp 
 set peer 175.41.134.92
 set transform-set ipsec-bdi-ts 
 set pfs group5
 match address ipsec-bdi

This seems to be the crux of the IPSec configuration.  The "crypto map" needs a name, in this case its name is "ipsec-bdi".  After that is defined, you can set the options below.

The "set peer" line tells the router where the other end of the IPSec connection is.  It uses this IP address to match the line further up where we defined which pre-shared key to use when connecting to this host.  So thats how the authentication seems to work.

The "transform-set" line tells the router the encryption parameters (or I believe, more specifically, how to "transform" packets from their un-encrypted state to their encrypted state - ie. which encryption algorithm to use and so on).  In this case, the "ipsec-bdi-ts" transform-set was defined just above this section in the router configuration.

The "set pfs group5" part sets the Perfect Forward Secrecy to "group5", which if you look it up here you will see that "Diffie Hellman" group5 translates to "modp1536".  This is important to know, as you need to match your StrongSWAN configuration with this, otherwise the VPN can not be established.

The "match address" line defines the access-list which defines what traffic should be passed via IPSec.  Bah!

Right, so the next part is this very access-list.  We need to tell the router what traffic we want encrypted, and, on the Linux side we also need to match this with 1 or more IPSec connection definitions in the /etc/ipsec.conf file.  If your rightsubnet and leftsubnet settings in the IPSec config file don't fall within the scope of this access-list then, again, the VPN will not establish.

So, I'll explain the access-list:

ip access-list extended ipsec-bdi
 permit ip host 202.164.207.194 host 10.128.45.168
 permit ip any 202.164.207.216 0.0.0.3

Ok, so the first "permit" rule tells the router that we want all traffic from 202.164.207.194 that is going to 10.128.45.168 to be encrypted with IPSec.  This is the line that allows me to access the Amazon EC2 instances' internal IP address directly from the dial-up access server (which is sitting on the IP 202.164.207.194).  We also need to tell the Linux box how to return these packets, which we do using the "leftsubnet" and "rightsubnet" parameters in the ipsec.conf file (you'll see that described below).

The second "permit" line is for the 202.164.207.216/30 subnet.  This line says that ANY packet with a destination address in this subnet should be encapsulated with IPSec and sent to the Linux box running in Amazon EC2. I will explain more about this particular setup on the Linux side of things later on in this post.

Ok... now for the other side... StrongSWAN, running on Ubuntu in Amazon EC2:

config setup
        plutostart=yes
        charonstart=no
        klipsdebug=all
        plutodebug=all
        interfaces=%defaultroute
        nat_traversal=yes
        virtual_private=%v4:202.164.207.192/27,%v4:10.0.0.0/8,%v4:172.16.0.0/12,%v4:192.168.0.0/16

conn %default
        type=tunnel
        auto=route
        keyexchange=ikev1
        ikelifetime=3600
        keylife=3600
        authby=psk
        auth=esp
        ike=aes256-sha1!
        esp=aes256-sha1!
        pfs=yes
        pfsgroup=modp1536
        dpdaction=restart
        dpddelay=20
        dpdtimeout=60

conn bdi-nas
        left=10.128.45.168
        leftnexthop=%defaultroute
        leftsubnet=10.128.45.168/32
        right=202.164.206.221
        rightsubnet=202.164.207.194/32

conn bdi-mail
        left=10.128.45.168
        leftnexthop=%defaultroute
        leftsubnet=202.164.207.216/30
        right=202.164.206.221
        rightsubnet=0.0.0.0/0


Ugh. More damn config to try and explain. Here goes:

These lines:

plutostart=yes
charonstart=no

The Cisco router needs IKEv1 (not IKEv2), so we need to enable "pluto" to do IKEv1. We don't need IKEv2 so we disable the "charon" daemon that normally runs as part of StrongSWAN.

These lines:

nat_traversal=yes
virtual_private=%v4:202.164.207.192/27,%v4:10.0.0.0/8,%v4:172.16.0.0/12,%v4:192.168.0.0/16

Well, "nat_traversal=yes" is needed because Amazon maps your Live IP (in this case, 175.41.134.92) to your Amazon instance which only has a 10.x.x.x IP (RFC1918 private IP). So, there is some NAT going on here. As such, this would normally break IPSec, so we need to enable NAT traversal.

The "virtual_private" setting I'm not sure if I really needed that. I'll try removing it shortly and see how that affects the VPN, if it does at all.

Ok, so the rest of the IPSec config you can look up if you care about it. Just keep in mind that you need to keep both ends (the router and the Linux box) in sync in terms of their configuration. If you don't, you'll most likely break your VPN.

At the bottom of this config we have two IPSec connection "conn" definitions. These connections inherit their configuration from the "%defaults" connection, which isn't really a connection... more of a template to be overridden or extended by later "conn" definitions.

So, lets begin with the "bdi-nas" connection. This connection is the IPSec connection that is responsible for encrypting the packets from the Cisco Access Server and the Linux box. I'll explain it below:

conn bdi-nas
        left=10.128.45.168
        leftnexthop=%defaultroute
        leftsubnet=10.128.45.168/32
        right=202.164.206.221
        rightsubnet=202.164.207.194/32

So, there are a few "left..." settings and a couple "right..." settings. This configuration exists on the Linux box, so imagine you're logged in there and you're configuring this VPN. Think of "left" as "local", and "right" as "remote". ie. Local is your Linux box, and Right is the Cisco router. If you wish, you could think of "left" as the "Linux" box, and "right" as the "Router". Your choice.

So, on the "left", ie. the Linux box, we have the 10.128.45.168 IP address assigned to eth0, as this is what Amazon does. Note that due to the NAT going on in Amazon, this configuration doesn't know anything about the live IP of the Amazon instance, its only aware of its 10.x.x.x IP, and this is part of the reason that IPSec Network Address Traversal is necessary.

Anyway, what we're basically defining here is that we have the "left" IP, ie, we have 10.128.45.168/32. We're also telling IPSec that the other end of the connection has 202.164.207.194/32, so packets from the local machine, destined to 202.164.207.194 will be encrypted with IPSec and then sent to the IPSec peer ("right") for decryption and forwarding.

In the second connection, we're working with two subnets:

conn bdi-mail
        left=10.128.45.168
        leftnexthop=%defaultroute
        leftsubnet=202.164.207.216/30
        right=202.164.206.221
        rightsubnet=0.0.0.0/0

Ok, so this is where things got slightly more tricky. The "leftsubnet" is 202.164.207.216/30, and I wanted a SMTP server to listen on 202.164.207.218, which falls within this subnet. I need this SMTP server to be able to send e-mail from this address, and I also need people to be able to connect to this SMTP server from various places, so it needs to be publicly accessible.

Note that the "rightsubnet" is 0.0.0.0/0, which basically means "anywhere". This puts in a security association and a rule which tells the Linux box that any packets with a source address falling within the 202.164.207.216/30 subnet need to be returned via the IPSec VPN. This eliminates any asymmetric routing issues and reverse-path filter problems that might crop up, as when a connection to the SMTP server is encapsulated by the Cisco router and sent to the Amazon EC2 instance via IPSec, the return packets will be send back via IPSec to the Cisco router, to then be routed back to where-ever they came from in the first place.

In order for this to work I needed an interface on the Linux box with the 202.164.207.218 IP address assigned to it, in order to bind the SMTP server to that IP. To achieve this, I set up an interface (dummy0) with this IP, as if it were on a normal network (ie. the equivalent of: ifconfig dummy0 202.164.207.218 netmask 255.255.255.252).

After this was done and the SMTP server was bound to this IP, I could connect to the SMTP server via my home Internet connection and also from within the network at the ISP POP. To further test this, I send a test e-mail and verified that the source IP address of the SMTP server it was relayed through was indeed 202.164.207.218, and it was. This means I now have an Amazon EC2 instance, with its normal live IP address as assigned by Amazon, but also with some additional live IP addresses available for me to run other services on, should it be necessary.

One last thing, before we wrap up. The Linux box needs to know the pre-shared key, and thus far I haven't covered that. Not that its hard to find out, but here is the /etc/ipsec.secrets file, which contains the pre-shared key:

# This file holds shared secrets or RSA private keys for inter-Pluto
# authentication.  See ipsec_pluto(8) manpage, and HTML documentation.

# RSA private key for this host, authenticating it to any other host
# which knows the public part.  Suitable public keys, for ipsec.conf, DNS,
# or configuration of other implementations, can be extracted conveniently
# with "ipsec showhostkey".

# this file is managed with debconf and will contain the automatically created private key
#include /var/lib/strongswan/ipsec.secrets.inc
10.128.45.168 202.164.206.221 : PSK "aVerySecretIPSecKeyGoesHere"

See that very last line. It contains the IP addresses of the "left" (local) and the "right" (remote) side. StrongSWAN uses thus IP combination to determine which pre-shared key (PSK) to use with the connection. On the other hand, the Cisco router simply uses the IP address of its peer, as you can see in its configuration example above.

So, the point of typing up all of this was to help you if you were in a similar situation to me, and with any luck someone will come along and help me with the bits I've got wrong or the bits I don't know much about.

People are migrating in-house IT infrastructure into Cloud Computing environments such as Amazon EC2 all the time, so it is foreseeable that other people will need or want to route an entire subnet to 1 or more Amazon EC2 instances.

That said, Amazon offers a Virtual Private Cloud service, but that costs money. I didn't want to pay Amazon for such a thing, but anyone considering doing something like this on a commercial basis should probably consider using that service instead of using StrongSWAN on a bunch of Linux boxes. That said, StrongSWAN works very well for my purposes on the "t1.micro" instance, which is Amazon's least powerful instance at this time of writing.

Because I said "Cisco" and "Linux" so much:

All trademarks and registered trademarks are the property of their respective owners.

;)


Update


So, this VPN was working really well... for about a week.  After that, for some reason the Cisco router couldn't re-establish the VPN when it failed (not its fault as far as I can tell).  I did a nasty work-around to get around this problem (this is not something I would normally even contemplate doing in a production environment).

The work-around was to restart ipsec on the EC2 instance every hour.  This isn't a big problem for this situation since the VPN will just re-establish itself every hour.  However, the only reason I did this and didn't bother fixing the problem properly, was because the systems that need this VPN are going to be shut down very soon, permanently, so there was no point (for me) to spend a bunch of time trying to figure out the best solution.  For the time being, this will do.

2 comments:

  1. Seems a shame that I'm posting the first comment here more than two years after you wrote this.

    But that said - thank you for putting this together. I'm at best a novice at networking beyond plumbing interfaces on linux servers. I'm working with a large mobile network provider to hook up remote mobile routers with our server infrastructure. I don't have anything working yet, but having a clearly spelled out doc like this - well, at least I'm not entirely stuck in as much of a haze of IKEAES3DESISAKMPAHESP as I was before. :)

    ReplyDelete
  2. Simply an excellent post..... Thanks for it

    ReplyDelete

When commenting, please make sure you tick the "Notify me" checkbox, otherwise you will not be notified when I reply to you!