~tore Tore Anderson's technology blog

IPv6-only data centre RFCs published

I’m very pleased to report that my SIIT-DC RFCs were published by the IETF last week. If you’re interested in learning how to operate an IPv6-only data centre while ensuring that IPv4-only Internet users will remain able to access the services hosted in it, you should really check them out.

Start out with Stateless IP/ICMP Translation for IPv6 Data Center Environments (RFC 7755). This document describes the core functionality of SIIT-DC and the reasons why it was conceived.

If you think that you can’t possibly make your data centre IPv6-only yet because you still need to support few legacy IPv4-only applications or devices, continue with RFC 7756. This document describes how the basic SIIT-DC architecture can be extended to support IPv4-only applications and devices, allowing them to live happily in an otherwise IPv6-only network.

The third and final document is Explicit Address Mappings for Stateless IP/ICMP Translation (RFC 7757). This extends the previously existing SIIT protocol, making it flexible enough to support SIIT-DC. This extension is not specific to SIIT-DC; other IPv6 transition technologies such as 464XLAT and IVI also make use of it. Unless you’re implementing an IPv4/IPv6 translation device, you can safely skip RFC 7757. That said, if you want a deeper understanding on how SIIT-DC works, I recommend you take the time to read RFC 7757 too.

So what is SIIT-DC, exactly?

SIIT-DC is a novel approach to the IPv6 transition that we’ve developed here at Redpill Linpro. It facilitates the use of IPv6-only data centre environments in the transition period where a significant portion of the Internet remains IPv4-only. One could quite accurately say that SIIT-DC delivers «IPv4-as-a-Service» for data centre operators.

In a nutshell, SIIT-DC works like this: when an IPv4 packet is sent to a service hosted in a data centre (such as a web site), that packet is intercepted by a device called an SIIT-DC Border Relay (BR) as soon as it reaches the data centre. The BR translates the IPv4 packet to IPv6, after which it is forwarded to the IPv6 web server just like any other IPv6 packet. The server’s reply gets routed back to a BR, where it is translated from IPv6 to IPv4, and forwarded through the IPv4 Internet back to the client. Neither the client nor the server need to know that translation between IPv4 and IPv6 is taking place; the IPv4 client thinks it’s talking to a regular IPv4 server, while the IPv6 server thinks it’s talking to a regular IPv6 client.

There are several reasons why an operator might find SIIT-DC an appealing approach. In no particular order:

  • It facilitates IPv6 deployment without accumulation of IPv4 technical debt. The operator can simply switch from IPv4 to IPv6, rather than committing to operate IPv6 in parallel with IPv4 for the unforseeable future (i.e., dual stack). This greatly reduces complexity and operational overhead.
  • It doesn’t require the native IPv6 infrastructure to be built in a certain way. Any IPv6 network is compatible with SIIT-DC. It does not touch native IPv6 traffic from IPv6-enabled users. This means that when the IPv4 protocol eventually falls into disuse, no migration project will be necessary - SIIT-DC can be safely removed without any impact to the IPv6 infrastructure.
  • It maximises the utilisation of the operator’s public IPv4 addresses. If all the operator has available is a /24, every single of those 256 addresses can be used to provide Internet-facing services and applications. No addresses go to waste due to them being assigned to routers or backend servers (which do not need to communicate with the public Internet). It is no longer necessary to waste addresses by rounding up IPv4 LAN prefix sizes to the nearest power of two. Never again will it be necessary to expand a server LAN prefix, as it will be IPv6-only and thus practically infinitely large.
  • Unlike IPv4 NAT, it is completely stateless. Therefore, it scales in the same way as a standard IP router: the only metrics that matter are packets-per-second and bits-per-second. Its stateless nature makes it trivial to deploy; the BRs can be located anywhere in the IPv6 network. It is possible to spread the load between multiple BRs using standard techniques such as anycast or ECMP. High availability and redundancy are easily accomplished with the use of standard IP routing protocols.
  • Unlike some kinds of IPv4 NAT, it doesn’t hide the source address of IPv4 users. Thus, the IPv6-only application servers remain able to perform tasks which depend on the client’s source address, such as geo-location or abuse logging.
  • It allows for IPv4-only applications or devices to be hosted in an otherwise IPv6-only data centre. This is accomplished through an optional component called a SIIT-DC Edge Relay. This is what is being decribed in RFC 7756.

The history of SIIT-DC

I think it was around the year 2008 that it dawned on me that Redpill Linpro’s IPv4 resources would not last forever. At some point in the future we would inevitably be prevented from expanding our infrastructure based on IPv4. It was clear that we needed to come up with a plan on how to deal with that situation well ahead of time. IPv6 obviously needed to be part of that plan, but exactly how wasn’t clear at all.

Conventional wisdom at the time told us that dual stack, i.e., running IPv4 in parallel with IPv6, was the solution. We did some pilot projects, but the results were discouraging. In particular, these problems quickly became apparent:

  1. It would not prevent us from running out of IPv4. After all, dual stack requires just as many IPv4 addresses as single-stack IPv4.
  2. IPv4 would continue to become an ever more entrenched part of our infrastructure. Every new IPv4-using service or application would inevitably make a future IPv4 sunsetting project even more difficult to pull off.
  3. Server and application operators simply didn’t like running two networking protocols in parallel. Dual stack greatly increased complexity: it became necessary to duplicate service configuration, firewall rules, monitoring targets, and so on, just in order to support both protocols equally well. This duplication in turn created lots of new possibilities of things going wrong, reducing reliability and uptime. And when something did go wrong, troubleshooting the issue required more time. Single stack was therefore seen as superior to dual stack.

It was clear that we needed a better approach based on single-stack IPv6, but we were unable to find an already existing one which solved all of our problems.

One of the things that we evaluated, though, was Stateless IP/ICMP Translation (RFC 6145). SIIT looked promising, but it had some significant shortcomings (which RFC 7757’s Problem Statement section elaborates on). In its then-current state, SIIT simply wasn’t flexible enough to be up to the task we had in mind for it. However, we did identify a way SIIT could be improved in order to facilitate our IPv6-only data centre use case. This improvement is what RFC 7757 ended up describing.

I believe the first time I presented the idea of SIIT-DC (under the working name «RFC 6145 ++») in public was at IIS.se’s World IPv6 Day seminar back in June 2011. In case you’re interested in a little bit of «history in the making», the slides (starting at page 34) and video (starting at 34:15) from that event are still available.

A few months later we had a working proof of concept (based on TAYGA) running. By January 2012 I had enough confidence in it to move our corporate home page www.redpill-linpro.com to it, where it has remained since. I didn’t ask for permission…but fortunately I didn’t have to ask for forgiveness either - to this day there have been zero complaints!

The solution turned out to work remarkably well, so in keeping with our open source philosophy we decided to document exactly how it worked so that the entire Internet community could benefit from it. To that end, my very first Internet-Draft, draft-anderson-siit-dc-00, was submitted to the IETF in November 2012. I must admit I greatly underestimated the amount of work that would be necessary from that point on…

The document was eventually adopted by the IPv6 Operations working group (v6ops) and split into three different documents, each covering relatively independent areas of functionality. Then began multiple cycles of peer review and feedback by the working group followed by updates and refinements. I’d especially like to thank Fred Baker, chair of the v6ops working group, for helping out a lot during the process. For a newcomer like me, the IETF procedures can certainly appear rather daunting, but thanks to Fred’s guidance it went very smoothly.

One particularly significant event happened in early 2015, when Alberto Leiva Popper from NIC México joined in the effort as a co-author of RFC 7757-to-be (which describes the specifics of the updated SIIT algorithm). Alberto is the lead developer of Jool, an open-source IPv4/IPv6 translator for the Linux kernel. Thanks to his efforts, RFC 7757-to-be (and, by extension, SIIT-DC) was quickly implemented in Jool, which really helped move things along. The IETF considers the availability of running code to be of utmost importance when considering a proposed new Internet standard, and Jool fit the bill perfectly.

For the record, we decommissioned our old TAYGA-based SIIT-DC BRs in favour of new ones based on Jool as soon as we could. This was a great success - our Jool BRs are currently handling IPv4 connectivity for hundreds of IPv6-only services and applications, and the number is rapidly growing. We’re very grateful to Alberto and NIC México for all the great work they’ve done with Jool - it’s an absolutely fantastic piece of software. I encourage anyone interested in IPv6 transition to download it and try it out.

In late 2015 the documents reached IETF consensus, after which they were sent to the RFC Editor. They did a great job with helping improve the language, fixing inconsistencies, pointing out unclear or ambiguous sentences, and so on. When that was done, the only remaining thing was to publish the documents - which, as I mentioned before, happened last week.

It feels great to have crossed the finish line with these documents, and writing them has certainly been an very interesting exercise. It is also nice to prove that it is possible for regular operators to provide meaningful contributions to the IETF - you don’t have to be an academic or work for one of the big network equipment vendors. That said, it has taken considerable effort, so I certainly look forward to being able to focus fully on my work as a network engineer again. I promise that’s going to result in more good IPv6 news in 2016…watch this space!

Norwegian IPv6 year in review

2016 is soon approaching. In this post I’ll take a look in the rear-view mirror to see how well we did in Norway with regards to IPv6 deployment in 2015. I focus on the status on the end-user side of things, that is, the extent of IPv6 deployment amongst Norwegian ISPs. This is due to the fact that my employer Redpill Linpro mainly provide managed services to content providers, so the traffic entering our dual-stacked data centres can only tell a story about how the ISPs are doing.

End of 2015 status: 7-8% IPv6 adoption

Our customer VG is kind enough to let me use their web site traffic to publish graphs detailing IPv6 deployment in Norway. VG is the largest Norwegian web site, appealing to a broad audience. Therefore the collected data gives a very good basis for making accurate statistics about the Norwegian population in general. The graph below visualises this data, showing how the Norwegian IPv6 adoption rate has developed throughout 2015:

This shows that at the time of writing, 7.3% of all the traffic that reached VG in the previous week was IPv6. While this is an increase compared to the beginning of the year, it is a disappointingly small one - only about a single percentage point.

The graph usually peaks above 8.5% every weekend and drops below 7% during weekdays. This tells us that that Norwegians are much more likely to have IPv6 at home than at work.

It is worth noting that other large content providers are also measuring IPv6 usage in Norway, and their measurements appear to confirm that my numbers are in the right ballpark: Akamai currently reports 7% adoption, while Google reports 7.94%.

On a global scale, Norway is actually quite average. The last few months in Google’s global IPv6 adoption graph are eerily similar to the VG data for the same period. Ranking the countries by their Google-reported IPv6 adoption percentage shows we’re #15 in the world:

  1. Belgium - 39.43%
  2. Switzerland - 26.22%
  3. United States - 22.95%
  4. Portugal - 21.57%
  5. Germany - 20.49%
  6. Greece - 19.05%
  7. Peru - 15.77%
  8. Luxembourg - 15.59%
  9. Czech Republic - 9.84%
  10. Ecuador - 9.64%
  11. Estonia - 9.60%
  12. St. Kitts & Nevis - 9.16%
  13. Malaysia - 8.88%
  14. Japan - 8.77%
  15. Norway - 7.94%

I’d say this is nothing to celebrate, except perhaps that we fare better than all of our Nordic neighbours (but probably not for long, as Finland is #16 and is climbing fast).

How did the Norwegian ISPs fare in 2015?

Norway is essentially an IPv6 duopoly. Over 80% of all IPv6 traffic originates from two ISPs: the incumbent telco Telenor and the cable ISP Get.

On the two next spots we find the NREN UNINETT and the fibre ISP Altibox. These two are responsible for a tiny (but measurable) share of IPv6 traffic each.

The four networks I’ve mentioned account for over 90% of all the IPv6 traffic. The remaining 10% is the long tail, consisting of way too many networks to mention individually here, as they are responsible for only a miniscule amount of IPv6 traffic each.

Below I examine more closely how Telenor and Get fared in 2015. Note that the Y axis of the following graphs shows a percentage of all traffic, i.e., including IPv4 traffic and IPv6 traffic from other ISPs. It does not say anything about how many of Telenor’s or Get’s subscribers are dual-stacked. Unfortunately, I don’t have such statistics at the moment. Akamai does, however.

Zooming in on Telenor

Telenor uses two distinct IPv6 prefixes, allowing me to make two graphs: One for their mobile subscribers, and another for all their wired broadband subscribers (i.e., cable, DSL, and fibre).

The above graph is for Telenor’s wired broadband customers, which is the largest group overall. It is disappointing to see that this group has not grown at all in 2015; rather, it looks like the percentage at the end of the year will be slightly lower than it was at the start of the year! Telenor is a long way from having rolled out IPv6 to their entire customer base, so I am truly hoping that they will pick up the pace again in 2016. (In case you’re wondering, the marked drop in the end of January was caused by a critical problem in their network.)

Telenor’s mobile subscribers are doing much better. Their IPv6 traffic has more than doubled in 2015. It is however quite worrying to see that the trend the last couple of months is clearly a negative one.

Telenor is by far Norway’s largest ISP. In absolute numbers, they are without question the largest source of IPv6 traffic too. However, according to Akamai, only 5.5% of Telenor’s subscribers are IPv6-capable, so there is clearly a huge potential for increased IPv6 deployment in Telenor in the future.

Zooming in on Get

Get is growing their IPv6 deployment. It’s not going very fast, but it is a steady positive trend. (The decline in the summer months is better explained by Get’s customers leaving home to go on holidays than anything Get did.)

According to Akamai, 24.1% of Get’s customers are IPv6-capable. This means Get is the ISP with the largest share of IPv6-capable customers in Norway - well done! At the same time, three out of four of their customers remain IPv4-only, so there is plenty of potential for further improvements in 2016.

Summary and hopes for 2016

If I’m being honest, I must say that 2015 turned out to be a rather disappointing year for IPv6 adoption in Norway. In the second half of 2014 I observed a rapid growth, but this trend did unfortunately not continue in 2015.

I’m hoping that Get and Telenor will intensify their IPv6 deployments in 2016. Especially Telenor has a lot of potential for growth - for example, their cable customers must currently manually opt-in to get IPv6, and all the Apple devices on their mobile network remain IPv4-only. If neither of those two things change in 2016 I’ll be very disappointed.

When it comes to the other major national Norwegian ISPs, I truly hope that 2016 will be the year when I’ll start seeing significant amounts of IPv6 traffic from them. I’m thinking in particular about the likes of Altibox, NetCom, and NextGenTel here.

I’d like to end on a more positive note, though. I’ll do that by commending Difi, a.k.a. The Agency for Public Management and eGovernment, for having made significant progress towards making IPv6 support a mandatory requirement in the Norwegian public sector. In 2016 this will likely become Norwegian “law”. The Norwegian public sector is huge and wealthy, so the moment the service providers start realising that lacking IPv6 support will disqualify them from bidding on lucrative government contracts, I think we’ll see quite a few laggards scrambling to catch up.

Happy New IPv6 Year!

IPv6 network boot with UEFI and iPXE

Here at Redpill Linpro we make extensive use of network booting to provision software onto our servers. Many of our servers don’t even have local storage - they boot from the network every time they start up. Others use network boot in order to install an operating system to local storage. The days when we were running around in our data centres with USB or optical install media are long gone, and we’re definitively not looking back.

Our network boot infrastructure is currently built around iPXE, a very flexible network boot firmware with powerful scripting functionality. Our virtual servers (using QEMU/KVM) simply execute iPXE directly. Our physical servers, on the other hand, use their standard built-in PXE ROMs in order to chainload an iPXE UNDI ROM over the network.

IPv6 PXE was first included in UEFI version 2.3 (Errata D), published five years ago. However, not all servers support IPv6 PXE yet, including the ageing ones in my lab. I’ll therefore focus on virtual servers for now, and will get back to IPv6 PXE on physical servers later.

Enabling IPv6 support in iPXE

At the time of writing, iPXE does not enable IPv6 support by default. This default spills over into Linux distributions like Fedora. I’m trying to get this changed, but for now it is necessary to manually rebuild iPXE with IPv6 support enabled.

This is done by downloading the iPXE sources and then enabling NET_PROTO_IPV6 in src/config/general.h. Replace #undef with #define so that the full line reads #define NET_PROTO_IPV6.

At this point, we’re ready to build iPXE. For the virtio-net driver used by our QEMU/KVM hypervisors, the correct command is make -C /path/to/ipxe/src bin/1af41000.rom. To build a UEFI image suitable for chainloading, run make -C /path/to/ipxe/src bin-x86_64-efi/ipxe.efi instead.

On RHEL7-based hypervisors, upgrading iPXE is just a matter of replacing the default 1af41000.rom file in /usr/share/ipxe with the one that was just built.

Network configuration

The network must be set up with both ICMPv6 Router Advertisements (RAs) and DHCPv6. RAs are necessary in order to provision the booting nodes with a default IPv6 router, while DHCPv6 is the only way to advertise IPv6 network boot options.

When it comes to the assignment of IPv6 addresses, you can use either SLAAC or DHCPv6 IA_NA. iPXE supports both approaches. Avoid using both at the same time, though, as doing so may trigger a bug which could lead to the boot process getting stuck halfway through.

You’ll probably want to provision the nodes with an IPv6 DNS server. This can be done both using DHCPv6 and ICMPv6 RAs. iPXE supports both approaches, so either will do just fine. That said, I recommend enabling both at the same time. It might very well be that some UEFI implementation only supports one of them.

ICMPv6 Router Advertisement configuration

protocol radv {
  # Use Google's public DNS server.
  rdnss {
    ns 2001:4860:4860::8888;
  };
  interface "vlan123" {
    managed no;       # Addresses (IA_NA) aren't found in DHCPv6
    other config yes; # "Other Configuration" is found in DHCPv6 
    prefix 2001:db8::/64 {
      onlink yes;     # The prefix is on-link
      autonomous yes; # The prefix may be used for SLAAC
    };
  };
}

The configuration above is for BIRD. It is all pretty standard stuff, but pay attention to the fact that the other config flag is enabled. This is required in order to make iPXE ask the DHCPv6 server for the Boot File URL Option.

DHCPv6 server configuration

option dhcp6.user-class code 15 = string;
option dhcp6.bootfile-url code 59 = string;
option dhcp6.client-arch-type code 61 = array of unsigned integer 16;

option dhcp6.name-servers 2001:4860:4860::8888;

if exists dhcp6.client-arch-type and
   option dhcp6.client-arch-type = 00:07 {
    option dhcp6.bootfile-url "tftp://[2001:db8::69]/ipxe.efi";
} else if exists dhcp6.user-class and
          substring(option dhcp6.user-class, 2, 4) = "iPXE" {
    option dhcp6.bootfile-url "http://boot.ipxe.org/demo/boot.php";
}

subnet6 2001:db8::/64 {}

The config above is for the ISC DHCPv6 server. The first paragraph declares the various necessary DHCPv6 options and their syntax. For some reason, ISC dhcpd does not appear to have any intrinsic knowledge of these, even though they’re standardised.

The second paragraph ensures the server can advertise an IPv6 DNS server to clients. In this example I’m using Google’s Public DNS; you’ll probably want to replace it with your own IPv6 DNS server.

The if/else statement ensures two things:

  1. If the client is an UEFI firmware performing IPv6 PXE, then we just chainload an UEFI-compatible iPXE image. (As I mentioned earlier, I haven’t been able to fully test this config due to lack of lab equipment supporting IPv6 PXE.)
  2. If the client is iPXE, then we give it an iPXE script to execute. In this example, I’m using the iPXE project’s demo service, which boots a very basic Linux system.

Finally, I declare the subnet prefix where the IPv6-only VMs live. Without this, the DHCPv6 server will not answer any requests coming from this network. Since I’m not using stateful address assignment (DHCPv6 IA_NA), I do not need to configure an IPv6 address pool.

Conclusion

Thanks to iPXE and UEFI, network boot can be made to work just as well over IPv6 as over IPv4. The only real remaining problem is that many server models still lack support for IPv6 PXE, but I am assuming this will become less of an issue over time as they upgrade their UEFI implementations to version 2.3 (Errata D) or newer.

In virtualised environments, nothing is missing. Apart from the somewhat annoying requirement to rebuild iPXE to enable IPv6 support, it Just Works. This is evident from by the boot log below, which shows a successful boot of a QEMU/KVM virtual machine residing on an IPv6-only network.

[root@kvmhost ~]# virsh create /etc/libvirt/qemu/v6only --console
Domene v6only opprettet fra /etc/libvirt/qemu/v6only
Connected to domain v6only
Escape character is ^]

Google, Inc.
Serial Graphics Adapter 06/09/14
SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (mockbuild@) Mon Jun  9 21:33:48 UTC 2014
4 0

SeaBIOS (version seabios-1.7.5-8.el7)
Machine UUID ebe11d4a-11d4-4ae8-b249-390cdf7c79ec

iPXE (http://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+7FF979E0+7FEF79E0 CA00

Booting from Hard Disk...
Boot failed: not a bootable disk

Booting from ROM...
iPXE (PCI 00:03.0) starting execution...ok
iPXE initialising devices...ok

iPXE 1.0.0+ (f92f) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP iSCSI TFTP AoE ELF MBOOT PXE bzImage Menu PXEXT

net0: 00:16:3e:c2:16:b7 using virtio-net on PCI00:03.0 (open)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
Configuring (net0 00:16:3e:c2:16:b7).................. ok
net0: fe80::216:3eff:fec2:16b7/64
net0: 2001:db8::216:3eff:fec2:16b7/64 gw fe80::21e:68ff:fed9:d156
Filename: http://boot.ipxe.org/demo/boot.php
http://boot.ipxe.org/demo/boot.php.......... ok
boot.php : 127 bytes [script]
/vmlinuz-3.16.0-rc4... ok
/initrd.img... ok
Probing EDD (edd=off to disable)... ok

iPXE Boot Demonstration
=======================

Linux (none) 3.16.0-rc4+ #1 SMP Wed Jul 9 15:44:09 BST 2014 x86_64 unknown

Congratulations!  You have successfully booted the iPXE demonstration
image from http://boot.ipxe.org/demo/boot.php

See http://ipxe.org for more ideas on how to use iPXE.

root:/#

Evaluating DHCPv6 relays

One of the few remaining IPv4-only services here at Redpill Linpro is our provisioning infrastructure, which is based on PXE network booting. I’ve long wanted to do something about that. Now that more and more servers are shipping with UEFI support, I am finally in a position to start looking at it.

I’m starting out by figuring out which DHCPv6 relay implementation we’ll be using. This post details my evaluation process, conclusion, and choice.

Network topology

Most of the servers we want to provision are usually located in a dedicated customer VLAN that is connected to a set of redundant routers running Linux. The routers speak VRRP on each of the VLANs in order to decide which router is the primary one serving the VLAN in question.

Furthermore, each of the routers have multiple redundant uplinks to the core network, and are using a dynamic IGP to ensure optimal routing and fault tolerance. The DHCPv6 server is reached through the core network using unicast routing.

The following figure illustrates the topology:

Our desired capabilities

Our current IPv4-only network boot infrastructure relies heavily on using Ethernet MAC addresses to distinguish between clients. Being able to continue to do so will make the introduction of IPv6 support quick and easy. We would therefore like for the implementation to support the DHCPv6 Client Link-Layer Address Option.

As discussed in the previous section, the network configuration on the routers is dynamic and could change without notice. An ideal DHCPv6 relay implementation would be able to notice such changes and automatically adapt to the new environment. In our environment, this would mean being able to cope with:

  • A new VLAN interface showing up, e.g., when we provision a new customer.
  • The IP address configuration on a VLAN interface changing, e.g., due to a VRRP fail-over event.
  • The route to the DHCPv6 server changing from one uplink interface to another, e.g., due to changed route metrics in our core network.

Finally, regarding the software itself, we’d like for it to be:

Available implementations and their capabilities

From what I was able to determine, there are four available open-source DHCPv6 relay implementations. These are, in alphabetical order:

The versions I tested are shown in parenthesis.

Of the tested implementations, only Dibbler supported this feature. It is enabled by adding the line option link-layer in the configuration file.

Desired capability 2: Detecting new interfaces on the fly

Disappointingly enough, none of the tested implementations were able to do this. dhcpv6 will by default listen on all available interfaces, but it does not detect new interfaces showing up after it has started.

The other three implementations all require that the listening interfaces be configured explicitly.

Desired capability 3: Detecting IPv6 addresss changing during runtime

Only WIDE-DHCPv6 was able to do this. It appears to check what the local address on the interface is every time it relays a packet, so it always sets the link address field in the relayed DHCPv6 packet correctly.

The other three implementations read in the global address (or lack thereof) for each interface when they start, and do not notice any changes. Thus, there is a risk that the link address field in their relayed packets is set incorrectly.

Desired capability 4: Coping with route to DHCPv6 server changing

Only dhcpv6 supports this without any weirdness. The address of the DHCPv6 server is specified with the -su command line option, and packets are relayed to it using a standard routing lookup.

ISC DHCP and WIDE-DHCPv6 behave in a rather bizarre way. They both require that the interface facing the DHCPv6 server is explicitly specified on the command line, but for some reason they completely ignore it and instead use a standard routing lookup to reach the server.

Dibbler also requires that the upstream interfaces are explicitly configured. If there is no route to the DHCPv6 server on one of these interfaces, it will log the following error for each DHCPv6 request:

Low-level layer error message: Unable to send data (dst addr: 2001:db8::d)
Failed to send data to server unicast address.

That said, it is possible to simply configure both eth0 and eth1 as upstream interfaces. This will ensure all requests are correctly relayed regardless of which interface has the active route to the DHCPv6 server. That said, I’m only awarding half a point to Dibbler here, both due to the clunkyness of the workaround and the constant stream of error messages it will result in.

Desired capability 5: Free and open-source software

Yes! Every tested implementation qualifies.

Desired capability 6: Actively maintained

Only Dibbler and ISC DHCP appear to be. According to its own homepage, dhcpv6 was discontinued in 2009. WIDE-DHCPv6 has not seen any release since 2008.

Desired capability 7: Available in Ubuntu’s software archive

Only dhcpv6 is missing, the rest are an apt-get install away.

Conclusion

Out of a maximum 7 points, the final scores are as follows:

  1. Dibbler: 4.5 points
  2. ISC DHCP: 4 points
  3. WIDE-DHCPv6: 4 points
  4. dhcpv6: 2 points

Disappointingly enough, none of them are able to run continously in a dynamic environment like ours. To work around this, we’ll probably have to devise a system that automatically generates new configuration and restarts the relay whenever a network configuration change is detected. Should a DHCPv6 request arrive exactly when the relay is being restarted, it will likely be retried within seconds, so this is extremely unlikely to cause any operational issues.

This workaround will handle the lack of desired capabilities 2 through 4. After disregarding these, only Dibbler gets full score (due to its support for the Client Link-Layer Address Option). Dibbler is thus the obvious choice.

SIIT-DC support in Varnish Cache through libvmod-rfc6052

Here at Redpill Linpro we’re big fans of the Varnish Cache. We tend to put Varnish in front of almost every web site that we operate for our customers, which goes a long way toward ensuring that they respond blazingly fast - even though the applications themselves might not always be designed with speed or scalability in mind.

We’re also big fans of IPv6, which we have deployed throughout our entire network infrastructure. We’ve also pioneered a technology called SIIT-DC, which has undergone peer review in the IETF and will likely be published as an RFC any day now. SIIT-DC allows us to operate our data centre applications using exclusively IPv6, while at the same time ensuring that they remain available from the IPv4 Internet without any performance or functionality loss.

A quick introduction to SIIT-DC

SIIT-DC works by embedding the 32-bit IPv4 source address of the client into an IPv6 address. The resulting IPv6 address is located within a 96-bit translation prefix. 96 + 32= 128, the number of bits of an IPv6 address. It is easiest to explain with an example:

Assume an IPv4-only client with the address 198.51.100.42 makes an HTTP request to a web site hosted in an IPv6-only data centre. The client’s initial IPv4 packet will be routed to the nearest SIIT-DC Border Relay, which will translate the packet to IPv6. If we assume that the translation prefix in use is 64:ff9b::/96, the resulting IPv6 packet will have a source address of 64:ff9b::c633:642a. (An alternative way of representing this address is 64:ff9b::198.51.100.42, by the way.)

The translated IPv6 packet then gets routed through the IPv6 data centre network until it reaches the web site’s Varnish Cache. Varnish responds to it as it would with any other native IPv6 packet. The response gets routed to the nearest SIIT-DC Border Relay, where it gets translated back to IPv4 and finally routed back to the IPv4-only client. There is full bi-directional connectivity between the IPv4-only client and the IPv6-only server, allowing the HTTP request to complete successfully.

That’s the gist of it, anyway. If you’d like to learn more about SIIT-DC, you should start out by watching the this presentation about it held at the RIPE69 conference in London last November.

What’s the problem, then?

From Varnish’s point of view, the translated IPv4 client looks the same as a native IPv6 one. SIIT-DC hides the fact that the client is in reality using IPv4. The implication is that the VCL variable client.ip will contain the IPv6 address 64:ff9b::c633:642a, instead of the IPv4 address 198.51.100.42.

If you don’t use the client.ip variable for anything, then there’s no problem at all. If, on the other, hand you do use client.ip for something, and that something expects to work on literal IPv4 addresses, then there’s a problem. For example, a IP geolocation library is unlikely to return anything useful when given an IPv6 address such as 64:ff9b::c633:642a to locate.

The solution: libvmod-rfc6052

Even though our example 64:ff9b::c633:642a looks nothing like an IPv4 address, it’s important to realise that the original IPv4 address is still there - it’s just hidden in last 32 bits of the IPv6 address, i.e., in the 0xc633642a hexadecimal number.

So all we need to do is to extract those 32 bits and transform them back to a regular IPv4 address. Doing just that is exactly the purpose of libvmod-rfc6052. It is a new Varnish Module that extends VCL with a set of functions that:

  • Checks if a Varnish sockaddr data structure (VSA) (e.g., client.ip) contains a so-called IPv4-embedded IPv6 address (cf. RFC6052 section 2.2).
  • Extracts the embedded IPv4 address from an IPv6 VSA, returning a new IPv4 VSA containing the embedded IPv4 address.
  • Performs an in-place substitution of an IPv6 VSA containing an IPv4-embedded IPv6 address with a new IPv4 VSA containing the embedded IPv4 address.

The following example VCL code shows how these functions can be used to insert an X-Forwarded-For HTTP header into the request. The use of libvmod-6052 ensures that the backend server will only ever see native IPv4 and IPv6 addreses.

import rfc6052;

sub vcl_init {
    # Set a custom translation prefix (/96 is implied).
    # Default: 64:ff9b::/96 (see RFC6052 section 2.1).
    rfc6052.prefix("2001:db8:46::");
}

sub vcl_recv {
    ###
    ### Alternative A: use rfc6052.extract().
    ### This leaves the "client.ip" variable intact.
    ###

    if(rfc6052.is_v4embedded(client.ip)) {
        # "client.ip" contains an RFC6052 IPv4-embedded IPv6
        # address. Set XFF to the embedded IPv4 address:
        set req.http.X-Forwarded-For = rfc6052.extract(client.ip);
    } else {
        # "client.ip" contained an IPv4 address, or a native
        # (non-RFC6052) IPv6 address. No RFC6052 extraction
        # necessary, we can just set XFF directly:
        set req.http.X-Forwarded-For = client.ip;
    }

    ##############################################################

    ###
    ### Alternative B: Use replace() to change the
    ### value of "client.ip" before setting XFF.
    ###

    rfc6052.replace(client.ip);

    # If "client.ip" originally contained an IPv4-embedded
    # IPv6 address, it will now contain just the IPv4 address.
    # Otherwise, replace() did no changes, and "client.ip"
    # still contains its original value. In any case, we can
    # now be certain that "client.ip" no longer contains an
    # IPv4-embedded IPv6 address.

    set req.http.X-Forwarded-For = client.ip;
}

We’re naturally providing libvmod-rfc6052 as FOSS software in the hope that it will be useful to the Varnish and IPv6 communities.

If you try it out, don’t hesitate to share your experiences with us. Should you stumble across any bugs or have any suggestions, head over to the libvmod-rfc6052 GitHub repo and submit an issue.