As part of testing and demonstrating our advanced deployment automation1 platform Deployit, we at XebiaLabs use a lot of cloud and Devops tooling to be able to handle all the different types of middleware we support and build, CI and Ops tooling with which we integrate2.
I was recently setting up a Vagrant3 environment to demonstrate Deployit’s Puppet module, which automatically registers new Puppet-provisioned middleware with your deployment automation platform to enable application-tier deployments to it, and ended up wrestling for quite some time with a tricky VirtualBox problem.
The issue in question has been around for over two years now, and relates to VirtualBox’s DHCP server sometimes, under as-yet-undetermined circumstances, failing to allocate an IP address to the NAT interface.
Since all Vagrant-managed images get a NAT interface by default4, this is more than a little inconvenient: Vagrant simply hangs during the VM configuration phase.
Since the problem doesn’t occur deterministically, one way to work around this issue is simply to avoid having to reboot the image: play the “NAT lottery” until you get lucky by killing the VBoxManage process if the image is hanging and trying vagrant up again, then run vagrant suspend rather than vagrant halt and you can resume the images when you need them.
That will work, but I wasn’t particularly happy with this approach because, aside from me not liking the idea of repeatedly killing hypervisor processes (I’m somewhat of a pacifist in this regard ;-)), it effectively “cripples” Vagrant: the ease with which you can start, stop and re-configure images is precisely one of the things that makes Vagrant so useful!
One of the things I quickly discovered is that, if you start a Vagrant-created image via the VirtualBox UI and it experiences the problem5, cycling the NAT adapter with ifdown eth0 && ifup eth0 fixes things: this second time, it is able to pick up an IP address from the DHCP server.
Unfortunately, this does’t get you far with an image that Vagrant itself is trying to start: Vagrant creates headless sessions, so you can’t actually access them through the VirtualBox UI until you’ve killed Vagrant and the VBoxHeadless process it starts.
Edited April 24th to add: see Richard Pot’s comment for instructions on how to start boxes in GUI mode.
Luckily, VirtualBox allows you to execute commands on the guest OS without having to use the UI, via VBoxManage’s guestcontrol command. So when Vagrant was again hanging while waiting to connect to the image, the first thing I tried was
/path/to/vboxmanage guestcontrol <em>my-vagrant-image</em> exec "/usr/bin/sudo" --username vagrant --password vagrant --verbose --wait-stdout ifdown eth0
/path/to/vboxmanage guestcontrol <em>my-vagrant-image</em> exec "/usr/bin/sudo" --username vagrant --password vagrant --verbose --wait-stdout ifup eth0
That did, as hoped, allow the NAT adapter to pick up an IP address. Unfortunately, it also confused Vagrant, which (presumably thinking that the image had gone offline) quit.
Happily, you don’t have to bring down the adapter to request a new IP address: dhclient will do just as well. And indeed
/path/to/vboxmanage guestcontrol <em>my-vagrant-image</em> exec "/usr/bin/sudo" --username vagrant --password vagrant --verbose --wait-stdout dhclient
works: the NAT adapter picks up an IP address and, after a few seconds, Vagrant continues with the image configuration.
Something to hopefully help out even if it indeed takes another couple of years to get to bottom of the actual issue 😉
- Or Application Release Automation (ARA), if you follow Gartner
- Check the platform support page for details
- For those not familiar with Vagrant, it’s a powerful tool (written in Ruby) that allows you to declaritively define multiple related virtual images based on templates called ‘boxes’. Vagrant orchestrates the interaction with VirtualBox to give you a very simple way of stopping, starting and configuring a cluster of images. In that sense, it’s a little lit a VirtualBox-based CloudFormation.
- That’s how Vagrant communicates with the image while configuring it
- You’ll know because ifconfig will show that the eth0 adapter does not have an IPv4 address