A database replica server just failed. No big deal. A replacement is launched automatically, gets setup, and the Elastic IP address is attached. Everything’s good to go. Except nothing’s connecting to it. The application servers are still complaining that the database is unreachable. That is, until the application is restarted; then everything starts working again. What’s going on?
It could be because of InetSocketAddress.
It’s a common pattern for Java applications to hold connection information in InetSocketAddress objects. They have convenient parsing methods, handle a variety of host string formats, and represent the canonical connection information used throughout the java.net packages. But, there’s one detail of their implementation that makes them unsuitable for holding that information over long periods of time: they perform DNS resolution at construction, and then never again.
This implies that if the DNS record of a host changes while the application is running it won’t be picked up, and the application will be stuck with incorrect IP addresses, even if it tries to reconnect.
In the Cloud
This is a subtle problem when using Elastic IPs in AWS. Elastic IPs are stable public IP addresses that you can move between instances. They’re really useful for things like databases and discovery systems that have to be in known locations.
AWS uses two types of IP addresses – public and private. Public addresses are visible to the outside world, whereas private addresses are only visible within an AWS datacenter. For traffic within a region, you want to use the Private IP address: it’s faster, lower latency, and cheaper (no charges for incoming or outgoing data). However, Elastic IPs are for the public address. Conversion from an Elastic IP hostname to the Private IP address is done through DNS.
So if the connection information for the database is stored in an InetSocketAddress object, the application won’t see Elastic IP reassignments. Bummer.
Hold the original connection information in memory, and only convert to an InetSocketAddress right before creating a socket. If you need to reconnect (say, because the server at the other end failed), use the original information to create a new InetSocketAddress. Doing so will cause Java to go through DNS resolution again, and pick up any possible changes. (The JVM will cache DNS resolutions locally for 60 seconds by default, so this won’t cause a flood of such requests).
As part of hardening our systems at TellApart, we’ve had to patch this problem in many places, both in our code and in open-source projects we use (for example, a memcached library). Hopefully, better awareness will mean that you won’t have to do so as well.
Kevin Ballard is a Software Engineer at TellApart.