The AOL XMPP scalability challenge

It has been widely reported on the Internet, that AOL is experimenting an XMPP gateway that will allow users to connect to AIM and ICQ with a compliant XMPP (eXtensible Messaging and Presence Protocol) client.

It is still early days, but this news proves that interest in XMPP is growing at a tremendous rate. Big players, like IBM, Google, Sun, Apple, Adobe and now possibly AOL, are embracing the protocol, and it seems that the battle between XMPP and SIMPLE (Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions) is drawing to an end. The market has already chosen XMPP because it is closer to Internet technology. In comparison, SIMPLE feels like an overly complicated Telco protocol. It covers only a fraction of the scope of XMPP, and XMPP supports a very large number of extensions.

The other interesting thing is that, if XMPP wins the battle over SIMPLE, it can also win over other proprietary protocols. Google chose it for Google Talk, and now AOL is considering it. Certainly, the Instant Messaging market is moving, and 2008 will probably be a very interesting year for XMPP.

For us, XMPP protocol designers and XMPP server developers, this year might present some of our biggest challenges yet. We have lots of customer deployments that prove that our XMPP server can scale to large numbers of users. Google Talk is also proving that XMPP can work on a large scale. However, the next challenge is XMPP federation on a large scale.

Federation in the XMPP world means sharing presence information, so that users can exchange messages with others users, no matter which server (called domain) they have an account with. In practice, different servers need to be able to put users in touch, connect together and route messages.

Google Talk is federated with quite a good number of other XMPP servers, as is Jabber.org (powered by our XMPP implementation, ejabberd, and supported by ProcessOne). Google Talk and Jabber.org probably have around the same number of connections (approximately 3000 server connections each is a good guess).

With the rapid spread of XMPP, two challenges need to be addressed:

Challenge 1: Massive number of federated servers

News of the AOL experience will probably boost the number of XMPP servers in production. Even though Jabber.org has 3000 connections to servers, this number is still very small. What will happen when this number grows?

Netcraft counted 70 million active domains in December 2007. This is a possible target for XMPP when all domains have web, mail and Instant Messaging.

I believe that the new version of ejabberd, ejabberd 2.0, is a step in the right direction. ejabberd 2.0 embeds a lot of improvements in the management of server-to-server connections. It can be set up in clusters and relies on special technical features to handle tens of thousands of connections on a single node. Connections between servers in ejabberd are now multiplexed to cope with large traffic between servers. We have also added protection mechanisms to detect when another server stalls and to disconnect it gracefully.

As a final note on this topic, my personal view is that, at some point in the future of XMPP, we will need to add meta routing nodes to avoid the possibility of up to 70 million servers connecting to 70 million other servers. I will share more thought on this point at a later time.

Challenge 2: Massive number of users on two federated servers

The possible future developments highlighted above will lead to another potential scale limit.

What will happen when two servers (for example AIM and Google) disconnect, and one of the servers needs to restart? If those two servers have a large overlap in their user base (for example if many users from one server are linked to many users in the other), those two servers would need to synchronise presence for millions of users at once.

Again, we believe that we are leading the way in ejabberd with experimental code. We are developing priority mechanisms to synchronise presence in an incremental way when servers need to get in touch for the first time. We are also testing protocol improvements that will limit the number of presence synchronisation packets that need to be exchanged between two servers. If this works well, we will propose this optimisation as an extension to the XMPP protocol.

Fault-tolerance and high-availability are very important features to consider when you select an XMPP server. This is not only because you cannot schedule downtime when your users are spread all over the world, but also because, each time you need to restart your server, you then need to undertake a possibly costly presence resynchronisation. Our Instant Messaging solution provides features to upgrade the code in a running system. It means less maintenance downtime and thus a smaller synchronisation burden over the whole XMPP network when you restart one server.

Please note that the rumoured AOL experiment is only a gateway that just targets XMPP client use. At this stage, we do not know if AOL will allow communication with other XMPP servers in the future. This question is only purely theoretical for the moment, but we need to make sure we are ready.

Conclusion

The big public server providers are considering switching to XMPP. This is a great opportunity for the XMPP protocol and brings new challenges. As a developer of XMPP servers, we are ready to face these challenges. Let’s hope 2008 will give us even more opportunity to prove it.

Other interesting reading

On the scalability topic, you can read a previous blog post: Web 2.0: Shifting from "Get Fast" to "Get Massive"

Leave a Comment