What should you use to build a large scalable website?

If you were required to build a high-performance website with a great deal of traffic, what technology would you use? What is the most effective way to build a website?

One’s initial thoughts are about the scripting language to use. When one looks at the highest-volume websites in the Alexa top 20, the technologies still seem to be predominately LAMP-based, and involving Python, PHP, Coldfusion and Perl. Only Microsoft sites such as MSN, www.microsoft.com and www.bing.com seem to show the flag for ASP.NET, and the Google sites go their own way with their mix of Python and C. Outside the top twenty, there are some high-volume sites using Java (Bebo, Ebay, the BBC). Ruby was, until recently, a favorite to succeed, but the enthusiasm seems to be receding. Mono is nowhere to be found amongst the giants yet, though there are large stable Mono sites out there (e.g. www.fiducial.fr).

It seems that the hosting platform you use, the configuration of load-balancers to the web farm, the performance of the message queue, the robustness of the database and the type of virtual environment, are more important for the responsiveness of a high-volume Web app than anything else. The type of web server matters a bit, but all the leading scripting environments have proved themselves in high-volume use. For the resilience of the site, diagnostics and profiling, along with good alerting in the event of problems, seem to be more important.

There is another problem that affects certain platforms: Both Ruby and Mono have displayed the problem when websites using these technologies are scaled up. In both cases the root cause has been the Garbage collector. The Ruby interpreter (Not JRuby or IronRuby) has its’ own garbage collector, which is uses a primitive mark-sweep garbage collection algorithm which slowly “leaks” memory over time as the heap fragments. Mono still uses the rather conservative Boehm Garbage collector, which can lead to some memory fragmentation, and has to scan the entire allocated memory pool. This means that neither are particularly suitable for large-scale always-on applications for the time being.

The message one can take from this is that, if you are focusing on the debate of the relative merits of LAMP or .NET for a website, or the specific scripting language, then you may be looking in the wrong direction. Resilience and performance can be designed-in, using any platform as long as it supports good diagnostics, effective garbage collection, quality hosting and good message-queuing.

Cheers,

Laila