10 Best Practices to Design Your Next Cloud Computing System

Moshe Kaplan lists the top ten best practices for launching a system in the cloud

In the last few years I’ve been writing, lecturing, and consulting for leading cloud and Internet firms.  Today, I’m excited to write my first post for this new venue. 

I’m also excited because Cloud Computing is a great way to enable your IT systems to meet rapidly changing business requirements. The ability to launch server instances, support increasing traffic, create load balancer instances, and provision TB of storage in seconds is crucial in the Internet world, where spikes, availability, and scalability are a must. Therefore, I summed up 10 best practices that will help you launch your system, doing the most with your resources and avoiding pitfalls:

  1. Back to Basics: Cloud computing is a brave new world. Yet, when you design your cloud system, keep the same basic rules that you’d use in regular software development: code review, security, and good coding practices.
  2. Define Your Strategy: If you got into cloud computing, you probably think about large numbers: 1 billion users and billions of daily transactions. Let’s face it; even Facebook didn’t do it in a single day. Therefore, you must define your destination, and how the system will gradually evolve to meet your long term strategy. Make sure that you know how each part of the system will be scaled out, removed, or upgraded in the future to meet your goals.
  3. Start Fast: It’s an exciting new world and there’s no time to waste. Moreover, people love quick wins. Your board loves it, your marketing guys love it, and most importantly your customers love it. Therefore, start safe and fast. If your developers are great with Java rather than Ruby, start with that. If you have an existing software product, consider using it as a base for the new product. Getting into a new project with new tools that you never used before can be risky (unless you invest a lot in training and reserve the needed buffers).
  4. Control your Immediate and Long Term Cost: Every business in growth (even the best one) burns a lot of money while warming its growth engine. You may have a huge profit from every new customer (lucky you), but your cash flow might be negative till periodic payments are deposited. Therefore, you should go over your business plan and turn it into a technical requirements plan. This technical requirements plan can help you find your bottlenecks and help you cut major growth costs. Choosing the correct solution can help you better pass the growth phase.
  5. Define Limitations: Since the marketing at the cloud world tends to be viral, you may face a huge demand for your cloud services in the first few days. If you have a great offer, supporting the huge demand (“The Digg effect“) might be challenging. Therefore, you must define a line that will help you hedge the demand and keep providing good service to your existing clients. Remember that it’s better to avoid some customers than make them all run away with catastrophic SLA and user experience.
  6. Refactor on the Run: As a key player in a growing business, you don’t have the option to stay in the same place. Your initial 100 users system is different than a 100M users system, and as the system grows, smaller modules that were neglected in the first phase will become more important in terms of bottleneck, cost, or business sensitivity. You have two ways to handle it: my way refactor the system, step after step to meet the business goals, or the highway rewrite the whole system time after time from scratch.
  7. Define Your Exit Strategy: You should always remember that your cloud operator, an important vendor and your best partner in the early days, is still a vendor. Terms may change when you (hopefully) become a giant. Therefore, choose your cloud provider tools carefully. For example, I would think twice before choosing propriety data stores like Amazon’s SimpleDB. If you still decide using a propriety tool you should create your own interface and have an exit strategy when needed.
  8. Prepare 100% Up-time (including downtime & upgrades): downtime will come, since in large scale everything happens, even things you didn’t expect. Make sure you can always provide service, even during upgrades.
  9. Risk Management: It is not going to be an easy task and you should prepare yourself: choose your vendors carefully, know your exit strategy, prepare for spikes and downtime.
  10. Always Listen to Your Customers: Your business depends on your users; take care of their feedback.

Bottom Line

Now, that you better understand what you are facing of, it is time for several architecture decisions. The next post will be dedicated for that.