Five lessons I learnt after spending a year using cloud computing
8 min read
14 October 2015
Everyone is always trying to push the cloud on you – but what is the reality of using cloud technology in your business? Here are five lessons from the coalface.
There are a great variety of startups who are benefiting from the scalability of cloud computing and growing their business and customer offerings as a result.
Below, I offer five tips for startups either in the early stages or considering cloud technology for the first time, based on our experiences at Purplebricks.com since launching in April 2014.
My verdict? While it poses certain challenges at first, it is well worth the investment in the long run.
1) Message queuing – essential but fiddly
A message typically represents a task created by someone (the “producer”) that has to be processed by someone else (the “consumer”).
Each message has a body and some attributes – the main architectural benefit is loose coupling. A message queueing service aims to remove the traditional overhead associated with operating in-house messaging infrastructures. As well as reducing cost, queues in the cloud simplify access to messaging resources and therefore facilitate integration efforts within organisations and between them.
Queues leverage cloud computing resources such as storage, network, memory and processing capacity. By using virtually unlimited cloud resources, message queueing services provide an internet scale messaging platform.
At the start of a project it’s difficult to predict what the future needs of the project will be. By introducing a layer in between processes, message queues create an implicit, data-based interface that both processes implement. This allows you to extend and modify these processes independently, by simply ensuring they adhere to the same interface requirements.
2) Optimise in order to reduce fees
This is key if you want quality performance. Microsoft Azure, or any cloud service, is built to penalise you if you use their resources poorly. The challenge is to fix this before the invoice for an unenlightened design decision arrives.
Saving money and application performance go hand in hand. When we launched in April 2014 and our first TV adverts hit the screens later that year, we made a conscious effort to scale up our servers. This meant our highest monthly bill to date was our first month, despite the lowest usage traffic. Through experience we have found the key is to set alerts and auto-scale correctly to ensure the cloud actually powers up extra resources when needed.
3) Architecting the cloud is a learning curve
The cloud stores information in different places and the rules are different for each. Expect a learning curve and learn from your mistakes.
When unexpected things happen in the cloud, it is worth taking the time to dig deep into what went wrong and then determine a course of action to help mitigate the issue in the future. After significant outages in both Windows Azure and Amazon EC2/S3, vendors publish a root cause analysis. It is important to read these and become familiar with them.
We constantly ask ourselves whether this is something that can happen to our code. In the same way, if your systems experience any downtime as a result of cloud outages, share your root cause analysis with your customers, especially on how you plan to prevent it from happening again in the future.
It is important to acknowledge the cloud is a constantly moving platform and new features are released weekly. Keep up to date with these and create a learning and sharing culture within the development team.
Continue reading more lessons and tips on page two…
4) Logging: If it wasn’t logged, it never happened
We need to be careful about what we log and we should log everything that can help us figure out what went wrong.
When working with the cloud it’s normal to experience failure every so often. Never build an application without thinking about how you will recover from a fault and how long will it take.
Sometimes, when you start on a new project, you only have time to plan for the straightforward cases. This means that we can learn a lot from a brand new application with real users. In order to facilitate this learning process and support it properly you need logging in place.
5) Prepare for failure
“The cloud never fails” – that’s what the cloud provider wants us to believe! The reality is different (and has been proven several times): even well-managed clouds will fail.
The problem is not that they fail, but that most people are unprepared for such failures, because they believe the cloud is an indestructible silver bullet.
Cloud providers do not explicitly plan for the failover of your services, they just provide the platform and the tools, and it’s your job to plan and implement your own failover system.
Cloud services are known for their accessibility, but they are still bound to Murphy’s law: “Anything that can go wrong will go wrong”. Amazon’s AWS, Microsoft’s Azure and Google Mail, amongst others, have all failed in the past and most of them will fail again in the future.
An important step for us was to lessen the reliance on our 20+ third party integrations. Our working assumption is that any third party system will fail, and if handled badly, a third party slowdown could quickly escalate and become our slowdown and impact our systems.
To mitigate this, the vast majority of our services run through an out of band message bus. Messages sent to the bus are sent in a “fire and forget” fashion. Messages sent to the bus takes an overage of 2ms, regardless of the state of the 3rd party. This mechanism allows us to handle requests in a fashion that does not impact the user’s experience.
All of our emails are handled by a specialist email provider, which provides an API that we use to send emails. This service has proven to be highly reliable, however if they have performance issues or their service becomes unavailable, the user experience is not affected because failed messages are stored and placed in the queue to be resent later.
This mechanism allows us to handle a complete outage from a range of providers using the same principle without having to worry about our users being impacted. Once a provider resumes service, we simply pick up the previously failed messages.
David Kavanagh is technical director at hybrid estate agency Purplebricks.com.