High availability environments for E-commerce
31 July 2020 by Jeroen Bakker in E-commerce Optimalisatie
Time is money. If amazon.com suffers 1 minute of downtime, they would stand to lose $203,577. In today’s competitive world, consumers expect uninterrupted access to their favorite services. Having a high availability environment, especially for e-commerce organizations, is a must. Every organization must analyze what is needed to increase application uptime and minimize the total cost of downtime. What should you keep in mind to achieve a high availability environment?
How to implement a high availability environment?
Contrary to what you might expect, adding more components to the platform is not the solution to make it more stable and available. In fact, you only risk making it worse because multiple components increase the chances of failures. A modern environment distributes workloads over multiple instances (e.g. network, clusters), which helps optimize resource usage, best performance, shorter response times, and avoids overloading by using load balancing. This also includes switching to a standby server/network in the event of a failure, known as failover systems.
How to achieve a high availability environment?
- Remove single points of failure
- Reliable failovers
- Different geographical locations
Remove single points of failure
By adding redundancy, you can prevent the whole system from failing due to the failure of one component. An obvious solution is to implement your application across multiple servers. This allows you to distribute the load so that if one server experiences downtime or overload, another server can take over these functions.
In systems that are designed with redundancy, the failover mechanism itself can become a single point of failure. A reliable, automatic, failover relies on a good health check. A check performed very regularly by the load balancer to check if the backend is still healthy. The health check should check all aspects of an application, such as the database and storage.
If the health check fails, the backend should be marked as unhealthy and no longer receive traffic. It is also advisable to involve a DevOps engineer who checks whether redundancy is automatically restored or whether it needs to be done manually.
Different geographical locations
If your applications and databases are running on servers that are located in the same physical location, and something goes wrong with that location, this can still cause significant downtime. Make sure your servers are located in different locations. This can be done by using multiple data centers or availability zones.
Best practices for high availability environments
If you want to keep system failures under control and prevent both planned and unplanned downtime, the use of a High Availability (HA) architecture is strongly recommended, especially for mission-critical applications. DevOps professionals at Cyso emphasize that each component must be well designed and thoroughly tested to be always available. The design and implementation of an HA architecture can be a challenge, given the wide range of software, hardware, and implementation options. You will need to map the requirements of the environment. This can be from both a business and technical perspective. The chosen architecture must meet the desired levels of security, scalability, performance, and availability.
In addition to designing the architecture, organizations can take the following best practices into account for keeping their critical applications online:
1. Disaster Recovery
Which backup system is best suited for your organization mainly depends on the type of business and what requirements you have for the recovery time objective (RTO, the maximum restore time) and the recovery point objective (RPO, the amount of data loss during a restore).
Making a reliable and consistent backup – and being sure that you get all the necessary data back when you need to restore – sounds like a given, but it is often not the case. A well-thought-out backup method saves a lot of hassle in case something goes wrong.
There are different methods for securing data: file-based backups, snapshots and image-based backups. Regardless of which cloud you use, think carefully about what guarantees should be given on the RTO and RPO. Then determine how disaster recovery should be handled. We recommend performing a restore test regularly.
Then choose the backup method that best fits your business. If you are bound by specific requirements around RTO/RPO, then it is likely that these can be best realized within a private or multi-cloud environment.
Clustering allows for instant failover application services in case of a malfunction or outage. An application with clusters can call on resources from multiple servers and revert to a secondary server if the main server goes offline.
A high-availability cluster contains multiple nodes, meaning each node can be disconnected or shut off from the network while the rest of the cluster continues to work as long as at least two nodes are fully operational. This also means that each node can be individually upgraded and reconnected while the cluster is active.
3. Scalable databases
A database crash can lead to data loss, which can be costly. Redundancy can be achieved by using secondary servers that can take over if the primary server becomes unavailable due to a crash or maintenance. Another option to scale the database is through sharding, where a shard is a horizontal partition in a database, where rows of the same table are run on a separate server.
4. Create an (emergency) plan
Implementing best practices for high availability is essentially preparing for downtime, but that is not all that organizations can do. It is recommended to keep track of logs and other data for troubleshooting and identifying trends, which can only be done by continuously monitoring the operational workload.
An emergency recovery plan should not only be well documented, but also regularly tested to ensure its effectiveness in case of unplanned outages.
Engineers should improve their skills in designing, implementing, and maintaining high availability architectures. Additionally, there should be a security policy with procedures in case of downtime due to breaches or attacks on the system.
Does a high availability environment fit your organization?
As an example, let’s say we want to deploy our React application six times. Each version of the application requires approximately 128MB of RAM and we want to set rules for when to restart the application or container. These rules can be set in Kubernetes. If our application crashes, Kubernetes will automatically restart it in the state we specified.
1. Your application must be suitable for this
Think, for example, of sessions. The applications, spread across multiple servers, must all be able to handle the fact that there are sessions living between the servers. Techniques for session storage are needed for this, such as Memcached or Redis.
If the application has shared files that are needed on all servers, you need shared storage for this. Think of, for example, a user’s avatar, which must be present on all servers. An alternative solution could be an Object Store (Swift/S3)
2. Developers must be able to handle a high availability environment
An HA environment can quickly become complex. Developers must be able to cope with the fact that there is a new way of deploying: no longer on a static server but on multiple. Another option is in containers or on a Kubernetes platform.
3. Deployment strategy with CI/CD
Because the application needs to be adjusted in two or three places for a new version release, you need to choose a deployment strategy that fits. A CI/CD street helps with this. You can automate deployment from a Git repository.
Conclusion: Is a high availability environment for everyone?
By setting up your environment as a high availability architecture, you can prevent downtime or disruptions from affecting the availability of your data and applications. The keyword here is redundancy. However, an HA architecture also comes with risks. An HA architecture is always more complex, and this increases the risk of downtime if the organization is not prepared for it. If the developers are not ready for a complex environment, it may be better not to choose an HA architecture.
Is a high availability environment right for your organization?
Do you have questions about how to implement best practices or what a high availability environment could look like for your organization? At Cyso, we help keep your websites, mission-critical platforms, and applications online 24×7. We are a 100% Dutch, independent managed hosting provider with roots in Alkmaar. With our personal and cloud-neutral approach, we provide hosting solutions suitable for every business case. Let’s discuss yours over a (virtual) cup of coffee. Contact us!