Traditional “big bang” approaches to software development are incompatible with the high flexibility, agility, and continuous deployment requirements of today’s cloud and DevOps software platforms.
It’s just not enough to prepare a checklist of manual steps to execute during the production release deployment. If you do so, you are not really agile, nor you are a proper DevOps.
Blue-Green Deployment: An Overview
Blue-Green deployment is an approach to software deployment that reduces downtime and risk of new software versions by creating two identical environments: active (blue) and inactive (green).
The active environment is where the current version of the software is running, and users are generating production traffic. The inactive environment is where the new version of the software is deployed and tested.
Once the new version is tested and ready, traffic is switched from the active environment to the inactive environment, making it the new active environment. You can repeat this process as needed.
Blue-Green deployment fits well with the DevOps mindset and processes because it supports continuous delivery and deployment of software while minimizing downtime for production users and eliminating the risk of a production release failure.
Having two identical environments makes it possible to test and deploy new versions of software without affecting the current production environment. This means faster and more frequent releases, which is a key aspect of DevOps.
Additionally, the ability to switch traffic between environments quickly is a primary precondition for rapid rollback in case of issues, which is also important in a DevOps environment.
Key Principles of Blue-Green Deployment
#1. Two Identical Environments
Blue-Green deployment requires creating two identical environments. That means identical from the data and processes point of view. One is active (blue), and the other is inactive (green).
The blue environment is where production users run their day-to-day processes. The green environment is always in sync with the blue, but testers run their test cases there. Even though this environment is not the production, you run the tests under real-world conditions as it is a production-like environment.
#2. Traffic Switch
Once the new version of the software is tested and ready, traffic is switched from the active environment to the inactive environment, making it the new active environment.
The switch is instant. All the deployment is now a thing of the past. There is no downtime window. Users don’t need to do anything in order to reach the new environment. They are redirected automatically, and all of them at the same time.
#3. Rapid Rollback
The ability to switch traffic between environments quickly also means rapid rollback in case of issues. This ensures minimal downtime, and the application remains highly available.
If anything goes wrong with the green environment, all users will instantly switch back to the stable original blue environment without any fuzz.
#4. Automated Testing
Automated testing is a key aspect of Blue-Green deployment. It ensures that the new version of the software is thoroughly tested before it is deployed to the active environment.
If you don’t have a significant amount of the tests automated in your systems (including unit tests, functional tests, and regression tests at least), then it probably doesn’t even have much sense to think about implementing Blue-Green deployment.
The lack of automated tests will slow you down dramatically. The time required to test the new (green) environment will be so long that by the time you will be able to switch to the green environment, it will be already “too old” from the perspective of the software development lifecycle.
#5. Continuous Delivery
Blue-Green deployment is part of a continuous delivery pipeline, which ultimately means faster and more frequent releases of software into production.
You can make the switch as soon as you are ready to test of new software version on the green environment. Since the deployment was already done and you only need to do the traffic switch itself, it’s so fast that you can do this every day. Assuming you are rapid also in testing activities, obviously.
The platform that runs Blue-Green deployment has its own specific lifecycle of steps and processes to run. This is what it usually consists from:
Build a new version of the software. This involves compiling the code, running automated tests, and creating a deployable artifact.
The next stage is where you deploy the new version of the software to the inactive (green) environment. This involves setting up the environment, deploying the artifact, and configuring any necessary settings.
Once the new version of the software is deployed to the green environment, run automated tests to ensure the new version functions correctly. This includes functional tests, regression tests, integration tests, and, if you are outstanding, even performance tests.
Switch the traffic from the active (blue) environment to the inactive (green) environment. This involves updating the load balancer or DNS settings to direct traffic to the green environment. Of course, you want to have this done via automated processes.
Once the switch is done, monitor the application to ensure it functions correctly. This includes monitoring for errors, performance issues, and other issues.
This step is optional, and you don’t really want to reach it too often. But If anybody detects any substantial issues, switch the traffic back to the blue environment to perform an instant rollback. Again, without any downtime or disconnection related to production users. Just update the load balancer or DNS settings to direct traffic to the blue environment.
Once you resolve those issues and you are ready to go back to the new version again, switch the traffic back to the green environment. So again – update the load balancer or DNS settings to direct traffic back to the green environment.
Finally, once the new version of the software is stable and functioning correctly, decommission the old version of the software running in the blue environment. You will need it to build up another new version of your system.
Implementing CI/CD Pipelines
Implementing Blue-Green deployment into a DevOps CI/CD pipeline shall be a natural process.
A strong prerequisite is you have those two identical environments already in place. Since this shall be an automated process, you can use infrastructure as a code tool like AWS CloudFormation or even cloud-agnostic Terraform scripts to create/recreate/update the environments for you within automated pipelines.
Once you have this, it is a relatively easy step toward creating a fully automated deployment process. You just reuse the already existing pipelines for the blue and green environment creation. However, this time you need to include in the pipeline also testing processes.
The traffic switching process you can automate with tools like AWS Elastic Load Balancer or NGINX. This involves updating the load balancer or DNS settings to direct traffic to the green environment once the new version of the software is tested and ready.
Finally, reuse existing pipelines even for decommissioning the old blue environment. It’s up to you whether you first execute destroy for all the services and components before recreating them from scratch, or alternatively, you can just update scripts for every service in the chain. Usually, the destroy & recreate is a safer option, as with the update, you have much more corner cases to consider.
Best Practices of Blue-Green Deployment
Curious about how to make the best use of Blue-Green deployment? Here are some of the tips coming from the practice.
Have a Solid Database Migration Strategy
When deploying a new version of the software, it is important to ensure that the database schema is updated correctly. Use a database migration strategy like Flyway or Liquibase to manage database schema changes.
Use a Canary Analysis Tool
Even though Canary deployment is an alternative approach, you can still use some of its techniques to perfect your Blue-Green deployment.
Use a canary analysis tool such as Kayenta or Spinnaker to analyze the performance of the new version of the software in a real-world environment. This involves comparing the performance of the new version of the software to the performance of the old version of the software.
Use a feature toggle framework such as Togglz to enable or disable features in the new version of the software. This allows for a gradual rollout of new features and enables rapid rollback if necessary.
Use a Load Balancer with Health Checks
Use a load balancer such as AWS Elastic Load Balancer or NGINX with health checks to ensure that traffic is only directed to healthy instances. This ensures that the application remains highly available and that downtime is minimized.
Use a Rollback Plan with Automated Rollback
Have a rollback plan in place in case of issues, and automate the rollback process using a tool such as AWS CodeDeploy or Octopus Deploy. This ensures that downtime is minimized and that the application remains highly available.
This applies mostly to the green environment whenever you discover some significant issue with the new version.
You don’t need a rollback plan for the blue environment, as this one stays untouched by the switch, and you can return to this stable environment whenever needed and instantly.
Challenges with Blue-Green Deployment
Implementing Blue-Green deployment can present some challenges for development teams. Here are some typical challenges:
Setting up and managing two identical environments can be complex and time-consuming. This requires expertise in infrastructure as code tools such as Terraform or CloudFormation. You need to have a senior development team in place capable of coping with such technical challenges.
When deploying a new version of the software, it is important to ensure that the database schema is updated correctly. This can be challenging, especially if the database schema is complex. You need solid database deployment processes in place that can automatically and reliably handle the schema update activities.
Analyzing the performance of the new version of the software in a real-world environment can be challenging. This requires expertise in canary analysis tools such as Kayenta or Spinnaker.
Implementing feature toggles can be challenging, especially if the application has a large number of features. This requires careful planning and coordination between development teams.
Testing the new version of the software in a real-world environment can be challenging, especially if the application has a large number of users or servers. You need to have test cases automated as much as possible. Also, your routine processes will end up including a lot of coordination between development and testing teams.
Having a good monitoring solution is a very rare reality, but for proper DevOps operations, this is a must. As soon as feasible, go and invest the time into building that solution with proven services (AWS CloudWatch, New Relic, Datadog).
Difference Between Blue-Green and Canary Deployment
While the difference to traditional deployment processes is quite obvious (there are no two parallel environments running with different software versions in traditional deployment processes), the difference to Canary deployment might be a bit more interesting.
Blue-Green deployment means two environments (blue and green). But at the same time, the two environments are constantly in sync in terms of data. Once the new version is tested and considered ready, traffic is switched from the active environment to the inactive environment, making it the new active environment. You don’t spend any time deploying new code, and there is no production downtime involved. All production users work all the time on the currently active environment, and they don’t even notice the switch.
Canary deployment involves deploying a new version of the software to a small subset of users while the majority of users or servers continue to use the current version. This is a gradual deployment rather than a full switch. Testers are, in this case, direct production users, even though only a defined subset of them. This group is actively testing the new version with production processes, and when finally stable, the new version will spread to the rest of the users.
So Which One Is Better?
A consultant’s answer “it depends” fits here the most, as mean as it can sound.
If your system’s priority is high availability above all, then Blue-Green deployment shall be your choice.
If your strong preference is rather faster feedback and a more controlled (although slower) rollout of the new system version, then Canary deployment has advantages over Blue-Green.
The important thing is that both of them are agile enough to consider themselves good enough for serious DevOps system creation.
Netflix uses Blue-Green deployment to deploy new versions of its streaming service. By using Blue-Green deployment, Netflix can deploy new versions of its service without affecting the user experience. In fact, Netflix also uses Canary deployment in parallel for other cases, so it’s not unrealistic to combine different approaches to DevOps deployment under the same roof.
Also, Amazon and Etsy use Blue-Green deployment to deploy new versions of their e-commerce platform.
Another case is LinkedIn which uses Blue-Green deployment to deploy new versions of its social networking platform.
Last but not least, IBM uses Blue-Green deployment to deploy new versions of its cloud platform.
These companies have successfully implemented Blue-Green deployment to their platform infrastructures and serve as a good example for others.
Like the Canary, Blue-Green deployment strives for the best optimization of your already existing agile processes and methodologies to deliver new software smoothly in such a way that nobody will ever notice it at all. This is the ultimate goal of such approaches. You deliver constantly and very often, but nobody knows about it, nobody notices it, and in the end, nobody cares.
It might be frustrating a bit for the development team that there is no gossip around the company about their latest releases. But if you ask me, this is exactly the best service you can deliver. Nobody talks about it, but everyone uses it day to day.
Delivery-oriented architect with implementation experience in data/data warehouse solutions with telco, billing, automotive, bank, health, and utility industries. Certified for AWS Database Specialty and AWS Solution Architect… read more