How all these large websites like Facebook, Twitter, Amazon, or web apps like Gmail, are being updated on production environment? They should have millions (or billions?) of users, and they should have tons of machines to serve these users. They release new features frequently, and seamlessly, without interruption of service.
How do they do it exactly? I have no idea. If you know something or if you have links to anything related – please tell me
Thinking logically, release process have set of requirements:
- No downtime. For enterprise software, it is easier in most cases (there are exceptions indeed). They can wait until night or weekend while few users are online, shutdown everything, install updates, get servers up. However, websites should be up and running 24/7, especially if they are international.
- No broken user experience. For example, user can sign-up for some service on the website, and he should pass three pages from the landing page to complete the process. John Smith opens landing page, fills some form on the second page. At that moment, we are releasing new version of the website where some fields of this form are moved to the third page. John presses “next”, and see the same fields that he already filled before. Confusion. In some cases, he can even get an error and cannot complete a transaction.
- Low cost.
No downtime – then you need to update servers while they are offline. You pull the server off the load balancing system, and then update it, test, and put back online.
Easiest way – to update servers one at a time (or small groups). You cannot update large group of server at once because it causes increase of load on remaining servers, and can hurt service level.
Remember that you do not want to break user experience. The best thing that you can do – to provide the same version for the same users. Therefore, if the user is using version X – he will stay on version X for the entire interaction session.
One of the ways to do it is to delegate this to load balancer. It can remember a version for the particular session and then route traffic for this session only to servers with this version deployed.
Another option is to have few versions deployed on each server simultaneously – and then each webserver can decide what version to serve.