In the relatively short time that it’s been around, software-defined infrastructure has changed the face of modern IT – mostly for the better, as things are now automated and operate more efficiently than when infrastructure was in-house and hardware based.
It has enabled many companies to save both time and money.
However, all is not perfect in the software-defined infrastructure world. Everyone is happy when the system is “healthy”, when things work well and systems are operating properly, but sometimes things go wrong. Systems can manifest a glitch or problem – it’s as if our healthy system has “caught a cold”.
Where do these colds – glitches, incompatibilities, outages, and other system traumas – come from? In my experience, they’re due to several reasons:
- A lack of sufficient controls and checkpoints over the systems: A modern organization could have dozens, if not hundreds, of development teams working in an environment. All the code that these teams produce can be integrated, delivered or deployed multiple times a day in some cases. Controls, checkpoints, and tests should be injected into every process, and before every change, to assure stability, quality and continuous improvement.
- Complexity and visibility: Since these systems and environments are so large and complicated, tracking down all the elements that go into ensuring a software-defined infrastructure works properly is extremely difficult.
- Ongoing changes: Due to agile programming, microservices architecture, and the nature of new programming methodologies, things are constantly in flux – new code is updated and deployed frequently and new elements and services are being used by many different hands. Keeping track of those changes is almost impossible.
To ensure that all code is up to standard and that there are no problems that could be reflected further down the line – that bad code will be integrated into production, with the problem only becoming evident when someone tries to access the function that code is responsible for – automated testing is implemented in the CI/CD pipeline. The code is tested for integrity on its own, or for errors in the context of a larger application. These tests include QA sanity testing and more. The testing has to be automated given the nature and scope of the project.
Those tests are supposed to ensure that the product coming out the other end of the CI/CD pipeline works properly. To expect coders to take into account so many factors including security, compliance, resiliency, and performance before writing anything is unrealistic, and no work would get done. Those issues should be validated by the checkpoints, so if there are problems, that is the first place we should be looking for. But due to the complexity of the system, however, it’s practically impossible to handle manually and with the existing toolkit.
So, just how robust and effective are your checkpoints?
Even if you think they are doing the job, they may not be, even if the code you test passes those checkpoints. For example, code deployed could work perfectly well in a staging environment, but when subjected to a heavy load, a code element fails.
Shouldn’t that have been caught by one of the checkpoint tests? Does one of the checkpoints include that kind of a load test? What about interaction with other software, systems, applications, layers, and services? Do the checkpoints test for how updates affect all the moving parts? Are all elements of the infrastructure tested against the code?
I’m guessing the answer to those questions is no, or at least “I don’t know.” Not good enough, I’m afraid. To solve these issues, your checkpoints need the following three elements:
- Automation: As mentioned, there is too much code for any team to take into consideration when they test their work. Checkpoints that can take all the disparate elements in the system and the environment and test them is a basic requirement for performing any serious QA.
- Visibility: Fortunately, many checkpoints do allow for automated testing, but are they testing for the right elements, or for all elements? And what about the dependencies between those elements? A good checkpoint is like night-vision goggles – with them, you become aware of issues and problems that are already inherent in the uploaded code, but wouldn’t be visible without your special equipment.
- Knowledge: Since new uploaded code is constantly being fed through the pipeline, new issues are constantly surfacing. By automating and constantly updating the process of testing and its content, and assuring it’s aligned with the up-to-date knowledge base and vendor best practices, you become aware of risks and problems before they impact your business. In addition, the ability to automatically record the circumstances and issues those tests uncovered make it easier to resolve problems when they crop up and provide better guidance on what to avoid in the future before deploying code in a production environment.
SEE ALSO: 8 tips to optimizing your continuous testing strategy
What’s needed is a proactive approach that will take all these elements into account, provide automated scanning and testing of the software-defined environment, as well as its staging environment, and reveal the risks that need to be dealt with, while maintaining and growing the knowledge base to ensure that the current problems do not repeat themselves. An example of this is if a Load Balancer and Auto Scaling Group are not configured properly – the proactive approach will detect the risk and raise a flag even without any load, thus saving organizations from the outage or disruption that would have resulted from that problem.
People and systems don’t just catch a cold. They need to be exposed to something that makes them ill. Prevention and detection are nine-tenths of a cure. For those running their business based on a software-defined infrastructure system, prevention of colds is a very smart and healthy strategy.
The post How to keep your software-defined infrastructure from catching a ‘cold’ appeared first on JAXenter.
Source : JAXenter