Case Study: Automating System Health Checks

Background

This case study will focus on how IDT applied automation to tests performed on a complex system of systems during the installation process. The automation in this case study was concentrated on the computer and network health checks that were being performed during a 72 hour longevity test that is conducted during the installation process. Using the current manual approach, the health checks were performed every four hours throughout the duration of the longevity test.

Strategy/Approach

IDT initiated an automated test strategy that applied ATRT: Test Manager to perform the health check function for this system. Health checks basically provide a snapshot of the overall health or performance of a system at a moment in time. In this case, the health check consisted of verifying that the data created on one system of the network was seen by the other systems on the network. More specifically, data is entered through the GUI on one system and then the information is verified across multiple systems throughout the network. If the data failed to be viewed on the remote systems, the health check failed.

Prior to automation, the test team performing this function consisted of about 16 operators and one test director per shift. It was taking the test team approximately 45 minutes to complete each health check and assess the results. The objective for using ATRT: Test Manager was to reduce the health check execution time, reduce overall test team manpower, and increase frequency of performing health checks. ATRT: Test Manager is ideally suited to achieve these objectives in that ATRT: Test Manager can be setup and run for the duration of a longevity test with no operator intervention and also provides features which easily enable the validation of information across multiple systems.

Conclusion

Testing highly complex systems of systems in varying test environments is a challenging and demanding effort. In this case study we successfully applied automation to computer health check tests to determine status of the system under test. Test team manpower was significantly reduced and execution time was reduced to about 18 minutes. With the reduction of execution time, the frequency of health checks was able to increase from every four hours to being performed every half hour. Thus providing higher fidelity snapshots of system performance, confidence the system was performing as expected and faster notification to the test director if the system was not performing. The measurable overall productivity efficiency achieved for this effort was 79%.