Computer ProgrammerBy Tony Reinert—At some point during the software development lifecycle, actual test data are needed to run through the software’s interfaces, classes and algorithms. Test data will most likely be loaded into a message stimulator, test harness, simulation environment, automated testing tool, or simply, a unit test. Oftentimes, it is labor intensive to fabricate data, which can lead to cutting corners, reduced code coverage, or even worse, to testing only the go path. So the question to be asked is, “From where should testers get their test data?”

Test Data Sources

Customer Provision

The easiest way to obtain test data is to have it provided by the customer or stakeholder. The customer may define several test data sets or use cases that must pass prior to acceptance. This “gold” data set may be a provided data file, or perhaps a set of test steps needed to get the system under test in the proper configuration to create the data. One area of caution with this approach includes the possibility of scope creep in the system requirements. New features that are added during development may not have been accounted for with acceptance test criteria. It’s important to identify any gaps in code coverage and supplement them with the appropriate testing.

When test data are provided by the customer or used from a production system, some additional considerations include: Does the data allow for covering the required test paths or a subset of them? Do you understand the coverage the provided data will include? Is the data sanitized so that any sensitive data (e.g. SSN, etc.) are handled accordingly?

Integration Tests

If the development work includes an update or rehost to an existing software component, then testing may simply involve dropping in the replacement component, which is known as integration. This would leverage any existing regression testing or component testing in place prior to the update.


If the system design included any type of interface documents, they may be of assistance when developing test data. Depending on the format, these documents may contain information that can be used to define some boundary tests for data. Boundary tests are a great way to augment other types of tests and help wring out some of those elusive bugs. This is a much better approach than trying to test the maximum size of the data structure. The main downside to boundary testing is that the data are typically not in context, so useful mileage may vary.

Data Structures

If the system accesses any API’s, the data structures involved would be a good starting point for generating test data. If the external APIs are mocked up, then their return data can be manipulated to help exercise different system behaviors. Additionally, if the system is designed using a modeling language such as UML, there might be default values or boundary information that could be useful.

Wrapping Up

Before designing your next development feature, take some time to think about not only how to test your software, but also what you are going to test it with. If your approach to testing is not defined ahead of time, it may result in some unforeseen work.  Considering this issue during the planning stage of your development will result in more effective testing later.

If you need guidance on acquiring test data or setting up the proper test environment for your system, Innovative Defense Technologies (IDT) may be able to help. Contact us for more information.

Tony Reinert is a senior software engineer at Innovative Defense Technologies (IDT). He is a contributing member of the ATRT: Test Manager and the ATRT: Analysis Manager development teams.