Architecture & DesignThree Major Challenges of Test Data Management

Three Major Challenges of Test Data Management content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

By Mush Honda, KMS Technology

Whether you’re testing new features and functions in a system or validating defects reported by a client, it’s vital that you can accurately emulate the workflows and interactions as encountered by the end users. To do that, you need to be able to synchronize the use of correct and accurate data within your tests. You can increase the confidence of your testing (thereby also increasing the quality of your test coverage) by getting the correct data in the right quantity, at the right time within the system. So, how can I get my test data to better reflect real-world scenarios? Welcome to the world of test data management!

Three Major Challenges of Test Data Management

Broadly speaking, you have a trio of major challenges to consider when you’re managing test data. Each comes with a host of associated risks. It’s vital to take them into consideration at the outset, and plan accordingly.

Data Validity and Consistency

Data ages. It can become obsolete. Proper versioning of data is important. You also need traceability at an end-to-end level, from inception all the way through the life cycle. The integrity of your data must be maintained. If you allow the data to age, it can lose context. If you can’t trace data and validate its integrity, it may prove to be very difficult to troubleshoot any issues that arise. How are you going to audit?

Data Privacy

Many applications contain sensitive personal information. There may be government mandates and regulations in place that stipulate the data that must be masked, de-identified, or encrypted. Without a solid process to protect that data, there’s a real risk that valuable and personal information could leak and be used in a malicious manner. A data breach can be extremely expensive to sort out; it can damage reputations and can result in lawsuits and punitive fines.

Data Selection and Sub-setting

You need data that’s relevant to the context of what you’re testing, but how do you manage data requests effectively? You’ll want to use a smaller subset of data in a scaled-down, non-production environment, but it must mimic the production environment. If you don’t get your selection right, test coverage won’t be as good as it could be.

Five-step Process for Success

If you organize the process properly, it’s possible to overcome these challenges. If you’re using production data with sensitive information in it, you must make sure it is converted when you pull it out. You can work through the following five-step checklist:

  1. Data generation:
    • [  ] How do you generate data for tests?
    • [  ] Do you get it from production?
    • [  ] Is it reusable?
    • [  ] Can you outline a set of criteria to automatically generate the quality of data you need?
  2. Data de-identification:
    • [  ] How are you going to mask it?
    • [  ] Who will have access?
    • [  ] Can you scramble sensitive details?
    • [  ] Do you need to encrypt?
  3. Data planning:
    • [  ] How do you select a subset of the data?
    • [  ] How do you ensure it is relevant?
    • [  ] You need to prioritize for critical data and set up a schedule to ensure it’s refreshed on a regular basis.
  4. Data maintenance:
    • [  ] How do you store the data?
    • [  ] How do you handle the update or refresh process?
    • [  ] How often is it migrated into the system?
  5. Data auditing:
    • [  ] Are you analyzing the data you’re using?
    • [  ] Is it fit for the purpose?
    • [  ] Can you trace the workflow from end-to-end?

Self-healing Data

It’s common for test teams to find false negatives and failures in the test scripts (especially in automation), simply because the data used is growing obsolete. Maybe the source of the data or the data values were irrelevant or not applicable. You can avoid this by creating self-healing scripts that use queries to generate subset selections of data on a schedule, based on a fresh set of criteria that you define for each new test cycle. You identify the data you need based on the criteria for what you’re testing before extraction.

Planning ahead should enable you to develop a streamlined process for test data management and overcome the major challenges. Identify the data you need and extract it, mask or de-identify it, migrate a subset into your system, validate the integrity, and schedule your refresh to maintain relevance. Maintain a watchful eye on the process and make tweaks where required. If you get your test data management right, you have a solid foundation for testing any system.

About the Author

Mush Honda is QA Director at KMS Technology, a provider of IT services across the software development lifecycle with offices in Atlanta and Ho Chi Minh City. He was previously a tester at Ernst & Young, Nexidia, Colibrium Partners and Connecture. KMS services include application management, testing, support, professional services and staff augmentation.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories