In this tutorial, We will see how can you validate the Test in ETL Testing or DataWare House Testing.
VALIDATION Testing Levels:
There are several levels of testing that can be performed during data warehouse testing. Some examples: Constraint testing, Source to target counts, Source to target data validation, Error processing. The level of testing to be performed should be defined as part of the testing strategy.
During constraint testing, the objective is to validate unique constraints, primary keys, foreign keys, indexes, and relationships. The test script should include these validation points. Some ETL processes can be developed to validate constraints during the loading of the warehouse. If the decision is made to add constraint validation to the ETL process, the ETL code must validate all business rules and relational data requirements. Depending solely on the automation of constraint testing is risky. When the setup is not done correctly or maintained throughout the ever changing requirements process, the validation could become incorrect and will nullify the tests.
Source to Target Counts (Reconciliation):
The objective of the count test scripts is to determine if the record counts in the source match the record counts in the target. Some ETL processes are capable of capturing record count information such as records read, records written, records in error, etc. If the ETL process is being used can capture that level of detail and create a list of the counts, allow it to do so. This will save time during the validation process.
Source to Target Data Validation:
No ETL process is smart enough to perform source to target field-to-field validation. This piece of the testing cycle is the most labor intensive and requires the most thorough analysis of the data. There are a variety of tests that can be performed during source to target validation.
Below is a list of tests that are best practices:
Threshold testing – expose any truncation that may be occurring during the transformation or loading of data
Source: table1.field1 (VARCHAR40):
Stage: table2.field5 (VARCHAR25):
Target: table3.field2 (VARCHAR40):
In this example the source field has a threshold of 40, the stage field has a threshold of 25 and the target mapping has a threshold of 40. The last 15 characters will be truncated during the ETL process of the staging table. Any data that was stored in position 26-30 will be lost during the move from source to staging.
Field-to-field testing – is a constant value being populated during the ETL process? It should not be unless it is documented in the Requirements and subsequently documented in the test scripts. Do the values in the source fields match the values in the respective target fields? Below are two additional field-to-field tests that should occur.
Initialization – During the ETL process, if the code does not re-initialize the cursor (or working storage) after each record, there is a chance that fields with null values may contain data from a previous record.
Record 125: Source field1 = Red Target field1 = Red
Record 126: Source field1 = null Target field 1 = Red
Validating relationships across data sets – Validate parent/child relationship(s)
Products from Amazon.in
Price: Out of stock
Price: INR 29,999.00
Price: INR 14,999.00
Source parent: Purple.
Source child: Red and Blue.
Target parent: Purple
Target child: Red and Yellow.
Error Processing: Understanding a script might fail during data validation, may confirm the ETL process is working through process validation. During process validation, the testing team will work to identify additional data cleansing needs, as well as identify consistent error patterns that could possibly be diverted by modifying the ETL code. Taking the time to modify the ETL process will need to be determined by the project Manager, development lead, and the business integrator. It is the responsibility of the validation team to identify any and all records that seem suspect. Once a record has been both data and process validated and the script has.