Process of ETL Testing and Its Types | Devstringx Technologies
What is ETL?
ETL stands for Extract Transform and Load. ETL is a process in which data transfer takes place in multiple stages. Data transfer starts from the legacy source to the staging server, from staging to data warehouse and finally from a data warehouse to data marts and then load the data from data mart into cubes of browsing.
What is a Data Warehouse?
A data warehousing is a technique of managing data after collecting it from various sources to provide meaningful business understanding. It allows the planned use of data by combining technologies and components.
Data warehouse is an electronic storage of a huge amount of information through business. It is designed for query and analysis instead of transaction processing. It is a process of transforming data into information and making it available to users in an appropriate manner to make a difference.
Why do we need ETL testing?
If we want to create a data warehouse that combines data from different sources, we will need a way to get that data into the warehouse. Each step in the ETL process is highly vulnerable and subject to errors or loss of data or erroneous transfer of data. This is the main reason to involve the concept of testing in the ETL cycles.
ETL Testing Process:
Different stages of ETL testing process are:-
- Understand business requirements.
- Test Planning and estimation on the basis of number of tables and volume of the data.
- Designing of test cases from all available inputs, design mapping data ,SQL scripts and preparation of test data.
- Test execution and bug reporting, execution phase includes running ETL jobs, monitoring job runs, SQL script execution, defect logging, defect retesting and regression testing.
- Summarising reports and result analysis, in this phase sign off is given to promote the job or move to the next phase.
- Test closure
These are the five pillars of ETL Testing process:
- Upstream data issues.
- Timelines and accuracy of data.
- Security of data.
- History of data and its maintenance.
- Availability if data for decision.
Difference between ETL Testing and Database Testing
It verifies whether the data is moved as per requirement or not.
It verifies whether the data is following the rules, as defined in the Data Model.
It verifies if the counts in the source and target are equal.
It verifies if foreign and primary key relation is maintained and also check for any orphan records.
It checks if foreign and primary key references are maintained during ETL process.
It verifies whether the database is ideally normalized and there are no redundant tables.
It verifies that there is no duplicacy in the data loaded into the data warehouse.
It checks for any missing data in the required columns.
Types of ETL Testing:
- Data transformation Testing: Multiple SQL queries are required to be run for each and every row to verify data is transformed correctly according to various business requirements and rules.
- Source to Target Testing: It includes matching of the count of records and data validation at the source and target system.
- End-User Testing: It includes report generation to verify if the data is as per expectation by cross checking the data in the target system for report validation.
- Retesting: It is a process of data validation by running the reports again once the bugs and defects are fixed in the target system.
- Production Validation Testing: It includes data validation in production system & compare it against the source data.
- Data Integration Testing: It make sure that the data from different sources has been loaded properly into the target system and all the threshold values are checked.
- Application Migration Testing: This testing ensures that the ETL application is working fine on moving to a new platform.
ETL tools are used for building and applying transformation logic to move data from one source to another. They are also used for mapping the schemas from source to destination that may occur in various ways. They also transform and clean data before moving data to the destination.
Some of the ETL tools are:
- Informatica Data Validation
- Datagaps ETL Validator
- Talend Open Studio for Data Integration
Creating Test cases in ETL testing:
ETL test case writing includes following documents:
- ETL Mapping sheets: This contains all information about the source and destination tables and its These sheets helps in creating SQL queries while doing testing.
- Database Schemas of Source and Destination table: ETL mapping sheets kept up to date with Database Schema for data validation.
Types of ETL Bugs:
- Calculation bugs: It is reported when the end result is deviated due to some mathematical error.
- Input/Output bugs: Accept invalid values and rejects valid/ correct values.
- User Interface bugs: It is related to the GUI of an application like color, font size , alignment etc
- Hardware bugs: When device doesn’t respond properly to the application due to some hardware issues.
- Load Condition bugs: When system doesn’t allow multiple user or customers expected load.
Originally published at https://www.devstringx.com on November 23, 2019.