This article is about cleansing and formatting data using SQL Server Data Quality Services .
In this post , I am going to show on how to cleanse & format data in the columns of SQL Server table . To do this , first we have to create Knowledge base and then Data quality project .
Creating Knowledge base is nothing but giving hint of source data to DQS and then setting rules on how data should be formatted by creating Domains for each columns .
After creating Knowledge base , create Data quality project making use of knowledge base on our source data and then after deciding on the recommendations provided by DQS , we can export the cleansed (corrected) & formatted data to new table .
Note : click on images to get enlarged view
Consider below screenshot is my sample data :
Create New Knowledge Base
Enter Knowledge Base name and click on Knowledge Discovery and then click Next
Create a domain

Create a domain for Source Column Name

Create a domain for Source Column Gender
Similarly create a domain for Source Column CityCode & State with Format Output to Upper Case

Click Next and then click on Start will Performs data discovery analysis on the selected data source .

Click Next and then set the Type as Invalid and enter the values to be corrected .
For example : For domain CityCodeKB - .> "Chennai" should be corrected as "chn" (city code instead of city name ) .


There are no values to be corrected in NameKB & StateKB , Check the values and then click on Finish and Publish the Knowledge Base :
Create New Data Quality Project :
Click Next
Click on Start will Perform Cleansing on the selected data source
Click on Export and then Finish
Clicking on Activity Monitoring will provide us the information as shown below :
In this post , I am going to show on how to cleanse & format data in the columns of SQL Server table . To do this , first we have to create Knowledge base and then Data quality project .
Creating Knowledge base is nothing but giving hint of source data to DQS and then setting rules on how data should be formatted by creating Domains for each columns .
After creating Knowledge base , create Data quality project making use of knowledge base on our source data and then after deciding on the recommendations provided by DQS , we can export the cleansed (corrected) & formatted data to new table .
Note : click on images to get enlarged view
Consider below screenshot is my sample data :
Create New Knowledge Base
Enter Knowledge Base name and click on Knowledge Discovery and then click Next
Create a domain
Create a domain for Source Column Name
Create a domain for Source Column Gender
Similarly create a domain for Source Column CityCode & State with Format Output to Upper Case
Click Next and then click on Start will Performs data discovery analysis on the selected data source .
Click Next and then set the Type as Invalid and enter the values to be corrected .
For example : For domain CityCodeKB - .> "Chennai" should be corrected as "chn" (city code instead of city name ) .
There are no values to be corrected in NameKB & StateKB , Check the values and then click on Finish and Publish the Knowledge Base :
Create New Data Quality Project :
Click Next
Click on Start will Perform Cleansing on the selected data source
Click on Export and then Finish
Clicking on Activity Monitoring will provide us the information as shown below :
No comments:
Post a Comment