An introduction to Data Quality Services

The use case for DQS is simple, a quick way to audit and correct data through an abstraction layer. This abstraction layer can be managed by someone other than a DBA or developer and help improve your database’s data quality and over all employee productivity.

First step is to install the DQS client, you can search for data quality server installer on your machine running your SQL instance.
Once you follow the prompts and install you should to able to access to DQS client.


Lets take a simple set of data as follows :

ID Fname Lname
1 Ian Fogelman
2 Bill Gates
3 Jim Brown
4 D Smith
5 J Green

Suppose that we want to enforce a business rule that each first name must be 3 characters.
We will solve this using a DQS approach.

First step is to create a knowledge base


Create New Domain

Input a name and a description for your domain


Next click domain rules and enter Length3 for name and add a rule of length equal to 3


Click publish


Now go to data quality projects, New Data Quality Project.
Name the project and select the knowledge base created, for this project we will be cleansing.


On the next screen match each field name with a domain, in this case we only have first name, click next.


Click Start


After the task is complete you can see how many rows of data passed and how many were invalid.


On the following screen we can see which records failed, because the length of the first name field was not 3.
Which we originally stated in the criteria of the domain. You can approve and reject.


On the last screen we can Export our results to a table in the database by defining the table name and clicking export.


Once clicking finish , we can view the data in the Output table in the database we specificied to see the full results


DQS can be used in combination with other technology such as SSIS/PowerBi/SSRS and the process of reporting data issues or automatically fixing them becomes much more interesting!

Ian Fogelman

Ian Fogelman

My name is Ian Fogelman. I like to develop data driven solutions with SQL Server, Python, .NET and predictive analytics.