June 7, 2021

Understanding SQL Server 2019 Data Classification



Published on June 7, 2021 by Solvaria

 

 

Understanding SQL Server 2019 Data Classification

 

SQL Server 2019 Data Discovery and Classification is a new feature in the latest SQL Server Management Studio (v17.5). SQL Server has continuously improved its security measures, from adding Dynamic Data Masking to SQL Server 2016, to now giving users the ability to discover, classify and label sensitive data that can serve as an infrastructure for meeting regulatory compliance requirements (HIPAA, PCI, etc.). 

As a security benefit, SQL Server 2019 does not store information as the extended properties, and instead adds metadata about the sensitive classified information depending on the defined columns.

Sensitive data includes protected health information such as payment history, insurance information or medical records, education transcripts, and financial information such as credit card numbers and tax forms.

There are 3 main capabilities supported in SQL Server 2019 Data Classification, including:

  1. Discovery
  2. Classifying
    1. Labels
    2. Information Types
  3. Visibility

 

Discovery

The classification engine scans your database and identifies columns that contain potentially sensitive data. It then provides you with an easy way to review and apply recommended classification

 

Classifying

The classification includes two metadata attributes:

 

  1. Labels: 
    • Sensitivity labels include:
      1. Public
      2. General
      3. Confidential
      4. Confidential - GDPR
      5. Highly Confidential
      6. Highly Confidential - GDPR
      7. [n/a]
    • The main classification attributes and are used to define the sensitivity level of the data stored in a column

 

  1. Information types: 
    • Attributes that provide more detailed information about the type of data stored in a column

 

You can also define sensitivity classification and information types using the below query as an example:

  • ADD SENSITIVITY CLASSIFICATION TO APPLICATION.[Name]

WITH (LABEL = 'Confidential', INFORMATION_TYPE = 'Financial');



Visibility:

You can view the database-classification state in a detailed dashboard in the SQL Server to Azure portal. You can also download an excel report to use for other needs such as compliance and database audit purposes.

 

How to classify data in Microsoft SQL Server

Classifying data in SQL Server 2019 can be achieved by following the steps below:

 

  1. In the SSMS Object Explorer, right click on the database that you would like to classify and choose: 
    1. Tasks > Data Discovery and Classification > Classify Data...

  2. The classification engine scans your database for columns containing potentially sensitive data and provides a list of recommended column classifications.

  3. To view the list of recommended column classifications, click on the recommendations notification box at the top or the recommendations panel at the bottom of the window.

  4. The classification engine scans your database for columns containing potentially sensitive data and provides a list of recommended column classifications.

  5. To view the list of recommended column classifications, click on the recommendations notification box at the top or the recommendations panel at the bottom of the window.

 

The options for Information Type are as follows: 

  • Banking
  • Contact info
  • Credentials
  • Credit card
  • Date of birth
  • Financial
  • Health
  • Name
  • National ID
  • Networking
  • SSN
  • Other
  • [n/a]

 

We can view the information about classified columns from the system view sys.sensitivity_classifications.

 

SQL Server vs Azure Data Discovery and Classification

SQL Azure is Microsoft’s cloud database service and allows organizations to store data in the cloud. SQL Azure is based on SQL Server, so they have compatibility in functionality and usage. For example, both platforms use T-SQL to parse the names of columns to identify and classify the data. 

 

There are a variety of SQL Server resources on our website for more information on the functionality of Microsoft SQL Server. However, since Azure is not the exact cloud version of SQL Server, there are several fundamental differences between Azure SQL and SQL Server. 

 

The most important difference between the two is that SQL Server can host up 32,767 databases per instance, and Azure SQL is only able to host one database per Azure SQL Instance. Azure SQL does host databases from multiple customers in a single physical platform. This means Azure SQL is a multi-tenant platform that shares its physical resources with any other clients who also use the platform.

 

There are a number of other differences that may be important enough to consider SQL Server over SQL Azure, or vice versa. First, the way Azure SQL and SQL Server communicate is different. In SQL Server, your application talks directly to the server over your LAN and uses Tabular Data Stream (TDS) protocol over TCP/IP or HTTP. Also, it can communicate on any configurable port and does not really require transport security. On the other hand, SQL Azure makes an explicit call through your network’s firewall that can reach the Azure Gateway only through the internet. The Gateway also features another firewall called SQL Azure Firewall that lets only specifically designed IP addresses get access. SQL Azure only uses TCP/IP protocol for communication. 

 

SQL Azure has many other benefits and features outside of its relation to SQL Server. Organizations can build applications on-premise and move them to SQL Azure, or build and keep them in the cloud on Windows Azure. SQL Azure can also support many SQL Server features such as query language, and allows data to be synchronized between on-prem and cloud databases. Data Discovery & Classification is built into Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics. The built-in feature also includes basic capabilities for discovering, classifying, labeling, and reporting.

 

Some other notable features that Azure SQL lacks compared to SQL Server are unsupported data types and actions such as cross-database joins, and there is a maximum database size of 50 GB. The current release also does not perform scheduled backups and restores, though Microsoft has said this will be addressed in future releases.

 

Discovering and classifying your organization's sensitive data can be a time-consuming and challenging task that our expert DBAs have years of combined experience performing. If you need more answers or want more information, feel free to contact us via the form below. Or, check out our Solutions page for more common database performance issues and possible solutions.

 

Share on

 
Back to Resources
Illustration