Azure Purview

What is Microsoft Azure Purview?

In Cloud Technology, Data Governance, MS Azure by PeterLeave a Comment

Microsoft Azure Purview is a fully managed, unified data governance service that helps you manage and govern your on-premises, multi-cloud, and SaaS data. Purview creates a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage.

Purview will enable organizations to have a top view of the data landscape, perform data discovery, data classification and establish end-to-end lineage.

Azure Purview is built based on Apache Atlas Open API ecosystem, with some enhancements and additions by Microsoft, it is an open-source project for metadata management and governance for data assets.

Azure Purview also has a data share mechanism that securely shares data with external business partners without setting up extra FTP nodes or creating redundant large datasets.

Azure Purview does not move or store customer data out of the region in which it is deployed.

Business users can easily create a unified map of data assets and their relationships with automated data discovery and sensitive data classification. We can also get insight into the location and movement of sensitive data across landscapes and empower data consumers to find valuable data through a data catalog.

Azure Purview is deeply integrated with Azure Synapse Analytics which allow us to interact with Azure Purview assets from within the Azure Synapse Studio.

Azure Purview captures the lineage relationships between data assets all the way from raw data to business insights. These relationships are captured automatically and kept up to date with turnkey integrations of Azure Purview with SQL Server, Power BI, Azure SQL, Azure Data Lake and many more. 

Purview’s automated data classification uses more than 200 prebuilt and custom classifiers to detect sensitive data types such as business terms, government IDs, names, location data and more.

Azure Purview’s integration with Microsoft Information Protection ensures that sensitivity labels defined in the Microsoft 365 Compliance Center can be applied

Prerequisites

Azure Subscription
Within your Azure Subscription, you will need administrative access permissions and the ability to create resources. The administrative access is required because you will have to register some Resource Providers if they do not already exist. Those resource providers are:

  • Microsoft Purview
  • Microsoft Storage
  • Microsoft EventHub
  • Azure Key Vault

Azure Purview Roles

In order to scan your data sources, one or more security principals need to be added to one of the predefined Data Plane roles: Purview Data Reader, Purview Data Curator, or Purview Data Source Administrator.

  • Azure Purview roles support individual users, Azure Active Directory Groups, and Service Principals.
  • By default, the creator of the Azure Purview Account will be treated as if they are in both the Purview Data Curator and Purview Data Source Administrator roles.
RoleActivities
Purview
Data Reader
Access to Purview Portal
Read all content except scanning
Purview
Data Curator

Access to Purview Portal
Read all content except scan bindings
Edit Asset information
Edit Classification definitions
Edit Glossary terms
Assign Classification definitions
Assign Glossary terms

Purview
Data Source Administrator

No Access to Purview Portal
Manage scan bindings information only
No access to non-scan related content

Azure Purview Data Map

Azure Purview Data Map creates Data Catalog and Data Insights as unified experience within the Purview Studio.

Azure Purview Data Map is the foundation for data discovery and data governance.

It’s a cloud-native PaaS service that captures metadata about enterprise data in analytics and operation systems on-premise and cloud.

Data Map extracts metadata, lineage, and classifications from existing data stores.

Purview Data Map is automatically kept up to date with a built-in automated scanning and classification system. Users can configure and use the Purview Data Map through an intuitive UI, and developers can programmatically interact with the Data Map using open-source Apache Atlas 2.0 APIs.

Label-sensitive data feature is supported consistently across the database servers, Azure, Microsoft 365, and Power BI

Scanning and Classification Engine

Azure Purview can scan all your data sources. While scanning, built-in custom classifiers can identify the type of data existing in your sources and provide the right classification to it, making it easy to quickly find specific types of data, including sensitive data.

Azure Purview enable users to easily search and find data assets, making use of familiar key terms.

  • Scan your Power BI environment and Azure Synapse Analytics workspaces with a few clicks and automatically publish all assets and lineage to the Purview Data Map
  • Connect Azure Purview to Azure Data Factory instances to automatically collect data integration lineage

Azure Purview Data Catalog

Data Catalog, which stores the meta data about your data sources in a searchable format for end users

  • Depending on a level of Catalog we can get following:
    • Business glossary
    • Lineage visualization
    • Catalog insights
    •  Sensitive data identification insights.

Business Glossary

  • Purview Data Catalog enables rich data discovery with the luxury of searching business and technical terms by browsing associated technical, business, semantic, and operational metadata
  • Purview supports the following out-of-the-box attributes for any glossary term:
  • Name
  • Definition
  • Data Stewards
  • Data Experts
  • Acronym
  • Synonyms
  • Related Terms
  • We can also add our own custom attributes

Data Lineage Visualization

Data lineage is important, Purview takes all pieces and parts and creates a visual to allow users to better and more quickly, understand the data lineage from raw data staged in different platforms, to the transformations performed in data transformation tools, to data visualizations in your reports.

Purview is capable to do this by capturing all the metadata about the data sources and transformation tools at the highest available degree of granularity.

Data catalog, along with information on the data source and interactive data lineage visualization, empowers data scientists, engineers, and business analysts with business context to drive Business Intelligence, analytics, Artificial Intelligence, and Machine Learning Initiatives.

Purview Data Insights

Using Purview Data Insights, data officers and security officers can get a bird’s eye view and, immediately, understand what Data is actively scanned, where sensitive data is, and how it moves.

Purview makes it possible through Catalog Insights. Asset Insights gives us better understanding of what types of assets we have and how they are distributed across our data landscape.

The data governance component provides users a bird’s-eye view of your organization’s data landscape, by quickly determining which analytics and reports are stored. It enables stakeholders to maintain and use an organization’s data efficiently if it exists already or not. This view allows you to get crucial insights such as data distribution across environments, how data is being moved, and where sensitive data is stored.

Microsoft included some great visualizations to help enhance the Data Catalog to provide a “single pane of glass” view of the data.

We can see their distribution by source type, size, and classification.

  • Scan Insights provide administrators with the tools to understand the overall health of the scans that are performed.
  • Glossary Insights provide business users valuable information about what areas are being assigned terms and what areas may need more attention.
  • Classification Insights show you where your classified data lives, enabling security administrators to do their jobs more effectively and efficiently.
  • File Extension Insights show you how many different file extensions are found during scans. This is very handy in identifying data that is not under IT department control.

Data Sources Supported by Azure Purview (at the time of writing)

  • SQL Server on-premises
  • Azure Data Lake Storage Gen1
  • Azure Data Lake Storage Gen2
  • Azure Blob Storage
  • Azure Data Explorer
  • Azure SQL DB
  • Azure SQL DB Managed Instance
  • Azure Synapse Analytics (formerly SQL DW) Azure Cosmos DB
  • Power BI
  • Teradata
  • ERP sources like SAP S/4 HANA and SAP ECC.
  • Oracle DB as a data source
  • Amazon S3, Azure Purview customers can now scan and classify data residing in Amazon AWS S3 with the help of automated scanning, AI-powered built-in and custom classifiers, and Microsoft Information Protection sensitivity labels.

How Azure Purview help treat data as a strategic asset

  • Data Catalog
  • Business Glossary
  • Unified Roadmap
  • Quality Control
  • Inventory
  • Enable Semantic Search Options
  • Security Compliance
  • Up To Date Information About Data in Motion

Availability

All these features can be accessed via the Azure Purview Studio in any of the web browsers:

  • Microsoft Edge
  • Safari (latest version, Mac only)
  • Chrome (latest version)
  • Firefox (latest version)

No additional applications need to be installed.

Azure Purview is Generally Available as of 09.28.2021. Microsoft announced some incredible adoption figures – 57 billion assets across 2,300 customers! Azure adoption has been massive, and it continues to accelerate.

Leave a Comment