Data Quality:

Data Engineering Quality for Developers

Instructor Led | Data Quality | 4 Days | 10.5.1

Course Overview

Gain the skills and knowledge necessary to implement and automate a data quality assurance process using the Informatica Data Quality platform, with profile and mapping execution on a Hadoop Spark environment. In addition to learning how to cleanse, standardize, and enhance data, students will learn to test and troubleshoot their Data Quality solutions. This course is applicable to version 10.5.1.

Objectives

After successfully completing this course, students should be able to:

Describe the overall Data Quality Management Process
Illustrate and discuss Data Engineering Quality Architecture and Runtime Environments
Describe how to run mappings on the Spark and Blaze Engines
Differentiate between the Analyst and Developer Roles and Tools
Navigate the Developer tool and collaborate on projects with team members
Perform Column, Rule, Multi object, Comparative, and Mid-Stream Profiling
Manage Reference Tables
Develop standardization, cleansing, and parsing Mappings and Mapplets
Identify duplicate records using Classic Data Matching
Create and execute Workflows to populate user inboxes with Exception and Duplicate record tasks
Describe the deployment options that are available when executing Mappings outside of Informatica Developer
Troubleshoot issues that may appear during development

Target Audience

Developer
Data Analyst
Data Scientist
Data Steward

Prerequisites

None

Agenda

Module 1: Course Introduction

Course introduction and objectives
Course agenda

Module 2: Data Quality Projects and Solution Architecture

Data Engineering Quality Architecture and Runtime Environments
Runtime Environments; Spark and Blaze
Dimensions of Data Quality
Data Quality Projects
Reporting and Gating Architectures

Module 3: Data Quality Process Overview

Data Quality Management
Dimensions of Data Quality
Developer and Analyst roles and tools
Data Quality Processes
Profiling, Standardization, Matching and Consolidation

Module 4: Analyst Collaboration and Reference Table Management

Developer GUI
Repository, Projects, and Folders
Analyst Collaboration
Scorecards
Reference Table Management
Lab: Project Collaboration
Lab: Review the Content project
Lab: Create Reference Tables

Module 5: Working in the Developer Tool

Repository, Projects, and Folders
Data Objects
Mappings and Mapplets
Mapplets and Rules
Content Sets
Transformations
Preview Data and Mapping Execution
Lab: Create a project in the Developer tool
Lab: Connect to an Oracle table
Lab: Import a Flat File data object
Lab: Create a logical data object using core transformations and a prebuilt mapplet

Module 6: Profiling and Creating Mapplets and Rules

Developer Profiling
Profiling Rules
Column Profiling options and Runtime Settings
Profiling Results
Mid-Stream Profiling
Comparative Profiles
The Analyst Tool
Mapplets and Rules in Profiles and Scorecards
Lab: Profile the new Logical Data Object
Lab: Create a rule to measure the accuracy of data in the Company column
Lab: Apply the rule to the profile in the Developer tool
Lab: Update Scorecards in Informatica Analyst
Lab: Measure the format of the Addr4 field

Module 7: Cleansing, Standardizing, and Enhancing Data

Standardize, cleanse, and enhance data
Standardization transformations
Configure Standardization transformations
Standardization mapplets
Lab: Review the Profile_LDO_ALLCUSTOMERS object
Lab: Build a mapping to cleanse, standardize, and enhance data
Lab: Build a standardization mapplet to cleanse a numeric field

Module 8: Parsing Data

Parsing Functions
The Parser Transformation
Token and Pattern based parsing
Parsing Outputs
Probabilistic Parsing
Lab: Parse the contact using the Token Parser
Lab: Use mid-stream profiling to update Reference Tables
Lab: Parse the contact using Pattern based parsing
Lab: Build a Mapplet to enhance and standardize the contact
Lab: Complete the standardization mapping

Module 9: Matching Data

The Data Quality Matching Process
Grouping and grouping strategies
The Key Generator Transformation
Key creation strategies
Mid-Stream Profiling for group analysis
Matching Data
Matching Strategies
Match Outputs
Lab: Perform Grouping and Classic Matching on Customer data

Module 10: Managing Exception and Duplicate Records

The Exception Management Process
The Exception Management Process running Natively
The Exception Management Process running on Hadoop
Data Quality Workflows
The Exception Transformation
Lab: Identify and gate bad records
Lab: Use the Exception transformation to create and populate the bad record management table
Lab: Use the Exception transformation to create and populate the duplicate record management table

Module 11: Managing and Deploying Workflows

Workflows and workflow tasks
Workflow Objects
Building Workflows and managing Tasks
Human Tasks and Steps
Deploying Workflows and Applications
Monitoring Application Execution
Lab: Build the Workflow to run the Bad Record/Exception Mapping and assign Tasks to users in Informatica Analyst
Lab: Deploy, Execute and Monitor the workflow
Lab: Build a workflow to run the Duplicate Records Mapping and assign Tasks to users in Informatica Analyst
Lab: Deploy, Execute and Monitor the workflow
Lab: Verify that the Workflows ran correctly by checking that Tasks were generated in Informatica Analyst