Data Quality:

Data Engineering Quality for Developers

Instructor Led | Data Quality | 4 Days | 10.5.1

Course Overview

Gain the skills and knowledge necessary to implement and automate a data quality assurance process using the Informatica Data Quality platform, with profile and mapping execution on a Hadoop Spark environment. In addition to learning how to cleanse, standardize, and enhance data, students will learn to test and troubleshoot their Data Quality solutions. This course is applicable to version 10.5.1.

Objectives

After successfully completing this course, students should be able to:

  • Describe the overall Data Quality Management Process
  • Illustrate and discuss Data Engineering Quality Architecture and Runtime Environments
  • Describe how to run mappings on the Spark and Blaze Engines
  • Differentiate between the Analyst and Developer Roles and Tools
  • Navigate the Developer tool and collaborate on projects with team members
  • Perform Column, Rule, Multi object, Comparative, and Mid-Stream Profiling
  • Manage Reference Tables
  • Develop standardization, cleansing, and parsing Mappings and Mapplets
  • Identify duplicate records using Classic Data Matching
  • Create and execute Workflows to populate user inboxes with Exception and Duplicate record tasks
  • Describe the deployment options that are available when executing Mappings outside of Informatica Developer
  • Troubleshoot issues that may appear during development

Target Audience

  • Developer
  • Data Analyst
  • Data Scientist
  • Data Steward

Prerequisites

  • None
Agenda
Module 1: Course Introduction
  • Course introduction and objectives
  • Course agenda
Module 2: Data Quality Projects and Solution Architecture
  • Data Engineering Quality Architecture and Runtime Environments
  • Runtime Environments; Spark and Blaze
  • Dimensions of Data Quality
  • Data Quality Projects
  • Reporting and Gating Architectures
Module 3: Data Quality Process Overview
  • Data Quality Management
  • Dimensions of Data Quality
  • Developer and Analyst roles and tools
  • Data Quality Processes
  • Profiling, Standardization, Matching and Consolidation
Module 4: Analyst Collaboration and Reference Table Management
  • Developer GUI
  • Repository, Projects, and Folders
  • Analyst Collaboration
  • Scorecards
  • Reference Table Management
  • Lab: Project Collaboration
  • Lab: Review the Content project
  • Lab: Create Reference Tables
Module 5: Working in the Developer Tool
  • Repository, Projects, and Folders
  • Data Objects
  • Mappings and Mapplets
  • Mapplets and Rules
  • Content Sets
  • Transformations
  • Preview Data and Mapping Execution
  • Lab: Create a project in the Developer tool
  • Lab: Connect to an Oracle table
  • Lab: Import a Flat File data object
  • Lab: Create a logical data object using core transformations and a prebuilt mapplet
Module 6: Profiling and Creating Mapplets and Rules
  • Developer Profiling
  • Profiling Rules
  • Column Profiling options and Runtime Settings
  • Profiling Results
  • Mid-Stream Profiling
  • Comparative Profiles
  • The Analyst Tool
  • Mapplets and Rules in Profiles and Scorecards
  • Lab: Profile the new Logical Data Object
  • Lab: Create a rule to measure the accuracy of data in the Company column
  • Lab: Apply the rule to the profile in the Developer tool
  • Lab: Update Scorecards in Informatica Analyst
  • Lab: Measure the format of the Addr4 field
Module 7: Cleansing, Standardizing, and Enhancing Data
  • Standardize, cleanse, and enhance data
  • Standardization transformations
  • Configure Standardization transformations
  • Standardization mapplets
  • Lab: Review the Profile_LDO_ALLCUSTOMERS object
  • Lab: Build a mapping to cleanse, standardize, and enhance data
  • Lab: Build a standardization mapplet to cleanse a numeric field
Module 8: Parsing Data
  • Parsing Functions
  • The Parser Transformation
  • Token and Pattern based parsing
  • Parsing Outputs
  • Probabilistic Parsing
  • Lab: Parse the contact using the Token Parser
  • Lab: Use mid-stream profiling to update Reference Tables
  • Lab: Parse the contact using Pattern based parsing
  • Lab: Build a Mapplet to enhance and standardize the contact
  • Lab: Complete the standardization mapping

Module 9: Matching Data
  • The Data Quality Matching Process
  • Grouping and grouping strategies
  • The Key Generator Transformation
  • Key creation strategies
  • Mid-Stream Profiling for group analysis
  • Matching Data
  • Matching Strategies
  • Match Outputs
  • Lab: Perform Grouping and Classic Matching on Customer data
Module 10: Managing Exception and Duplicate Records
  • The Exception Management Process
  • The Exception Management Process running Natively
  • The Exception Management Process running on Hadoop
  • Data Quality Workflows
  • The Exception Transformation
  • Lab: Identify and gate bad records
  • Lab: Use the Exception transformation to create and populate the bad record management table
  • Lab: Use the Exception transformation to create and populate the duplicate record management table
Module 11: Managing and Deploying Workflows
  • Workflows and workflow tasks
  • Workflow Objects
  • Building Workflows and managing Tasks
  • Human Tasks and Steps
  • Deploying Workflows and Applications
  • Monitoring Application Execution
  • Lab: Build the Workflow to run the Bad Record/Exception Mapping and assign Tasks to users in Informatica Analyst
  • Lab: Deploy, Execute and Monitor the workflow
  • Lab: Build a workflow to run the Duplicate Records Mapping and assign Tasks to users in Informatica Analyst
  • Lab: Deploy, Execute and Monitor the workflow
  • Lab: Verify that the Workflows ran correctly by checking that Tasks were generated in Informatica Analyst
Module 12: Deploying: Executing Mappings outside of the Developer tool
  • Running DQ Mappings in PowerCenter
  • Export Compatibility and Options
  • Schedule mappings, profiles, and scorecards
  • Monitor Application Execution
  • Lab: Running DQ mappings in a standalone environment
Module 13: Importing and Exporting Project Objects
  • Object Import Overview
  • Basic Import Wizard
  • Advanced Import Wizard
  • Dependency Resolution Options
  • Project Export
  • Lab: Import a new project
  • Lab: Import a project using the Advanced Method
  • Lab: Export a project
Module 14: Troubleshooting
  • Sample DQ Errors
  • Troubleshooting Tips
  • Administrator Logs and Monitor Jobs
  • Troubleshooting System and Mapping Issues
  • Workflow configuration issues
  • Lab (optional): Troubleshooting issues


Enroll Now

Back to Course Overview

Power User Axon for Community Users (Instructor Led or onDemand) Axon Content Curation (Instructor Led) Axon for Power Users (Instructor Led) Axon Data Governance (Professional Certification) Axon Data Governance (Professional Certification) Axon Data Governance (Professional Certification) Some more content to make this bigger asdf asdf asdf

Informatica offers programs to extend learning in convenient and economic packages. Programs include self-paced subscriptions as well as bundled instructor led training and certifications. Each program is curated around a specific skillset to enable customer success.

365University Data Governance Annual Subscription

Informatica MasterPass Education Subscription

Informatica Learning Library

Data Governance & Privacy Journey Master

View Full Course Offerings