View Course Agenda

Big Data for Developers

Instructor Led | Big Data | 3 Days | Version 10.2.1

Big Data for Developers

Course Overview

This course is applicable for software version 10.2.1. Learn to leverage Informatica Big Data Management for the optimization of data warehousing through the offloading of data processing to Hadoop. Also discover options for the enhancement of a data warehouse through accessing NoSQL databases and complex file parsing and gain an understanding of how to reuse application logic for big data use cases.



After successfully completing this course, students should be able to:

  • Define “Big Data”
  • Identify and prioritize the offloading resource intensive Data Warehouse processes to Hadoop
  • Migrate PowerCenter mappings to Big Data Management and ingest data into Hadoop
  • Migrate and ingest data into Hadoop using SQOOP and SQL to Mapping
  • Describe the Informatica on Hadoop architecture
  • Transform data on Hadoop using Informatica polyglot computing
  • Differentiate the capabilities of the Informatica engines on Hadoop including Hive MR/Tez, Blaze, and Spark engines
  • Leverage the Informatica Smart Executor
  • Utilize Informatica and Hadoop monitoring and troubleshooting
  • Parse and transform complex data such as JSON, AVRO, and Parquet
  • Describe how Informatica parses, reads, and writes NoSQL data collections
  • Create and interpret PowerCenter Reuse Reports
  • Import PowerCenter mapping to the Developer tool
  • Modify the imported mapping for it to be Hadoop-ready
  • Guidelines and limitations of importing from PC

Target Audience

  • Architect
  • Developer



Big Data for Developers

Module 1: Informatica Big Data Management Overview

  • Big Data concepts
  • Big Data Management features
  • Benefits of Big Data Management
  • Big Data Management architecture
  • Big Data Management developer tasks

Module 2: Big Data Basics

  • Application Services of BDM 10.2.1
    • Metadata Access Service
    • Mass Ingestion Service
  • Hadoop file systems
  • Hive
  • Mass Ingestion
  • Mass Ingestion architecture
  • Mass Ingestion process
  • Mass Ingestion tool user interface
  • Mass ingestion to HDFS
  • Mass ingestion to Hive
  • Integrate with relational databases using SQOOP
  • SQOOP architecture
  • SQOOP optimizations
  • SQOOP for Teradata

Module 3: Big Data Engine Strategy

  • BDM engine strategy
  • Hive Engine architecture
  • MapReduce
  • Tez
  • Spark architecture
  • Blaze architecture
  • Transformations in the Hadoop environment
    • Expression Transformation
    • Filter Transformation
    • Lookup Transformation
    • Python Transformation
    • Router Transformation
    • Update Strategy Transformation

Module 4: Big Data Development Process

  • Initial load
  • Dynamic mapping
  • Stateful computing and windowing
  • Data science integration using Python

Module 5: Complex File Processing

  • Big data file formats – Avro, Parquet, JSON
  • Complex file data objects
  • Complex file data types – Structs, Arrays, Maps

Module 6: Monitoring Logs and Troubleshooting

  • Spark Monitoring
  • Blaze Monitoring
  • Viewing logs
  • Troubleshooting

Module 7: Performance Tuning and Best Practices

  • Native Vs Hadoop Mode of execution
  • Tune performance of Spark jobs
  • Tune performance of Blaze jobs
  • List some best practices while working with BDM


PowerCenter Migration to BDM

Module 12: PowerCenter Reuse Story

  • Existing DI vs Big Data
  • What is expected to change? 
  • What shouldn’t change?

Module 13: PowerCenter Classic Reuse Reports

  • PC Reuse Report Utility 
  • PC Reuse Report Formats 
  • Execute the command to generate PowerCenter Reuse Report 
  • Interpret PowerCenter Reuse Reports 
  • PC Reuse Report – Levels

Module 14: Mapping Validation

  • Review the Mapping errors in PC Reuse Report 
  • Modify the imported Mappings for it to be Hadoop-Ready 
  • Validate the mapping in Hadoop environment

Back to Course Overview


Instructor Led | Big Data | 3 Days | Version 10.2.1

Print Friendly and PDF