View Course Agenda

Big Data for Developers

Instructor Led | Big Data | 3 Days | Version 10.2.1

Big Data for Developers

Course Overview

This course is applicable for software version 10.2.1. Learn to accelerate Big Data Integration through mass ingestion, transformations, processing of complex files, and integrating data science using Python. Optimize the Big Data system performance through monitoring, troubleshooting, and best practices while gaining an understanding of how to reuse application logic for big data use cases.

infa-university-lp-buttons.png 

Objectives

After successfully completing this course, students should be able to:

  • Mass ingest data to Hive and HDFS
  • Integrate with relational databases using SQOOP
  • Perform transformations across various engines
  • Perform initial load
  • Perform stateful computing and windowing
  • Process complex files
  • Monitor logs and troubleshoot
  • Tune performances of Spark and Blaze jobs
  • Create and interpret PowerCenter Reuse Reports
  • Import PowerCenter mapping to the Developer tool
  • Modify the imported mapping for it to be Hadoop-ready
  • Guidelines and limitations of importing from PC

Target Audience

  • Architect
  • Developer

Prerequisites

Agenda

Big Data for Developers

Module 01: Informatica Big Data Management Overview

  • Big Data concepts
  • Big Data Management features
  • Benefits of Big Data Management
  • Big Data Management architecture
  • Big Data Management developer tasks

Module 02: Big Data Basics

  • Application Services of BDM 10.2.1
    • Metadata Access Service
    • Mass Ingestion Service
  • Hadoop file systems
  • Hive
  • Mass Ingestion
  • Mass Ingestion architecture
  • Mass Ingestion process
  • Mass Ingestion tool user interface
  • Mass ingestion to HDFS
  • Mass ingestion to Hive
  • Integrate with relational databases using SQOOP
  • SQOOP architecture
  • SQOOP optimizations
  • SQOOP for Teradata

Module 03: Big Data Engine Strategy

  • BDM engine strategy
  • Hive Engine architecture
  • MapReduce
  • Tez
  • Spark architecture
  • Blaze architecture
  • Transformations in the Hadoop environment
    • Expression Transformation
    • Filter Transformation
    • Lookup Transformation
    • Python Transformation
    • Router Transformation
    • Update Strategy Transformation

Module 04: Big Data Development Process

  • Initial load
  • Dynamic mapping
  • Stateful computing and windowing
  • Data science integration using Python

Module 05: Complex File Processing

  • Big data file formats – Avro, Parquet, JSON
  • Complex file data objects
  • Complex file data types – Structs, Arrays, Maps

Module 06: Monitoring Logs and Troubleshooting

  • Spark Monitoring
  • Blaze Monitoring
  • Viewing logs
  • Troubleshooting

Module 07: Performance Tuning and Best Practices

  • Native Vs Hadoop Mode of execution
  • Tune performance of Spark jobs
  • Tune performance of Blaze jobs
  • List some best practices while working with BDM

 

PowerCenter Migration to BDM

Module 08: PowerCenter Reuse Story

  • Existing DI vs Big Data
  • What is expected to change? 
  • What shouldn’t change?

Module 09: PowerCenter Classic Reuse Reports

  • PC Reuse Report Utility 
  • PC Reuse Report Formats 
  • Execute the command to generate PowerCenter Reuse Report 
  • Interpret PowerCenter Reuse Reports 
  • PC Reuse Report – Levels

Module 10: Mapping Validation

  • Review the Mapping errors in PC Reuse Report 
  • Modify the imported Mappings for it to be Hadoop-Ready 
  • Validate the mapping in Hadoop environment
 
infa-university-lp-buttons.png 

Back to Course Overview


QUESTIONS?

Instructor Led | Big Data | 3 Days | Version 10.2.1

Print Friendly and PDF