View Course Agenda

Big Data for Developers

Instructor Led | Big Data | 3 Days | Version 10.2.2

Big Data for Developers

Course Overview

This course is applicable for software version 10.2.2. Learn to accelerate Big Data Integration through mass ingestion, incremental loads, transformations, processing of complex files, and integrating data science using Python. Optimize the Big Data system performance through monitoring, troubleshooting, and best practices while gaining an understanding of how to reuse application logic for big data use cases.

Enroll Now 

Objectives

After successfully completing this course, students should be able to:

  • Mass ingest data to Hive and HDFS
  • Perform incremental loads in mass ingestion
  • Integrate with relational databases using SQOOP
  • Perform transformations across various engines
  • Execute a mapping using JDBC in Spark mode
  • Perform stateful computing and windowing
  • Process complex files
  • Execute dynamic mappings
  • Monitor logs and troubleshoot
  • Monitor logs using REST operations hub
  • Tune performances of Spark jobs
  • Create and interpret PowerCenter Reuse Reports
  • Import PowerCenter mapping to the Developer tool
  • Modify the imported mapping for it to be Hadoop-ready
  • Guidelines and limitations of importing from PC

Target Audience

  • Developer

Prerequisites

Agenda

Module 1: Informatica Big Data Management Overview

  • Big Data concepts
  • Big Data Management features
  • Benefits of Big Data Management
  • Big Data Management architecture
  • Big Data Management developer tasks

Module 2: Ingestion and Extraction

  • Integrating BDM with Hadoop cluster
  • Application services of BDM 10.2.2
  • Hadoop file systems
  • Ingest data to HDFS and Hive using SQOOP
  • Mass ingestion to HDFS and Hive – initial load
  • Mass ingestion to HDFS and Hive - incremental load
  • Lab: Ingesting data from Oracle (SQOOP) to HDFS
  • Lab: Ingesting data from Oracle (SQOOP) to Hive
  • Lab: Creating mapping specifications using a mass ingestion service 

Module 3: Big Data Engine Strategy

  • BDM engine strategy
  • Hive engine architecture
  • MapReduce
  • Tez
  • Spark architecture
  • Blaze architecture
  • Basic BDM transformations
  • Lab: Executing a mapping using different BDM transformations in Spark mode

Module 4: Big Data Development Process

  • Advanced transformations in BDM – python and update strategy
  • Hive ACID use case
  • Stateful computing and windowing
  • Lab: Python transformation
  • Lab: Update strategy and JDBC support
  • Lab: Performing Hive upserts
  • Lab: Windowing function LAG
  • Lab: Windowing function LEAD

Module 5: Complex Data Processing

  • Big data file formats – AVRO, Parquet, JSON
  • Complex file data types – structs, arrays, maps
  • Dynamic mapping
  • Dynamic expression support
  • Lab: Convert flat file data object to an AVRO file
  • Lab: Write a JSON file to Parquet file format
  • Lab: Use complex data types arrays, structs, and maps in a mapping
  • Lab: Build dynamic mappings using dynamic ports
  • Lab: Build dynamic mappings using input rules

Module 6: Monitoring Logs and Troubleshooting

  • REST operations hub
  • Spark monitoring
  • Blaze monitoring
  • Viewing logs
  • Troubleshooting
  • Lab: Monitor mappings using REST operations hub
  • Lab: Troubleshooting- Avro Unsupported Error
  • Lab: Troubleshooting- Decimal High Precision Error
  • Lab: Troubleshooting- Dynamic Rules Error
  • Lab: Troubleshooting- Python Decimal Error

Module 7: Performance Tuning and Best Practices

  • Native Vs Hadoop mode of execution
  • Tune performance of Spark jobs
  • Tune performance of Blaze jobs
  • List some best practices while working with BDM

Module 8: Introduction to PowerCenter Reuse

  • Overview of Data Integration Solutions
  • Transitioning from PowerCenter to Big Data Ecosystem
  • Steps to migrating to BDM
  • Native vs Hadoop mode
  • Transformations on Hadoop mode

Module 9: PowerCenter Classic Reuse Report

  • Export PC mappings through CLI
  • PC Reuse Report Formats
  • Considerations in using the PC Reuse Utility
  • Lab: Assess the PowerCenter mapping to execute on BDM

Module 10: BDM Optimization

  • PowerCenter Classic Import
  • Mapping Validation
  • Lab: Import the mapping from PowerCenter 10.2.0 to BDM
  • Lab: Mapping validation
 
Enroll Now 

Back to Course Overview


QUESTIONS?

Instructor Led | Big Data | 3 Days | Version 10.2.2

Print Friendly and PDF