View Course Agenda

Big Data for Developers

Instructor Led | Big Data | 2 Days | Version 10.2.1

Big Data for Developers

Course Overview

This course is applicable to users of Big Data version 10.2.1 and forward. Learn to accelerate Big Data Integration through mass ingestion, transformations, and processing of complex files. Optimize the Big Data system performance through monitoring, troubleshooting, and best practices.

Enroll Now 

Objectives

After successfully completing this course, students should be able to:

  • Mass ingest data to Hive and HDFS
  • Integrate with relational databases using SQOOP
  • Perform transformations across various engines
  • Perform initial load
  • Perform stateful computing and windowing
  • Process complex files
  • Monitor logs and troubleshoot
  • Tune performances of Spark and Blaze jobs

Target Audience

  • Architect
  • Developer

Prerequisites

Agenda

Module 1: Informatica Big Data Management Overview

  • Big Data concepts
  • Big Data Management features
  • Benefits of Big Data Management
  • Big Data Management architecture
  • Big Data Management developer tasks

Module 2: Ingestion and Extraction

  • Application Services of BDM 10.2.1
    • Metadata Access Service
    • Mass Ingestion Service
  • Hadoop file systems
  • Hive
  • Mass Ingestion
  • Mass Ingestion architecture
  • Mass Ingestion process
  • Mass Ingestion tool user interface
  • Mass ingestion to HDFS
  • Mass ingestion to Hive
  • Integrate with relational databases using SQOOP
  • SQOOP architecture
  • SQOOP optimizations
  • SQOOP for Teradata

Module 3: Big Data Engine Strategy

  • BDM engine strategy
  • Hive Engine architecture
  • MapReduce
  • Tez
  • Spark architecture
  • Blaze architecture
  • Transformations in the Hadoop environment
    • Expression Transformation
    • Filter Transformation
    • Lookup Transformation
    • Python Transformation
    • Router Transformation
    • Update Strategy Transformation

Module 4: Big Data Development Process

  • Initial load
  • Dynamic mapping
  • Stateful computing and windowing
  • Data science integration using Python

Module 5:Complex File Processing

  • Big data file formats – Avro, Parquet, JSON
  • Complex file data objects
  • Complex file data types – Structs, Arrays, Maps

Module 6: Monitoring and Troubleshooting

  • Spark Monitoring
  • Blaze Monitoring
  • Viewing logs
  • Troubleshooting

Module 7: Performance Tuning and Best Practices

  • Native Vs Hadoop Mode of execution
  • Tune performance of Spark jobs
  • Tune performance of Blaze jobs
  • List some best practices while working with BDM
 
Enroll Now 

Back to Course Overview


QUESTIONS?

Instructor Led | Big Data | 2 Days | Version 10.2.1

Print Friendly and PDF