Data Engineering Integration for Developers

Instructor Led | Data Engineering | 3 Days | Version 10.5

Data Engineering Integration for Developers

Course Overview

This course is applicable for software version 10.5. Learn to accelerate Data Engineering Integration through mass ingestion, incremental loads, transformations, processing of complex files, creating dynamic mappings, and integrating data science using Python. Optimize the Data Engineering system performance through monitoring, troubleshooting, and best practices while gaining an understanding of how to reuse application logic for Data Engineering use cases.

Objectives

After successfully completing this course, students should be able to:

Mass ingest data to Hive and HDFS
Perform incremental loads in Mass Ingestion
Perform initial and incremental loads
Integrate with relational databases using SQOOP
Perform transformations across various engines
Execute a mapping using JDBC in Spark mode
Perform stateful computing and windowing
Process complex files
Parse hierarchical data on Spark engine
Run profiles and choose sampling options on Spark engine
Execute Dynamic Mappings
Create Audits on Mappings
Monitor logs using REST Operations Hub
Monitor logs using Log Aggregation and troubleshoot
Run mappings in Databricks environment
Create mappings to access Delta Lake tables
Tune performances of Spark and Databricks jobs

Target Audience

Developer

Prerequisites

Informatica Developer Tool for Big Data Developers (Instructor Led OR onDemand)

Agenda
Module 1: Informatica Data Engineering Management Overview Data Engineering concepts Data Engineering Management features Benefits of Data Engineering Management Data Engineering Management architecture Data Engineering Management developer tasks Data Engineering Integration 10.4 new features Module 2: Ingestion and Extraction in Hadoop Integrating DEI with Hadoop cluster Hadoop file systems Data Ingestion to HDFS and Hive using SQOOP Mass Ingestion to HDFS and Hive – Initial load Mass Ingestion to HDFS and Hive - Incremental load Lab: Configure SQOOP for Processing Data Between Oracle (SQOOP) to HDFS Lab: Configure SQOOP for processing data between an Oracle database and Hive Lab: Creating Mapping Specifications using Mass Ingestion Service Module 3: Native and Hadoop Engine Strategy Data Engineering Integration engine strategy Hive Engine architecture MapReduce Tez Spark architecture Blaze architecture Lab: Executing a mapping in Spark mode Lab: Connecting to a Deployed Application Module 4: Data Engineering Development Process Advanced Transformations in Data Engineering Integration Python and Update Strategy Hive ACID Use Case Stateful Computing and Windowing Lab: Creating a Reusable Python Transformation Lab: Creating an Active Python Transformation Lab: Performing Hive Upserts Lab: Using Windowing Function LEAD Lab: Using Windowing Function LAG Lab: Creating a Macro Transformation Module 5: Complex File Processing Data Engineering file formats – Avro, Parquet, JSON Complex file data types – Structs, Arrays, Maps Complex Configuration, Operators and Functions Lab: Converting Flat File data object to an Avro file Lab: Using complex data types - Arrays, Structs, and Maps in a mapping	Module 6: Hierarchical Data Processing Hierarchical Data Processing Flatten Hierarchical Data Dynamic Flattening with Schema Changes Hierarchical Data Processing with Schema Changes Complex Configuration, Operators and Functions Dynamic Ports Dynamic Input Rules Lab: Flattening a complex port in a Mapping Lab: Building dynamic mappings using dynamic ports Lab: Building dynamic mappings using input rules Lab: Performing Dynamic Flattening of complex ports Lab: Parsing Hierarchical Data on the Spark Engine Module 7: Mapping Optimization and Performance Tuning Validation Environments Execution Environment Mapping Optimization Mapping Recommendations and Insight Scheduling, Queuing, and Node Labeling Mapping Audits Lab: Implementing Recommendation Lab: Implementing Insight Lab: Implementing Mapping Audits Module 8: Monitoring Logs and Troubleshooting in Hadoop Hadoop Environment Logs Spark Engine Monitoring Blaze Engine Monitoring REST Operations Hub Log Aggregator Troubleshooting Lab: Monitoring Mappings using REST Operations Hub Lab: Viewing and analyzing logs using Log Aggregator Module 9: Intelligent Structure Model Intelligent Structure Discovery Overview Intelligent Structure Model Lab: Use an Intelligent Structure Model in a Mapping Module 10: Databricks Overview Databricks overview Steps to configure Databricks Databricks clusters Notebooks, Jobs, and Data Delta Lakes Module 11: Databricks Integration Databricks Integration Components of the Informatica and the Databricks environments Run-time process on the Databricks Spark Engine Databricks Integration Task Flow Pre-requisites for Databricks integration Cluster Workflows Demo: Set up Databricks connection Demo: Run a mapping with Databricks Spark engine


Back to Course Overview

QUESTIONS?

Work Email:

First Name:

Last Name:

Company:

Phone:

Country:

Yes, I would like to receive communications from Informatica about products, solutions and events.

Lead Source:

Lead Source Campaign:

Employee Range:

Revenue Range: