Data Engineering Integration for Developers

onDemand | Data Engineering | Self-Paced | Version 10.5

Course Overview

Learn to accelerate Data Engineering Integration through mass ingestion, incremental loads, transformations, processing of complex files, creating dynamic mappings, and integrating data science using Python. Optimize the Data Engineering system performance through monitoring, troubleshooting, and best practices while gaining an understanding of how to reuse application logic for Data Engineering use cases.
This course is applicable for software version 10.5.

Objectives

After successfully completing this course, students should be able to:

Mass ingest data to Hive and HDFS
Perform incremental loads in Mass Ingestion
Perform initial and incremental loads
Integrate with relational databases using SQOOP
Perform transformations across various engines
Execute a mapping using JDBC in Spark mode
Perform stateful computing and windowing
Process complex files
Parse hierarchical data on Spark engine
Run profiles and choose sampling options on Spark engine
Execute Dynamic Mappings
Create Audits on Mappings
Monitor logs using REST Operations Hub
Monitor logs using Log Aggregation and troubleshoot
Run mappings in Databricks environment
Create mappings to access Delta Lake tables
Tune performances of Spark and Databricks jobs

Target Audience

Developer

Prerequisites

Informatica Developer Tool for Big Data Developers (Instructor Led OR onDemand)

Agenda

Module 1: Informatica Data Engineering Management Overview

Data Engineering concepts
Data Engineering Management features
Benefits of Data Engineering Management
Data Engineering Management architecture
Data Engineering Management developer tasks
Data Engineering Integration 10.5 new features

Module 2: Ingestion and Extraction

Integrating Data Engineering Integration with Hadoop cluster
Application Services of Data Engineering Integration 10.4.0
Hadoop file systems
Ingest data to HDFS and Hive using SQOOP
Mass Ingestion to HDFS and Hive – Initial load
Mass Ingestion to HDFS and Hive - Incremental load
Lab: Ingesting data from Oracle (SQOOP) to HDFS
Lab: Ingesting data from Oracle (SQOOP) to Hive
Lab: Creating Mapping Specifications using a Mass Ingestion Service

Module 3: Native and Hadoop Engine Strategy

DEI engine strategy
Hive Engine architecture
MapReduce
Tez
Spark architecture
Blaze architecture
Lab: Executing a mapping in Spark mode
Lab: Connecting to a Deployed Application

Module 4: Data Engineering Development Process

Advanced Transformations in DEI – Python, Update Strategy, and Macro
Hive ACID Use Case
Stateful Computing and Windowing
Lab: Creating a Reusable Python Transformation
Lab: Creating an Active Python Transformation
Lab: Performing Hive Upserts
Lab: Using Windowing Function LEAD
Lab: Using Windowing Function LAG
Lab: Creating a Macro Transformation

Module 5: Complex File Processing

Data Engineering file formats – Avro, Parquet, JSON
Complex file data types – Structs, Arrays, Maps
Complex Configuration, Operators and Functions
Lab: Converting Flat File data object to an Avro file
Lab: Using complex data types - Arrays, Structs, and Maps in a mapping

Module 6: Hierarchical Data Processing

Hierarchical Data Processing
Flatten Hierarchical Data
Dynamic Flattening with Schema Changes
Hierarchical Data Processing with Schema Changes
Complex Configuration, Operators and Functions
Dynamic Ports
Dynamic Input Rules
Lab: Flattening a complex port in a Mapping
Lab: Building dynamic mappings using dynamic ports
Lab: Building dynamic mappings using input rules
Lab: Performing Dynamic Flattening of complex ports
Lab: Parsing Hierarchical Data on the Spark Engine

Module 7: Mapping Optimization and Performance Tuning

Validation Environments
Execution Environment
Mapping Optimization
Mapping Recommendations and Insight
Scheduling, Queuing, and Node Labeling
Mapping Audits
Lab: Implementing Recommendation
Lab: Implementing Insight
Lab: Implementing Mapping Audits

Module 8: Monitoring Logs and Troubleshooting in Hadoop

Hadoop Environment Logs
Spark Engine Monitoring
Blaze Engine Monitoring
REST Operations Hub
Log Aggregator
Troubleshooting
Lab: Monitoring Mappings using REST Operations Hub
Lab: Viewing and analyzing logs using Log Aggregator

Module 9: Intelligent Structure Model

Intelligent Structure Discovery Overview
Intelligent Structure Model
Lab: Use an Intelligent Structure Model in a Mapping

Module 10: Databricks Overview

Databricks overview
Steps to configure Databricks
Databricks clusters
Notebooks, Jobs, and Data
Delta Lakes

Module 11: Databricks Integration

Databricks Integration
Components of the Informatica and the Databricks environments
Run-time process on the Databricks Spark Engine
Databricks Integration Task Flow
Pre-requisites for Databricks integration
Cluster Workflows

Enroll Now

Back to Course Overview

Power User

Axon for Community Users (Instructor Led or onDemand)

Axon Content Curation (Instructor Led)

Axon for Power Users (Instructor Led)

Axon Data Governance (Professional Certification)

Axon Data Governance (Professional Certification) Some more content to make this bigger asdf asdf asdf

Informatica offers programs to extend learning in convenient and economic packages. Programs include self-paced subscriptions as well as bundled instructor led training and certifications. Each program is curated around a specific skillset to enable customer success.

365University Data Governance Annual Subscription

Informatica MasterPass Education Subscription

Informatica Learning Library

Data Governance & Privacy Journey Master

View Full Course Offerings

Need More Information?

Frequently Asked Questions

Training Terms & Conditions →