Course Overview
Learn to accelerate Data Engineering Integration through mass ingestion, incremental loads, transformations, processing of complex files, creating dynamic mappings, and integrating data science using Python. Optimize the Data Engineering system performance through monitoring, troubleshooting, and best practices while gaining an understanding of how to reuse application logic for Data Engineering use cases.
This course is applicable for software version 10.5.
Objectives
After successfully completing this course, students should be able to:
- Mass ingest data to Hive and HDFS
- Perform incremental loads in Mass Ingestion
- Perform initial and incremental loads
- Integrate with relational databases using SQOOP
- Perform transformations across various engines
- Execute a mapping using JDBC in Spark mode
- Perform stateful computing and windowing
- Process complex files
- Parse hierarchical data on Spark engine
- Run profiles and choose sampling options on Spark engine
- Execute Dynamic Mappings
- Create Audits on Mappings
- Monitor logs using REST Operations Hub
- Monitor logs using Log Aggregation and troubleshoot
- Run mappings in Databricks environment
- Create mappings to access Delta Lake tables
- Tune performances of Spark and Databricks jobs
Target Audience
- Developer
Prerequisites
- Informatica Developer Tool for Big Data Developers (Instructor Led OR onDemand)