Data Engineering Integration: Administration

onDemand | Data Engineering | Self-Paced | Version 10.5

Course Overview

Set up a live DEI environment by performing various administrative tasks such as Hadoop integration, Databricks integration, security mechanism set up, Data Engineering recovery, monitoring, and performance tuning. Learn to integrate the Informatica domain with the Hadoop and Databricks eco-system leveraging Hadoop’s lightning processing capability, and Databricks’ analytics cloud platform technology to churn huge data sets. Applicable for users of software version 10.5. 

Objectives

After successfully completing this course, students should be able to:

  • Describe DEI Architecture
  • List DEI Components
  • List the steps to enable SAML on the Domain
  • Create Cluster Configuration Object for Hadoop integration
  • Set up Informatica Security that includes different Authentication and Authorization mechanisms
  • Tune the performance of the system
  • Monitor, view, and troubleshoot DEI logs

Target Audience

  • Administrators

Prerequisites

  • None
Agenda
Module 1: Data Engineering Integration (DEI) Overview
  • Data Engineering and the role of DEI in the Big Data ecosystem
  • DEI Components
  • DEI architecture
  • Roles and responsibilities of Informatica DEI Administrator
  • DEI engines: Blaze, Spark, and Databricks
  • DEI 10.5 features
Module 2: SAML Authentication
  • SAML overview
  • SAML authentication in a domain
  • Steps to enable SAML on an existing Informatica domain
Module 3: Hadoop Integration
  • Cluster Integration overview
  • Data Engineering Integration Component Architecture
  • Prerequisites for Hadoop integration
  • HDP integration tasks
  • Create a Cluster Configuration
  • Integration with Hadoop
  • Lab: Create Cluster Configuration Object
  • Lab: Explore Cluster Configuration Views
  • Lab: Cluster Configuration Privileges and Permissions
Module 4: Security Overview
  • DEI security
  • Security aspects
  • Authentication overview
  • Authorization overview
Module 5: Kerberos Authentication and Ranger Authorization
  • Kerberos Authentication
  • Ranger Authorization
  • Pre-steps to run mappings in a Kerberos-Enabled Hadoop Environment
  • Run mappings on a cluster with Kerberos authentication and Ranger authorization
  • Lab: Execute Pre-steps for Running Mappings in a Kerberos-Enabled Hadoop Environment
  • Lab: Run Mappings in a Kerberos-Enabled Hadoop Environment
 Module 6: Operating System Profiles
  • Operating System profiles for Data Integration Service
  • Operating System profile components
  • Configure system permissions for the Operating System profile users
  • Enable the Data Integration Service to use Operating System profiles
  • Execute a mapping using OS profiles
  • Lab: Execute a mapping using OS profiles
 
Module 7: HDFS and Fine-Grained Authorization
  • Authorization
  • HDFS permissions
  • Fine-Grained authorization
  • Lab: Access Directories with HDFS Permissions
  • Lab: Run a Mapping with HDFS Permissions
  • Lab: Restrict Ranger Permissions for Hive Tables and Columns
  • Lab: Run a Mapping with Fine-Grained Authorization
Module 8: Data Engineering Recovery
  • DIS processing overview
  • DIS Queuing
  • Execution Pools
  • Data Engineering recovery
  • Monitor recovered jobs
  • Lab: Recover DIS and execute a Mapping using Data Engineering Recovery
Module 9: DEI Performance Tuning
  • DEI Deployment types
  • Sizing recommendations
  • Hadoop cluster Hardware tuning
  • Tune Spark performance
  • infacmd autotune command
  • Lab: Tune DIS and MRS using infacmd Autotune command
Module 10: Monitoring and Troubleshooting
  • Hadoop Environment Logs
  • Spark Engine Monitoring
  • Blaze Engine Monitoring
  • Cloud File Management Utility
  • Log Aggregation
  • Log Packer
  • File Watcher
  • Customer pain points and solutions
Module 11: Databricks Overview
  • Databricks overview
  • Steps to configure Databricks
  • Databricks clusters
  • Notebooks, Jobs, and Data
  • Delta Lakes
  • Sequence generator for Databricks
  • Databricks warm pool
Module 12: Databricks Integration
  • Databricks Integration
  • Components of the Informatica and the Databricks environments
  • Run-time process on the Databricks Spark Engine
  • Databricks Integration Task Flow
  • Prerequisites for Databricks integration


Enroll Now

Back to Course Overview

Power User Axon for Community Users (Instructor Led or onDemand) Axon Content Curation (Instructor Led) Axon for Power Users (Instructor Led) Axon Data Governance (Professional Certification) Axon Data Governance (Professional Certification) Axon Data Governance (Professional Certification) Some more content to make this bigger asdf asdf asdf

Informatica offers programs to extend learning in convenient and economic packages. Programs include self-paced subscriptions as well as bundled instructor led training and certifications. Each program is curated around a specific skillset to enable customer success.

365University Data Governance Annual Subscription

Informatica MasterPass Education Subscription

Informatica Learning Library

Data Governance & Privacy Journey Master

View Full Course Offerings