Apache Iceberg Fundamentals Training Course
Apache Iceberg is an open-source table format for large-scale data sets that brings the reliability and simplicity of SQL tables to big data. It was designed to solve the challenges of managing big data in data lakes, which often involve handling complex schemas, large files, and diverse data sources.
This instructor-led, live training (online or onsite) is aimed at beginner-level data professionals who wish to acquire the knowledge and skills necessary to effectively utilize Apache Iceberg for managing large-scale datasets, ensuring data integrity, and optimizing data processing workflows.
By the end of this training, participants will be able to:
- Gain a thorough understanding of Apache Iceberg's architecture, features, and benefits.
- Learn about table formats, partitioning, schema evolution, and time travel capabilities.
- Install and configure Apache Iceberg in different environments.
- Create, manage, and manipulate of Iceberg tables.
- Understand the process of migrating data from other table formats to Iceberg.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Course Outline
Introduction to Apache Iceberg
- Overview of Apache Iceberg
- Importance and use cases in modern data architecture
- Key features and benefits
Core Concepts
- Iceberg table format and architecture
- Comparison with other table formats
- Partitioning and schema evolution
- Time travel and data versioning
Setting Up Apache Iceberg
- Installation and configuration
- Integrating Iceberg with various data processing engines
- Setting up an Iceberg environment on a local machine
Basic Operations
- Creating and managing Iceberg tables
- Writing to and reading from Iceberg tables
- Basic CRUD operations
Data Migration and Integration
- Migrating data from Hive and other systems to Iceberg
- Integration with BI tools
- Migrating a sample dataset to Iceberg
Optimizing Performance
- Performance tuning techniques
- Optimizing queries and data scans
- Performance optimization in Iceberg
Overview of Advanced Features
- Partition evolution and hidden partitioning
- Table evolution and schema changes
- Time travel and rollback features
- Implementing advanced features in Iceberg
Summary and Next Steps
Requirements
- Familiarity with concepts such as tables, schemas, partitions, and data ingestion
- Basic knowledge of SQL
Audience
- Data engineers
- Data architects
- Data analysts
- Software developers
Open Training Courses require 5+ participants.
Apache Iceberg Fundamentals Training Course - Booking
Apache Iceberg Fundamentals Training Course - Enquiry
Apache Iceberg Fundamentals - Consultancy Enquiry
Consultancy Enquiry
Testimonials (4)
Trainer had good grasp of concepts
Josheel - Verizon Connect
Course - Amazon Redshift
What I liked most was the trainer's mastery of the subject, his patience and clarity when explaining the concepts, and especially his constant willingness to answer all the questions that arose. It was a really enriching and very enjoyable learning experience.
Patricio Condado - SOKODB
Machine Translated
analytical functions
khusboo dassani - Tech Northwest Skillnet
Course - SQL Advanced
how the trainor shows his knowledge in the subject he's teachign
john ernesto ii fernandez - Philippine AXA Life Insurance Corporation
Course - Data Vault: Building a Scalable Data Warehouse
Upcoming Courses
Related Courses
SQL Advanced
14 HoursThis instructor-led, live training in Argentina (online or onsite) is aimed at intermediate-level database administrators, developers, and analysts who wish to master advanced SQL functionalities for complex data operations and database management.
By the end of this training, participants will be able to:
- Perform advanced querying techniques using unions, subqueries, and complex joins.
- Add, update, and delete data, tables, views, and indexes with precision.
- Ensure data integrity through transactions and manipulate database structures.
- Create and manage databases efficiently for robust data storage and retrieval.
Amazon Redshift
21 HoursAmazon Redshift is a petabyte-scale cloud-based data warehouse service in AWS.
In this instructor-led, live training, participants will learn the fundamentals of Amazon Redshift.
By the end of this training, participants will be able to:
- Install and configure Amazon Redshift
- Load, configure, deploy, query, and visualize data with Amazon Redshift
Audience
- Developers
- IT Professionals
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Note
- To request a customized training for this course, please contact us to arrange.
Advanced Apache Iceberg
21 HoursThis instructor-led, live training in Argentina (online or onsite) is aimed at advanced-level data professionals who wish to optimize data processing workflows, ensure data integrity, and implement robust data lakehouse solutions that can handle the complexities of modern big data applications.
By the end of this training, participants will be able to:
- Gain an in-depth understanding of Iceberg’s architecture, including metadata management and file layout.
- Configure Iceberg for optimal performance in various environments and integrate it with multiple data processing engines.
- Manage large-scale Iceberg tables, perform complex schema changes, and handle partition evolution.
- Master techniques to optimize query performance and data scan efficiency for large datasets.
- Implement mechanisms to ensure data consistency, manage transactional guarantees, and handle failures in distributed environments.
Big Data Consulting
21 HoursThis instructor-led, live training in Argentina (online or onsite) is aimed at intermediate-level IT professionals who wish to enhance their skills in data architecture, governance, cloud computing, and big data technologies to effectively manage and analyze large datasets for data migration within their organizations.
By the end of this training, participants will be able to:
- Understand the foundational concepts and components of various data architectures.
- Gain a comprehensive understanding of data governance principles and their importance in regulatory environments.
- Implement and manage data governance frameworks such as Dama and Togaf.
- Leverage cloud platforms for efficient data storage, processing, and management.
Big Data & Database Systems Fundamentals
14 HoursThe course is part of the Data Scientist skill set (Domain: Data and Technology).
Azure Data Lake Storage Gen2
14 HoursThis instructor-led, live training in Argentina (online or onsite) is aimed at intermediate-level data engineers who wish to learn how to use Azure Data Lake Storage Gen2 for effective data analytics solutions.
By the end of this training, participants will be able to:
- Understand the architecture and key features of Azure Data Lake Storage Gen2.
- Optimize data storage and access for cost and performance.
- Integrate Azure Data Lake Storage Gen2 with other Azure services for analytics and data processing.
- Develop solutions using the Azure Data Lake Storage Gen2 API.
- Troubleshoot common issues and optimize storage strategies.
Data Vault: Building a Scalable Data Warehouse
28 HoursIn this instructor-led, live training in Argentina, participants will learn how to build a Data Vault.
By the end of this training, participants will be able to:
- Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI.
- Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse.
- Develop a consistent and repeatable ETL (Extract, Transform, Load) process.
- Build and deploy highly scalable and repeatable warehouses.
Apache Druid for Real-Time Data Analysis
21 HoursApache Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo.
In this instructor-led, live course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment.
Format of the Course
- Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding
Greenplum Database
14 HoursThis instructor-led, live training in Argentina (online or onsite) is aimed at administrators who wish to set up Greenplum Database for business intelligence and data warehousing solutions.
By the end of this training, participants will be able to:
- Address processing needs with Greenplum.
- Perform ETL operations for data processing.
- Leverage existing query processing infrastructures.
IBM Datastage For Administrators and Developers
35 HoursThis instructor-led, live training in Argentina (online or onsite) is aimed at intermediate-level IT professionals who wish to have a comprehensive understanding of IBM DataStage from both an administrative and a development perspective, allowing them to manage and utilize this tool effectively in their respective workplaces.
By the end of this training, participants will be able to:
- Understand the core concepts of DataStage.
- Learn how to effectively install, configure, and manage DataStage environments.
- Connect to various data sources and extract data efficiently from databases, flat files, and external sources.
- Implement effective data loading techniques.