
Enjoy fast, free delivery, exclusive deals, and award-winning movies & TV shows with Prime
Try Prime
and start saving today with fast, free delivery
Amazon Prime includes:
Fast, FREE Delivery is available to Prime members. To join, select "Try Amazon Prime and start saving today with Fast, FREE Delivery" below the Add to Cart button.
Amazon Prime members enjoy:- Cardmembers earn 5% Back at Amazon.com with a Prime Credit Card.
- Unlimited Free Two-Day Delivery
- Streaming of thousands of movies and TV shows with limited ads on Prime Video.
- A Kindle book to borrow for free each month - with no due dates
- Listen to over 2 million songs and hundreds of playlists
- Unlimited photo storage with anywhere access
Important: Your credit card will NOT be charged when you start your free trial or if you cancel during the trial period. If you're happy with Amazon Prime, do nothing. At the end of the free trial, your membership will automatically upgrade to a monthly membership.

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Follow the authors
OK
Fundamentals of Data Engineering: Plan and Build Robust Data Systems 1st Edition
Purchase options and add-ons
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle.
Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology.
This book will help you:
- Get a concise overview of the entire data engineering landscape
- Assess data engineering problems using an end-to-end framework of best practices
- Cut through marketing hype when choosing data technologies, architecture, and processes
- Use the data engineering lifecycle to design and build a robust architecture
- Incorporate data governance and security across the data engineering lifecycle
- ISBN-101098108302
- ISBN-13978-1098108304
- Edition1st
- PublisherO'Reilly Media
- Publication dateJuly 26, 2022
- LanguageEnglish
- Dimensions7 x 0.91 x 9.19 inches
- Print length447 pages
Frequently bought together

Related Climate Pledge Friendly items
- Sustainability features for this product
Sustainability features
This product has sustainability features recognized by trusted certifications.Carbon impactCarbon emissions from the lifecycle of this product were measured, reduced and offset.As certified byCarbon Neutral Certified by SCS Global Services
Carbon Neutral Certified by SCS Global Services enables companies to demonstrate carbon neutrality for their products by reducing carbon emissions throughout their operations and offsetting their remaining carbon footprint through the purchase of carbon credits. SCS Global Services is a global leader in third-party environmental and sustainability verification, certification, auditing, testing, and standards development. SCS is a chartered Benefit Corporation, reflecting its commitment to socially and environmentally responsible business practices. - Sustainability features for this product
Sustainability features
This product has sustainability features recognized by trusted certifications.Carbon impactCarbon emissions from the lifecycle of this product were measured, reduced and offset.As certified byClimatePartner certified
The Climate neutral label by ClimatePartner certifies that the carbon footprint of a product was calculated and all associated emissions were offset. Additionally, ClimatePartner encourages companies to set ambitious reduction targets and reduce their products' carbon footprints. The certificate number can be entered on ClimatePartner’s website for additional information such as the supported carbon offset project(s). ClimatePartner is improving lives by helping companies tackle climate change with practical solutions.Certification Number743M8T - Sustainability features for this product
Sustainability features
This product has sustainability features recognized by trusted certifications.Carbon impactCarbon emissions from the lifecycle of this product were measured, reduced and offset.As certified byClimatePartner certified
The Climate neutral label by ClimatePartner certifies that the carbon footprint of a product was calculated and all associated emissions were offset. Additionally, ClimatePartner encourages companies to set ambitious reduction targets and reduce their products' carbon footprints. The certificate number can be entered on ClimatePartner’s website for additional information such as the supported carbon offset project(s). ClimatePartner is improving lives by helping companies tackle climate change with practical solutions.Certification NumberIT6M7H - Sustainability features for this product
Sustainability features
This product has sustainability features recognized by trusted certifications.Forestry practicesMade with materials from well-managed forests, recycled materials, and/or other controlled wood sources.As certified byThe Forest Stewardship Council
The Forest Stewardship Council (FSC) supports responsible forestry, which is a vital solution to combat climate change. Choosing FSC-certified products – whether furniture, building materials, paper, rubber, or textiles – helps protect forests, wildlife, clean water and supports the Indigenous Peoples, forest workers and communities who depend on them. Choosing FSC-certified products can also help to mitigate climate change by supporting responsible management of the world’s forests. For a better future, choose FSC. - Sustainability features for this product
Sustainability features
This product has sustainability features recognized by trusted certifications.Carbon impactCarbon emissions from the lifecycle of this product were measured, reduced and offset.As certified byCarbon Neutral Certified by SCS Global Services
Carbon Neutral Certified by SCS Global Services enables companies to demonstrate carbon neutrality for their products by reducing carbon emissions throughout their operations and offsetting their remaining carbon footprint through the purchase of carbon credits. SCS Global Services is a global leader in third-party environmental and sustainability verification, certification, auditing, testing, and standards development. SCS is a chartered Benefit Corporation, reflecting its commitment to socially and environmentally responsible business practices.
- Data architecture is the design of systems to support the evolving data needs of an enterprise, achieved by flexible and reversible decisions reached through a careful evaluation of trade-offs.Highlighted by 603 Kindle readers
- Enterprise architecture is the design of systems to support change in the enterprise, achieved by flexible and reversible decisions reached through careful evaluation of trade-offs.Highlighted by 564 Kindle readers
- Adopt true real-time streaming only after identifying a business use case that justifies the trade-offs against using batch.Highlighted by 538 Kindle readers
From the brand

-
Databases, data science & more
-
Data Science
-
Data Visualization
-
Databases
-
Streaming
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher

From the Preface
How did this book come about? The origin is deeply rooted in our journey from data science into data engineering. We often jokingly refer to ourselves as recovering data scientists. We both had the experience of being assigned to data science projects, then struggling to execute these projects due to a lack of proper foundations. Our journey into data engineering began when we undertook data engineering tasks to build foundations and infrastructure.
With the rise of data science, companies splashed out lavishly on data science talent, hoping to reap rich rewards. Very often, data scientists struggled with basic problems that their background and training did not address—data collection, data cleansing, data access, data transformation, and data infrastructure. These are problems that data engineering aims to solve.
What This Book Isn’t
Before we cover what this book is about and what you’ll get out of it, let’s quickly cover what this book isn’t. This book isn’t about data engineering using a particular tool, technology, or platform. While many excellent books approach data engineering technologies from this perspective, these books have a short shelf life. Instead, we focus on the fundamental concepts behind data engineering.
By the end of this book you will understand:
- How data engineering impacts your current role (data scientist, software engineer, or data team)
- How to cut through the marketing hype and choose the right technologies, data arch. & processes
- How to use the data engineering lifecycle to design and build a robust architecture
- Best practices for each stage of the data lifecycle
What This Book Is About
This book aims to fill a gap in current data engineering content and materials. While there’s no shortage of technical resources that address specific data engineering tools and technologies, people struggle to understand how to assemble these components into a coherent whole that applies in the real world. This book connects the dots of the end-to-end data lifecycle. It shows you how to stitch together various technologies to serve the needs of downstream data consumers such as analysts, data scientists, and machine learning engineers. This book works as a complement to O’Reilly books that cover the details of particular technologies, platforms, and programming languages.
The big idea of this book is the data engineering lifecycle: data generation, storage, ingestion, transformation, and serving. Since the dawn of data, we’ve seen the rise and fall of innumerable specific technologies and vendor products, but the data engineering lifecycle stages have remained essentially unchanged. With this framework, the reader will come away with a sound understanding for applying technologies to real-world business problems.
Our goal here is to map out principles that reach across two axes. First, we wish to distill data engineering into principles that can encompass any relevant technology. Second, we wish to present principles that will stand the test of time. We hope that these ideas reflect lessons learned across the data technology upheaval of the last twenty years and that our mental framework will remain useful for a decade or more into the future.
One thing to note: we unapologetically take a cloud-first approach. We view the cloud as a fundamentally transformative development that will endure for decades; most on-premises data systems and workloads will eventually move to cloud hosting. We assume that infrastructure and systems are ephemeral and scalable, and that data engineers will lean toward deploying managed services in the cloud. That said, most concepts in this book will translate to non-cloud environments.
Who Should Read This Book
Our primary intended audience for this book consists of technical practitioners, mid- to senior-level software engineers, data scientists, or analysts interested in moving into data engineering; or data engineers working in the guts of specific technologies, but wanting to develop a more comprehensive perspective. Our secondary target audience consists of data stakeholders who work adjacent to technical practitioners—e.g., a data team lead with a technical background overseeing a team of data engineers, or a director of data warehousing wanting to migrate from on-premises technology to a cloud-based solution.
Ideally, you’re curious and want to learn—why else would you be reading this book? You stay current with data technologies and trends by reading books and articles on data warehousing/data lakes, batch and streaming systems, orchestration, modeling, management, analysis, developments in cloud technologies, etc. This book will help you weave what you’ve read into a complete picture of data engineering across technologies and paradigms.
Editorial Reviews
About the Author
Matt Housley is a data engineering consultant and cloud specialist. After some early programming experience with Logo, Basic, and 6502 assembly, he completed a PhD in mathematics at the University of Utah. Matt then began working in data science, eventually specializing in cloud-based data engineering. He cofounded Ternary Data with Joe Reis, where he leverages his teaching experience to train future data engineers and advise teams on robust data architecture. Matt and Joe also pontificate on all things data on The Monday Morning Data Chat.
Product details
- Publisher : O'Reilly Media; 1st edition (July 26, 2022)
- Language : English
- Paperback : 447 pages
- ISBN-10 : 1098108302
- ISBN-13 : 978-1098108304
- Item Weight : 1.57 pounds
- Dimensions : 7 x 0.91 x 9.19 inches
- Best Sellers Rank: #10,026 in Books (See Top 100 in Books)
- #2 in Data Mining (Books)
- #3 in Data Processing
- #3 in Data Modeling & Design (Books)
- Customer Reviews:
About the authors
Matt Housley is a data engineering consultant, trainer and cloud specialist. After some early programming experience with Logo, Basic, and 6502 assembly, he completed a PhD in mathematics at the University of Utah. Matt then began working in data science, eventually specializing in cloud-based data engineering. He cofounded Ternary Data with Joe Reis, where he leverages his teaching experience to train future data engineers and advise teams on robust data architecture. Matt and Joe also pontificate on all things data on The Monday Morning Data Chat.
Joe Reis is a “recovering data scientist” and a business-minded data nerd who’s worked in the data industry for 20 years, with responsibilities ranging from statistical modeling, forecasting, machine learning, data engineering, data architecture, and almost everything else in between.
He co-hosts the popular data show, The Monday Morning Data Chat, teaches at the University of Utah, speaks globally, and co-authored the wildly successful O’Reilly book, Fundamentals of Data Engineering.
When he’s not busy running a company, teaching, or creating content, Joe often finds himself rock climbing or trail running in the mountains around Salt Lake City, Utah.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers say
Customers find the book informative and useful for learning about data engineering and development. They describe it as a great read that covers both basic and advanced concepts. The book provides helpful design considerations and ideas through a theme approach. Many readers consider it a good reference material that will stand the test of time. However, some customers feel the content is not engaging and has flaws in the prose and presentation.
AI-generated from the text of customer reviews
Select to learn more
Customers find the book informative and useful for data engineering. It covers both basic and advanced concepts, making it a great reference with insightful chapters. They find it an essential read for data professionals.
"...The book is wonderful and a must read for any data professional. My favorite part is that the book is tool-agnostic...." Read more
"Recommended for IT professionals and students that want lo learn about all the components of Data engineering...." Read more
"Many books in the computer science domain are informative, readable and useful...." Read more
"...There's no doubt reading this book will give you a strong framework in how to view data engineering and a serviceable exposure to many of the terms,..." Read more
Customers find the book easy to read and well-organized. It provides a good reference for most topics.
"...The book is wonderful and a must read for any data professional. My favorite part is that the book is tool-agnostic...." Read more
"Many books in the computer science domain are informative, readable and useful...." Read more
"...In summary, this is a good, necessary book that provides a great introduction to the field while also clearly being a first edition by authors who..." Read more
"...that have a base understanding of most of the topics, it is fine to read end-to-end." Read more
Customers find the book helpful for thinking through design considerations and themes. They appreciate the theme approach and find it has many ideas even if you're already working.
"...Another thing that I liked is the theme approach and that at the end of each chapter (who will you work with) they adopted an approach from dataops,..." Read more
"...work at a higher level, and will help you think through some key design considerations as you’re building, queries, data, pipelines, and systems of..." Read more
"I'm just starting to read this book and I'm already loving it, it has a lot of ideas even if you're already working in a Data Engineer job" Read more
Customers appreciate the book's sturdiness. They say it will stand the test of time and is in perfect condition.
"This is one of those data books that will stand the test of time. Joe and Matt detail timeless data principles...." Read more
"The book is in perfect condition..." Read more
"POOR condition (new book)..." Read more
Customers find the book's content lacking. They mention it has flaws in the prose and content, and that it can arrive in poor condition. The book is also criticized for being too long and not engaging.
"I found this book unnecessarily long and not very engaging in its presentation...." Read more
"...buying this book, even you choose new status, it can arrive in poor coditions. See the pictures, it looks bad printed." Read more
"Great Introductory Book with Flaws in Prose and Content..." Read more
Reviews with images

Brutal
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
- Reviewed in the United States on February 27, 2025I finished reading “Fundamentals of Data Engineering” by Joe Reis and Matt Houser. The book is wonderful and a must read for any data professional.
My favorite part is that the book is tool-agnostic. While there are many books that teach data engineering in one specific tool or language, Fundamentals of Data Engineering succeeds in explaining data engineering concepts without being attached to a tool.
What also caught my attention is that this book is very well designed for a broad audience. There is value for anyone who is either a seasoned professional, or someone who is relatively new to the field (like me).
This field is constantly evolving in a very fast pace, and I can see how Fundamentals of Data Engineering is one of the books that will stand the test of time. I highly recommend it to anyone in the data industry.
- Reviewed in the United States on December 21, 2024I would recommend it to anyone who wants to either standardize their knowledge or learn about the basics of data engineering.
- Reviewed in the United States on July 29, 2024Recommended for IT professionals and students that want lo learn about all the components of Data engineering. Very useful to understand current and coming Data engineering trends and technologies.
- Reviewed in the United States on November 13, 2023Many books in the computer science domain are informative, readable and useful. This book exceeds in all those categories, and in my opinion, achieves something more difficult: it is enjoyable to read. The concise and often times clever analogies efficiently compress complex ideas into a digestible concepts. Though the writing style is similar to other O'Reilly books in the same genre, you get a little more personality coming through the pages.
- Reviewed in the United States on May 7, 2024I read this book end-to-end over a 2 month period, making notes and reviewing chapters as I went. I'm coming at it from the perspective of a data analyst with a reasonable amount of exposure to the concepts and practises mentioned.
One of the best attributes about this book is that it is one of, if not the, first high level introductions that tries to remain technology agnostic. Unlike many others that define data engineering as "use of pyspark" or "use of Hadoop", this book tries and (mostly) succeeds in setting up a universally applicable framework or "lifecycle" through which the data flows through. It tries to instil in the reader a respect for good architecture, security, and ROI, beyond just playing with the latest toys. In doing so it casts a wide net, giving information ranging from the very "guts" of the hardware (HDD, SSD, etc.) right up to finops and stakeholder management. There's no doubt reading this book will give you a strong framework in how to view data engineering and a serviceable exposure to many of the terms, concepts and technologies therein.
So this book is worth reading, but it's far from perfect. The first major thing is that this book is not a "holistic" product. It doesn't read end-to-end as a cohesive narrative with each chapter building on the ones prior. Some concepts are mentioned before they are described in any detail. Others are repeated multiple times across different chapters (I estimate the length of this book could be decreased by 15-20% without impacting the flow by removing redundancy). The choice of what to focus on is a bit odd. Some subjects are given enormous amounts of coverage explained in detail, while others get a paragraph. The level of technical detail also varies greatly. Some concepts are described as if it's talking to someone with no background in data/software eng., while others are a struggle to comprehend unless you've been exposed to it before.
An additional thing worth noting is that, perhaps because of the background of the authors, it definitely reads like a book aimed for those who work in "big tech". The major way this manifests is in the book's incessantly repeated assertion that streaming data is the future, and the enormous amount of ink spilt detailing it. From the position of someone working in a non-FAANG company that doesn't need down-to-the-second data, this seems far more like an opinion then a fact. If their prediction comes true then they get my praise, but if they don't it's a massive and wearying drag for a topic that probably less than 5% of data engineers really need to be concerned about.
In summary, this is a good, necessary book that provides a great introduction to the field while also clearly being a first edition by authors who haven't quite perfected the art of relaying information efficiently. Recommended to people trying to break into the field and those who want to catch up on key concepts. Might be worth waiting for a 2nd or 3rd edition though.
- Reviewed in the United States on October 7, 2024This is one of those data books that will stand the test of time. Joe and Matt detail timeless data principles. This book is a must read for anyone who takes their craft seriously.
- Reviewed in the United States on January 31, 2023This book does a good job of informing a person on technical subjects without reading like a reference manual. As a working data engineer, I would recommend it for people just starting out in the profession as well as seasoned professionals. People with less experience might want to skip around to subjects that are particularly interesting, but for those that have a base understanding of most of the topics, it is fine to read end-to-end.
- Reviewed in the United States on February 12, 2023An excellent opportunity to learn about data engineering. An interesting point of view of the field to juniors, tech managers, data architects, data people.This book offers a good balance between the main framework of data engineering to new concepts in the field.
It is a gift to read Joe and Math, clearly the authors produced this book from their experience in the field.
I personally love the third chapter about data architecture. Depending the business maturity, in a lot of cases we, as data engineers, must understand architecture concepts to design data products.
Another thing that I liked is the theme approach and that at the end of each chapter (who will you work with) they adopted an approach from dataops, architecture, security, orchestration.
Top reviews from other countries
- Cliente KindleReviewed in Brazil on April 20, 2024
5.0 out of 5 stars A great reference book
This book is a great reference for you that thinks in work in the field or even for you to know the boundaries and do's and dont's of data Engineering. It's a theoretical book, so it won't teach you a specific tool.
After reading this book I felt more confident. If someone talk about a new tool or anything related to data Engineering I feel confident that I might understand in which "bucket" the tool or concept is located in the universe of this book.
-
Juan Manuel Tellez PerezReviewed in Mexico on March 20, 2024
5.0 out of 5 stars Fundamentos de Ingeniería de Datos
Es un excelente libro, con un contenido muy completo, llegó en buen estado
- Just Some GuyReviewed in the United Kingdom on January 25, 2025
5.0 out of 5 stars A Great Overview of the Data Analytics Tech Landscape (for the Uninitiated)
This is a great book for anyone who has a solid understanding of software development and cloud architecture, but doesn't have direct experience building data pipelines or data analytics products.
The authors don't get into much technical detail at a tactical level - this is not a book about actually implementing anything whatsoever. Rather, this book offers a really excellent 10,000 foot view of the current state of Data Engineering from multiple angles.
Throughout the book they spend a lot of time explaining the "people" side of things (what developers and teams actually do when building Data Eng teams, analytics pipelines, etc.) and how they interact with various other teams and stakeholders (data scientists, analysts, PMs, execs,...).
They also cover a vast amount of ground on the architectural side of things. As a developer with years of tech experience, but one which has never directly worked on data pipelines, I really enjoyed how they offered both numerous examples and stories of how projects were built and operated in the _ancient_ "big data" Hadoop era (i.e. 2010-2020, LOL!), and then how quickly the tech and related architectures have changed as significant new technologies came to the fore (i.e. Kafka, BigQuery/Athena, Snowflake/Databricks, etc...).
My 2 constructive criticisms of this book are:
1) Some will be frustrated by the lack of tactical content or technical depth. That said, what they sacrifice in depth they make up for in scope. The data analytics space is vast, and evolving at a breakneck pace. They do an admirable job of introducing and summarizing a vast topic, all grounded in practical advice and real-world anecdotes and examples (from their own professional experience).
2) They have 1 surprising blind spot, imho – which is that they don't even offer a passing nod to Domain Driven Design (DDD). Given that they do discuss topics including microservices, data models, schemas, and some aspects of "domains" in the enterprise sense, as well as the need to interact with stakeholders and experts from various other teams (aka "domain experts"), this strikes me as a surprising blind spot. I'd like to see them explore DDD in a future 2nd edition (please!).
Final word – If you're an experienced developer or architect with big data or analytics experience, this book may leave you wanting. For anyone else with a solid technical foundation and an interest in the data realm from almost any angle, this is a great read that's well worth your time.
- Nick CReviewed in Germany on December 16, 2024
5.0 out of 5 stars Great foundational knowledge
Provides indeed the fundamental understanding of (almost) everything around data. It analyzes the whole data lifecycle from Ingestion to storage and analytics. Highly recommended for analysts, data scientists, less experienced engineers and for sure people that want to switch to DE.
- shrish kumarReviewed in India on May 9, 2024
5.0 out of 5 stars Book content & Binding
Content is great to begin with data engineering, everything is explained in organized way.
Best 👌 binding and page quality is great for Indian version.