$43.99 with 45 percent savings
List Price: $79.99
FREE Returns
FREE delivery Monday, March 17
Or Prime members get FREE delivery Friday, March 14. Order within 1 hr 47 mins.
In Stock
$$43.99 () Includes selected options. Includes initial monthly payment and selected options. Details
Price
Subtotal
$$43.99
Subtotal
Initial payment breakdown
Shipping cost, delivery date, and order total (including tax) shown at checkout.
Ships from
Amazon.com
Amazon.com
Ships from
Amazon.com
Sold by
Amazon.com
Amazon.com
Sold by
Amazon.com
Returns
30-day refund/replacement
30-day refund/replacement
This item can be returned in its original condition for a full refund or replacement within 30 days of receipt.
Payment
Secure transaction
Your transaction is secure
We work hard to protect your security and privacy. Our payment security system encrypts your information during transmission. We don’t share your credit card details with third-party sellers, and we don’t sell your information to others. Learn more
Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

Follow the authors

Something went wrong. Please try your request again later.

Fundamentals of Data Engineering: Plan and Build Robust Data Systems 1st Edition

4.7 4.7 out of 5 stars 669 ratings

{"desktop_buybox_group_1":[{"displayPrice":"$43.99","priceAmount":43.99,"currencySymbol":"$","integerValue":"43","decimalSeparator":".","fractionalValue":"99","symbolPosition":"left","hasSpace":false,"showFractionalPartIfEmpty":true,"offerListingId":"R%2FnaBB%2FfvRuVgtmaYmszhJlRjG%2BGXUyFf02qrxtyJkiDE%2F%2BYxwcak%2FgidTZ1GpqIVPF069E4Cg%2B48Ef%2FWaRpvV%2BbHv2JcvtaQrZVZnkS0Z3ldyW%2BflfujhJO1JOcHBXGbxgUVJFipihyL%2FAUUcykaQ%3D%3D","locale":"en-US","buyingOptionType":"NEW","aapiBuyingOptionIndex":0}]}

Purchase options and add-ons

Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle.

Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology.

This book will help you:

  • Get a concise overview of the entire data engineering landscape
  • Assess data engineering problems using an end-to-end framework of best practices
  • Cut through marketing hype when choosing data technologies, architecture, and processes
  • Use the data engineering lifecycle to design and build a robust architecture
  • Incorporate data governance and security across the data engineering lifecycle

Frequently bought together

This item: Fundamentals of Data Engineering: Plan and Build Robust Data Systems
$43.99
Get it as soon as Monday, Mar 17
In Stock
Ships from and sold by Amazon.com.
+
$23.65
Get it as soon as Monday, Mar 17
In Stock
Ships from and sold by Amazon.com.
+
$17.29
Get it as soon as Monday, Mar 17
In Stock
Ships from and sold by Amazon.com.
Total price: $00
To see our price, add these items to your cart.
Details
Added to Cart
spCSRF_Treatment
Choose items to buy together.
Popular Highlights in this book

From the brand


From the Publisher

From the Preface

How did this book come about? The origin is deeply rooted in our journey from data science into data engineering. We often jokingly refer to ourselves as recovering data scientists. We both had the experience of being assigned to data science projects, then struggling to execute these projects due to a lack of proper foundations. Our journey into data engineering began when we undertook data engineering tasks to build foundations and infrastructure.

With the rise of data science, companies splashed out lavishly on data science talent, hoping to reap rich rewards. Very often, data scientists struggled with basic problems that their background and training did not address—data collection, data cleansing, data access, data transformation, and data infrastructure. These are problems that data engineering aims to solve.

What This Book Isn’t

Before we cover what this book is about and what you’ll get out of it, let’s quickly cover what this book isn’t. This book isn’t about data engineering using a particular tool, technology, or platform. While many excellent books approach data engineering technologies from this perspective, these books have a short shelf life. Instead, we focus on the fundamental concepts behind data engineering.

By the end of this book you will understand:

  • How data engineering impacts your current role (data scientist, software engineer, or data team)
  • How to cut through the marketing hype and choose the right technologies, data arch. & processes
  • How to use the data engineering lifecycle to design and build a robust architecture
  • Best practices for each stage of the data lifecycle

What This Book Is About

This book aims to fill a gap in current data engineering content and materials. While there’s no shortage of technical resources that address specific data engineering tools and technologies, people struggle to understand how to assemble these components into a coherent whole that applies in the real world. This book connects the dots of the end-to-end data lifecycle. It shows you how to stitch together various technologies to serve the needs of downstream data consumers such as analysts, data scientists, and machine learning engineers. This book works as a complement to O’Reilly books that cover the details of particular technologies, platforms, and programming languages.

The big idea of this book is the data engineering lifecycle: data generation, storage, ingestion, transformation, and serving. Since the dawn of data, we’ve seen the rise and fall of innumerable specific technologies and vendor products, but the data engineering lifecycle stages have remained essentially unchanged. With this framework, the reader will come away with a sound understanding for applying technologies to real-world business problems.

Our goal here is to map out principles that reach across two axes. First, we wish to distill data engineering into principles that can encompass any relevant technology. Second, we wish to present principles that will stand the test of time. We hope that these ideas reflect lessons learned across the data technology upheaval of the last twenty years and that our mental framework will remain useful for a decade or more into the future.

One thing to note: we unapologetically take a cloud-first approach. We view the cloud as a fundamentally transformative development that will endure for decades; most on-premises data systems and workloads will eventually move to cloud hosting. We assume that infrastructure and systems are ephemeral and scalable, and that data engineers will lean toward deploying managed services in the cloud. That said, most concepts in this book will translate to non-cloud environments.

Who Should Read This Book

Our primary intended audience for this book consists of technical practitioners, mid- to senior-level software engineers, data scientists, or analysts interested in moving into data engineering; or data engineers working in the guts of specific technologies, but wanting to develop a more comprehensive perspective. Our secondary target audience consists of data stakeholders who work adjacent to technical practitioners—e.g., a data team lead with a technical background overseeing a team of data engineers, or a director of data warehousing wanting to migrate from on-premises technology to a cloud-based solution.

Ideally, you’re curious and want to learn—why else would you be reading this book? You stay current with data technologies and trends by reading books and articles on data warehousing/data lakes, batch and streaming systems, orchestration, modeling, management, analysis, developments in cloud technologies, etc. This book will help you weave what you’ve read into a complete picture of data engineering across technologies and paradigms.

Editorial Reviews

About the Author

Joe Reis is a business-minded data nerd who's worked in the data industry for 20 years, with responsibilities ranging from statistical modeling, forecasting, machine learning, data engineering, data architecture, and almost everything else in between. Joe is the CEO and cofounder of Ternary Data, a data engineering and architecture consulting firm based in Salt Lake City, Utah. In addition, he volunteers with several technology groups and teaches at the University of Utah. In his spare time, Joe likes to rock climb, produce electronic music, and take his kids on crazy adventures.

Matt Housley is a data engineering consultant and cloud specialist. After some early programming experience with Logo, Basic, and 6502 assembly, he completed a PhD in mathematics at the University of Utah. Matt then began working in data science, eventually specializing in cloud-based data engineering. He cofounded Ternary Data with Joe Reis, where he leverages his teaching experience to train future data engineers and advise teams on robust data architecture. Matt and Joe also pontificate on all things data on The Monday Morning Data Chat.

Product details

  • Publisher ‏ : ‎ O'Reilly Media; 1st edition (July 26, 2022)
  • Language ‏ : ‎ English
  • Paperback ‏ : ‎ 447 pages
  • ISBN-10 ‏ : ‎ 1098108302
  • ISBN-13 ‏ : ‎ 978-1098108304
  • Item Weight ‏ : ‎ 1.57 pounds
  • Dimensions ‏ : ‎ 7 x 0.91 x 9.19 inches
  • Customer Reviews:
    4.7 4.7 out of 5 stars 669 ratings

About the authors

Follow authors to get new release updates, plus improved recommendations.

Customer reviews

4.7 out of 5 stars
669 global ratings

Review this product

Share your thoughts with other customers

Customers say

Customers find the book informative and useful for learning about data engineering and development. They describe it as a great read that covers both basic and advanced concepts. The book provides helpful design considerations and ideas through a theme approach. Many readers consider it a good reference material that will stand the test of time. However, some customers feel the content is not engaging and has flaws in the prose and presentation.

AI-generated from the text of customer reviews

20 customers mention "Data engineering"20 positive0 negative

Customers find the book informative and useful for data engineering. It covers both basic and advanced concepts, making it a great reference with insightful chapters. They find it an essential read for data professionals.

"...The book is wonderful and a must read for any data professional. My favorite part is that the book is tool-agnostic...." Read more

"Recommended for IT professionals and students that want lo learn about all the components of Data engineering...." Read more

"Many books in the computer science domain are informative, readable and useful...." Read more

"...There's no doubt reading this book will give you a strong framework in how to view data engineering and a serviceable exposure to many of the terms,..." Read more

15 customers mention "Readability"15 positive0 negative

Customers find the book easy to read and well-organized. It provides a good reference for most topics.

"...The book is wonderful and a must read for any data professional. My favorite part is that the book is tool-agnostic...." Read more

"Many books in the computer science domain are informative, readable and useful...." Read more

"...In summary, this is a good, necessary book that provides a great introduction to the field while also clearly being a first edition by authors who..." Read more

"...that have a base understanding of most of the topics, it is fine to read end-to-end." Read more

3 customers mention "Design"3 positive0 negative

Customers find the book helpful for thinking through design considerations and themes. They appreciate the theme approach and find it has many ideas even if you're already working.

"...Another thing that I liked is the theme approach and that at the end of each chapter (who will you work with) they adopted an approach from dataops,..." Read more

"...work at a higher level, and will help you think through some key design considerations as you’re building, queries, data, pipelines, and systems of..." Read more

"I'm just starting to read this book and I'm already loving it, it has a lot of ideas even if you're already working in a Data Engineer job" Read more

3 customers mention "Sturdiness"3 positive0 negative

Customers appreciate the book's sturdiness. They say it will stand the test of time and is in perfect condition.

"This is one of those data books that will stand the test of time. Joe and Matt detail timeless data principles...." Read more

"The book is in perfect condition..." Read more

"POOR condition (new book)..." Read more

3 customers mention "Content quality"0 positive3 negative

Customers find the book's content lacking. They mention it has flaws in the prose and content, and that it can arrive in poor condition. The book is also criticized for being too long and not engaging.

"I found this book unnecessarily long and not very engaging in its presentation...." Read more

"...buying this book, even you choose new status, it can arrive in poor coditions. See the pictures, it looks bad printed." Read more

"Great Introductory Book with Flaws in Prose and Content..." Read more

Brutal
5 out of 5 stars
Brutal
In summary: this is a must-read if you are in the data world (scientist, engineer, analyst) or your work is related to analytics (managers, product owners, project managers, etc.).The book is fantastic! Easy to read, with many insights in every chapter. I thought I knew of data engineering before this masterpiece, and how wrong I was. Although its content is theoretical, I am excited to apply everything I learned from this book.
Thank you for your feedback
Sorry, there was an error
Sorry we couldn't load the review

Top reviews from the United States

  • Reviewed in the United States on February 27, 2025
    I finished reading “Fundamentals of Data Engineering” by Joe Reis and Matt Houser. The book is wonderful and a must read for any data professional.

    My favorite part is that the book is tool-agnostic. While there are many books that teach data engineering in one specific tool or language, Fundamentals of Data Engineering succeeds in explaining data engineering concepts without being attached to a tool.

    What also caught my attention is that this book is very well designed for a broad audience. There is value for anyone who is either a seasoned professional, or someone who is relatively new to the field (like me).

    This field is constantly evolving in a very fast pace, and I can see how Fundamentals of Data Engineering is one of the books that will stand the test of time. I highly recommend it to anyone in the data industry.
  • Reviewed in the United States on December 21, 2024
    I would recommend it to anyone who wants to either standardize their knowledge or learn about the basics of data engineering.
    One person found this helpful
    Report
  • Reviewed in the United States on July 29, 2024
    Recommended for IT professionals and students that want lo learn about all the components of Data engineering. Very useful to understand current and coming Data engineering trends and technologies.
    4 people found this helpful
    Report
  • Reviewed in the United States on November 13, 2023
    Many books in the computer science domain are informative, readable and useful. This book exceeds in all those categories, and in my opinion, achieves something more difficult: it is enjoyable to read. The concise and often times clever analogies efficiently compress complex ideas into a digestible concepts. Though the writing style is similar to other O'Reilly books in the same genre, you get a little more personality coming through the pages.
    5 people found this helpful
    Report
  • Reviewed in the United States on May 7, 2024
    I read this book end-to-end over a 2 month period, making notes and reviewing chapters as I went. I'm coming at it from the perspective of a data analyst with a reasonable amount of exposure to the concepts and practises mentioned.

    One of the best attributes about this book is that it is one of, if not the, first high level introductions that tries to remain technology agnostic. Unlike many others that define data engineering as "use of pyspark" or "use of Hadoop", this book tries and (mostly) succeeds in setting up a universally applicable framework or "lifecycle" through which the data flows through. It tries to instil in the reader a respect for good architecture, security, and ROI, beyond just playing with the latest toys. In doing so it casts a wide net, giving information ranging from the very "guts" of the hardware (HDD, SSD, etc.) right up to finops and stakeholder management. There's no doubt reading this book will give you a strong framework in how to view data engineering and a serviceable exposure to many of the terms, concepts and technologies therein.

    So this book is worth reading, but it's far from perfect. The first major thing is that this book is not a "holistic" product. It doesn't read end-to-end as a cohesive narrative with each chapter building on the ones prior. Some concepts are mentioned before they are described in any detail. Others are repeated multiple times across different chapters (I estimate the length of this book could be decreased by 15-20% without impacting the flow by removing redundancy). The choice of what to focus on is a bit odd. Some subjects are given enormous amounts of coverage explained in detail, while others get a paragraph. The level of technical detail also varies greatly. Some concepts are described as if it's talking to someone with no background in data/software eng., while others are a struggle to comprehend unless you've been exposed to it before.

    An additional thing worth noting is that, perhaps because of the background of the authors, it definitely reads like a book aimed for those who work in "big tech". The major way this manifests is in the book's incessantly repeated assertion that streaming data is the future, and the enormous amount of ink spilt detailing it. From the position of someone working in a non-FAANG company that doesn't need down-to-the-second data, this seems far more like an opinion then a fact. If their prediction comes true then they get my praise, but if they don't it's a massive and wearying drag for a topic that probably less than 5% of data engineers really need to be concerned about.

    In summary, this is a good, necessary book that provides a great introduction to the field while also clearly being a first edition by authors who haven't quite perfected the art of relaying information efficiently. Recommended to people trying to break into the field and those who want to catch up on key concepts. Might be worth waiting for a 2nd or 3rd edition though.
    11 people found this helpful
    Report
  • Reviewed in the United States on October 7, 2024
    This is one of those data books that will stand the test of time. Joe and Matt detail timeless data principles. This book is a must read for anyone who takes their craft seriously.
  • Reviewed in the United States on January 31, 2023
    This book does a good job of informing a person on technical subjects without reading like a reference manual. As a working data engineer, I would recommend it for people just starting out in the profession as well as seasoned professionals. People with less experience might want to skip around to subjects that are particularly interesting, but for those that have a base understanding of most of the topics, it is fine to read end-to-end.
    8 people found this helpful
    Report
  • Reviewed in the United States on February 12, 2023
    An excellent opportunity to learn about data engineering. An interesting point of view of the field to juniors, tech managers, data architects, data people.This book offers a good balance between the main framework of data engineering to new concepts in the field.

    It is a gift to read Joe and Math, clearly the authors produced this book from their experience in the field.

    I personally love the third chapter about data architecture. Depending the business maturity, in a lot of cases we, as data engineers, must understand architecture concepts to design data products.

    Another thing that I liked is the theme approach and that at the end of each chapter (who will you work with) they adopted an approach from dataops, architecture, security, orchestration.
    7 people found this helpful
    Report

Top reviews from other countries

Translate all reviews to English
  • Cliente Kindle
    5.0 out of 5 stars A great reference book
    Reviewed in Brazil on April 20, 2024
    This book is a great reference for you that thinks in work in the field or even for you to know the boundaries and do's and dont's of data Engineering. It's a theoretical book, so it won't teach you a specific tool.

    After reading this book I felt more confident. If someone talk about a new tool or anything related to data Engineering I feel confident that I might understand in which "bucket" the tool or concept is located in the universe of this book.
  • Juan Manuel Tellez Perez
    5.0 out of 5 stars Fundamentos de Ingeniería de Datos
    Reviewed in Mexico on March 20, 2024
    Es un excelente libro, con un contenido muy completo, llegó en buen estado
    Report
  • Just Some Guy
    5.0 out of 5 stars A Great Overview of the Data Analytics Tech Landscape (for the Uninitiated)
    Reviewed in the United Kingdom on January 25, 2025
    This is a great book for anyone who has a solid understanding of software development and cloud architecture, but doesn't have direct experience building data pipelines or data analytics products.

    The authors don't get into much technical detail at a tactical level - this is not a book about actually implementing anything whatsoever. Rather, this book offers a really excellent 10,000 foot view of the current state of Data Engineering from multiple angles.

    Throughout the book they spend a lot of time explaining the "people" side of things (what developers and teams actually do when building Data Eng teams, analytics pipelines, etc.) and how they interact with various other teams and stakeholders (data scientists, analysts, PMs, execs,...).

    They also cover a vast amount of ground on the architectural side of things. As a developer with years of tech experience, but one which has never directly worked on data pipelines, I really enjoyed how they offered both numerous examples and stories of how projects were built and operated in the _ancient_ "big data" Hadoop era (i.e. 2010-2020, LOL!), and then how quickly the tech and related architectures have changed as significant new technologies came to the fore (i.e. Kafka, BigQuery/Athena, Snowflake/Databricks, etc...).

    My 2 constructive criticisms of this book are:
    1) Some will be frustrated by the lack of tactical content or technical depth. That said, what they sacrifice in depth they make up for in scope. The data analytics space is vast, and evolving at a breakneck pace. They do an admirable job of introducing and summarizing a vast topic, all grounded in practical advice and real-world anecdotes and examples (from their own professional experience).

    2) They have 1 surprising blind spot, imho – which is that they don't even offer a passing nod to Domain Driven Design (DDD). Given that they do discuss topics including microservices, data models, schemas, and some aspects of "domains" in the enterprise sense, as well as the need to interact with stakeholders and experts from various other teams (aka "domain experts"), this strikes me as a surprising blind spot. I'd like to see them explore DDD in a future 2nd edition (please!).

    Final word – If you're an experienced developer or architect with big data or analytics experience, this book may leave you wanting. For anyone else with a solid technical foundation and an interest in the data realm from almost any angle, this is a great read that's well worth your time.
  • Nick C
    5.0 out of 5 stars Great foundational knowledge
    Reviewed in Germany on December 16, 2024
    Provides indeed the fundamental understanding of (almost) everything around data. It analyzes the whole data lifecycle from Ingestion to storage and analytics. Highly recommended for analysts, data scientists, less experienced engineers and for sure people that want to switch to DE.
  • shrish kumar
    5.0 out of 5 stars Book content & Binding
    Reviewed in India on May 9, 2024
    Content is great to begin with data engineering, everything is explained in organized way.
    Best 👌 binding and page quality is great for Indian version.