Difference between revisions of "Book/Designing Data-Intensive Applications"

From PKC
Jump to navigation Jump to search
Line 15: Line 15:
=Critical Insights=
=Critical Insights=
In the '''Summary''' section of Chapter 10: Batch Processing<ref>{{:Book/Designing Data-Intensive Applications}}, Chapter 10: Batch Processing, Page 429</ref>
In the '''Summary''' section of Chapter 10: Batch Processing<ref>{{:Book/Designing Data-Intensive Applications}}, Chapter 10: Batch Processing, Page 429</ref>
{{Blockquote
{{:Quote/Unix has a uniform interface}}
|text=In the Unix world, the unifrom interface that allows one program to be composed with another is files and pipes; in MapReduce, that interface is a distributed filesystem. We saw that dataflow engines add their own pipe-like data transport mechanisms to avoid materializing intermediate state to the distributed filesystem, but the initial input and final output of a job is still usually HDFS.
|sign=[[Martin Kleppmann]]
}}





Revision as of 10:39, 16 July 2022

Kleppmann, Martin (Mar 1, 2017). Designing Data-Intensive Applications:The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. local page: O'reily Press. 


Martin Kleppmann teaches a course on Distributed systems[1]. The author also has a video series called Distributed Systems[2] on Youtube. One may also want to read on Jean Bacon's book on Concurrent Systems[3].

Critical Insights

In the Summary section of Chapter 10: Batch Processing[4]

In the Unix world, the unifrom interface that allows one program to be composed with another is files and pipes; in MapReduce, that interface is a distributed filesystem. We saw that dataflow engines add their own pipe-like data transport mechanisms to avoid materializing intermediate state to the distributed filesystem, but the initial input and final output of a job is still usually HDFS.


References

Related Pages

Author:Martin Kleppmann