Hadoop: The Definitive Guide

Hadoop: The Definitive Guide


Get able to release the facility of your facts. With the fourth version of this accomplished consultant, you’ll tips on how to construct and keep trustworthy, scalable, disbursed platforms with Apache Hadoop. This publication is perfect for programmers seeking to research datasets of any dimension, and for directors who are looking to manage and run Hadoop clusters.

Using Hadoop 2 completely, writer Tom White offers new chapters on YARN and a number of other Hadoop-related tasks akin to Parquet, Flume, Crunch, and Spark. You’ll know about fresh alterations to Hadoop, and discover new case stories on Hadoop’s position in healthcare platforms and genomics info processing.

  • Learn basic parts akin to MapReduce, HDFS, and YARN
  • Explore MapReduce extensive, together with steps for constructing functions with it
  • Set up and continue a Hadoop cluster working HDFS and MapReduce on YARN
  • Learn info codecs: Avro for facts serialization and Parquet for nested data
  • Use info ingestion instruments resembling Flume (for streaming information) and Sqoop (for bulk information transfer)
  • Understand how high-level information processing instruments like Pig, Hive, Crunch, and Spark paintings with Hadoop
  • Learn the HBase dispensed database and the ZooKeeper dispensed configuration service

Show sample text content

Download sample