加入收藏 | 设为首页 | 会员中心 | 我要投稿 | RSS
您当前的位置:首页 > 教程文章 > NOSQL数据库

Notes for Hadoop the definitive guide

时间:2012-04-27 01:20:01  来源:  作者:

 

1.Introduction to HDFS
1.1.HDFS Concepts
1.1.1.Blocks
lHDFS too has the concept of a block, but it is a much larger unit 64 MB by default.
lLike in a filesystem for a single disk, files in HDFS are broken into block-sized chunks, which are stored as independent units.
lUnlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage.
1.1.2.Namenodes and Datanodes
lThe namenode manages the filesystem namespace.
nIt maintains the filesystem tree and the metadata for all the files and directories in the tree.
nThis information is stored persistently on the local disk in the form of two files: the namespace image and the edit log.
nThe namenode also knows the datanodes on which all the blocks for a given file are located, however, it does not store block locations persistently, since this information is reconstructed from datanodes when the system starts.
lDatanodes are the work horses of the filesystem.
nThey store and retrieve blocks when they are told to (by clients or the namenode)
nThey report back to the namenode periodically with lists of blocks that they are storing.
lsecondary namenode
nIt does not act as a namenode.
nIts main role is to periodically merge the namespace image with the edit log to prevent the edit log from becoming too large.
nIt keeps a copy of the merged name space image, which can be used in the event of the namenode failing.
Namenode directory structure
clip_image002
lThe VERSION file is a Java properties file that contains information about the version of HDFS that is running
nThe layoutVersion is a negative integer that defines the version of HDFS’s persistent data structures.
nThe namespaceID is a unique identifier for the filesystem, which is created when the filesystem is first formatted.
nThe cTime property marks the creation time of the namenode’s storage.
nThe storageType indicates that this storage directory contains data structures for a namenode.
clip_image004
The filesystem image and edit log
lWhen a filesystem client performs a write operation, it is first recorded in the edit log.
lThe namenode also has an in-memory representation of the filesystem metadata, which it updates after the edit log has been modified.
lThe edit log is flushed and synced after every write before a success code is returned to the client.
lThe fsimage file is a persistent checkpoint of the filesystem metadata. it is not updated for every filesystem write operation.
lIf the namenode fails, then the latest state of its metadata can be reconstructed by loading the fsimage from disk into memory, then applying each of the operations in the edit log.
lThis is precisely what the namenode does when it starts up.
lThe fsimage file contains a serialized form of all the directory and file inodes in the filesystem.
lThe secondary namenode is to produce checkpoints of the primary’s in-memory filesystem metadata.
lThe checkpointing process proceeds as follows :
nThe secondary asks the primary to roll its edits file, so new edits go to a new file.
nThe secondary retrieves fsimage and edits from the primary (using HTTP GET).
nThe secondary loads fsimage into memory, applies each operation from edits, then creates a new consolidated fsimage file.
nThe secondary sends the new fsimage back to the primary (using HTTP POST).
nThe primary replaces the old fsimage with the new one from the secondary, and the old edits file with the new one it started in step 1. It also updates the fstime file to record the time that the checkpoint was taken.
nAt the end of the process, the primary has an up-to-date fsimage file, and a shorter edits file.
clip_image006
Secondary namenode directory structure
clip_image008
Datanode directory structure
clip_image010
lA datanode’s VERSION file
clip_image012
lThe other files in the datanode’s current storage directory are the files with the blk_ prefix.
nThere are two types: the HDFS blocks themselves (which just consist of the file’s raw bytes) and the metadata for a block (with a .meta suffix).
nA block file just consists of the raw bytes of a portion of the file being stored;
nthe metadata file is made up of a header with version and type information, followed by a series of checksums for sections of the block.
lWhen the number of blocks in a directory grows to a certain size, the datanode creates a new subdirectory in which to place new blocks and their accompanying metadata.
1.2.Data Flow
1.2.1.Anatomy of a File Read
clip_image014
lThe client opens the file it wishes to read by calling open() on the FileSystem object (step 1).
lDistributedFileSystem calls the namenode, using RPC, to determine the locations of the blocks for the first few blocks in the file (step 2).
lFor each block, the namenode returns the addresses of the datanodes that have a copy of that block.
lThe datanodes are sorted according to their proximity to the client.
lThe DistributedFileSystem returns a FSDataInputStream to the client for it to read data from.
lThe client then calls read() on the stream (step 3).
lDFSInputStream connects to the first (closest) datanode for the first block in the file.
lData is streamed from the datanode back to the client (step 4).
lWhen the end of the block is reached, DFSInputStream will close the connection to the datanode, then find the best datanode for the next block (step 5).
lWhen the client has finished reading, it calls close() on the FSDataInputStream (step 6).
lDuring reading, if the client encounters an error while communicating with a datanode, then it will try the next closest one for that block.
lIt will also remember datanodes that have failed so that it doesn’t needlessly retry them for later blocks.
lThe client also verifies checksums for the data transferred to it from the datanode. If a corrupted block is found, it is reported to the namenode.
1.2.2.Anatomy of a File Write
clip_image016
lThe client creates the file by calling create() (step 1).
lDistributedFileSystem makes an RPC call to the namenode to create a new file in the filesystem’s namespace, with no blocks associated with it (step 2).
lThe namenode performs various checks to make sure the file doesn’t already exist, and that the client has the right permissions to create the file. If these checks pass, the namenode makes a record of the new file; otherwise, file creation fails and the client is thrown an IOException.
lThe DistributedFileSystem returns a FSDataOutputStream for the client to start writing data to.
lAs the client writes data (step 3), DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue.
lThe data queue is consumed by the Data Streamer, whose responsibility it is to ask the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas. The list of datanodes forms apipeline.
lThe DataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline. Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipe line (step 4).
lDFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue. A packet is removed from the ack queue only when it has been acknowledged by all the datanodes in the pipeline (step 5).
lIf a datanode fails while data is being written to it,
nFirst the pipeline is closed, and any packets in the ack queue are added to the front of the data queue.
nThe current block on the good datanodes is given a new identity by the namenode, so that the partial block on the failed datanode will be deleted if the failed data node recovers later on.
nThe failed datanode is removed from the pipeline and the remainder of the block’s data is written to the two good datanodes in the pipeline.
nThe namenode notices that the block is under-replicated, and it arranges for a further replica to be created on another node.
lWhen the client has finished writing data it calls close() on the stream (step 6). This action flushes all the remaining packets to the datanode pipeline and waits for acknowledgments before contacting the namenode to signal that the file is complete (step7).
2.Meet Map/Reduce
lMapReduce has two phases: the map phase and the reduce phase.
lEach phase has key-value pairs as input and output (the types can be specified).
nThe input key-value types of the map phase is determined by the input format
nThe output key-value types of the map phase should match the input key value types of the reduce phase
nThe output key-value types of the reduce phase can be set in the JobConf interface.
lThe programmer specifies two functions: the map function and the reduce function.
2.1.MapReduce logical data flow
clip_image018
2.2.MapReduce Code
2.2.1.The map function is represented by an implementation of the Mapper interface, which declares a map() method.
clip_image020
2.2.2.The reduce function is defined using a Reducer
lThe input types of the reduce function must match the output type of the map function.
clip_image022
2.2.3.The code runs the MapReduce job
lAn input path is specified by calling the static addInputPath() method on FileInputFormat
nIt can be a single file, a directory, or a file pattern.
naddInputPath() can be called more than once to use input from multiple paths.
lThe output path is specified by the static setOutputPath() method on FileOutputFormat.
nIt specifies a directory where the output files from the reducer functions are written.
nThe directory shouldn’t exist before running the job
lThe map and reduce types can be specified via the setMapperClass() and setReducerClass() methods.
lThe setOutputKeyClass() and setOutputValueClass() methods control the output types for the map and the reduce functions, which are often the same.
nIf they are different, then the map output types can be set using the methods setMapOutputKeyClass() and setMapOutputValueClass().
lThe input types are controlled via the input format, which we have not explicitly set since we are using the default TextInputFormat.
clip_image024
2.3.Scaling Out
2.3.1.MapReduce data flow with a single reduce task
lA MapReduce job is a unit of work that the client wants to be performed: it consists of the input data, the MapReduce program, and configuration information.
lHadoop runs the job by dividing it into tasks, of which there are two types: map tasks and reduce tasks.
lThere are two types of nodes that control the job execution process: a jobtracker and a number of tasktrackers.
nThe jobtracker coordinates all the jobs run on the system by scheduling tasks to run on tasktrackers.
nTasktrackers run tasks and send progress reports to the jobtracker, which keeps a record of the overall progress of each job.
nIf a tasks fails, the jobtracker can reschedule it on a different tasktracker.
lHadoop divides the input to a MapReduce job into fixed-size input splits.
lHadoop creates one map task for each split, which runs the user defined map function for each record in the split.
lHadoop does its best to run the map task on a node where the input data resides in HDFS.
nThis is called the data locality optimization.
nThis is why the optimal split size is the same as the block size: it is the largest size of input that can be guaranteed to be stored on a single node.
lReduce tasks don’t have the advantage of data locality
nThe input to a single reduce task is normally the output from all mappers.
nThe output of the reduce is normally stored in HDFS for reliability.
clip_image026
2.3.2.MapReduce data flow with multiple reduce tasks
The number of reduce tasks is not governed by the size of the input, but is specified independently.
lWhen there are multiple reducers, the map tasks partition their output, each creating one partition for each reduce task.
lThere can be many keys (and their associated values) in each partition, but the records for every key are all in a single partition.
lThe partitioning can be controlled by a user-defined partitioning function
nNormally the default partitioner which buckets keys using a hash function.
nconf.setPartitionerClass(HashPartitioner.class);
nconf.setNumReduceTasks(1);
lThe data flow between map and reduce tasks is “the shuffle,” as each reduce task is fed by many map tasks.
clip_image028
lIt’s also possible to have zero reduce tasks. This can be appropriate when you don’t need the shuffle since the processing can be carried out entirely in parallel
3.MapReduce Types and Formats
3.1.MapReduce Types
lThe map and reduce functions in Hadoop MapReduce have the following general form:
clip_image030
clip_image032
lThe partition function operates on the intermediate key and value types (K2 and V2), and returns the partition index.
clip_image034
clip_image036
3.1.1.Configuration of MapReduce types
clip_image038
lInput types are set by the input format.
nFor instance, a TextInputFormat generates keys of type LongWritable and values of type Text.
lA minimal MapReduce driver, with the defaults explicitly set
clip_image040
lThe default input format is TextInputFormat, which produces keys of type LongWritable (the offset of the beginning of the line in the file) and values of type Text (the line of text).
lThe setNumMapTasks() call does not necessarily set the number of map tasks to one
nThe actual number of map tasks depends on the size of the input
lThe default mapper is IdentityMapper
clip_image042
lMap tasks are run by MapRunner, the default implementation of MapRunnable that calls the Mapper’s map() method sequentially with each record.
lThe default partitioner is HashPartitioner, which hashes a record’s key to determine which partition the record belongs in.
nEach partition is processed by a reduce task, so the number of partitions is equal to the number of reduce tasks for the job
clip_image044
lThe default reducer is IdentityReducer
clip_image046
lRecords are sorted by the MapReduce system before being presented to the reducer.
lThe default output format is TextOutputFormat, which writes out records, one per line, by converting keys and values to strings and separating them with a tab character.
3.2.Input Formats
3.2.1.Input Splits and Records
lAn input split is a chunk of the input that is processed by a single map.
lEach split is divided into records, and the map processes each record—a key-value pair—in turn.
clip_image048
lAn InputSplit has a length in bytes, and a set of storage locations, which are just hostname strings.
lA split doesn’t contain the input data; it is just a reference to the data.
lThe storage locations are used by the MapReduce system to place map tasks as close to the split’s data as possible
lThe size is used to order the splits so that the largest get processed first
lAn InputFormat is responsible for creating the input splits, and dividing them into records.
clip_image050
lThe JobClient calls the getSplits() method, passing the desired number of map tasks as the numSplits argument.
lHaving calculated the splits, the client sends them to the jobtracker, which uses their storage locations to schedule map tasks to process them on the tasktrackers.
lOn a tasktracker, the map task passes the split to the getRecordReader() method on InputFormat to obtain a RecordReader for that split.
lA RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs, which it passes to the map function.
clip_image052
lThe same key and value objects are used on each invocation of the map() method—only their contents are changed. If you need to change the value out of map, make a copy of the object you want to hold on to.
3.2.2.FileInputFormat
lFileInputFormat is the base class for all implementations of InputFormat that use files as their data source.
lIt provides two things: a place to define which files are included as the input to a job, and an implementation for generating splits for the input files.
clip_image054
lFileInputFormat input paths may represent a file, a directory, or, by using a glob, a collection of files and directories.
clip_image056
lTo exclude certain files from the input, you can set a filter using the setInputPathFilter() method on FileInputFormat
clip_image058
lFileInputFormat splits only large files. Here “large” means larger than an HDFS block.
lProperties for controlling split size
nThe minimum split size is usually 1 byte, by setting this to a value larger than the block size, they can force splits to be larger than a block.
nThe maximum split size defaults to the maximum value that can be represented by a Java long type. It has an effect only when it is less than the block size, forcing splits to be smaller than a block.
Small files and CombineFileInputFormat
lHadoop works better with a small number of large files than a large number of small files.
lWhere FileInputFormat creates a split per file, CombineFileInputFormat packs many files into each split so that each mapper has more to process.
lOne technique for avoiding the many small files case is to merge small files into larger files by using a SequenceFile: the keys can act as filenames and the values as file contents.
3.2.3.Text Input
lTextInputFormat is the default InputFormat.
nEach record is a line of input.
nThe key, a LongWritable, is the byte offset within the file of the beginning of the line.
nThe value is the contents of the line, excluding any line terminators, and is packaged as a Text object.
lThe logical records that FileInputFormats define do not usually fit neatly into HDFS blocks.
lA single file is broken into lines, and the line boundaries do not correspond with the HDFS block boundaries.
lSplits honor logical record boundaries
nThe first split contains line 5, even though it spans the first and second block.
nThe second split starts at line 6.
lData-local maps will perform some remote reads.
clip_image060
KeyValueTextInputFormat
lIt is common for each line in a file to be a key-value pair, separated by a delimiter such as a tab character.
lYou can specify the separator via the key.value.separator.in.input.line property.
NLineInputFormat
lIf you want your mappers to receive a fixed number of lines of input, then NLineInputFormat is the InputFormat to use.
lLike TextInputFormat, the keys are the byte offsets within the file and the values are the lines themselves.
lN refers to the number of lines of input that each mapper receives.
3.2.4.Binary Input
SequenceFileInputFormat
lHadoop’s sequence file format stores sequences of binary key-value pairs.
lTo use data from sequence files as the input to MapReduce, you use SequenceFileInputFormat.
lThe keys and values are determined by the sequence file, and you need to make sure that your map input types correspond.
lFor example, if your sequence file has IntWritable keys and Text values, then the map signature would be Mapper<IntWritable, Text, K, V>.
SequenceFileAsTextInputFormat
lSequenceFileAsTextInputFormat is a variant of SequenceFileInputFormat that converts the sequence file’s keys and values to Text objects.
SequenceFileAsBinaryInputFormat
lSequenceFileAsBinaryInputFormat is a variant of SequenceFileInputFormat that retrieves the sequence file’s keys and values as opaque binary objects.
lThey are encapsulated as BytesWritable objects
SequenceFile
lWriting a SequenceFile
nTo create a SequenceFile, use one of its createWriter() static methods, which returns a SequenceFile.Writer instance.
nspecify a stream to write to (either a FSDataOutputStream or a FileSystem and Path pairing), a Configuration object, and the key and value types.
nOnce you have a SequenceFile.Writer, you then write key-value pairs, using the append() method.
nThen when you’ve finished you call the close() method
clip_image062
lReading a SequenceFile
nReading sequence files from beginning to end is a matter of creating an instance of SequenceFile.Reader, and iterating over records by repeatedly invoking one of the next() methods.
clip_image064
lThe SequenceFile Format
nA sequence file consists of a header followed by one or more records.
nThe first three bytes of a sequence file are the bytes SEQ, which acts a magic number, followed by a single byte representing the version number.
nThe header contains other fields including the names of the key and value classes, compression details, user-defined metadata, and the sync marker.
nThe sync marker is used to allow a reader to synchronize to a record boundary from any position in the file.
clip_image066
3.2.5.Multiple Inputs
lThe MultipleInputs class allows you to specify the InputFormat and Mapper to use on a per-path basis.
clip_image068
3.3.Output Formats
3.3.1.Text Output
lThe default output format, TextOutputFormat, writes records as lines of text.
lIts keys and values may be of any type, since TextOutputFormat turns them to strings by calling toString() on them.
lEach key-value pair is separated by a tab character, although that may be changed using the mapred.textoutputformat.separator property.
3.3.2.Binary Output
lSequenceFileOutputFormat
lSequenceFileAsBinaryOutputFormat
lMapFileOutputFormat
Writing a MapFile
lYou create an instance of MapFile.Writer, then call the append() method to add entries in order.
lKeys must be instances of WritableComparable, and values must be Writable
clip_image070
lIf we look at the MapFile, we see it’s actually a directory containing two files called data and index:
clip_image072
lBoth files are SequenceFiles. The data file contains all of the entries, in order:
clip_image074
lThe index file contains a fraction of the keys, and contains a mapping from the key to that key’s offset in the data file:
clip_image076
Reading a MapFile
lyou create a MapFile.Reader, then call the next() method until it returns false
3.3.3.Multiple Outputs
MultipleOutputFormat
lMultipleOutputFormat allows you to write data to multiple files whose names are derived from the output keys and values.
nconf.setOutputFormat(StationNameMultipleTextOutputFormat.class);
clip_image078
MultipleOutputs
lMultipleOutputs can emit different types for each output.
clip_image080
4.Developing a MapReduce Application
4.1.The Configuration API
lAn instance of the Configuration class (found in the org.apache.hadoop.conf package) represents a collection of configuration properties and their values.
lConfigurations read their properties from resources—XML files
clip_image082
lwe can access its properties using a piece of code like this:
clip_image084
4.2.Configuring the Development Environment
4.2.1.Managing Configuration
lWhen developing Hadoop applications, it is common to switch between running the application locally and running it on a cluster.
lhadoop-local.xml
clip_image086
lhadoop-localhost.xml
clip_image088
lhadoop-cluster.xml
clip_image090
clip_image092
lWith this setup, it is easy to use any configuration with the -conf command-line switch.
lFor example, the following command shows a directory listing on the HDFS server running in pseudo-distributed mode on localhost:
clip_image094
4.2.2.GenericOptionsParser, Tool, and ToolRunner
clip_image096
clip_image098
5.How MapReduce Works
5.1.Anatomy of a MapReduce Job Run
lThere are four independent entities:
nThe client, which submits the MapReduce job.
nThe jobtracker, which coordinates the job run. The jobtracker is a Java application whose main class is JobTracker.
nThe tasktrackers, which run the tasks that the job has been split into. Tasktrackers are Java applications whose main class is TaskTracker.
nThe distributed filesystem, which is used for sharing job files between the other entities.
clip_image100
5.1.1.Job Submission
lThe runJob() method on JobClient creates a new JobClient instance and calls submitJob() on it.
lHaving submitted the job, runJob() polls the job’s progress once a second, and reports the progress to the console if it has changed since the last report.
lWhen the job is complete, if it was successful, the job counters are displayed. Otherwise, the error that caused the job to fail is logged to the console.
The job submission process
lAsks the jobtracker for a new job ID (by calling getNewJobId() on JobTracker)
lChecks the output specification of the job.
lComputes the input splits for the job.
lCopies the resources needed to run the job, including the job JAR file, the configuration file and the computed input splits, to the jobtracker’s filesystem in a directory named after the job ID.
lTells the jobtracker that the job is ready for execution (by calling submitJob() on JobTracker)
5.1.2.Job Initialization
lWhen the JobTracker receives a call to its submitJob() method, it puts it into an internal queue from where the job scheduler will pick it up and initialize it.
lInitialization involves creating an object to represent the job being run, which encapsulates its tasks, and bookkeeping information to keep track of the tasks’ status and progress.
lTo create the list of tasks to run, the job scheduler first retrieves the input splits computed by the JobClient from the shared filesystem.
lIt then creates one map task for each split.
lTasks are given IDs at this point.
5.1.3.Task Assignment
lTasktrackers run a simple loop that periodically sends heartbeat method calls to the jobtracker.
lAs a part of the heartbeat, a tasktracker will indicate whether it is ready to run a new task, and if it is, the jobtracker will allocate it a task, which it communicates to the tasktracker using the heartbeat return value
lBefore it can choose a task for the tasktracker, the jobtracker must choose a job to select the task from according to priority.(setJobPriority() and FIFO)
lTasktrackers have a fixed number of slots for map tasks and for reduce tasks.
lThe default scheduler fills empty map task slots before reduce task slots
lTo choose a reduce task the jobtracker simply takes the next in its list of yet-to-be-run reduce tasks, since there are no data locality considerations.
5.1.4.Task Execution
lNow the tasktracker has been assigned a task, the next step is for it to run the task.
lFirst, it localizes the job JAR by copying it from the shared filesystem to the tasktracker’s filesystem.
lIt also copies any files needed from the distributed cache by the application to the local disk
lSecond, it creates a local working directory for the task, and un-jars the contents of the JAR into this directory.
lThird, it creates an instance of TaskRunner to run the task.
lTaskRunner launches a new Java Virtual Machine to run each task in
lIt is however possible to reuse the JVM between tasks;
lThe child process communicates with its parent through the umbilical interface.
5.1.5.Job Completion
lWhen the jobtracker receives a notification that the last task for a job is complete, it changes the status for the job to “successful.” T
lhen, when the JobClient polls for status, it learns that the job has completed successfully, so it prints a message to tell the user, and then returns from the runJob() method.
clip_image102
5.2.Failures
5.2.1.Task Failure
lThe most common way is when user code in the map or reduce task throws a runtime exception.
nthe child JVM reports the error back to its parent tasktracker, before it exits.
nThe error ultimately makes it into the user logs.
nThe tasktracker marks the task attempt as failed, freeing up a slot to run another task.
lAnother failure mode is the sudden exit of the child JVM
nthe tasktracker notices that the process has exited, and marks the attempt as failed.
lHanging tasks are dealt with differently.
nThe tasktracker notices that it hasn’t received a progress update for a while, and proceeds to mark the task as failed.
nThe child JVM process will be automatically killed after this period
lWhen the jobtracker is notified of a task attempt that has failed (by the tasktracker’s heartbeat call) it will reschedule execution of the task.
nThe jobtracker will try to avoid rescheduling the task on a tasktracker where it has previously failed.
nIf a task fails more than four times, it will not be retried further.
5.2.2.Tasktracker Failure
lIf a tasktracker fails by crashing, or running very slowly, it will stop sending heartbeats to the jobtracker (or send them very infrequently).
lThe jobtracker will notice a tasktracker that has stopped sending heartbeats and remove it from its pool of tasktrackers to schedule tasks on.
lThe jobtracker arranges for map tasks that were run and completed successfully on that tasktracker to be rerun if they belong to incomplete jobs, since their intermediate output residing on the failed tasktracker’s local filesystem may not be accessible to the reduce task. Any tasks in progress are also rescheduled.
5.2.3.Jobtracker Failure
5.3.Shuffle and Sort
clip_image104
5.3.1.The Map Side
lWhen the map function starts producing output, it is not simply written to disk.
lEach map task has a circular memory buffer that it writes the output to.
lWhen the contents of the buffer reach a certain threshold size, a background thread will start to spill the contents to disk.
lSpills are written in round-robin fashion to the directories specified by the mapred.local.dir property
lBefore it writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to.
lWithin each partition, the background thread performs an in-memory sort by key.
lEach time the memory buffer reaches the spill threshold, a new spill file is created, so after the map task has written its last output record there could be several spill files.
lBefore the task is finished, the spill files are merged into a single partitioned and sorted output file.
lThe output file’s partitions are made available to the reducers over HTTP.
lThe number of worker threads used to serve the file partitions is controlled by the task tracker.http.threads property
5.3.2.The Reduce Side
lAs map tasks complete successfully, they notify their parent tasktracker of the status update, which in turn notifies the jobtracker.
lfor a given job, the jobtracker knows the mapping between map outputs and tasktrackers.
lA thread in the reducer periodically asks the jobtracker for map output locations until it has retrieved them all.
lThe reduce task needs the map output for its particular partition from several map tasks across the cluster.
lThe map tasks may finish at different times, so the reduce task starts copying their outputs as soon as each completes. This is known as the copy phase of the reduce task.
lThe reduce task has a small number of copier threads so that it can fetch map outputs in parallel.
lAs the copies accumulate on disk, a background thread merges them into larger, sorted files.
lWhen all the map outputs have been copied, the reduce task moves into the sort phase (which should properly be called the merge phase, as the sorting was carried out on the map side), which merges the map outputs, maintaining their sort ordering.
lDuring the reduce phase the reduce function is invoked for each key in the sorted output. The output of this phase is written directly to the output filesystem, typically HDFS.
来顶一下
返回首页
返回首页
发表评论 共有条评论
用户名: 密码:
验证码: 匿名发表
推荐资讯
在CentOS下搭建Android 开发环境
在CentOS下搭建Androi
轻松搭建属于自己的Ubuntu发行版
轻松搭建属于自己的Ub
利用SUSE Studio 打造自己的个性化Linux发行版
利用SUSE Studio 打造
那些采用PHP技术的IT大企业
那些采用PHP技术的IT大
相关文章
    无相关信息
栏目更新
栏目热门