加入收藏 | 设为首页 | 会员中心 | 我要投稿 | RSS
您当前的位置:首页 > 教程文章 > NOSQL数据库

Linux系统下运行基于本地的Hadoop

时间:2012-04-25 17:32:31  来源:  作者:

感觉,又不同于在Windows下使用Cygwin模拟Linux环境下运行Hadoop。在Linux下,如果权限不够,根本就不可能让你运行的。

当然,使用root用户没有问题了,看看我的运行过程。我使用的是hadoop-0.18.0版本的。

首先,修改Hadoop配置文件hadoop-env.sh,设置JAVA_HOME:
 

# The java implementation to use. Required.
export JAVA_HOME="/usr/java/jdk1.6.0_07"

其次,切换到root用户,并通过ssh登录到127.0.0.1:
 

[shirdrn@shirdrn hadoop-0.18.0]$ su root
口令:
[root@shirdrn hadoop-0.18.0]# ssh localhost
root@localhost's password:
Last login: Wed Sep 24 19:25:21 2008 from localhost.localdomain
[root@shirdrn ~]#

接着,准备输入数据文件,在hadoop-0.18.0目录下面新建一个目录my-input,里面新建了7个TXT文件,文件内容就是使用空格分隔的英文单词。

然后,切换到hadoop-0.18.0目录下面,并运行WordCount统计词频的工具:
 

[root@shirdrn hadoop-0.18.0]# bin/hadoop jar hadoop-0.18.0-examples.jar wordcount my-input my-output

运行过程如下所示:
 

08/09/25 16:32:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
08/09/25 16:32:40 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 16:32:40 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 16:32:41 INFO mapred.JobClient: Running job: job_local_0001
08/09/25 16:32:41 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 16:32:41 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 16:32:41 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:41 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:42 INFO mapred.JobClient: map 0% reduce 0%
08/09/25 16:32:44 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:44 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:45 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:45 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720
08/09/25 16:32:45 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680
08/09/25 16:32:45 INFO mapred.MapTask: Index: (0, 26, 26)
08/09/25 16:32:45 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:45 INFO mapred.LocalJobRunner: file:/home/shirdrn/hadoop-0.18.0/my-input/e.txt:0+1957
08/09/25 16:32:45 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
08/09/25 16:32:45 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000000_0' to file:/home/shirdrn/hadoop-0.18.0/my-output
08/09/25 16:32:46 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:46 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:46 INFO mapred.JobClient: map 100% reduce 0%
08/09/25 16:32:46 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:46 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:46 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:46 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720
08/09/25 16:32:46 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680
08/09/25 16:32:46 INFO mapred.MapTask: Index: (0, 26, 26)
08/09/25 16:32:46 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:46 INFO mapred.LocalJobRunner: file:/home/shirdrn/hadoop-0.18.0/my-input/a.txt:0+1957
08/09/25 16:32:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.
08/09/25 16:32:46 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000001_0' to file:/home/shirdrn/hadoop-0.18.0/my-output
08/09/25 16:32:46 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:46 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:47 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:47 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:47 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:47 INFO mapred.MapTask: bufstart = 0; bufend = 16845; bufvoid = 99614720
08/09/25 16:32:47 INFO mapred.MapTask: kvstart = 0; kvend = 1684; length = 327680
08/09/25 16:32:47 INFO mapred.MapTask: Index: (0, 42, 42)
08/09/25 16:32:47 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:47 INFO mapred.LocalJobRunner: file:/home/shirdrn/hadoop-0.18.0/my-input/b.txt:0+10109
08/09/25 16:32:47 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0' done.
08/09/25 16:32:47 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000002_0' to file:/home/shirdrn/hadoop-0.18.0/my-output
08/09/25 16:32:47 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:47 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:48 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:48 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:48 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:48 INFO mapred.MapTask: bufstart = 0; bufend = 3312; bufvoid = 99614720
08/09/25 16:32:48 INFO mapred.MapTask: kvstart = 0; kvend = 331; length = 327680
08/09/25 16:32:48 INFO mapred.MapTask: Index: (0, 72, 72)
08/09/25 16:32:48 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:48 INFO mapred.LocalJobRunner: file:/home/shirdrn/hadoop-0.18.0/my-input/d.txt:0+1987
08/09/25 16:32:48 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000003_0' done.
08/09/25 16:32:48 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000003_0' to file:/home/shirdrn/hadoop-0.18.0/my-output
08/09/25 16:32:48 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:48 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:49 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:49 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:49 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:49 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720
08/09/25 16:32:49 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680
08/09/25 16:32:49 INFO mapred.MapTask: Index: (0, 26, 26)
08/09/25 16:32:49 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:49 INFO mapred.LocalJobRunner: file:/home/shirdrn/hadoop-0.18.0/my-input/g.txt:0+1957
08/09/25 16:32:49 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000004_0' done.
08/09/25 16:32:49 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000004_0' to file:/home/shirdrn/hadoop-0.18.0/my-output
08/09/25 16:32:49 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:49 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:49 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:49 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:49 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:49 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720
08/09/25 16:32:49 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680
08/09/25 16:32:50 INFO mapred.MapTask: Index: (0, 26, 26)
08/09/25 16:32:50 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:50 INFO mapred.LocalJobRunner: file:/home/shirdrn/hadoop-0.18.0/my-input/c.txt:0+1957
08/09/25 16:32:50 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000005_0' done.
08/09/25 16:32:50 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000005_0' to file:/home/shirdrn/hadoop-0.18.0/my-output
08/09/25 16:32:50 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:50 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:50 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:50 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:50 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:50 INFO mapred.MapTask: bufstart = 0; bufend = 3306; bufvoid = 99614720
08/09/25 16:32:50 INFO mapred.MapTask: kvstart = 0; kvend = 330; length = 327680
08/09/25 16:32:50 INFO mapred.MapTask: Index: (0, 50, 50)
08/09/25 16:32:50 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:50 INFO mapred.LocalJobRunner: file:/home/shirdrn/hadoop-0.18.0/my-input/f.txt:0+1985
08/09/25 16:32:50 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000006_0' done.
08/09/25 16:32:50 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000006_0' to file:/home/shirdrn/hadoop-0.18.0/my-output
08/09/25 16:32:51 INFO mapred.ReduceTask: Initiating final on-disk merge with 7 files
08/09/25 16:32:51 INFO mapred.Merger: Merging 7 sorted segments
08/09/25 16:32:51 INFO mapred.Merger: Down to the last merge-pass, with 7 segments left of total size: 268 bytes
08/09/25 16:32:51 INFO mapred.LocalJobRunner: reduce > reduce
08/09/25 16:32:51 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
08/09/25 16:32:51 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_r_000000_0' to file:/home/shirdrn/hadoop-0.18.0/my-output
08/09/25 16:32:51 INFO mapred.JobClient: Job complete: job_local_0001
08/09/25 16:32:51 INFO mapred.JobClient: Counters: 11
08/09/25 16:32:51 INFO mapred.JobClient:   File Systems
08/09/25 16:32:51 INFO mapred.JobClient:     Local bytes read=953869
08/09/25 16:32:51 INFO mapred.JobClient:     Local bytes written=961900
08/09/25 16:32:51 INFO mapred.JobClient:   Map-Reduce Framework
08/09/25 16:32:51 INFO mapred.JobClient:     Reduce input groups=7
08/09/25 16:32:51 INFO mapred.JobClient:     Combine output records=21
08/09/25 16:32:51 INFO mapred.JobClient:     Map input records=7
08/09/25 16:32:51 INFO mapred.JobClient:     Reduce output records=7
08/09/25 16:32:51 INFO mapred.JobClient:     Map output bytes=36511
08/09/25 16:32:51 INFO mapred.JobClient:     Map input bytes=21909
08/09/25 16:32:51 INFO mapred.JobClient:     Combine input records=3649
08/09/25 16:32:51 INFO mapred.JobClient:     Map output records=3649
08/09/25 16:32:51 INFO mapred.JobClient:     Reduce input records=21

最后,查看处理数据的结果:
 

[root@shirdrn hadoop-0.18.0]# cat my-output/part-00000
apache 1826
baketball       1
bash    1813
fax     2
find    1
hash    1
shirdrn 5

运行完成之后,可以到tmp目录下查看hadoop-root目录和hsperfdata_root目录,其实hsperfdata_root是空的,不过在hadoop-root目录下有很多临时目录和文件:
 

[root@shirdrn /]# ls -l -R tmp/hadoop-root
tmp/hadoop-root:
总计 8
drwxr-xr-x 4 root root 4096 09-24 19:43 mapred

tmp/hadoop-root/mapred:
总计 16
drwxr-xr-x 4 root root 4096 09-24 19:43 local
drwxr-xr-x 2 root root 4096 09-25 16:32 system

tmp/hadoop-root/mapred/local:
总计 16
drwxr-xr-x 2 root root 4096 09-25 16:32 localRunner
drwxr-xr-x 3 root root 4096 09-24 19:43 taskTracker

tmp/hadoop-root/mapred/local/localRunner:
总计 8
-rw-r--r-- 1 root root 104 09-25 16:32 split.dta

tmp/hadoop-root/mapred/local/taskTracker:
总计 8
drwxr-xr-x 3 root root 4096 09-24 19:43 jobcache

tmp/hadoop-root/mapred/local/taskTracker/jobcache:
总计 8
drwxr-xr-x 10 root root 4096 09-25 16:32 job_local_0001

tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_local_0001:
总计 64
drwxr-xr-x 2 root root 4096 09-25 16:32 attempt_local_0001_m_000000_0
drwxr-xr-x 2 root root 4096 09-25 16:32 attempt_local_0001_m_000001_0
drwxr-xr-x 2 root root 4096 09-25 16:32 attempt_local_0001_m_000002_0
drwxr-xr-x 2 root root 4096 09-25 16:32 attempt_local_0001_m_000003_0
drwxr-xr-x 2 root root 4096 09-25 16:32 attempt_local_0001_m_000004_0
drwxr-xr-x 2 root root 4096 09-25 16:32 attempt_local_0001_m_000005_0
drwxr-xr-x 2 root root 4096 09-25 16:32 attempt_local_0001_m_000006_0
drwxr-xr-x 2 root root 4096 09-25 16:32 attempt_local_0001_r_000000_0

tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0:
总计 0

tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000001_0:
总计 0

tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000002_0:
总计 0

tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000003_0:
总计 0

tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000004_0:
总计 0

tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000005_0:
总计 0

tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000006_0:
总计 0

tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_local_0001/attempt_local_0001_r_000000_0:
总计 0

tmp/hadoop-root/mapred/system:
总计 0

如果没有使用root用户,就会出现很多问题,我想是因为权限的问题。

从root用户切换到用户shiyanjun:
 

[root@shirdrn /]# su shiyanjun

配置认证,并通过ssh登录到127.0.0.1:
 

[shiyanjun@shirdrn hadoop-0.18.0]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/shiyanjun/.ssh/id_rsa):
/home/shiyanjun/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/shiyanjun/.ssh/id_rsa.
Your public key has been saved in /home/shiyanjun/.ssh/id_rsa.pub.
The key fingerprint is:
76:7d:0c:8c:77:81:6c:eb:d9:7e:b2:d2:87:d0:ac:61 shiyanjun@shirdrn
[shiyanjun@shirdrn hadoop-0.18.0]$ ssh localhost
shiyanjun@localhost's password:
Last login: Wed Sep 24 16:30:17 2008

这时,准备好待处理数据文件,开始运行WordCount工具:
 

[shiyanjun@shirdrn hadoop-0.18.0]$ bin/hadoop jar hadoop-0.18.0-examples.jar wordcount my-input my-output

总是会因为创建输出目录而发生异常:
 

08/09/25 17:14:24 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
08/09/25 17:14:24 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 17:14:24 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 17:14:25 INFO mapred.JobClient: Running job: job_local_0001
08/09/25 17:14:25 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 17:14:25 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 17:14:25 ERROR mapred.LocalJobRunner: Mkdirs failed to create file:/home/shiyanjun/hadoop-0.18.0/my-output/_temporary
08/09/25 17:14:25 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: The directory file:/home/shiyanjun/hadoop-0.18.0/my-output/_temporary doesnt exist
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:148)
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
        at org.apache.hadoop.examples.WordCount.run(WordCount.java:149)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)


待解决...
 

因为刚刚接触Linux系统,对于一些配置还是不能游刃有余。有知道该如何配置的朋友,请不吝赐教,谢谢。

2008年09月26日 星期五

已解决,可以参考文章:Linux系统下新建(useradd)用户权限简单配置问题 (链接为:http://hi.baidu.com/shirdrn/blog/item/c8ba563f049213c47d1e715b.html)。

来顶一下
返回首页
返回首页
发表评论 共有条评论
用户名: 密码:
验证码: 匿名发表
推荐资讯
在CentOS下搭建Android 开发环境
在CentOS下搭建Androi
轻松搭建属于自己的Ubuntu发行版
轻松搭建属于自己的Ub
利用SUSE Studio 打造自己的个性化Linux发行版
利用SUSE Studio 打造
那些采用PHP技术的IT大企业
那些采用PHP技术的IT大
相关文章
    无相关信息
栏目更新
栏目热门