hadoop(一)入门

This is about jdk

Posted by PsycheLee on 2015-10-08

hadoop(一)入门

创建用户和文件夹

1
2
3
4
5
6
[root@hadoop001 ~]# useradd bigdata
[root@hadoop001 ~]# id bigdata
uid=1000(bigdata) gid=1000(bigdata) groups=1000(bigdata)

[root@hadoop001 ~]# su - bigdata
[bigdata@hadoop001 ~]$ mkdir tmp sourcecode software shell log lib data app

安装部署jdk

hadoop解压和软连接

1
2
3
4
5
6
7
8
9
[bigdata@hadoop001 software]$ tar -xzvf hadoop-2.6.0-cdh5.16.2.tar.gz -C ../app/
[bigdata@hadoop001 app]$ ll
total 4
drwxr-xr-x 14 bigdata bigdata 4096 Jun 3 2019 hadoop-2.6.0-cdh5.16.2
[bigdata@hadoop001 app]$ ln -s hadoop-2.6.0-cdh5.16.2 hadoop
[bigdata@hadoop001 app]$ ll
total 4
lrwxrwxrwx 1 bigdata bigdata 23 Nov 28 10:28 hadoop -> hadoop-2.6.0-cdh5.16.2/
drwxr-xr-x 14 bigdata bigdata 4096 Nov 28 10:29 hadoop-2.6.0-cdh5.16.2

软连接作用

1.版本切换
/home/hadoop/app/hadoop
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2
想要升级 代码脚本都要仔细检查修改 2–》3

但是如果提前设置软连接,代码脚本是hadoop,不关心版本多少

2.小盘换大盘
/根目录磁盘 设置的比较小 20G /app/log/hadoop-hdfs 文件夹 18G
/data01

mv /app/log/hadoop-hdfs /data01/ ==>/data01/hadoop-hdfs
ln -s /data01/hadoop-hdfs /app/log/hadoop-hdfs

环境变量配置

1
2
3
[bigdata@hadoop001 app]$ cd hadoop/etc/hadoop
[bigdata@hadoop001 hadoop]$ vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_181

配置ssh无密码认证

1
2
3
4
5
#hosts文件修改 前两行不能动 172.22.212.16内网IP
[root@hadoop001 ~]# vi /etc/hosts
127.0.0.1 localhost localhost
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.22.212.16 hadoop001 hadoop001
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# 删除已存在的.ssh 建议生产上mv移走重命名
[bigdata@hadoop001 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/bigdata/.ssh/id_rsa):
Created directory '/home/bigdata/.ssh'.
Enter passphrase (empty for no passphrase): #直接回车
Enter same passphrase again: #直接回车
Your identification has been saved in /home/bigdata/.ssh/id_rsa.
Your public key has been saved in /home/bigdata/.ssh/id_rsa.pub.
The key fingerprint is:
f7:00:3a:37:09:b0:24:97:eb:d2:6d:82:35:27:fd:6d bigdata@hadoop001
The key's randomart image is:
+--[ RSA 2048]----+
| . +. |
| +.o |
| .o. . |
| = oo o |
| = =o.S.o |
| o + oo.oEo |
| . o . . |
| |
| |
+-----------------+
[bigdata@hadoop001 ~]$ cd .ssh
[bigdata@hadoop001 .ssh]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[bigdata@hadoop001 .ssh]$ chmod 0600 ~/.ssh/authorized_keys
###验证
[bigdata@hadoop001 .ssh]$ ssh hadoop001 date
The authenticity of host 'hadoop001 (172.22.212.16)' can't be established.
ECDSA key fingerprint is d1:f6:16:e4:8b:e8:86:68:1f:75:ab:8f:1b:03:4f:4b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop001,172.22.212.16' (ECDSA) to the list of known hosts.
Sat Nov 28 10:37:51 CST 2020
[bigdata@hadoop001 .ssh]$ ssh hadoop001 date
Sat Nov 28 10:37:54 CST 2020

修改配置,hdsf的三个进程都以hadoop001启动

nn配置修改

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[bigdata@hadoop001 ~]$ cd app/hadoop/etc/hadoop
[bigdata@hadoop001 hadoop]$ vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop001:9000</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/home/bigdata/tmp/</value>
</property>

</configuration>

snn配置修改

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[bigdata@hadoop001 hadoop]$ vi hdfs-site.xml 
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop001:9868</value>
</property>

<property>
<name>dfs.namenode.secondary.https-address</name>
<value>hadoop001:9869</value>
</property>
</configuration>

dn配置修改

1
2
[bigdata@hadoop001 hadoop]$ vi slaves
hadoop001

格式化

1
2
3
[bigdata@hadoop001 hadoop]$ pwd
/home/bigdata/app/hadoop
[bigdata@hadoop001 hadoop]$ bin/hdfs namenode -format

启动

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[bigdata@hadoop001 hadoop]$ pwd
/home/bigdata/app/hadoop
[bigdata@hadoop001 hadoop]$ sbin/start-dfs.sh
20/11/28 11:02:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop001]
hadoop001: starting namenode, logging to /home/bigdata/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-bigdata-namenode-hadoop001.out
hadoop001: starting datanode, logging to /home/bigdata/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-bigdata-datanode-hadoop001.out
Starting secondary namenodes [hadoop001]
hadoop001: starting secondarynamenode, logging to /home/bigdata/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-bigdata-secondarynamenode-hadoop001.out
20/11/28 11:02:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[bigdata@hadoop001 hadoop]$ jps
17733 Jps
17624 SecondaryNameNode #snn 老二 默认是按1小时粒度去备份老大的数据
17465 DataNode #dn 存储数据的 小弟
17340 NameNode #nn 负责分配数据存储的 老大

web界面

http://106.14…23.129:50070/dfshealth.html#tab-overview

  • 50070端口要去安全组配置

  • 更改默认端口50070为50071

    1
    2
    3
    4
    5
    6
    7
    8
    [bigdata@hadoop001 hadoop]$ vi hdfs-site.xml 
    ##增加一组参数配置
    <property>
    <name>dfs.namenode.http-address</name>
    <value>hadoop001:50071</value>
    </property>
    [bigdata@hadoop001 hadoop]$ stop-dfs.sh
    [bigdata@hadoop001 hadoop]$ start-dfs.sh

创建hdfs文件夹

1
2
3
[root@hadoop001 ~]# vi /etc/profile
export HADOOP_HOME=/home/bigdata/app/hadoop
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
1
2
3
4
5
6
[bigdata@hadoop001 ~]$ hdfs dfs -mkdir /user
[bigdata@hadoop001 ~]$ hdfs dfs -ls /
20/11/28 11:59:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x - bigdata supergroup 0 2020-11-28 11:58 /user
[bigdata@hadoop001 hadoop]$ hdfs dfs -mkdir -p /user/bigdata/input

案例

准备input文件并上传

1
2
3
4
5
6
7
8
9
10
11
[bigdata@hadoop001 hadoop]$ mkdir input
[bigdata@hadoop001 hadoop]$ cd input/
[bigdata@hadoop001 input]$ vi 1.log
[bigdata@hadoop001 input]$ vi 2.log
#上传
[bigdata@hadoop001 hadoop]$ hdfs dfs -put input /user/bigdata/input
20/11/28 12:14:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[bigdata@hadoop001 hadoop]$ hdfs dfs -ls
20/11/28 12:14:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x - bigdata supergroup 0 2020-11-28 12:14 input

运行jar包

1
[bigdata@hadoop001 hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar grep input output 'dfs[a-z.]+'

查看结果

1
2
3
4
5
6
7
8
[bigdata@hadoop001 hadoop]$ hdfs dfs -get output output #下载
[bigdata@hadoop001 hadoop]$ cd output/
[bigdata@hadoop001 output]$ ll
total 4
-rw-r--r-- 1 bigdata bigdata 10 Nov 28 12:16 part-r-00000
-rw-r--r-- 1 bigdata bigdata 0 Nov 28 12:16 _SUCCESS
[bigdata@hadoop001 output]$ cat part-r-00000
3 dfsssss

注意

9.数据存储在/tmp/hadoop-hadoop 不合理
因为/tmp目录 30天 不访问的文件文件夹会被按照规则删除
所以生产上不要把内容丢在/tmp目录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[bigdata@hadoop001 tmp]$ mv /tmp/hadoop-hadoop/* /home/hadoop/tmp/

[bigdata@hadoop001 tmp]$ ll /home/hadoop/tmp/

total 0

drwxrwxr-x 5 bigdata bigdata 48 Nov 21 21:30 dfs

drwxrwxr-x 4 bigdata bigdata 32 Nov 21 21:56 mapred

在core-site.xml文件中新增

<property>

<name>hadoop.tmp.dir</name>

<value>/home/bigdata/tmp</value>

</property>

重启dfs

stop-dfs.sh

start-dfs.sh

但是namenode、datanode、checkpoint(secondarynamenode)官方默认配置如下:
dfs.namenode.name.dir --> file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir --> file://${hadoop.tmp.dir}/dfs/data
dfs.namenode.checkpoint.dir --> file://${hadoop.tmp.dir}/dfs/namesecondary

所以配置hadoop.tmp.dir临时目录改为/home/bigdata/tmp,
那么namenode、datanode、checkpoint(secondarynamenode)
存储也对应变更。