hadoop(一)入门
创建用户和文件夹
1 2 3 4 5 6 [root@hadoop001 ~]# useradd bigdata [root@hadoop001 ~]# id bigdata uid=1000(bigdata) gid=1000(bigdata) groups=1000(bigdata) [root@hadoop001 ~]# su - bigdata [bigdata@hadoop001 ~]$ mkdir tmp sourcecode software shell log lib data app
安装部署jdk
hadoop解压和软连接
1 2 3 4 5 6 7 8 9 [bigdata@hadoop001 software]$ tar -xzvf hadoop-2.6.0-cdh5.16.2.tar.gz -C ../app/ [bigdata@hadoop001 app]$ ll total 4 drwxr-xr-x 14 bigdata bigdata 4096 Jun 3 2019 hadoop-2.6.0-cdh5.16.2 [bigdata@hadoop001 app]$ ln -s hadoop-2.6.0-cdh5.16.2 hadoop [bigdata@hadoop001 app]$ ll total 4 lrwxrwxrwx 1 bigdata bigdata 23 Nov 28 10:28 hadoop -> hadoop-2.6.0-cdh5.16.2/ drwxr-xr-x 14 bigdata bigdata 4096 Nov 28 10:29 hadoop-2.6.0-cdh5.16.2
软连接作用
1.版本切换
/home/hadoop/app/hadoop
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2
想要升级 代码脚本都要仔细检查修改 2–》3
但是如果提前设置软连接,代码脚本是hadoop,不关心版本多少
2.小盘换大盘
/根目录磁盘 设置的比较小 20G /app/log/hadoop-hdfs 文件夹 18G
/data01
mv /app/log/hadoop-hdfs /data01/ ==>/data01/hadoop-hdfs
ln -s /data01/hadoop-hdfs /app/log/hadoop-hdfs
环境变量配置
1 2 3 [bigdata@hadoop001 app]$ cd hadoop/etc/hadoop [bigdata@hadoop001 hadoop]$ vi hadoop-env.sh export JAVA_HOME=/usr/java/jdk1.8.0_181
配置ssh无密码认证
1 2 3 4 5 #hosts文件修改 前两行不能动 172.22.212.16内网IP [root@hadoop001 ~]# vi /etc/hosts 127.0.0.1 localhost localhost ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.22.212.16 hadoop001 hadoop001
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 # 删除已存在的.ssh 建议生产上mv移走重命名 [bigdata@hadoop001 ~]$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/bigdata/.ssh/id_rsa): Created directory '/home/bigdata/.ssh'. Enter passphrase (empty for no passphrase): #直接回车 Enter same passphrase again: #直接回车 Your identification has been saved in /home/bigdata/.ssh/id_rsa. Your public key has been saved in /home/bigdata/.ssh/id_rsa.pub. The key fingerprint is: f7:00:3a:37:09:b0:24:97:eb:d2:6d:82:35:27:fd:6d bigdata@hadoop001 The key's randomart image is: +--[ RSA 2048]----+ | . +. | | +.o | | .o. . | | = oo o | | = =o.S.o | | o + oo.oEo | | . o . . | | | | | +-----------------+ [bigdata@hadoop001 ~]$ cd .ssh [bigdata@hadoop001 .ssh]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys [bigdata@hadoop001 .ssh]$ chmod 0600 ~/.ssh/authorized_keys ###验证 [bigdata@hadoop001 .ssh]$ ssh hadoop001 date The authenticity of host 'hadoop001 (172.22.212.16)' can't be established. ECDSA key fingerprint is d1:f6:16:e4:8b:e8:86:68:1f:75:ab:8f:1b:03:4f:4b. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'hadoop001,172.22.212.16' (ECDSA) to the list of known hosts. Sat Nov 28 10:37:51 CST 2020 [bigdata@hadoop001 .ssh]$ ssh hadoop001 date Sat Nov 28 10:37:54 CST 2020
修改配置,hdsf的三个进程都以hadoop001启动
nn配置修改
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [bigdata@hadoop001 ~]$ cd app/hadoop/etc/hadoop [bigdata@hadoop001 hadoop]$ vi core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop001:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/bigdata/tmp/</value> </property> </configuration>
snn配置修改
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [bigdata@hadoop001 hadoop]$ vi hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop001:9868</value> </property> <property> <name>dfs.namenode.secondary.https-address</name> <value>hadoop001:9869</value> </property> </configuration>
dn配置修改
1 2 [bigdata@hadoop001 hadoop]$ vi slaves hadoop001
格式化
1 2 3 [bigdata@hadoop001 hadoop]$ pwd /home/bigdata/app/hadoop [bigdata@hadoop001 hadoop]$ bin/hdfs namenode -format
启动
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [bigdata@hadoop001 hadoop]$ pwd /home/bigdata/app/hadoop [bigdata@hadoop001 hadoop]$ sbin/start-dfs.sh 20/11/28 11:02:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [hadoop001] hadoop001: starting namenode, logging to /home/bigdata/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-bigdata-namenode-hadoop001.out hadoop001: starting datanode, logging to /home/bigdata/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-bigdata-datanode-hadoop001.out Starting secondary namenodes [hadoop001] hadoop001: starting secondarynamenode, logging to /home/bigdata/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-bigdata-secondarynamenode-hadoop001.out 20/11/28 11:02:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [bigdata@hadoop001 hadoop]$ jps 17733 Jps 17624 SecondaryNameNode #snn 老二 默认是按1小时粒度去备份老大的数据 17465 DataNode #dn 存储数据的 小弟 17340 NameNode #nn 负责分配数据存储的 老大
web界面
http://106.14 …23.129:50070/dfshealth.html#tab-overview
50070端口要去安全组配置
更改默认端口50070为50071
1 2 3 4 5 6 7 8 [bigdata@hadoop001 hadoop]$ vi hdfs-site.xml ##增加一组参数配置 <property> <name>dfs.namenode.http-address</name> <value>hadoop001:50071</value> </property> [bigdata@hadoop001 hadoop]$ stop-dfs.sh [bigdata@hadoop001 hadoop]$ start-dfs.sh
创建hdfs文件夹
1 2 3 [root@hadoop001 ~]# vi /etc/profile export HADOOP_HOME=/home/bigdata/app/hadoop export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
1 2 3 4 5 6 [bigdata@hadoop001 ~]$ hdfs dfs -mkdir /user [bigdata@hadoop001 ~]$ hdfs dfs -ls / 20/11/28 11:59:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxr-xr-x - bigdata supergroup 0 2020-11-28 11:58 /user [bigdata@hadoop001 hadoop]$ hdfs dfs -mkdir -p /user/bigdata/input
案例
1 2 3 4 5 6 7 8 9 10 11 [bigdata@hadoop001 hadoop]$ mkdir input [bigdata@hadoop001 hadoop]$ cd input/ [bigdata@hadoop001 input]$ vi 1.log [bigdata@hadoop001 input]$ vi 2.log #上传 [bigdata@hadoop001 hadoop]$ hdfs dfs -put input /user/bigdata/input 20/11/28 12:14:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [bigdata@hadoop001 hadoop]$ hdfs dfs -ls 20/11/28 12:14:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items drwxr-xr-x - bigdata supergroup 0 2020-11-28 12:14 input
运行jar包
1 [bigdata@hadoop001 hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar grep input output 'dfs[a-z.]+'
查看结果
1 2 3 4 5 6 7 8 [bigdata@hadoop001 hadoop]$ hdfs dfs -get output output #下载 [bigdata@hadoop001 hadoop]$ cd output/ [bigdata@hadoop001 output]$ ll total 4 -rw-r--r-- 1 bigdata bigdata 10 Nov 28 12:16 part-r-00000 -rw-r--r-- 1 bigdata bigdata 0 Nov 28 12:16 _SUCCESS [bigdata@hadoop001 output]$ cat part-r-00000 3 dfsssss
注意
9.数据存储在/tmp/hadoop-hadoop 不合理
因为/tmp目录 30天 不访问的文件文件夹会被按照规则删除
所以生产上不要把内容丢在/tmp目录
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [bigdata@hadoop001 tmp]$ mv /tmp/hadoop-hadoop/* /home/hadoop/tmp/ [bigdata@hadoop001 tmp]$ ll /home/hadoop/tmp/ total 0 drwxrwxr-x 5 bigdata bigdata 48 Nov 21 21:30 dfs drwxrwxr-x 4 bigdata bigdata 32 Nov 21 21:56 mapred 在core-site.xml文件中新增 <property> <name>hadoop.tmp.dir</name> <value>/home/bigdata/tmp</value> </property> 重启dfs stop-dfs.sh start-dfs.sh
但是namenode、datanode、checkpoint(secondarynamenode)官方默认配置如下:
dfs.namenode.name.dir --> file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir --> file://${hadoop.tmp.dir}/dfs/data
dfs.namenode.checkpoint.dir --> file://${hadoop.tmp.dir}/dfs/namesecondary
所以配置hadoop.tmp.dir临时目录改为/home/bigdata/tmp,
那么namenode、datanode、checkpoint(secondarynamenode)
存储也对应变更。