hadoop-0.20.2 fs 命令指南

hadoop-0.20.2 fs 命令指南

hehaibolocal:hadoop-0.20.2-tmp hehaibo$ hadoop fs -help

hadoop fs is the command to execute fs commands. The full syntax is: 

hadoop fs [-fs <local | file system URI>] [-conf <configuration file>]

 [-D <property=value>] [-ls <path>] [-lsr <path>] [-du <path>]

 [-dus <path>] [-mv <src> <dst>] [-cp <src> <dst>] [-rm [-skipTrash] <src>]

 [-rmr [-skipTrash] <src>] [-put <localsrc> ... <dst>] [-copyFromLocal <localsrc> ... <dst>]

 [-moveFromLocal <localsrc> ... <dst>] [-get [-ignoreCrc] [-crc] <src> <localdst>

 [-getmerge <src> <localdst> [addnl]] [-cat <src>]

 [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>] [-moveToLocal <src> <localdst>]

 [-mkdir <path>] [-report] [-setrep [-R] [-w] <rep> <path/file>]

 [-touchz <path>] [-test -[ezd] <path>] [-stat [format] <path>]

 [-tail [-f] <path>] [-text <path>]

 [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]

 [-chown [-R] [OWNER][:[GROUP]] PATH...]

 [-chgrp [-R] GROUP PATH...]

 [-count[-q] <path>]

 [-help [cmd]]

-fs [local | <file system URI>]:  Specify the file system to use.

  If not specified, the current configuration is used, 

  taken from the following, in increasing precedence: 

   core-default.xml inside the hadoop jar file 

   core-site.xml in $HADOOP_CONF_DIR 

  'local' means use the local file system as your DFS. 

  <file system URI> specifies a particular file system to 

  contact. This argument is optional but if used must appear

  appear first on the command line.  Exactly one additional

  argument must be specified. 

-ls <path>:  List the contents that match the specified file pattern. If

  path is not specified, the contents of /user/<currentUser>

  will be listed. Directory entries are of the form 

   dirName (full path) <dir> 

  and file entries are of the form 

   fileName(full path) <r n> size 

  where n is the number of replicas specified for the file 

  and size is the size of the file, in bytes.

-lsr <path>:  Recursively list the contents that match the specified

  file pattern.  Behaves very similarly to hadoop fs -ls,

  except that the data is shown for all the entries in the

  subtree.

-du <path>:  Show the amount of space, in bytes, used by the files that 

  match the specified file pattern.  Equivalent to the unix

  command "du -sb <path>/*" in case of a directory, 

  and to "du -b <path>" in case of a file.

  The output is in the form 

   name(full path) size (in bytes)

-dus <path>:  Show the amount of space, in bytes, used by the files that 

  match the specified file pattern.  Equivalent to the unix

  command "du -sb"  The output is in the form 

   name(full path) size (in bytes)

-mv <src> <dst>:   Move files that match the specified file pattern <src>

  to a destination <dst>.  When moving multiple files, the 

  destination must be a directory. 

-cp <src> <dst>:   Copy files that match the file pattern <src> to a 

  destination.  When copying multiple files, the destination

  must be a directory. 

-rm [-skipTrash] <src>:  Delete all files that match the specified file pattern.

  Equivalent to the Unix command "rm <src>"

  -skipTrash option bypasses trash, if enabled, and immediately

deletes <src>

-rmr [-skipTrash] <src>:  Remove all directories which match the specified file 

  pattern. Equivalent to the Unix command "rm -rf <src>"

  -skipTrash option bypasses trash, if enabled, and immediately

deletes <src>

-put <localsrc> ... <dst>:  Copy files from the local file system 

  into fs. 

-copyFromLocal <localsrc> ... <dst>: Identical to the -put command.

-moveFromLocal <localsrc> ... <dst>: Same as -put, except that the source is

  deleted after it's copied.

-get [-ignoreCrc] [-crc] <src> <localdst>:  Copy files that match the file pattern <src> 

  to the local name.  <src> is kept.  When copying mutiple, 

  files, the destination must be a directory. 

-getmerge <src> <localdst>:  Get all the files in the directories that 

  match the source file pattern and merge and sort them to only

  one file on local fs. <src> is kept.

-cat <src>:  Fetch all files that match the file pattern <src> 

  and display their content on stdout.

-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>:  Identical to the -get command.

-moveToLocal <src> <localdst>:  Not implemented yet 

-mkdir <path>:  Create a directory in specified location. 

-setrep [-R] [-w] <rep> <path/file>:  Set the replication level of a file. 

  The -R flag requests a recursive change of replication level 

  for an entire tree.

-tail [-f] <file>:  Show the last 1KB of the file. 

  The -f option shows apended data as the file grows. 

-touchz <path>: Write a timestamp in yyyy-MM-dd HH:mm:ss format

  in a file at <path>. An error is returned if the file exists with non-zero length

-test -[ezd] <path>: If file { exists, has zero length, is a directory

  then return 0, else return 1.

-text <src>:  Takes a source file and outputs the file in text format.

  The allowed formats are zip and TextRecordInputStream.

-stat [format] <path>: Print statistics about the file/directory at <path>

  in the specified format. Format accepts filesize in blocks (%b), filename (%n),

  block size (%o), replication (%r), modification date (%y, %Y)

-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...

  Changes permissions of a file.

  This works similar to shell's chmod with a few exceptions.

 -R modifies the files recursively. This is the only option

  currently supported.

 MODE Mode is same as mode used for chmod shell command.

  Only letters recognized are 'rwxX'. E.g. a+r,g-w,+rwx,o=r

 OCTALMODE Mode specifed in 3 digits. Unlike shell command,

  this requires all three digits.

  E.g. 754 is same as u=rwx,g=rx,o=r

  If none of 'augo' is specified, 'a' is assumed and unlike

  shell command, no umask is applied.

-chown [-R] [OWNER][:[GROUP]] PATH...

  Changes owner and group of a file.

  This is similar to shell's chown with a few exceptions.

 -R modifies the files recursively. This is the only option

  currently supported.

  If only owner or group is specified then only owner or

  group is modified.

  The owner and group names may only cosists of digits, alphabet,

  and any of '-_.@/' i.e. [-_.@/a-zA-Z0-9]. The names are case

  sensitive.

  WARNING: Avoid using '.' to separate user name and group though

  Linux allows it. If user names have dots in them and you are

  using local file system, you might see surprising results since

  shell command 'chown' is used for local files.

-chgrp [-R] GROUP PATH...

  This is equivalent to -chown ... :GROUP ...

-count[-q] <path>: Count the number of directories, files and bytes under the paths

  that match the specified file pattern.  The output columns are:

  DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME or

  QUOTA REMAINING_QUATA SPACE_QUOTA REMAINING_SPACE_QUOTA 

        DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME

-help [cmd]:  Displays help for given command or all commands if none

  is specified.

hehaibolocal:hadoop-0.20.2-tmp hehaibo$ 

mac 下 hadoop-0.20.2单机版环境安装

1 安装java环境安装

略…

hehaibolocal:~ hehaibo$ java -version

java version "1.8.0_91"

Java(TM) SE Runtime Environment (build 1.8.0_91-b14)

Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)

hehaibolocal:~ hehaibo$ 

2 hadoop 安装

从 https://archive.apache.org/dist/hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz 下载到 磁盘 /Users/hehaibo/hadoop/ 

执行

hehaibolocal:hadoop hehaibo$ tar xvf hadoop-0.20.2.tar.gz

安装后的目录如下:/Users/hehaibo/hadoop/hadoop-0.20.2

 

3 配置hadooop环境变量

sudo vi /etc/profile

输入:

HADOOP_HOME=/Users/hehaibo/hadoop/hadoop-0.20.2

PATH=".;$PATH:/usr/local/bin:$JAVA_HOME/bin:$ANT_HOME/bin:$MAVEN_HOME/bin:$HADOOP_HOME/bin

4 验证安装的版本

hehaibolocal:hadoop-0.20.2 hehaibo$ hadoop version

Hadoop 0.20.2

Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707

Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010

hehaibolocal:hadoop-0.20.2 hehaibo$ 

5 配置hadoop环境

5.1 配置conf/core-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<!-- 设置namenode所在主机,端口号是9000 -->

<name>fs.default.name</name>

<value>hdfs://localhost:9000/</value>

</property>

</configuration>

5.2 配置conf/hdfs-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<!-- 设置HDFS文件系统的元信息保存目录,可以设置多个,逗号分割 -->

<property>

<name>dfs.data.dir</name>

<value>/Users/hehaibo/hadoop/hadoop-0.20.2-tmp/hadoop-data</value>

</property>

<!-- 设置HDFS文件系统的数据保存在什么目录下,可以设置多个,逗号分割 -->

<property>

<name>dfs.name.dir</name>

<value>//Users/hehaibo/hadoop/hadoop-0.20.2-tmp/hadoop-name</value>

</property>

<property>

<!-- 设置数据块的复制次数,默认是3,如果slave节点数少于3,则写成相应的1或者2 -->

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

5.3 配置conf/mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<!-- 设置jobtracker所在机器,端口号9001 -->

<name>mapred.job.tracker</name>

<value>localhost:8021</value>

</property>

</configuration>

6 配置ssh免密码登录

% sudo apt-get install ssh

% ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa 

% cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

--免登成功
hehaibolocal:~ hehaibo$ ssh localhost

Last login: Thu Jul 19 16:30:48 2018

hehaibolocal:~ hehaibo$ 


7 修改conf/hadoop-env.sh 增加java环境变量

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home/

8 hadoop 格式化hdfs文件系统

% hadoop namenode -format


hehaibolocal:~ hehaibo$ hadoop namenode -format

18/07/19 16:50:25 INFO namenode.NameNode: STARTUP_MSG: 

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = hehaibolocal.local/172.17.11.24

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 0.20.2

STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

************************************************************/

18/07/19 16:50:26 INFO namenode.FSNamesystem: fsOwner=hehaibo,staff,access_bpf,everyone,localaccounts,_appserverusr,admin,_appserveradm,_lpadmin,_appstore,_lpoperator,_developer,_analyticsusers,com.apple.access_ftp,com.apple.access_screensharing,com.apple.access_ssh-disabled

18/07/19 16:50:26 INFO namenode.FSNamesystem: supergroup=supergroup

18/07/19 16:50:26 INFO namenode.FSNamesystem: isPermissionEnabled=true

18/07/19 16:50:26 INFO common.Storage: Image file of size 97 saved in 0 seconds.

18/07/19 16:50:26 INFO common.Storage: Storage directory /Users/hehaibo/hadoop/hadoop-0.20.2-tmp/hadoop-name has been successfully formatted.

18/07/19 16:50:26 INFO namenode.NameNode: SHUTDOWN_MSG: 

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at hehaibolocal.local/172.17.11.24

************************************************************/

9 启动hadoop环境

9.1启动

hehaibolocal:~ hehaibo$ start-dfs.sh

namenode running as process 5375. Stop it first.

localhost: starting datanode, logging to /Users/hehaibo/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hehaibo-datanode-hehaibolocal.local.out

localhost: starting secondarynamenode, logging to /Users/hehaibo/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hehaibo-secondarynamenode-hehaibolocal.local.out

hehaibolocal:~ hehaibo$ start-mapred.sh 

starting jobtracker, logging to /Users/hehaibo/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hehaibo-jobtracker-hehaibolocal.local.out

localhost: starting tasktracker, logging to /Users/hehaibo/hadoop/hadoop-0.20.2/bin/../logs/hadoop-hehaibo-tasktracker-hehaibolocal.local.out

9.2查看启动进程

hehaibolocal:~ hehaibo$ jps

5603 DataNode

5669 SecondaryNameNode

5770 TaskTracker

5710 JobTracker

5375 NameNode

9.3 浏览器访问:

http://localhost:50070/dfshealth.jsp

http://localhost:50030/jobtracker.jsp

10 停止hadoop服务

hehaibolocal:~ hehaibo$ stop-dfs.sh 

stopping namenode

localhost: stopping datanode

localhost: stopping secondarynamenode

hehaibolocal:~ hehaibo$ stop-mapred.sh 

stopping jobtracker

localhost: stopping tasktracker

hehaibolocal:~ hehaibo$