亚马逊S3终端工具s3cmd - DevOps的那点事

Amazon S3 是Amazon网落服务(Amazon Web Services，AWS)提供的云存储。Amazon S3在众多第三方已经开发完成的商业服务或客户端软件之上，发布了一组网络服务接口。本教程描述怎样用Linux命令行访问Amazon S3云存储。

最著名的Amazon S3命令行客户端是用python写的s3cmd。作为一个简单的AWS S3命令行工具，s3cmd的思想是用于运行脚本化的cron任务，比如每天的备份工作。

s3cmd 使用介绍

一、安装

#在Ubuntu或者Debian上安装s3cm
sudo apt-get install s3cmd

#centos
yum install s3cmd

#gentoo
emerge -av s3cmd

#rpm安装
rpm -ivh http://s3tools.org/repo/RHEL_6/x86_64/s3cmd-1.0.0-4.1.x86_64.rpm

#源码安装
git clone https://github.com/s3tools/s3cmd
cd s3cmd
python setup.py install

命令说明

Usage: s3cmd [options] COMMAND [parameters]

S3cmd is a tool for managing objects in Amazon S3 storage. It allows for
making and removing "buckets" and uploading, downloading and removing
"objects" from these buckets.

Options:
  -h, --help            show this help message and exit
  --configure           Invoke interactive (re)configuration tool. Optionally
                        use as '--configure s3://some-bucket' to test access
                        to a specific bucket instead of attempting to list
                        them all.
  -c FILE, --config=FILE
                        Config file name. Defaults to /home/mludvig/.s3cfg
  --dump-config         Dump current configuration after parsing config files
                        and command line options and exit.
  --access_key=ACCESS_KEY
                        AWS Access Key
  --secret_key=SECRET_KEY
                        AWS Secret Key
  -n, --dry-run         Only show what should be uploaded or downloaded but
                        don't actually do it. May still perform S3 requests to
                        get bucket listings and other information though (only
                        for file transfer commands)
  -e, --encrypt         Encrypt files before uploading to S3.
  --no-encrypt          Don't encrypt files.
  -f, --force           Force overwrite and other dangerous operations.
  --continue            Continue getting a partially downloaded file (only for
                        [get] command).
  --continue-put        Continue uploading partially uploaded files or
                        multipart upload parts.  Restarts/parts files that
                        don't have matching size and md5.  Skips files/parts
                        that do.  Note: md5sum checks are not always
                        sufficient to check (part) file equality.  Enable this
                        at your own risk.
  --upload-id=UPLOAD_ID
                        UploadId for Multipart Upload, in case you want
                        continue an existing upload (equivalent to --continue-
                        put) and there are multiple partial uploads.  Use
                        s3cmd multipart [URI] to see what UploadIds are
                        associated with the given URI.
  --skip-existing       Skip over files that exist at the destination (only
                        for [get] and [sync] commands).
  -r, --recursive       Recursive upload, download or removal.
  --check-md5           Check MD5 sums when comparing files for [sync].
                        (default)
  --no-check-md5        Do not check MD5 sums when comparing files for [sync].
                        Only size will be compared. May significantly speed up
                        transfer but may also miss some changed files.
  -P, --acl-public      Store objects with ACL allowing read for anyone.
  --acl-private         Store objects with default ACL allowing access for you
                        only.
  --acl-grant=PERMISSION:EMAIL or USER_CANONICAL_ID
                        Grant stated permission to a given amazon user.
                        Permission is one of: read, write, read_acp,
                        write_acp, full_control, all
  --acl-revoke=PERMISSION:USER_CANONICAL_ID
                        Revoke stated permission for a given amazon user.
                        Permission is one of: read, write, read_acp, wr
                        ite_acp, full_control, all
  -D NUM, --restore-days=NUM
                        Number of days to keep restored file available (only
                        for 'restore' command).
  --delete-removed      Delete remote objects with no corresponding local file
                        [sync]
  --no-delete-removed   Don't delete remote objects.
  --delete-after        Perform deletes after new uploads [sync]
  --delay-updates       Put all updated files into place at end [sync]
  --max-delete=NUM      Do not delete more than NUM files. [del] and [sync]
  --add-destination=ADDITIONAL_DESTINATIONS
                        Additional destination for parallel uploads, in
                        addition to last arg.  May be repeated.
  --delete-after-fetch  Delete remote objects after fetching to local file
                        (only for [get] and [sync] commands).
  -p, --preserve        Preserve filesystem attributes (mode, ownership,
                        timestamps). Default for [sync] command.
  --no-preserve         Don't store FS attributes
  --exclude=GLOB        Filenames and paths matching GLOB will be excluded
                        from sync
  --exclude-from=FILE   Read --exclude GLOBs from FILE
  --rexclude=REGEXP     Filenames and paths matching REGEXP (regular
                        expression) will be excluded from sync
  --rexclude-from=FILE  Read --rexclude REGEXPs from FILE
  --include=GLOB        Filenames and paths matching GLOB will be included
                        even if previously excluded by one of
                        --(r)exclude(-from) patterns
  --include-from=FILE   Read --include GLOBs from FILE
  --rinclude=REGEXP     Same as --include but uses REGEXP (regular expression)
                        instead of GLOB
  --rinclude-from=FILE  Read --rinclude REGEXPs from FILE
  --ignore-failed-copy  Don't exit unsuccessfully because of missing keys
  --files-from=FILE     Read list of source-file names from FILE. Use - to
                        read from stdin.
  --bucket-location=BUCKET_LOCATION
                        Datacentre to create bucket in. As of now the
                        datacenters are: US (default), EU, ap-northeast-1, ap-
                        southeast-1, sa-east-1, us-west-1 and us-west-2
  --reduced-redundancy, --rr
                        Store object with 'Reduced redundancy'. Lower per-GB
                        price. [put, cp, mv]
  --access-logging-target-prefix=LOG_TARGET_PREFIX
                        Target prefix for access logs (S3 URI) (for [cfmodify]
                        and [accesslog] commands)
  --no-access-logging   Disable access logging (for [cfmodify] and [accesslog]
                        commands)
  --default-mime-type=DEFAULT_MIME_TYPE
                        Default MIME-type for stored objects. Application
                        default is binary/octet-stream.
  -M, --guess-mime-type
                        Guess MIME-type of files by their extension or mime
                        magic. Fall back to default MIME-Type as specified by
                        --default-mime-type option
  --no-guess-mime-type  Don't guess MIME-type and use the default type
                        instead.
  --no-mime-magic       Don't use mime magic when guessing MIME-type.
  -m MIME/TYPE, --mime-type=MIME/TYPE
                        Force MIME-type. Override both --default-mime-type and
                        --guess-mime-type.
  --add-header=NAME:VALUE
                        Add a given HTTP header to the upload request. Can be
                        used multiple times. For instance set 'Expires' or
                        'Cache-Control' headers (or both) using this option.
  --server-side-encryption
                        Specifies that server-side encryption will be used
                        when putting objects.
  --encoding=ENCODING   Override autodetected terminal and filesystem encoding
                        (character set). Autodetected: UTF-8
  --add-encoding-exts=EXTENSIONs
                        Add encoding to these comma delimited extensions i.e.
                        (css,js,html) when uploading to S3 )
  --verbatim            Use the S3 name as given on the command line. No pre-
                        processing, encoding, etc. Use with caution!
  --disable-multipart   Disable multipart upload on files bigger than
                        --multipart-chunk-size-mb
  --multipart-chunk-size-mb=SIZE
                        Size of each chunk of a multipart upload. Files bigger
                        than SIZE are automatically uploaded as multithreaded-
                        multipart, smaller files are uploaded using the
                        traditional method. SIZE is in Mega-Bytes, default
                        chunk size is 15MB, minimum allowed chunk size is
                        5MB, maximum is 5GB.
  --list-md5            Include MD5 sums in bucket listings (only for 'ls'
                        command).
  -H, --human-readable-sizes
                        Print sizes in human readable form (eg 1kB instead of
                        1234).
  --ws-index=WEBSITE_INDEX
                        Name of index-document (only for [ws-create] command)
  --ws-error=WEBSITE_ERROR
                        Name of error-document (only for [ws-create] command)
  --progress            Display progress meter (default on TTY).
  --no-progress         Don't display progress meter (default on non-TTY).
  --enable              Enable given CloudFront distribution (only for
                        [cfmodify] command)
  --disable             Enable given CloudFront distribution (only for
                        [cfmodify] command)
  --cf-invalidate       Invalidate the uploaded filed in CloudFront. Also see
                        [cfinval] command.
  --cf-invalidate-default-index
                        When using Custom Origin and S3 static website,
                        invalidate the default index file.
  --cf-no-invalidate-default-index-root
                        When using Custom Origin and S3 static website, don't
                        invalidate the path to the default index file.
  --cf-add-cname=CNAME  Add given CNAME to a CloudFront distribution (only for
                        [cfcreate] and [cfmodify] commands)
  --cf-remove-cname=CNAME
                        Remove given CNAME from a CloudFront distribution
                        (only for [cfmodify] command)
  --cf-comment=COMMENT  Set COMMENT for a given CloudFront distribution (only
                        for [cfcreate] and [cfmodify] commands)
  --cf-default-root-object=DEFAULT_ROOT_OBJECT
                        Set the default root object to return when no object
                        is specified in the URL. Use a relative path, i.e.
                        default/index.html instead of /default/index.html or
                        s3://bucket/default/index.html (only for [cfcreate]
                        and [cfmodify] commands)
  -v, --verbose         Enable verbose output.
  -d, --debug           Enable debug output.
  --version             Show s3cmd version (1.5.0-beta1) and exit.
  -F, --follow-symlinks
                        Follow symbolic links as if they are regular files
  --cache-file=FILE     Cache FILE containing local source MD5 values
  -q, --quiet           Silence output on stdout

二、初始化

第一次运行s3cmd需要运行下面的命令做配置：

s3cmd –configure

它将会问你一系列问题：

AWS S3的访问密钥和安全密钥
对AWS S3双向传输的加密密码和加密数据
为加密数据设定GPG程序的路径（例如，/usr/bin/gpg）
是否使用https协议
如果使用http代理，设定名字和端口

配置将以保存普通文本格式保存在 ~/.s3cfg.

chmod 600 ~/.s3cfg

三、基本使用

#0.列出所有bucket的所有对象
s3cmd la

#1.在你的账户中列出所有现有的bucket
s3cmd ls

#2.建立新的bucket icyboy
s3cmd mb s3://icyboy

#3.上传文件到现有的bucket
s3cmd put 1.png 2.png 3.png s3://icyboy
#上传文件的默认访问权限是私有的(private)，就是只有你自己可以访问，使用正确的访问和安全密码即可。

#4.上传公开访问权限的文件到现有bucket
s3cmd put --acl-public 4.png s3://icyboy
#如果上传的文件授予公开访问权限，任何人在浏览器中都可以通过http://icyboy.s3.amazonaws.com/4.png 访问。

#5.查看一个现有bucket的内容
s3cmd ls s3://icyboy

#6.下载现有bucket包含的文件（例如所有的.png文件）
s3cmd get s3://icyboy/*.png

#7.删除现有bucket中的文件
s3cmd del s3://icyboy/*.png

#8.获取现有bucket的信息，包括存储位置和访问控制列表(ACL)
s3cmd info s3://icyboy

#9.在上传到现有的bucket之前，加密文件
s3cmd -e put encrypt.png s3://icyboy
#当用s3cmd下载一个加密过的文件时，它会自动检测加密并在下载过程解密，因此下载和访问加密文件时，就像通常所做的一样
s3cmd get s3://icyboy/encrypt.png

#10.删除现有的bucket
s3cmd rb s3://icyboy
#注意，你不能删除一个非空的bucket。

#11.查看bucket所有大小
s3cmd du s3://icyboy

#12.拷贝
s3cmd cp s3://icyboy/1.txt s3://xupeng/1.txt_copy

#13.移动
s3cmd mv s3://icyboy/1.txt s3://xupeng/1.txt_copy

四、复杂使用

#1.上传文件夹
xupeng@icyboy ~ $ s3cmd put [--recursive|-r] dir1 s3://icyboy/some/path/
dir1/file1-1.txt -> s3://icyboy/some/path/dir1/file1-1.txt  [1 of 2]
dir1/file1-2.txt -> s3://icyboy/some/path/dir1/file1-2.txt  [2 of 2]

xupeng@icyboy ~ $ s3cmd put -r dir1/ s3://icyboy/some/path/
dir1/file1-1.txt -> s3://icyboy/some/path/file1-1.txt  [1 of 2]
dir1/file1-2.txt -> s3://icyboy/some/path/file1-2.txt  [2 of 2]

#2.同步
xupeng@icyboy ~ $ s3cmd sync  ./  s3://icyboy/some/path/
dir2/file2-1.log -> s3://icyboy/some/path/dir2/file2-1.log  [1 of 2]
dir2/file2-2.txt -> s3://icyboy/some/path/dir2/file2-2.txt  [2 of 2]

xupeng@icyboy ~ $ s3cmd sync --dry-run --delete-removed ~/demo/ s3://icyboy/some/path/
delete: s3://icyboy/some/path/file1-1.txt
delete: s3://icyboy/some/path/file1-2.txt
upload: ~/demo/dir1/file1-2.txt -> s3://icyboy/some/path/dir1/file1-2.txt
WARNING: Exiting now because of --dry-run

xupeng@icyboy ~ $ s3cmd sync --dry-run --skip-existing --delete-removed ~/demo/ s3://icyboy/some/path/
delete: s3://icyboy/some/path/file1-1.txt
delete: s3://icyboy/some/path/file1-2.txt
WARNING: Exiting now because of --dry-run

xupeng@icyboy ~ $ s3cmd sync --dry-run --exclude '*.txt' --include 'dir2/*' . s3://icyboy/demo/
exclude: dir1/file1-1.txt
exclude: dir1/file1-2.txt
exclude: file0-2.txt
upload: ./dir2/file2-1.log -> s3://icyboy/demo/dir2/file2-1.log
upload: ./dir2/file2-2.txt -> s3://icyboy/demo/dir2/file2-2.txt
upload: ./file0-1.msg -> s3://icyboy/demo/file0-1.msg
upload: ./file0-3.log -> s3://icyboy/demo/file0-3.log
WARNING: Exiting now because of --dry-run

左邻右舍