Git Product home page Git Product logo

danilop / yas3fs Goto Github PK

View Code? Open in Web Editor NEW
641.0 40.0 98.0 1.81 MB

YAS3FS (Yet Another S3-backed File System) is a Filesystem in Userspace (FUSE) interface to Amazon S3. It was inspired by s3fs but rewritten from scratch to implement a distributed cache synchronized by Amazon SNS notifications. A web console is provided to easily monitor the nodes of a cluster.

Home Page: http://danilop.github.io/yas3fs

License: MIT License

Python 98.26% Shell 1.66% Vim Snippet 0.08%

yas3fs's Introduction

Yet Another S3-backed File System: yas3fs

Join the chat at https://gitter.im/danilop/yas3fs

YAS3FS (Yet Another S3-backed File System) is a Filesystem in Userspace (FUSE) interface to Amazon S3. It was inspired by s3fs but rewritten from scratch to implement a distributed cache synchronized by Amazon SNS notifications. A web console is provided to easily monitor the nodes of a cluster through the YAS3FS Console project.

If you use YAS3FS please share your experience on the wiki, thanks!

  • It allows to mount an S3 bucket (or a part of it, if you specify a path) as a local folder.
  • It works on Linux and Mac OS X.
  • For maximum speed all data read from S3 is cached locally on the node, in memory or on disk, depending of the file size.
  • Parallel multi-part downloads are used if there are reads in the middle of the file (e.g. for streaming).
  • Parallel multi-part uploads are used for files larger than a specified size.
  • With buffering enabled (the default) files can be accessed during the download from S3 (e.g. for streaming).
  • It can be used on more than one node to create a "shared" file system (i.e. a yas3fs "cluster").
  • SNS notifications are used to update other nodes in the cluster that something has changed on S3 and they need to invalidate their cache.
  • Notifications can be listened using HTTP or SQS endpoints.
  • If the cache grows to its maximum size, the less recently accessed files are removed.
  • Signed URLs are provided through Extended file attributes (xattr).
  • AWS credentials can be passed using AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.
  • In an EC2 instance a IAM role can be used to give access to S3/SNS/SQS resources.
  • It is written in Python (2.6) using boto and fusepy.

This is a personal project. No relation whatsoever exists between this project and my employer.

License

Copyright (c) 2012-2014 Danilo Poccia, http://danilop.net

This code is licensed under the The MIT License (MIT). Please see the LICENSE file that accompanies this project for the terms of use.

Introduction

This is the logical architecture of yas3fs:

yas3fs Logical Architecture

I strongly suggest to start yas3fs for the first time with the -df (debug + foreground) options, to see if there is any error. When everything works it can be interrupted (with ^C) and restarted to run in background (it's the default with no -f options).

To mount an S3 bucket without using SNS (i.e. for a single node):

yas3fs s3://bucket/path /path/to/mount

To persist file system metadata such as attr/xattr yas3fs is using S3 User Metadata. To mount an S3 bucket without actually writing metadata in it, e.g. because it is a bucket you mainly use as a repository and not as a file system, you can use the --no-metadata option.

To mount an S3 bucket using SNS and listening to an SQS endpoint:

yas3fs s3://bucket/path /path/to/mount --topic TOPIC-ARN --new-queue

To mount an S3 bucket using SNS and listening to an HTTP endpoint (on EC2):

yas3fs s3://bucket/path /path/to/mount --topic TOPIC-ARN --ec2-hostname --port N

On EC2 the security group must allow inbound traffic from SNS on the selected port.

On EC2 the command line doesn't need any information on the actual server and can easily be used within an Auto Scaling group.

Quick Installation

WARNING: PIP installation is no longer supported. Use "git clone" instead.

Requires Python 2.6 or higher. Install using pip.

pip install yas3fs

If it fails, check the CentOS 6 installation steps below.

If you want to do a quick test here's the installation procedure depending on the OS flavor (Linux or Mac):

  • Create an S3 bucket in the AWS region you prefer.
  • You don't need to create anything in the bucket as the initial path (if any) is created by the tool on the first mount.
  • If you want to use an existing S3 bucket you can use the --no-metadata option to not use user metadata to persist file system attr/xattr.
  • If you want to have more than one node in sync, create an SNS topic in the same region as the S3 bucket and write down the full topic ARN (you need it to run the tool if more than one client is connected to the same bucket/path).
  • Create a IAM Role that gives access to the S3 and SNS/SQS resources you need or pass the AWS credentials to the tool using environment variables (see -h).

On Amazon Linux

sudo yum -y install fuse fuse-libs
sudo easy_install pip
sudo pip install yas3fs # assume root installation
sudo sed -i'' 's/^# *user_allow_other/user_allow_other/' /etc/fuse.conf # uncomment user_allow_other
yas3fs -h # See the usage
mkdir LOCAL-PATH
# For single host mount
yas3fs s3://BUCKET/PATH LOCAL-PATH
# For multiple hosts mount
yas3fs s3://BUCKET/PATH LOCAL-PATH --topic TOPIC-ARN --new-queue

On Ubuntu Linux

sudo apt-get update
sudo apt-get -y install fuse python-pip 
sudo pip install yas3fs # assume root installation
sudo sed -i'' 's/^# *user_allow_other/user_allow_other/' /etc/fuse.conf # uncomment user_allow_other
sudo chmod a+r /etc/fuse.conf # make it readable by anybody, it is not the default on Ubuntu
yas3fs -h # See the usage
mkdir LOCAL-PATH
# For single host mount
yas3fs s3://BUCKET/PATH LOCAL-PATH
# For multiple hosts mount
yas3fs s3://BUCKET/PATH LOCAL-PATH --topic TOPIC-ARN --new-queue

On a Mac with OS X

Install FUSE for OS X from http://osxfuse.github.com.

sudo pip install yas3fs # assume root installation
mkdir LOCAL-PATH
# For single host mount
yas3fs s3://BUCKET/PATH LOCAL-PATH
# For multiple hosts mount
yas3fs s3://BUCKET/PATH LOCAL-PATH --topic TOPIC-ARN --new-queue

On CentOS 6

sudo yum -y install fuse fuse-libs centos-release-scl
sudo yum -y install python27
# upgrade setuptools
scl enable python27 -- pip install setuptools --upgrade
# grab the latest sources
git clone https://github.com/danilop/yas3fs.git
cd yas3fs
scl enable python27 -- python setup.py install
scl enable python27 -- yas3fs -h # See the usage
mkdir LOCAL-PATH
# For single host mount
scl enable python27 -- yas3fs s3://BUCKET/PATH LOCAL-PATH
# For multiple hosts mount
scl enable python27 -- yas3fs s3://BUCKET/PATH LOCAL-PATH --topic TOPIC-ARN --new-queue

/etc/fstab support

# Put contrib/mount.yas3fs to /usr/local/sbin and make the symlink
chmod +x /usr/local/sbin/mount.yas3fs
cd /sbin; sudo ln -s /usr/local/sbin/mount.yas3fs.centos6 # replace centos6 to amzn1 for Amazon Linux installation
# Add the contents of contrib/fstab.snippet to /etc/fstab and modify accordingly
# Try to mount
mount /mnt/mybucket

Workaround to unmount yas3fs correctly during host shutdown or reboot

sudo cp contrib/unmount-yas3fs.init.d /etc/init.d/unmount-yas3fs
sudo chmod +x /etc/init.d/unmount-yas3fs
sudo chkconfig --add unmount-yas3fs
sudo chkconfig unmount-yas3fs on
sudo /etc/init.d/unmount-yas3fs start

To listen to SNS HTTP notifications (I usually suggest to use SQS instead) with a Mac you need to install the Python M2Crypto module, download the most suitable "egg" from http://chandlerproject.org/Projects/MeTooCrypto#Downloads.

sudo easy_install M2Crypto-*.egg

If something does not work as expected you can use the -df options to run in foreground and in debug mode.

Unmount

To unmount the file system on Linux:

fusermount -u LOCAL-PATH
or
umount LOCAL-PATH

The latter works if /etc/fstab support steps (see above) were completed

To unmount the file system on a Mac you can use umount.

rsync usage

rsync's option --inplace has to be used to avoid S3 busy events

Full Usage

yas3fs -h

usage: yas3fs [-h] [--region REGION] [--topic ARN] [--new-queue]
              [--new-queue-with-hostname] [--queue NAME] 
              [--queue-wait N] [--queue-polling N] [--nonempty]
              [--hostname HOSTNAME] [--use-ec2-hostname] [--port N]
              [--cache-entries N] [--cache-mem-size N] [--cache-disk-size N]
              [--cache-path PATH] [--recheck-s3] [--cache-on-disk N] [--cache-check N]
              [--s3-num N] [--download-num N] [--prefetch-num N] [--st-blksize N]
              [--buffer-size N] [--buffer-prefetch N] [--no-metadata]
              [--prefetch] [--mp-size N] [--mp-num N] [--mp-retries N]
              [--s3-retries N] [--s3-retries-sleep N] 
              [--s3-use-sigv4] [--s3-endpoint URI]
              [--aws-managed-encryption] 
              [--no-allow-other]
              [--download-retries-num N] [--download-retries-sleep N]
              [--read-retries-num N] [--read-retries-sleep N]
              [--id ID] [--mkdir] [--uid N] [--gid N] [--umask MASK]
              [--read-only] [--expiration N] [--requester-pays]
              [--with-plugin-file FILE] [--with-plugin-class CLASS]
              [-l FILE] 
              [--log-mb-size N] [--log-backup-count N] [--log-backup-gzip]
              [-f] [-d] [-V]
              S3Path LocalPath

YAS3FS (Yet Another S3-backed File System) is a Filesystem in Userspace (FUSE)
interface to Amazon S3. It allows to mount an S3 bucket (or a part of it, if
you specify a path) as a local folder. It works on Linux and Mac OS X. For
maximum speed all data read from S3 is cached locally on the node, in memory
or on disk, depending of the file size. Parallel multi-part downloads are used
if there are reads in the middle of the file (e.g. for streaming). Parallel
multi-part uploads are used for files larger than a specified size. With
buffering enabled (the default) files can be accessed during the download from
S3 (e.g. for streaming). It can be used on more than one node to create a
"shared" file system (i.e. a yas3fs "cluster"). SNS notifications are used to
update other nodes in the cluster that something has changed on S3 and they
need to invalidate their cache. Notifications can be delivered to HTTP or SQS
endpoints. If the cache grows to its maximum size, the less recently accessed
files are removed. Signed URLs are provided through Extended file attributes
(xattr). AWS credentials can be passed using AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY environment variables. In an EC2 instance a IAM role can
be used to give access to S3/SNS/SQS resources. AWS_DEFAULT_REGION environment
variable can be used to set the default AWS region.

positional arguments:
  S3Path               the S3 path to mount in s3://BUCKET/PATH format, PATH
                       can be empty, can contain subfolders and is created on
                       first mount if not found in the BUCKET
  LocalPath            the local mount point

optional arguments:
  -h, --help           show this help message and exit
  --region REGION      AWS region to use for SNS and SQS (default is eu-
                       west-1)
  --topic ARN          SNS topic ARN
  --new-queue          create a new SQS queue that is deleted on unmount to
                       listen to SNS notifications, overrides --queue, queue
                       name is BUCKET-PATH-ID with alphanumeric characters
                       only
  --new-queue-with-hostname
                       create a new SQS queue with hostname in queuename,
                       overrides --queue, queue name is BUCKET-PATH-ID with
                       alphanumeric characters only
  --queue NAME         SQS queue name to listen to SNS notifications, a new
                       queue is created if it doesn't exist
  --queue-wait N       SQS queue wait time in seconds (using long polling, 0
                       to disable, default is 20 seconds)
  --queue-polling N    SQS queue polling interval in seconds (default is 0
                       seconds)
  --hostname HOSTNAME  public hostname to listen to SNS HTTP notifications
  --use-ec2-hostname   get public hostname to listen to SNS HTTP notifications
                       from EC2 instance metadata (overrides --hostname)
  --port N             TCP port to listen to SNS HTTP notifications
  --cache-entries N    max number of entries to cache (default is 100000
                       entries)
  --cache-mem-size N   max size of the memory cache in MB (default is 128 MB)
  --cache-disk-size N  max size of the disk cache in MB (default is 1024 MB)
  --cache-path PATH    local path to use for disk cache (default is
                       /tmp/yas3fs-BUCKET-PATH-random)
  --recheck-s3         Cache ENOENT results in forced recheck of S3 for new file/directory
  --cache-on-disk N    use disk (instead of memory) cache for files greater
                       than the given size in bytes (default is 0 bytes)
  --cache-check N      interval between cache size checks in seconds (default
                       is 5 seconds)
  --s3-endpoint        the S3 endpoint URI, only required if using --s3-use-sigv4
  --s3-num N           number of parallel S3 calls (0 to disable writeback,
                       default is 32)
  --s3-retries N       number of retries for s3 write operations (default 3)
  --s3-retries-sleep N  number of seconds between retries for s3 write operations (default 1)
  --s3-use-sigv4       use signature version 4 signing process, required to connect
                       to some newer AWS regions. --s3-endpoint must also be set
  --download-num N     number of parallel downloads (default is 4)
  --download-retries-num N max number of retries when downloading (default is 60)
  --download-retries-sleep N how long to sleep in seconds between download retries (default is 1)
  --read-retries-num N max number of retries when read() is invoked (default is 10)
  --read-retries-sleep N how long to sleep in seconds between read() retries (default is 1)
  --prefetch-num N     number of parallel prefetching downloads (default is 2)
  --st-blksize N       st_blksize to return to getattr() callers in bytes, optional
  --nonempty           allows mounts over a non-empty file or directory
  --buffer-size N      download buffer size in KB (0 to disable buffering,
                       default is 10240 KB)
  --buffer-prefetch N  number of buffers to prefetch (default is 0)
  --no-metadata        don't write user metadata on S3 to persist file system
                       attr/xattr
  --prefetch           download file/directory content as soon as it is
                       discovered (doesn't download file content if download
                       buffers are used)
  --mp-size N          size of parts to use for multipart upload in MB
                       (default value is 100 MB, the minimum allowed by S3 is
                       5 MB)
  --mp-num N           max number of parallel multipart uploads per file (0 to
                       disable multipart upload, default is 4)
  --mp-retries N       max number of retries in uploading a part (default is
                       3)
  --aws-managed-encryption  Enable AWS managed encryption (sets header x-amz-server-side-encryption = AES256)
  --no-allow-other     do not allow other users to access this bucket
  --id ID              a unique ID identifying this node in a cluster (default
                       is a UUID)
  --mkdir              create mountpoint if not found (and create intermediate
                       directories as required)
  --uid N              default UID
  --gid N              default GID
  --umask MASK         default umask
  --read-only           mount read only
  --expiration N       default expiration for signed URL via xattrs (in
                       seconds, default is 30 days)
  --requester-pays     requester pays for S3 interactions, the bucket must
                       have Requester Pays enabled
  --with-plugin-file FILE
                       YAS3FSPlugin file
  --with-plugin-class CLASS
                       YAS3FSPlugin class, if this is not set it will 
                       take the first child of YAS3FSPlugin from exception 
                       handler file
  -l FILE, --log FILE  filename for logs
  --log-mb-size N       max size of log file
  --log-backup-count N  number of backups log files
  --log-backup-gzip     flag to gzip backup files

  -f, --foreground     run in foreground
  -d, --debug          show debug info
  -V, --version        show program's version number and exit

Signed URLs

You can dynamically generate signed URLs for any file on yas3fs using Extended File attributes.

The default expiration is used (30 days or the value, in seconds, of the '--expiration' option).

You can specify per file expiration with the 'yas3fs.expiration' attribute (in seconds).

On a Mac you can use the 'xattr' command to list 'yas3fs.* attributes:

$ xattr -l file
yas3fs.bucket: S3 bucket
yas3fs.key: S3 key
yas3fs.URL: http://bucket.s3.amazonaws.com/key
yas3fs.signedURL: https://bucket.s3.amazonaws.com/... (for default expiration)
yas3fs.expiration: 2592000 (default)

$ xattr -w yas3fs.expiration 3600 file # Sets signed URL expiration for the file to 1h
$ xattr -l file
yas3fs.bucket: S3 bucket
yas3fs.key: S3 key
yas3fs.URL: http://bucket.s3.amazonaws.com/key
yas3fs.signedURL: https://bucket.s3.amazonaws.com/... (for 1h expiration)
yas3fs.expiration: 3600

$ xattr -d yas3fs.expiration file # File specific expiration removed, the default is used again

Similarly on Linux you can use the 'getfattr' and 'setfattr' commands:

$ getfattr -d -m yas3fs file
# file: file
user.yas3fs.URL="http://bucket.s3.amazonaws.com/key"
user.yas3fs.bucket="S3 bucket"
user.yas3fs.expiration="2592000 (default)"
user.yas3fs.key="S3 key"
user.yas3fs.signedURL="https://bucket.s3.amazonaws.com/..." (for default expiration)

$ setfattr -n user.yas3fs.expiration -v 3600
$ getfattr -d -m yas3fs file
# file: file
user.yas3fs.URL="http://bucket.s3.amazonaws.com/key"
user.yas3fs.bucket="S3 bucket"
user.yas3fs.expiration="3600"
user.yas3fs.key="S3 key"
user.yas3fs.signedURL="https://bucket.s3.amazonaws.com/..." (for 1h expiration)

$ setfattr -x user.yas3fs.expiration latest.zip # File specific expiration removed, the default is used again

Notification Syntax & Use

You can use the SNS topic for other purposes than keeping the cache of the nodes in sync. These are some sample use cases:

  • You can listen to the SNS topic to be updated on changes on S3 (if done through yas3fs).
  • You can publish on the SNS topic to manage the overall "cluster" of yas3fs nodes.

The SNS notification syntax is based on JSON (JavaScript Object Notation):

[ "node_id", "action", ... ]

The following action(s) are currently implemented:

  • mkdir (new directory): [ "node_id", "mkdir", "path" ]
  • rmdir (remove directory): [ "node_id", "rmdir", "path" ]
  • mknod (new empty file): [ "node_id", "mknod", "path" ]
  • unlink (remove file): [ "node_id", "unlink", "path" ]
  • symlink (new symbolic link): [ "node_id", "symlink", "path" ]
  • rename (rename file or directory): [ "node_id", "rename", "old_path", "new_path" ]
  • upload (new or updated file): [ "node_id", "upload", "path", "new_md5" ] (path and new_md5 are optional)
  • md (updated metadata, e.g. attr/xattr): [ "node_id", "md", "path", "metadata_name" ]
  • reset (reset cache): [ "node_id", "reset", "path" ] (path is optional)
  • cache (change cache config): [ "node_id", "cache" , "entries" or "mem" or "disk", new_value ]
  • buffer (change buffer config): [ "node_id", "buffer", "size" or "prefetch", new_value ]
  • prefetch (change prefetch config): [ "node_id", "prefetch", "on" or "off" ]
  • url (change S3 url): [ "node_id", "url", "s3://BUCKET/PATH" ]

Every node will listen to notifications coming from a node_id different from its own id. As an example, if you want to reset the cache of all the nodes in a yas3fs cluster, you can send the following notification to the SNS topic (assuming there is no node with id equal to all):

[ "all", "reset" ]

To send the notification you can use the SNS web console or any command line tool that supports SNS, such as AWS CLI.

In the same way, if you uploaded a new file (or updated an old one) directly on S3 you can invalidate the caches of all the nodes in the yas3fs cluster for that path sending this SNS notification:

[ "all", "upload", "path" ]

The path is the relative path of the file system (/ corresponding to the mount point) and doesn't include any S3 path (i.e. prefix) as given in the --url option.

To change the size of the memory cache on all nodes, e.g. to bring it from 1GB (the current default) to 10GB, you can publish (the size is in MB as in the corresponding command line option):

[ "all", "cache", "mem", 10240 ]

To change the size of the disk cache on all nodes, e.g. to bring it from 10GB (the current default) to 1TB, you can publish (the size is in MB as in the corresponding command line option):

[ "all", "cache", "disk", 1048576 ]

To change the buffer size used to download the content (and make it available for reads) from the default of 10MB (optimized for a full download speed) to 256KB (optimized for a streaming service) you can use (the size is in KB, as in the corresponding command line option):

[ "all", "buffer", "size", 256 ]

To change buffer prefetch from the default of 0 to 1 (optimized for sequential access) you can publish:

[ "all", "buffer", "prefetch", 1 ]

Similarly, to activate download prefetch of all files on all nodes you can use:

[ "all", "prefetch", "on" ]

To change the multipart upload size to 100MB:

[ "all", "multipart", "size", 102400 ]

To change the maximum number of parallel threads to use for multipart uploads to 16:

[ "all", "multipart", "num", 16 ]

To change the maximum number of retries for multipart uploads to 10:

[ "all", "multipart", "retries", 10 ]

You can even change dinamically the mounted S3 URL (i.e. the bucket and/or the path prefix):

[ "all", "url", "s3://BUCKET/PATH" ]

To check the status of all the yas3fs instances listening to a topic you can use:

[ "all", "ping" ]

To the previous message all yas3fs instances will answer publishing a message on the topic with this content:

[ "id", "status", hostname, number of entries in cache, cache memory size,
  cache disk size, download queue length, prefetch queue length, S3 queue length ]

Loading files into S3

Have to load a massive amount of files into an S3 bucket that you intend to front though yas3fs? Check out s3-bucket-loader for massively parallel imports to S3.

Testing

Use this tool to test a YAS3FS install: yas3fs-test

It will run through a slew of common commands on one or more nodes, adjust the settings.py file to what you imagine your production environment to look like.

It is INVALUABLE for making changes to the yas3fs code base.

More tests always being added.

You can use this tool to test a YAS3FS cluster: yas3fs-cluster-tester

It is a test harness suite to induce file I/O and validate YAS3FS cluster activity across N peer-nodes.

This may be useful to anyone who wants to validate/test YAS3FS to see how it behaves under load and with N peers all managing files in the same S3 bucket. This has been used to test YAS3FS against a several node "cluster" with each node generating hundreds of files.

IAM Policy Permissions

S3
{
  "Effect": "Allow",
  "Action": [
      "s3:GetBucketLocation",
      "s3:DeleteObject",
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:ListBucket",
      "s3:PutObject"
  ],
  "Resource": [
      "arn:aws:s3:::bucketname",
      "arn:aws:s3:::bucketname/*"
  ]
}
SNS
{
  "Effect": "Allow",
  "Action": [
      "sns:ConfirmSubscription",
      "sns:GetTopicAttributes",
      "sns:Publish",
      "sns:Subscribe",
      "sns:Unsubscribe"
  ],
  "Resource": [
      "arn:aws:sns:region:acct:topicname"
  ]
}
SQS
{
  "Effect": "Allow",
  "Action": [
  	  "sqs:CreateQueue",
      "sqs:DeleteMessage",
      "sqs:GetQueueAttributes",
      "sqs:GetQueueUrl",
      "sqs:ReceiveMessage",
      "sqs:SetQueueAttributes",
      "sqs:SendMessage"
  ],
  "Resource": [
      "arn:aws:sqs:region:acct:queuename"
  ]
}
IAM
{
  "Effect": "Allow",
  "Action": "iam:GetUser",
  "Resource": [
      "*"
  ]
}

Happy File Sharing!

yas3fs's People

Contributors

bilts avatar bitdeli-chef avatar bitsofinfo avatar cyrusmaher avatar dacut avatar danilop avatar ewah avatar gitter-badger avatar jazzl0ver avatar keithcallenberg avatar liath avatar longwave avatar mojodna avatar paulo-nascimento-mw avatar sam-wouters avatar superman32432432 avatar takuti avatar thkrmr avatar timor-raiman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yas3fs's Issues

OpenVZ container support

I want to run yas3fs in a OpenVZ container. OpenVZ cannot load any modules locally, so "fuse" module is supported via http://openvz.org/FUSE.

It appears that yas3fs requires the "fuse" module to be loaded locally though. Hence when running inside the container, it returns the following error:

./yas3fs -h
Traceback (most recent call last):
File "./yas3fs", line 45, in
from fuse import FUSE, FuseOSError, Operations, LoggingMixIn, fuse_get_context
ImportError: No module named fuse

Is it possible to modify the code so that it can support openvz containers?

Multipart upload retries can result in incorrect bytes sent in final file

There are certain conditions where boto gets an low-level connection error that is thrown, causing yas3fs to attempt a retry of a given "part" for a multi-part upload. In these conditions the PartOfFSData's position (pos) is not reset to zero, resulting is the wrong number of bytes being sent on the next part upload retry.

chgrp & chown permissions incorrectly set

Once mounted, if I cd to the mount and issue the command:

chown user:user dir

Permissions are assigned correctly.

if I use the command
chgrp abc temp or chown abc temp

I see result similar to:

drwxr-xr-x 1 4294967295 xyz 4096 Nov 16 05:00 temp
drwxr-xr-x 1 xyz 4294967295 4096 Nov 16 05:00 temp

Any help is appreciated! TY

SNS HTTP notifications help

I'm trying to get SNS HTTP notifications to work, but run into problems. I have included the errors below, replacing my ip address with aaa.bbb.ccc.ddd. yas3fs version 2.2.12.

Any pointers on what I may have missed? Thanks.

root@localhost:/# yas3fs s3://bucket /s3 --topic arn:aws:sns:us-east-1:631229848200:sns-test --hostname aaa.bbb.ccc.ddd --port 80 -df

2014-06-16 22:59:36,862 INFO Listening on: 'http://aaa.bbb.ccc.ddd:80/sns'
2014-06-16 22:59:36,863 DEBUG check_cache_size
2014-06-16 22:59:36,863 DEBUG check_cache_size get_memory_usage
2014-06-16 22:59:36,864 DEBUG check_status
send: u'GET /?Action=Subscribe&ContentType=JSON&Endpoint=http%3A%2F%2Faaa.bbb.ccc.ddd%3A80%2Fsns&Protocol=http&TopicArn=arn%3Aaws%3Asns%3Aus-east-1%3A631229848200%3Asns-test&Version=2010-03-31 HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-Length: 0\r\nHost: sns.us-east-1.amazonaws.com\r\nAuthorization: AWS4-HMAC-SHA256 Credential=TKLMNTNUHP3MCOQDGCPQ/20140616/us-east-1/sns/aws4_request,SignedHeaders=host;x-amz-date,Signature=e644401e8267fce96eecfc06c7718296d0d8c409c0c81ccae80889a38bac9e0d\r\nX-Amz-Date: 20140616T125936Z\r\nUser-Agent: Boto/2.29.1 Python/2.6.6 Linux/2.6.32-27\r\n\r\n'
2014-06-16 22:59:36,866 INFO entries, mem_size, disk_size, download_queue, prefetch_queue, s3_queue: 0, 0, 0, 0, 0, 0
2014-06-16 22:59:36,866 DEBUG new_locks, unused_locks: 0, 0
2014-06-16 22:59:36,866 DEBUG gc count0/threshold0, count1/threshold1, count2/threshold2: 234/700, 11/10, 3/10
2014-06-16 22:59:36,867 DEBUG check_threads 'False'
2014-06-16 22:59:36,867 DEBUG Restarting HTTP listen thread
2014-06-16 22:59:36,867 INFO Listening on: 'http://aaa.bbb.ccc.ddd:80/sns'
2014-06-16 22:59:36,867 ERROR Uncaught Exception in Thread
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/yas3fs/init.py", line 2342, in run
super(TracebackLoggingThread, self).run()
File "/usr/lib64/python2.6/threading.py", line 484, in run
self.*target(_self.__args, _self.__kwargs)
File "/usr/lib/python2.6/site-packages/yas3fs/__init
.py", line 913, in listen_for_messages_over_http
self.httpd = server_class(server_address, handler_class)
File "/usr/lib64/python2.6/SocketServer.py", line 412, in init
self.server_bind()
File "/usr/lib64/python2.6/BaseHTTPServer.py", line 108, in server_bind
SocketServer.TCPServer.server_bind(self)
File "/usr/lib64/python2.6/SocketServer.py", line 423, in server_bind
self.socket.bind(self.server_address)
File "", line 1, in bind
error: [Errno 98] Address already in use
Exception in thread Thread-44:
Traceback (most recent call last):
File "/usr/lib64/python2.6/threading.py", line 532, in bootstrap_inner
self.run()
File "/usr/lib/python2.6/site-packages/yas3fs/__init
.py", line 2342, in run
super(TracebackLoggingThread, self).run()
File "/usr/lib64/python2.6/threading.py", line 484, in run
self.*target(_self.__args, _self.__kwargs)
File "/usr/lib/python2.6/site-packages/yas3fs/__init
.py", line 913, in listen_for_messages_over_http
self.httpd = server_class(server_address, handler_class)
File "/usr/lib64/python2.6/SocketServer.py", line 412, in init
self.server_bind()
File "/usr/lib64/python2.6/BaseHTTPServer.py", line 108, in server_bind
SocketServer.TCPServer.server_bind(self)
File "/usr/lib64/python2.6/SocketServer.py", line 423, in server_bind
self.socket.bind(self.server_address)
File "", line 1, in bind
error: [Errno 98] Address already in use

reply: 'HTTP/1.1 200 OK\r\n'
header: x-amzn-RequestId: b5279622-9f69-5f09-8e88-da3a73103394
header: Content-Type: application/json
header: Content-Length: 156
header: Date: Mon, 16 Jun 2014 12:59:36 GMT

2014-06-16 22:59:38,238 DEBUG downloading certificate

Exception happened during processing of request from ('72.21.217.192', 25353)
Traceback (most recent call last):
File "/usr/lib64/python2.6/SocketServer.py", line 293, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib64/python2.6/SocketServer.py", line 319, in process_request
self.finish_request(request, client_address)
File "/usr/lib64/python2.6/SocketServer.py", line 332, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib64/python2.6/SocketServer.py", line 627, in init
self.handle()
File "/usr/lib64/python2.6/BaseHTTPServer.py", line 329, in handle
self.handle_one_request()
File "/usr/lib64/python2.6/BaseHTTPServer.py", line 323, in handle_one_request
method()
File "/usr/lib/python2.6/site-packages/yas3fs/init.py", line 513, in do_POST
cert = M2Crypto.X509.load_cert_string(self.certificate)
NameError: global name 'M2Crypto' is not defined

symlink problem

Is anyone else having problems reading symlinks on remote nodes ?

Operations on symlinks are OK on the local node (where I created the symlink), but yas3fs on all the remote nodes hang everytime I read them e.g. by running "ls" command.

The strange thing is: non-read operations (e.g. "rm" and "mv") on remote nodes are ok.

support for unicoded file names

mkdir ยฃ
mkdir: cannot create directory `ยฃ': Bad address

encoding issues in lots of places.

I am able to create these objects via AWS S3 console.

Creeping memory usage

Hey, appreciate all the work on this library!

I'm seeing creeping memory usage when using this library; we start fine and over time (a day or two) eventually usage gets high enough that the system locks up and yas3fs needs to be killed and restarted. I've disabled the memory-based cache by setting the cache size to 0, but that hasn't seemed to resolve it. I'm serving a lot of small web assets.

Any ideas? Even something like a clean way to restart would help me out a lot.

ERROR S3 bucket not found

I give up....

My S3 bucket is correct and exists
My yas3fs IAM user has r/w access to my s3 bucket, SQS, and SNS.

My same mount command works for other s3 buckets:

sudo -E yas3fs s3://xxx-com-yyy
/mnt/vhosts/xxx.com_yyy
--region us-east-1
--topic arn:aws:sns:us-east-1:205821040441:yyy
--new-queue
--download-num=10
--cache-on-disk=0
--cache-mem-size=1
--nonempty

Why would I be getting the errror "ERROR S3 bucket not found"? TY

Support python3 (NameError: name 'execfile' is not defined)

Was getting the below error for "pip install yas3fs".
NameError: name 'execfile' is not defined
Realized I was using python 3.3. Since (finally) python3 is getting more widespread, would be nice if yas3fs was compatible. In meantime, maybe a note on install page that python3 is not supported.

Many S3 objects

I have a bucket with about 30,000 files. When I first do an ls, I understand why it takes some time to actually list because S3 will only list 1000 objects/per request. However once, I have listed them, then go on to do something like cat the file, the debug info shows a lot of activity even though the file is already in local cache. Can you tell me what is going on and how I can optimize for what is ready-only use case with a lot of files in the bucket?

pip upgrade from 2.2.12a to latest: error

Was previously running 2.2.12a and did a pip upgrade and now have these errors on the upgrade, which and also the output from yas3fs --version after this upgrade failure

# pip install yas3fs --upgrade
Downloading/unpacking yas3fs from https://pypi.python.org/packages/source/y/yas3fs/yas3fs-2.2.16.tar.gz#md5=d405065704425ddad0a2a9c639917e2e
  Downloading yas3fs-2.2.16.tar.gz
  Running setup.py egg_info for package yas3fs
Downloading/unpacking distribute from https://pypi.python.org/packages/source/d/distribute/distribute-0.7.3.zip#md5=c6c59594a7b180af57af8a0cc0cf5b4a (from yas3fs)
  Downloading distribute-0.7.3.zip (145kB): 145kB downloaded
  Running setup.py egg_info for package distribute
Downloading/unpacking boto>=2.25.0 from https://pypi.python.org/packages/source/b/boto/boto-2.29.1.tar.gz#md5=b752db4e5a37bfa061be38e7ed0c255e (from yas3fs)
  Downloading boto-2.29.1.tar.gz (7.1MB): 7.1MB downloaded
  Running setup.py egg_info for package boto
    warning: no files found matching 'boto/mturk/test/*.doctest'
    warning: no files found matching 'boto/mturk/test/.gitignore'
Requirement already up-to-date: fusepy>=2.0.2 in /usr/lib/python2.6/site-packages (from yas3fs)
Requirement already up-to-date: argparse in /usr/lib/python2.6/site-packages (from yas3fs)
Downloading/unpacking setuptools>=0.7 (from distribute->yas3fs)
  Downloading setuptools-5.2.tar.gz (807kB): 807kB downloaded
  Running setup.py egg_info for package setuptools
Installing collected packages: yas3fs, distribute, boto, setuptools
  Found existing installation: yas3fs 2.2.12a
    Uninstalling yas3fs:
      Successfully uninstalled yas3fs
  Running setup.py install for yas3fs
    Installing yas3fs script to /usr/bin
  Found existing installation: distribute 0.6.10
    Uninstalling distribute:
      Successfully uninstalled distribute
  Running setup.py install for distribute
  Found existing installation: boto 2.27.0
    Uninstalling boto:
      Successfully uninstalled boto
  Running setup.py install for boto
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ImportError: No module named setuptools
    Complete output from command /usr/bin/python2.6 -c "import setuptools;__file__='/tmp/pip-build-root/boto/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-P_NrY7-record/install-record.txt --single-version-externally-managed:
    Traceback (most recent call last):

  File "<string>", line 1, in <module>

ImportError: No module named setuptools

----------------------------------------
  Rolling back uninstall of boto
Command /usr/bin/python2.6 -c "import setuptools;__file__='/tmp/pip-build-root/boto/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-P_NrY7-record/install-record.txt --single-version-externally-managed failed with error code 1 in /tmp/pip-build-root/boto
Storing complete log in /root/.pip/pip.log
# yas3fs --version
Traceback (most recent call last):
  File "/usr/bin/yas3fs", line 5, in <module>
    from pkg_resources import load_entry_point
ImportError: No module named pkg_resources

Setting file system size

I was browsing through the code, and notice the YAS3FS.statfs method, and saw that it just returns 1 PB every time.

I just happen to have a situation where I need to arbitrarily set a folders size limit, and being able to do that on a S3 backed one would basically itch all my itches.

So, would be to specific a feature to maybe implement some actual size measuring, and the ability to set a maximum size?

Symbolic Links fail to create

I cd to the yas3fs mount and issue the command
sudo ln -s /tmp tmp
ln: creating symbolic link `tmp': Bad address

I'm using sudo to ensure it's not a permissions problem. Same command works fine in any other directory but the yas3fs mount. ANy help is appreciated. ty

chown/chmod's invoked while new file being uploaded are ack'd and reflected in mem cache but never get to S3 and are lost after restart

chown/chmod's invoked while new file being uploaded are ack'd and reflected in mem cache but never get to S3 and are lost after restart

For a given file where FSData 'change' is currently set = True, if a chown/chmod is invoked concurrently at this time, the chown/chmod() methods fetch the current metadata for the path, and if there is a difference proceed to alter the attr['varName'] reference, and then delegate to set_metadata, which then does nothing due to the below line which prevents any meta-data from being persisted if data has 'change' = True. Even after the 'change' flag is set to false, subsequent chown/chmod will have no effect because those methods now detect no difference between the new value and that within the current meta-data attrs state (which is in memory only).... after a restart the file permissions/ownership disappear.

if self.write_metadata and (key or (not data) or (data and not data.has('change'))):

approach:

I think FSData could be altered to note that we want need to invoke a set_metadata(path,'attr') AFTER change is set = False. Then set_metadata can flag this "todo" when it cannot proceed at the current moment because 'change' is currently True

Under yas3fs S3 "directories" show up as files rather than folders.

I have a very large S3 bucket that includes many subdirectories and many files in those subdirectories. I'm having an issue where yas3fs displays a majority of those directories as files so one cannot access the files included inside them. I'd be happy to get any debug information you need to fix this, but this is a big problem for us. I'm using the --no-metadata and --mkdir flag to see if it changes anything, but it doesn't.

SQS queues created with --new-queue are not being deleted when unmounting with fusermount -u

The help says that the --new-queue parameter creates a queue that's deleted on umount, but this is not happening for me. The queues keep showing on the SQS panel, after either killing the process or unmounting with fusermount -u (which also terminates the process). Does it just take some time to then to disappear, like terminated EC2 instances, or there's something wrong here?

Index out of range error in process_message()

This appears to occur when an 'upload' message arrives where there is no 'etag' and the code makes an assumption that element 3 of the array exists.

Possibly 'upload_to_s3' around line 2234 is not adding the etag (despite a comment indicating that it should....)

pub = [ 'upload', path ] # Add Etag before publish

2014-09-18 09:08:23,365 ERROR Uncaught Exception in Thread
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/yas3fs/init.py", line 2479, in run
super(TracebackLoggingThread, self).run()
File "/usr/lib64/python2.6/threading.py", line 484, in run
self.*target(_self.__args, _self.__kwargs)
File "/usr/lib/python2.6/site-packages/yas3fs/__init
.py", line 978, in listen_for_messages_over_sqs
self.process_message(message)
File "/usr/lib/python2.6/site-packages/yas3fs/init.py", line 1018, in process_message
self.invalidate_cache(c[2], c[3])
IndexError: list index out of range

utime() should be utimens()

from fuse.py
('utime', c_voidp), # Deprecated, use utimens

should/could be:
def utimens(self, path, times=None):
return self.utime(path, times)

Music files not opening in iTunes

I have an S3 Bucket of music files that I'm trying to open with iTunes, and unfortunately none of them are playing.

I can open the files perfectly fine with another player (VLC), so the issue doesn't seem to be on that end. Does iTunes expect something different from the file that other players may not? Is there a way to fix that?

TypeError: cannot deepcopy this pattern object

I have a yas3fs mount as my data directory on a Owncloud installation, on EC2 t1.micro instances. Every now and then, I get this error, when I try to log in on the Owncloud running in this intances:

2014-06-02 18:38:00,043 DEBUG write '/owncloud.log' '161' '295' '0'
2014-06-02 18:38:00,044 DEBUG enqueue_download_data '/owncloud.log' 0 0
2014-06-02 18:38:00,044 DEBUG get_key from cache '/owncloud.log'
2014-06-02 18:38:00,044 DEBUG write wait '/owncloud.log' '161' '295' '0'
2014-06-02 18:38:00,072 DEBUG download_data '/owncloud.log' 0-10485759 [thread 'Thread-34']
2014-06-02 18:38:00,073 DEBUG get_key from cache '/owncloud.log'
2014-06-02 18:38:00,074 ERROR Uncaught Exception in Thread
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/yas3fs/__init__.py", line 2341, in run
    super(TracebackLoggingThread, self).run()
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/yas3fs/__init__.py", line 1568, in download
    self.download_data(path, start, end)
  File "/usr/local/lib/python2.7/dist-packages/yas3fs/__init__.py", line 1580, in download_data
    key = copy.deepcopy(self.get_key(path))
  File "/usr/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 298, in _deepcopy_inst
    state = deepcopy(state, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 174, in deepcopy
    y = copier(memo)
TypeError: cannot deepcopy this pattern object

It apparently happens when trying to read the file owncloud.log, as you can see form the logs I pasted above. Any process that tries to read this file or, I just noted this, any other txt file, changes to state D (on htop), uninterruptible disk sleep, and can't be killed, not even with -9. Any ideas what might've been causing it?

yas3fs - use the 'nonempty' mount option

So I'm attempting to restart yas3fs, but it complains the directory is not empty. The fuse error recommends the 'noempty' flag but I'm not sure it can be set while using yas3fs. The non empty directory is what's left over after shutting down a prior release of yas3fs.

2014-05-30 19:05:35,621 INFO Version: 2.2.7
2014-05-30 19:05:35,621 INFO S3 bucket: 'redacted'
2014-05-30 19:05:35,622 INFO S3 prefix (can be empty): ''
2014-05-30 19:05:35,622 INFO AWS region for SNS and SQS: 'us-west-2'
2014-05-30 19:05:35,622 INFO SNS topic ARN: 'redacted'
2014-05-30 19:05:35,622 INFO SQS queue wait time (in seconds): '20'
2014-05-30 19:05:35,622 INFO SQS queue polling interval (in seconds): '0'
2014-05-30 19:05:35,622 INFO Cache entries: '10000'
2014-05-30 19:05:35,622 INFO Cache memory size (in bytes): '393216000'
2014-05-30 19:05:35,622 INFO Cache disk size (in bytes): '1073741824'
2014-05-30 19:05:35,622 INFO Cache on disk if file size greater than (in bytes): '0'
2014-05-30 19:05:35,622 INFO Cache check interval (in seconds): '5'
2014-05-30 19:05:35,622 INFO Number of parallel S3 threads (0 to disable writeback): '0'
2014-05-30 19:05:35,623 INFO Number of parallel donwloading threads: '4'
2014-05-30 19:05:35,623 INFO Number of parallel prefetching threads: '2'
2014-05-30 19:05:35,623 INFO Download buffer size (in KB, 0 to disable buffering): '10485760'
2014-05-30 19:05:35,623 INFO Number of buffers to prefetch: '0'
2014-05-30 19:05:35,623 INFO Write metadata (file system attr/xattr) on S3: 'True'
2014-05-30 19:05:35,623 INFO Download prefetch: 'False'
2014-05-30 19:05:35,623 INFO Multipart size: '5368709120'
2014-05-30 19:05:35,623 INFO Multipart maximum number of parallel threads: '8'
2014-05-30 19:05:35,623 INFO Multipart maximum number of retries per part: '3'
2014-05-30 19:05:35,623 INFO Default expiration for signed URLs via xattrs: '2592000'
2014-05-30 19:05:35,623 INFO Cache path (on disk): '/mnt/s3fs-cache'
2014-05-30 19:05:35,984 INFO Unique node ID: 'redacted'
2014-05-30 19:05:36,294 INFO SQS queue name (new): 'redacted'
fuse: mountpoint is not empty
fuse: if you are sure this is safe, use the 'nonempty' mount option
2014-05-30 19:05:36,295 ERROR Uncaught Exception
None

Path is not created and file is not uploaded

I just tried this on a EC2 instance with Ubuntu Precise (12.04), but it it's not working. I mounted the bucket just like the README says, but the path I passed was not created in the bucket, and none of the files I created in the folder were uploaded. No error message was issued, though. Am I doing something wrong?

Call for user experience information on write performance

I'd be interested in getting some stats from users out there w/ regards to the write performance people are getting w/ yas3fs. In some of my tests throughput has been quite slow. (i.e. yas3fs writes to local cache first, then uploads) so the timing information I am looking for from folks is basically how long your program that writes to a yas3fs mount point takes for various sized files. I.E. with the 'cp' command, or whatever.

Please add your comments below for the following sized files

  • 10mb
  • 50mb
  • 100mb

Thanks!

Cache behavior

Noticed this behavior

(amended: note I just saw this issue, #5, which sort of confirms this, but I guess I'm asking about the possibility of adding fallback checks for the case of explicit requests for specific files)

a) Copy a file to a yas3fs mounted S3 bucket, named a.txt

b) rename that file via some other application, to b.txt

c) ln the mount point, yas3fs still shows the file listed "a.txt"

d) try to copy a.txt, it works (because getting from local cache)

e) try to copy b.txt explicily, fails, yas3fs says it does not exist (again relying totally on cache)

f) I manually forcibly remove the local disk cache directory that yas3fs is using and it still reports file does not exist and reports the cached directory listing

g) the only way I can fix it is to fusermount -u the moint point and restart yas3fs

Now I understand this is because of the caching mechanism in yas3fs, however maybe an option could be added that, in the instance of a direct/explicit request for a file, that does not exist in the cache, that would force yas3fs to force check S3 for that file before stating it does not exist to the caller. (again this fallback check would be optional, not all folks would want this behavior)

Seems like this would be useful for not only this use case (some non-yas3fs process adding/renaming a file in the bucket), but failures in the SQS/SNS notification system, which could be susceptible to transient and non-transient errors, leading to situations like this. Such an option might make it a bit more robust.

Errors

Is this something to be concerned about? I'm using 1.0.15 for now because the 2.x will require more retrofitting than I was prepared to do in the chef recipes.

2014-04-03 20:05:40,603 INFO entries, mem_size, disk_size, download_queue, prefetch_queue: 192, 0, 7334593, 0, 0
2014-04-03 20:05:44,103 ERROR [Errno 104] Connection reset by peer
Traceback (most recent call last):
File "/opt/yas3fs/yas3fs", line 1911, in flush
k.set_contents_from_file(data.content, headers={'Content-Type': mimetype})
File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 1246, in set_contents_from_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 725, in send_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 914, in _send_file_internal
query_args=query_args
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 571, in make_request
retry_handler=retry_handler
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1030, in make_request
retry_handler=retry_handler)
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 907, in _mexe
request.body, request.headers)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 815, in sender
http_conn.send(chunk)
File "/usr/lib/python2.7/httplib.py", line 790, in send
self.sock.sendall(data)
File "/usr/lib/python2.7/ssl.py", line 229, in sendall
v = self.send(data[count:])
File "/usr/lib/python2.7/ssl.py", line 198, in send
v = self._sslobj.write(data)
error: [Errno 104] Connection reset by peer
2014-04-03 20:05:45,104 INFO flush '/internal_attachments/1836/1836021-002ced50f265b00f6002a5783ea6bc73.data' '0' '<Key: firefall-forum-data,internal_attachments/1836/1836021-002ced50f265b00f6002a5783ea6bc73.data>' 'application/octet-stream' S3 retry 1
2014-04-03 20:05:45,604 INFO entries, mem_size, disk_size, download_queue, prefetch_queue: 193, 0, 7334593, 0, 0
2014-04-03 20:05:50,605 INFO entries, mem_size, disk_size, download_queue, prefetch_queue: 195, 0, 7334593, 0, 0

Testing yas3fs mount and remounting

Every now and again, my mount is "lost" or I get a read error and I have to unmount (maybe, if not already unmounted) and remount s3. Is there a script that exists to check and remount a yas3fs mount?

TY

large file issues?

Hi,
Sort of puzzled by this.

a) created a 10mb, 100mb, and 1gb test files locally (i.e.) dd if=/dev/zero of=1000mb_file bs=1024 count=1024000 etc

b) mount my s3 bucket.

c) copy 10mb, 100mb files no problem into the bucket

d) when I copy the 1GB file, the cp command comes back as if it was successful, I see the 1gb file in the local yas3fs cache, directory, however the file is never uploaded to S3 or visible there. (after 30min, 1 hour, and hours later)

e) I tried again running in the foreground w/ debugging on and see this after the file is copied locally to the cache (multipart upload initialization?), followed by no other debug output regarding the upload

2014-04-10 10:46:58,492 DEBUG multipart_upload '1000mb11' '<yas3fs.FSData instance at 0x19c9f80>' '{'Content-Type': 'application/octet-stream'}'
2014-04-10 10:46:58,493 DEBUG part from 104857600 for 104857600
2014-04-10 10:46:58,493 DEBUG part from 209715200 for 104857600
2014-04-10 10:46:58,493 DEBUG part from 314572800 for 104857600
2014-04-10 10:46:58,493 DEBUG part from 419430400 for 104857600
2014-04-10 10:46:58,494 DEBUG part from 524288000 for 104857600
2014-04-10 10:46:58,494 DEBUG part from 629145600 for 104857600
2014-04-10 10:46:58,494 DEBUG part from 734003200 for 104857600
2014-04-10 10:46:58,494 DEBUG part from 838860800 for 104857600
2014-04-10 10:46:58,494 DEBUG part from 943718400 for 104857600
2014-04-10 10:46:58,494 DEBUG part from 1048576000 for 104857600
2014-04-10 10:46:58,494 DEBUG initiate_multipart_upload '1000mb11' '{'Content-Type': 'application/octet-stream'}'
2014-04-10 10:46:58,682 DEBUG new thread!
2014-04-10 10:46:58,682 DEBUG multipart_upload thread '0' started
2014-04-10 10:46:58,682 DEBUG trying to get a part from the queue
2014-04-10 10:46:58,691 DEBUG begin upload of part 1 retry 0
2014-04-10 10:46:58,692 DEBUG seek '0' '2'
2014-04-10 10:46:58,692 DEBUG seek '0' '0'
2014-04-10 10:46:58,692 DEBUG read '8192' at '0' starting from '0' for '104857600'
2014-04-10 10:46:58,693 DEBUG new thread!
2014-04-10 10:46:58,693 DEBUG multipart_upload thread '1' started
2014-04-10 10:46:58,693 DEBUG trying to get a part from the queue
2014-04-10 10:46:58,694 DEBUG begin upload of part 2 retry 0
2014-04-10 10:46:58,694 DEBUG seek '0' '2'
2014-04-10 10:46:58,694 DEBUG seek '0' '0'
2014-04-10 10:46:58,694 DEBUG read '8192' at '0' starting from '104857600' for '104857600'
2014-04-10 10:46:58,694 DEBUG new thread!
2014-04-10 10:46:58,694 DEBUG multipart_upload thread '2' started
2014-04-10 10:46:58,694 DEBUG trying to get a part from the queue
2014-04-10 10:46:58,695 DEBUG begin upload of part 3 retry 0
2014-04-10 10:46:58,695 DEBUG seek '0' '2'
2014-04-10 10:46:58,695 DEBUG seek '0' '0'
2014-04-10 10:46:58,696 DEBUG read '8192' at '0' starting from '209715200' for '104857600'
2014-04-10 10:46:58,696 DEBUG new thread!
2014-04-10 10:46:58,696 DEBUG trying to get a part from the queue
2014-04-10 10:46:58,696 DEBUG begin upload of part 4 retry 0
2014-04-10 10:46:58,696 DEBUG seek '0' '2'
2014-04-10 10:46:58,696 DEBUG seek '0' '0'
2014-04-10 10:46:58,696 DEBUG read '8192' at '0' starting from '314572800' for '104857600'
2014-04-10 10:46:58,697 DEBUG multipart_upload thread '3' started
2014-04-10 10:46:58,697 DEBUG multipart_upload all threads started '1000mb11' '<yas3fs.FSData instance at 0x19c9f80>' '{'Content-Type': 'application/octet-stream'}'

Following this (above), all I see is the following kinds of debug messages over and over and over, with only the gc count0/threshold0 number incrementing w/ each message

2014-04-10 10:57:41,161 DEBUG check_threads 'False'
2014-04-10 10:57:46,118 DEBUG check_cache_size get_memory_usage
2014-04-10 10:57:46,166 INFO entries, mem_size, disk_size, download_queue, prefetch_queue: 1, 0, 1048576000, 0, 0
2014-04-10 10:57:46,167 DEBUG new_locks, unused_locks: 0, 0
2014-04-10 10:57:46,167 DEBUG gc count0/threshold0, count1/threshold1, count2/threshold2: 487/700, 7/10, 3/10
2014-04-10 10:57:46,167 DEBUG check_threads 'False'
2014-04-10 10:57:51,123 DEBUG check_cache_size get_memory_usage
2014-04-10 10:57:51,173 INFO entries, mem_size, disk_size, download_queue, prefetch_queue: 1, 0, 1048576000, 0, 0
2014-04-10 10:57:51,173 DEBUG new_locks, unused_locks: 0, 0
2014-04-10 10:57:51,173 DEBUG gc count0/threshold0, count1/threshold1, count2/threshold2: 487/700, 7/10, 3/10
2014-04-10 10:57:51,173 DEBUG check_threads 'False'
2014-04-10 10:57:54,485 DEBUG Got 0 messages from SQS
2014-04-10 10:57:56,127 DEBUG check_cache_size get_memory_usage
2014-04-10 10:57:56,179 INFO entries, mem_size, disk_size, download_queue, prefetch_queue: 1, 0, 1048576000, 0, 0
2014-04-10 10:57:56,179 DEBUG new_locks, unused_locks: 0, 0
2014-04-10 10:57:56,179 DEBUG gc count0/threshold0, count1/threshold1, count2/threshold2: 488/700, 7/10, 3/10
2014-04-10 10:57:56,180 DEBUG check_threads 'False'

If I try fusermount -u unmounting it it says resource busy.

yas3fs as maildir?

Hi, I came across yas3fs and have a question... maybe you could help me if it is not too much trouble!

Is yas3fs a suitable solution to be implemented as a DIRECT maildir storage? By direct I mean that it is not backup and IMAP should be able to access it as if it was a "regular" maildir.

What I'm trying to accomplish is something like that:

  • Mirrored email servers
  • Messages can arrive from any of those servers
  • If any of the servers go down, it will not affect my clients
  • Both servers would mount the SAME bucket using yas3fs

I have some questions:

  1. I see that yas3fs can be mounted in several EC2 instances at the same time. How does yas3fs handle the concurrency of several EC2 instances writing in the same bucket?
  2. Do you see any drawbacks in the architecture I described above?

Background Info: My company provide a mail attachment processing service and currently I have about 6TB of stored emails in various servers. Currently, those servers receive about 5 emails per second (all servers together), which I consider to be very high load and some of the messages are very large (~30MB).

Thanks!!

SNS subscription

Hi - First, may I say: Great work on yas3fs!
I have formed some policies to restrict IAM roles suitably so that yas3fs can only access the nominated SNS ARNs, and also used the yas3fs SQS naming convention to restricts its SQS access for specific mounts. It's not perfect, but I do try to restrict as much as possible.
In debugging, I notice that when unmounting, yas3fs sends an unsubscribe request to ARN * for SNS under the account, rather than sending a specific unsubscribe for the subscription created on mount. It may be that I'm seeing a mis-reported error, but if not, I wondered if this was for some reason(s) by design? It seems to currently require unsubscribe action permissions to ARN * for the account in question.

-ic

recovery plugin makes assumption that mounted bucketname is name of cache dir

The name of the cache dir is different than the bucket name that is mounted

[Errno 2] No such file or directory: '/var/lib/yas3fs/BUCKETNAME/files/path/to/file.txt'
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/yas3fs/__init__.py", line 1937, in do_cmd_on_s3_now
    key.set_contents_from_file(data.get_content(),**kargs)
  File "/usr/lib/python2.6/site-packages/yas3fs/__init__.py", line 261, in get_content
    return open(filename, mode='rb+')

Production Use?

Has yas3fs been used in production?

I'm in need of a shared filesystem with one write and multiple read nodes. The application requires local filesystem access to files, but I want them stored on s3 for long term centralized storage. I'm trying to avoid an nfs share as that would require ec2 and ebs backing. Ultimately I want to use the s3 bucket for a cdn. Files are mostly images and are rarely deleted or modified - just need them to be accessible from all read nodes in a "reasonable" amount of time, but since the data is mostly static, changes are really a non-issue. Volume of files is my main concern - does the number of files affect performance in any way? I attempted using s3fs, but it proved unreliable. If a network interruption occurred, the bucket became unreadable and had to be unmounted and remounted. How is yas3fs affect by network outages? Once restored, will it "reset" itself" Does my scenario seem like a good use of your program? TY

Add support to return configurable st_blksize on getattr

By returning a value for st_blksize on calls to getattr, one can potentially reduce the number of invocations of write that occur against FSData. (the max of this I guess could only be 131072 due to fuse's max_write ceiling). Whether or not it will have an effect on a "caller" (writer) is up to the caller, for example CP seems to increase its write block size when this is set, but some other programs do not.

timing issue in replacing files.

I am rsync'ing files from local to s3 (without queueing or listening)

it will unlink and remove local cache

25905:2014-05-15 13:18:44,449 DEBUG get_key from cache '/public/img/l/2006/0604xx_feria/060429m_06.jpg'
25906:2014-05-15 13:18:44,449 DEBUG unlink '/public/img/l/2006/0604xx_feria/060429m_06.jpg' '<Key: s3.140507,public/img/l/2006/0604xx_feria/060429m_06.jpg>' S3
25907:2014-05-15 13:18:44,450 DEBUG unlink cache file '/tmp/yas3fs/s3.140507/files/public/img/l/2006/0604xx_feria/060429m_06.jpg'
25908:2014-05-15 13:18:44,451 DEBUG remove_empty_dirs_for_file '/tmp/yas3fs/s3.140507/files/public/img/l/2006/0604xx_feria/060429m_06.jpg'
25911:2014-05-15 13:18:44,453 DEBUG unlink cache etag file '/tmp/yas3fs/s3.140507/etags/public/img/l/2006/0604xx_feria/060429m_06.jpg'
25912:2014-05-15 13:18:44,454 DEBUG remove_empty_dirs_for_file '/tmp/yas3fs/s3.140507/etags/public/img/l/2006/0604xx_feria/060429m_06.jpg'

create a new file (but fetch a key of not yet deleted file)

25919:2014-05-15 13:18:44,460 DEBUG create '/public/img/l/2006/0604xx_feria/060429m_06.jpg' '33152' 'None'
25920:2014-05-15 13:18:44,461 DEBUG open '/public/img/l/2006/0604xx_feria/060429m_06.jpg' '33152'
25921:2014-05-15 13:18:44,462 DEBUG check_data '/public/img/l/2006/0604xx_feria/060429m_06.jpg'
25922:2014-05-15 13:18:44,462 DEBUG get_key from S3 #1 '/public/img/l/2006/0604xx_feria/060429m_06.jpg'
25923:2014-05-15 13:18:44,548 DEBUG get_key to cache '/public/img/l/2006/0604xx_feria/060429m_06.jpg'
25924:2014-05-15 13:18:44,549 DEBUG creating new cache file '/tmp/yas3fs/s3.140507/files/public/img/l/2006/0604xx_feria/060429m_06.jpg'

eventually delete it on s3

63119:2014-05-15 13:29:00,436 DEBUG do_on_s3_now action 'delete' key '<Key: s3.140507,public/img/l/2006/0604xx_feria/060429m_06.jpg>' args 'None' kargs 'None'

but then try to write the file w/ the previous key

63120:2014-05-15 13:29:00,683 DEBUG do_on_s3_now action 'copy' key '<Key: s3.140507,public/img/l/2006/0604xx_feria/060429m_06.jpg>' args '['s3.140507', u'public/img/l/2006/0604xx_feria/060429m_06.jpg', {'attr': '{"st_ctime": 1400192033.0, "st_mtime": 1313697410.0, "st_nlink": 1, "st_gid": 501, "st_size": 91483, "st_mode": 33188, "st_uid": 501, "st_atime": 1400174325.0}'}]' kargs '{'preserve_acl': False}'
63122:<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>public/img/l/2006/0604xx_feria/060429m_06.jpg</Key><RequestId>87CC324F2F78EC02</RequestId><HostId>1aFtchwp5sm24C+nGY+SXfuQ9xCpos4zdDwCOe70e3tJWljnWDYK1wbupm1XG8YSdzEZAaIBR6I=</HostId></Error>

download_data() retry's forever when s3 returns 404

This will repeat forever if a file really does not exist and cannot be downloaded

2014-09-24 15:27:33,586 DEBUG check_cache_size get_memory_usage
2014-09-24 15:27:33,690 DEBUG get_key from cache '/0/0/31/63/46/2223d.jpg/2223d_1.0.jpg'
2014-09-24 15:27:33,691 DEBUG download_data range '/0/0/31/63/46/2223d.jpg/2223d_1.0.jpg' '{}' [thread 'Thread-34']
2014-09-24 15:27:33,752 ERROR S3ResponseError: 404 Not Found
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>0/0/31/63/46/2223d.jpg/2223d_1.0.jpg</Key><RequestId>222222</RequestId><HostId>232+bnQo3Yb/222</HostId></Error>
Traceback (most recent call last):
  File "/my/p/python2.6/site-packages/yas3fs/__init__.py", line 1732, in download_data
    bytes = key.get_contents_as_string()
  File "/my/p/python2.6/site-packages/boto/s3/key.py", line 1730, in get_contents_as_string
    response_headers=response_headers)
  File "/my/p/python2.6/site-packages/boto/s3/key.py", line 1603, in get_contents_to_file
    response_headers=response_headers)
  File "/my/p/python2.6/site-packages/boto/s3/key.py", line 1435, in get_file
    query_args=None)
  File "/my/p/python2.6/site-packages/boto/s3/key.py", line 1467, in _get_file_internal
    override_num_retries=override_num_retries)
  File "/my/p/python2.6/site-packages/boto/s3/key.py", line 325, in open
    override_num_retries=override_num_retries)
  File "/my/p/python2.6/site-packages/boto/s3/key.py", line 273, in open_read
    self.resp.reason, body)
S3ResponseError: S3ResponseError: 404 Not Found
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>0/0/31/63/46/2223d.jpg/2223d_1.0.jpg</Key><RequestId>2222</RequestId><HostId>2323222+bnQo3Yb/22</HostId></Error>
2014-09-24 15:27:33,753 INFO download_data error '/0/0/31/63/46/2223d.jpg/2223d_1.0.jpg' 0-10485759 [thread 'Thread-34'] -> retrying

Unable to view directory contents due to UnicodeEncodeError

We're running yas3fs on a our production instances, and are unable to access a directory that we've been able to access in the past due to the below error:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/fuse.py", line 414, in _wrapper
    return func(*args, **kwargs) or 0
  File "/usr/local/lib/python2.7/dist-packages/fuse.py", line 422, in getattr
    return self.fgetattr(path, buf, None)
  File "/usr/local/lib/python2.7/dist-packages/fuse.py", line 668, in fgetattr
    attrs = self.operations('getattr', path.decode(self.encoding), fh)
  File "/usr/local/lib/python2.7/dist-packages/fuse.py", line 881, in __call__
    ret = getattr(self, op)(path, *args)
  File "./yas3fs", line 1140, in getattr
    attr = self.get_metadata(path, 'attr')
  File "./yas3fs", line 1049, in get_metadata
    key = self.get_key(path)
  File "./yas3fs", line 1026, in get_key
    dirs = self.readdir(parent_path)
  File "./yas3fs", line 1185, in readdir
    d = k.name.encode('ascii')[len(full_path):]
UnicodeEncodeError: Error: 'ascii' codec can't encode character u'\u2019' in position 28: ordinal not in range(128)
2013-12-14 15:45:44,847 DEBUG getattr -> '/application_attachments' 'None'
2013-12-14 15:45:44,847 DEBUG get_metadata -> '/application_attachments' 'attr' 'None'
2013-12-14 15:45:44,847 DEBUG get_metadata <- '/application_attachments' 'attr' 'None' '{u'st_ctime': 1375841900.0, u'st_mtime': 1375841900.0, u'st_gid': 1001, 'st_size': 0, u'st_mode': 16895, u'st_uid': 2000, u'st_atime': 1375841900.0}'
2013-12-14 15:45:44,848 DEBUG getattr <- '/application_attachments' 'None' '{'st_ctime': 1375841900.0, 'st_mtime': 1375841900.0, 'st_nlink': 1, 'st_gid': 1001, 'st_size': 4096, 'st_atime': 1375841900.0, 'st_uid': 2000, 'st_mode': 16895}'
2013-12-14 15:45:44,849 DEBUG getattr -> '/application_attachments/.git' 'None'
2013-12-14 15:45:44,849 DEBUG get_metadata -> '/application_attachments/.git' 'attr' 'None'
2013-12-14 15:45:44,849 DEBUG readdir '/application_attachments' 'None'
2013-12-14 15:45:44,850 DEBUG readdir '/application_attachments' 'None' no cache
2013-12-14 15:45:44,850 DEBUG readdir '/application_attachments' 

We can cd into the directory but an attempt to ls -la the directory is met with this:

ls: reading directory .: Bad address
total 0

yas3fs behavior when async background upload fails after 3 attempts

Piggybacking on #17

You noted that ya3fs by default attempts 3 times to upload the file in the background after committing locally to the cache (and reporting to the writer/caller that the write succeeded).

However what is the behavior if the 3 retries fail? Does yas3fs delete the locally cached file?

If not, could the following options be exposed?

a) deleteCachedFileOrphanAfterUploadFile = i.e. enable purging the locally cached file if the s3 upload processed exhausted all retries

b) Some sort of option to log locally a list of all files (paths) that were written OK to the local cache, but failed to upload to S3. This would permit integrations with calling applications so they could consult this file to cleanup meta-data that now points to orphaned files (i.e. files that yas3fs said were OK (written locally) but failed to truely write to s3 in the background)

Support standard mount -o option format

It is possible to mount fuse filesystems from fstab.

For example:

yas3fs#s3://a-bucket               /srv/mountpoint        fuse    defaults        0 0

This works by calling mount.fuse which in turn runs the program before the # sign in the following manner:

yas3fs s3://a-bucket /srv/mountpoint -o rw,suid,dev

There doesn't appear to be a way to disable the addition of options, and yas3fs fails with the unknown options.

I wrote a wrapper that I am asking for permission from my client to submit to you, but I think this would be better within the main program.

Thank you.
Jeff

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.