Using Amazon S3 with a local drive as a cache

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

icanus
Posts: 478
Joined: Mon Aug 13, 2007 1:19 pm UTC
Location: in England now abed

Using Amazon S3 with a local drive as a cache

Postby icanus » Mon Feb 15, 2016 4:05 pm UTC

I'm working with Moodle (in fact a number of seperate Moodles installed on the same EC2 server), which stores user uploaded files in a data folder, and accesses them using an inbuilt filestorage class, backed with a database of which files should be where. The vast majority of these files are infrequently accessed but need to be kept long term (until a user chooses to delete them).

S3 would be ideal for this volume of data, but using it is quite a bit slower than a mounted drive, so I've been tossing around the idea of storing uploaded files both locally and on S3, with automated deletion of files not accessed in the last X days from the mounted drive, and modifying the filestorage class so that when a file is requested it first checks the mounted drive, and if the file is missing, downloads it from S3 and stores it again locally.

My thinking is that this will allow me to have the benefits of fast access from a mounted drive for recently accessed files, with cheap, expandable S3 storage for less frequently accessed files (at the cost of a performance hit on the first access after the expiry period while it's copied back to the mounted drive).

The question is, what obvious factors have I missed that make this a horrible idea?

User avatar
Xanthir
My HERO!!!
Posts: 5327
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex
Contact:

Re: Using Amazon S3 with a local drive as a cache

Postby Xanthir » Mon Feb 15, 2016 5:53 pm UTC

I'm not an expert on this sort of thing, but it sounds fine to me, and a good idea.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Using Amazon S3 with a local drive as a cache

Postby DaveInsurgent » Tue Feb 16, 2016 4:26 am UTC

icanus wrote:The question is, what obvious factors have I missed that make this a horrible idea?


Cache invalidation can be tricky.. I'm curious why you believe S3 is slower. Does your application interact with the files, or just serve them to users? If the latter you can generate S3 signed URLs and send the users clients directly to those files. This saves application server resources and is a good thing. You could also have a separate S3 bucket that is your 'cache', and use CloudFront to distribute those files. You can also interface with S3 via HTTP, so you could put a caching layer in front purely at the application protocol level which would give you a LRU type cache without having to have the application aware of it. I try to avoid having storage combined with application servers (its an additional dimension to scaling and deployment that need not be part of it).


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 10 guests