Gih's Blog


Delete all but the most recent N files(objects) from s3.

2014-08-22 by gihnius, tagged as ruby

When I backup servers and save data to AWS S3, usually just upload entire files to S3. This is done automatic by the backup script. The uploading process looks simple:

S3_PUT backup_data_a s3://org-sys-backup/
S3_PUT backup_data_b s3://org-sys-backup/
S3_PUT backup_data_c s3://org-sys-backup/
S3_PUT backup_data_d s3://org-sys-backup/
S3_PUT backup_data_e s3://org-sys-backup/
S3_PUT backup_data_f s3://org-sys-backup/

## or 
S3_PUT backup_data_20140801xxxx s3://org-sys-backup/
S3_PUT backup_data_20140802xxxx s3://org-sys-backup/
S3_PUT backup_data_20140803xxxx s3://org-sys-backup/
S3_PUT backup_data_20140804xxxx s3://org-sys-backup/
S3_PUT backup_data_20140805xxxx s3://org-sys-backup/

I don't care the backups' data older than 1 week, then to free the space in S3 I have to delete those old data files (called objects in S3), just keep the most recent 5 copies. There maybe ready-made tools available,  but I want to roll my own. I wrote a ruby script to do that, here is source code:

#!/usr/bin/env ruby

## Delete all but the most recent N files from s3://bucket
## supply a prefix and count of copies and the bucket name

require 'right_aws'
require 'optparse'

opts = {} do |o|
  o.on("-a", "--access-key-id X", "aws ec2 access key id") do |x|
    opts[:aws_access_key_id] = x
  o.on("-s", "--secret-access-key X", "aws ec2 secret access key") do |x|
    opts[:aws_secret_access_key] = x
  o.on("-b", "--bucket X", "bucket name") do |x|
    opts[:bucket] = x || ""
  o.on("-p", "--prefix X", "object name prefix") do |x|
    opts[:prefix] = x || ""
  o.on("-c", "--count N", "how many copies to keep") do |x|
    opts[:count] = x || 1

if opts[:bucket].size < 1 || opts[:prefix].size < 1
  puts "bucket or prefix not valid!"
  exit 1

s3 =[:aws_access_key_id], opts[:aws_secret_access_key])

object_keys = {}

s3.interface.incrementally_list_bucket(opts[:bucket], {'prefix' => opts[:prefix], 'delimiter' => '/'}) do |item|
  item[:contents].each{|c| object_keys[c[:key]] = c[:last_modified] }

## sort obj keys by time(:last_modified)
sorted_keys = object_keys.sort_by{|k,v| Time.parse(v).to_i}.map{|x| x[0]}
puts "Found #{sorted_keys.size} objects with prefix: #{opts[:prefix]}"

if sorted_keys.size > opts[:count].to_i
  old_keys = sorted_keys.first(sorted_keys.size - opts[:count].to_i)
  n = old_keys.inject(0) do |a, k|
    s3.interface.delete(opts[:bucket], k)
    a += 1
  puts "Deleted #{n} old objects."

This script get the objects list which object name match the given prefix, and sort them by modified time, then find the old ones and delete them.

Save it as, eg. s3_cleanup.rb.

How to run it?

You need to install a ruby gem 'right_aws' by: gem install right_aws, then

ruby s3_cleanup.rb -a 'access-key-id' -s 'secret-access-key' -b org-sys-backup -p 'backup_data_' -c 5

an example output:

Found 8 objects with prefix: backup_data_
Deleted 3 old objects.

This script is available in Github.

You may want to read about Object Expiration before checking out the script. That maybe a  official recommended practices.

Amazon S3 Announces Object Expiration Amazon S3 announced a new feature, Object Expiration that allows you to schedule the deletion of your objects after a pre-defined time period. Using Object Expiration to schedule periodic removal of objects eliminates the need for you to identify objects for deletion and submit delete requests to Amazon S3. You can define Object Expiration rules for a set of objects in your bucket. Each Object Expiration rule allows you to specify a prefix and an expiration period in days.The prefix field (e.g. "logs/") identifies the object(s) subject to the expiration rule, and the expiration period specifies the number of days from creation date (i.e. age) after which object(s) should be removed. Once the objects are past their expiration date, they will be queued for deletion. You will not be billed for storage for objects on or after their expiration date.

Here is another related: Managing Lifecycle Configuration.