S3 : abort pending multipart uploads in batch

If you plan to upload large files to an S3 bucket, you should probably have a look at the Multipart Upload API. Multipart uploads allow you to split your data into chunks and put each part independently to the bucket. This is really useful when uploading large files because:

  • you're creating a backup tool and you want to recover from network issues without having to reupload the whole file
  • you have access to high bandwidth and you'd like to upload parts in parallel
  • you're performing long running computational tasks, such as astronomic calculations, and you'd like to stream the output of these calculations to S3
  • whatever

To upload an object in chunks, one has basically to create an S3 object, initiate a multipart upload, upload each part then mark the operation as completed. If, like me, you're creating a lot of large S3 objects using the multipart upload API, and sometimes cancel them without explicitely aborting the upload, you'll end up with gigabytes for incomplete objects. These objects don't appear in the AWS S3 console but Amazon charges you for the corresponding storage.

I wrote a little Ruby script based on the AWS-SDK for Ruby which allows you to list and abort pending uploads:

#!/usr/bin/env ruby

require 'rubygems'  
require 'aws-sdk'

# An incredibly smart option parser
unless bucket_name = ARGV.first && region = ARGV[1]  
  puts "Usage: flush_multipart_upload <bucket> <region> [--dry-run]"
  exit 1
end

dry_run = ARGV[2] == '--dry-run'

MEGABYTE = 1024 ** 2

# Assumes credentials defined in env variables:
# - AWS_ACCESS_KEY_ID=access_key_id
# - AWS_SECRET_ACCESS_KEY=secret_access_key

AWS::S3.new(region: region).tap do |s3|  
  # Get the bucket
  bucket=s3.buckets[bucket_name]
  bucket.multipart_uploads.each do |operation|
    size=operation.parts.reduce(0) { |sum, part| sum + part.size }
    puts "#{operation.object.key}, #{operation.parts.count} parts, " + \
      "#{size / MEGABYTE} MB"
    # Abort the upload
    operation.abort unless dry_run
  end
end

Install Ruby, the aws-sdk gem (gem install aws-sdk-v1), create the abort-pending-uploads script and make it executable:

chmod a+x abort-pending-uploads  

Export your AWS credentials as environment variables:

export AWS_ACCESS_KEY_ID=<your-aws-access-key-id>  
export AWS_SECRET_ACCESS_KEY=<your-aws-secret-access-key>  

Then run the script specifying the name of the bucket as the first argument, and the name of the S3 region (us-west-1, eu-central-1, etc.) as the second one:

./abort-pending-uploads <your-bucket-name> <s3-region>

If you'd like to verify what uploads the script will abort before actually aborting them - and I strongly advice you to do so, use the --dry-run argument:

./abort-pending-uploads a-bucket eu-central-1 --dry-run

This "software" is provided "as is" under the Do What The Fuck you want to Public License:

           DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
                   Version 2, December 2004

Copyright (C) 2015 Jef Mathiot 

Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.

           DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
  TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION

 0. You just DO WHAT THE FUCK YOU WANT TO.