AWS Batch Operations

May 27, 2020

For a mass migration of files in AWS buckets, using Batch Operations is a good option. Thanks to a great guide by Alex Debrie we used such a batch operation just recently.

To quote Alex, this is what you will need:

There are four core elements to an S3 Batch Operation: Manifest: A file indicating which objects should be processed in a Batch job Operation: The task to be performed on each object in the job Report: An output file that summarizes the results of your job Role Arn: An IAM role assumed by the batch operation.

While following his guide worked out great, this brief overview just acts as a reminder if we ever need to run such an operation again.

Job Goal

Update the ACL on all files in a specific bucket folder. The files were already there.

Manifest

The manifest csv needs a line per file to be addressed. This can be done using the AWS cli:

aws s3 ls s3://BUCKET_NAME --recursive | awk '{cmd="echo BUCKET_NAME,"$4;system(cmd)}' >> manifest.csv

This will produce a csv like so:

files.mywebsite.com,uploads/cat.jpg
files.mywebsite.com,uploads/dog.jpg

Be sure to install and configure the AWS CLI first.

Batch Operation

The information on AWS is pretty explanatory, but here are some key points:

go to https://s3.console.aws.amazon.com/s3/jobs to create new job
under ‘Manifest’, choose for the csv option and upload your manifest.csv to a bucket you can access
choose an operations type, in our case it was ‘Replace access control list (ACL)’
if you want a job report, supply the path in a bucket the file will be placed. Note that the policy you will choose/create in the next step needs access for this as well
choose an existing IAM role, or create a new one using the two templates output there. The ‘trust policy’ goes under the ‘Trust relations’ tab on the Role edit page.

After you created the job, you need to start it yourself, but that is made clear in the wizard you follow.

Security note

The granularity of the roles and permissions options can be overwhelming, but avoid giving excessive permissions. It’s tempting to just go for AmazonS3FullAccess, but this can bite you down the road. If you do choose to use the latter, it might be wise to just perform the action, and delete the IAM roles and permissions afterwards.