Automating the AWS S3 Lifecycle using Lambda

Efficient data lifecycle management in Amazon S3 is crucial for optimizing costs and performance in the cloud. In this article, we’ll explore how to implement the AWS S3 lifecycle in an automated way using Lambda functions, which can be useful when the Lifecycle offered by S3 itself doesn’t meet or work as expected, providing a dynamic and flexible solution for data management.

aws s3 lifecycle

Need to be met

This week I needed to activate the ALB logs in AWS, in order to troubleshoot an incident, so I forwarded the access logs to the bucket in S3.

The main advantage of activating ALB logs is the ability to troubleshoot problems and identify performance bottlenecks. By analyzing the logs, you can detect traffic patterns, identify errors and anomalies, and take corrective measures to improve application efficiency.

In addition, ALB logs are also valuable for compliance and security purposes. They record information such as IP addresses, URLs accessed and HTTP status codes, which can be crucial for investigating suspicious activity, identifying possible attacks and carrying out compliance audits.

Costs involved

However, it is important to bear in mind that storing logs can have a significant cost, especially in high-demand environments or with many requests. Logs can take up considerable space in your data storage, which can result in additional charges.

To mitigate the costs associated with ALB logs, it is advisable to implement an efficient management strategy (lambda lifecycle). This can include configuring retention policies to limit the amount of time logs are stored, using log compression to reduce file sizes and using data analysis services to process and filter logs more efficiently.

It is also important to consider the proper configuration of access permissions to ALB logs. Ensuring that only the relevant teams and individuals have access to the logs can help prevent leaks of sensitive information and minimize security risks.

In summary, enabling ALB logs on AWS is key to monitoring and analyzing network traffic and ensuring proper application performance and security. However, it is important to be aware of the costs associated with storing logs and to adopt efficient management practices to optimize usage and minimize unnecessary expenses.

Understanding the AWS S3 Lifecycle

What is the AWS S3 Lifecycle?

AWS S3 lifecycle is a powerful tool that allows you to define rules to automatically manage the lifecycle of objects stored in S3. These rules can include transitioning objects to more economical storage classes or deleting old objects.

Why use Lambda with S3 Lifecycle?

The integration of AWS Lambda with the S3 lifecycle offers a programmatic and highly customizable approach to managing the lifecycle of objects. This makes it possible to create complex logics that go beyond the static rules of the S3 lifecycle.

Problem

As many of you know, the cost of storing a massive amount of logs in S3 is enormous, so I decided to activate S3’s own lifecycle, specifying a specific path (where the ALB logs are) for it to delete logs older than 2 days.

However, two days passed and the items were still in the bucket, generating unnecessary costs.

In another bucket, where I activated the lifecycle for the entire bucket (aws s3 bucket lifecycle), the process was going as expected.

lambda lifecycle

Solution

Since the solution offered by AWS wasn’t working as expected, I decided to take a different approach by creating a Python script that performs this lifecycle process.

Requirements

In order to create a structure where the lifecycle process takes place in an automated way, similar to what I would achieve through the configuration of S3, it was necessary:

  • S3 Bucket
  • AWS Lambda in Python
  • Policy for Lambda Roll
  • Bucket policy
  • Cron in Amazon EventBridge

Creating Lambda

The first step is to access the AWS console and go to the Lambda service.

In Lambda, access:

  • Functions
  • Create function

In the window that will open, leave the Author from scratch option checked, in Function name I put “s3-lifecycle” (you can choose the name that suits you best) and in Runtime choose “Python 3.9”.

Below is an example image:

aws lambda lifecycle

Leave the rest as it is, click on Create function.

A screen like this will be displayed:

lifecycle s3
lifecycle s3

In the section where there is an example Python code, let’s remove the lines and add our code:

import boto3
from datetime import datetime, timedelta, timezone

def delete_objects(bucket_name, prefix, days):
    s3 = boto3.client('s3')
    cutoff_date = datetime.now(timezone.utc) - timedelta(days=days)

    objects_to_delete = []

    paginator = s3.get_paginator('list_objects_v2')
    page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix)

    for page in page_iterator:
        if 'Contents' in page:
            for obj in page['Contents']:
                key = obj['Key']
                last_modified = obj['LastModified'].replace(tzinfo=timezone.utc)
                if last_modified < cutoff_date:
                    objects_to_delete.append({'Key': key})

    if len(objects_to_delete) > 0:
        s3.delete_objects(Bucket=bucket_name, Delete={'Objects': objects_to_delete})
        print(f'{len(objects_to_delete)} objects deleted.')
    else:
        print('No objects found to delete.')

def lambda_handler(event, context):
    bucket_name = 'devops-mind'
    prefix = 'tshoot-incidente-alb/AWSLogs/'
    days = 2

    delete_objects(bucket_name, prefix, days)

In this script, you only need to adjust the following fields:

  • bucket_name
    • Name of the bucket where the objects are stored.
  • prefix
    • Prefix/path where the objects are stored (the logs in our case).
  • days
    • Number of days that objects are considered old and can be deleted.

After adjusting the code, a message “Changes not deployed” will appear, you can click Deploy.

aws s3 bucket lifecycle

Before proceeding with using Lambda or configuring EventBridge, we need to adjust the permissions, both on the S3 Bucket and on the role used by Lambda.

Adjusting Role and Bucket policies

In order for the whole process to take place properly, permissions related to the S3 bucket are required.

Bucket Policy

Assuming you already have a bucket in S3 (I won’t cover the bucket creation part in this article), go to:

  • Permissions tab
  • In the section on Bucket Policy, click Edit.
s3 bucket

Let’s add the following policy:

{
    "Version": "2008-10-17",
    "Id": "Policy1335892530063",
    "Statement": [
        {
            "Sid": "DevOps-Mind-lambda",
            "Effect": "Allow",
            "Principal": {
                "Service": "lambda.us-east-1.amazonaws.com"
            },
            "Action": [
                "s3:*"
            ],
            "Resource": "arn:aws:s3:::devops-mind/tshoot-incidente-alb/AWSLogs/*",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "123456"
                }
            }
        }
    ]
}

Edit the fields according to the features and structure of your objects in S3.

It is also necessary to edit the SourceAccount, I tried to leave permissions for resources of a specific account only.

In actions we left them all, but we could release more specific actions. In order to make the article simpler, we’re going to allow all of them.

policy s3

Once you have finished adjusting the policy, click on Save changes.

Policy para a Role

When we created Lambda, a Role was created along with it.

In my case, the role is s3-lifecycle-role-fggxxkgz:

policy role s3 lifecycle

We need to create a policy and link it to this role to ensure that it has the necessary privileges in our S3 bucket.

Access the IAM service on AWS, click on Policies and then on the Create policy button, as highlighted in yellow in the image below:

iam policy

Let’s use the following code to create our policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DevOpsMindBucket1",
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "*"
        },
        {
            "Sid": "DevOpsMindBucket2",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::devops-mind/tshoot-incidente-alb/AWSLogs/*",
                "arn:aws:s3:::devops-mind"
            ]
        }
    ]
}

Our screen will look like this:

permissions s3

Define a name for the policy and click Create policy:

policy s3

Once this is done, search for the role in the IAM.

role s3 lifecycle

Go to Role and click on Add permissions.

Then click on Attach policies:

s3 lifecycle role

Select the policy in the search.

Click em Add permissions.

permissions s3 role

Expected result:

s3 role

Setting up automation

For the lifecycle process to take place as expected, we need to set up a trigger in our Lambda.

The easiest way is to go to the AWS console, access our Lambda again and click on Add trigger, as highlighted in yellow:

aws lambda lifecycle

Configure the event source to use EventBridge.

Define a name for the rule and add a description.

In Schedule expression, add the values:

cron(0 12 * * ? *)

The expression cron(0 12 * * ? *) in an Amazon EventBridge rule defines an event schedule that takes place every day at 12:00 (noon) UTC.

Let’s look at the cron expression in detail:

  • The first field (0) indicates the value of the minutes. In this case, 0 means that the event will occur at the beginning of each hour.
  • The second field (12) indicates the time value. In this case, 12 indicates that the event will take place at 12 noon.
  • The third field (*) indicates the value of the day of the month. In this case, * means that the event will take place on any day of the month.
  • The fourth field (*) indicates the value of the month. In this case, * means that the event will take place in any month.
  • The fifth field (?) is used to replace the day of the week field (*) when scheduling is based on the day of the month. In this case, ? indicates that there is no weekday specification.
  • The sixth field (*) indicates the value of the year. In this case, * means that the event will take place in any year.

Therefore, the event rule with this cron expression will be triggered every day at exactly 12:00 (noon) UTC.

After configuring all the fields, we’ll have something like this:

aws lambda trigger

Click on Add to finish the process.

Expected result:

aws lambda s3 lifecycle

Best Practices for AWS S3 Lifecycle with Lambda

  1. Monitoring: Use CloudWatch to monitor the execution of your Lambda function and the impact of lifecycle rules.
  2. Test: Always test your configurations in a development environment before applying them in production.
  3. Versioning: Consider enabling S3 versioning for greater security when managing the lifecycle of objects.
  4. Cost Optimization: Regularly analyze your usage patterns and adjust lifecycle rules to maximize savings.

Common use cases

  1. Log Archiving: Automatically move old logs to cheaper storage classes.
  2. Backup Management: Implement a retention policy for backups, deleting old versions after a defined period.
  3. Data Compliance: Ensure that sensitive data is deleted after a specific period to comply with regulations.

FAQ

  1. Q: Can I use Lambda to apply different lifecycle rules to specific objects? A: Yes, you can implement custom logic in Lambda to apply rules based on metadata or object naming standards.
  2. Q: How often should I run my Lambda function to manage the lifecycle? A: It depends on your needs. For frequent changes, consider a daily run. For less dynamic scenarios, a weekly run may be sufficient.
  3. Q: Is it possible to reverse a lifecycle action? A: Some actions, such as deletions, are irreversible. That’s why it’s crucial to test your settings carefully.

Conclusão

When the S3 lifecycle doesn’t work properly, it can be frustrating to deal with unnecessary object storage or a lack of deletion of expired items. Fortunately, by using AWS Lambdas, you can create a customized process to manage the object lifecycle in an automated and efficient way.

With Lambdas, you can solve specific lifecycle problems in S3, ensuring that objects are correctly transitioned or deleted according to your business rules. In addition, Lambdas offer flexibility and scalability, allowing you to adapt the process according to your changing needs.

By using AWS Lambdas to automate the S3 lifecycle process, you can keep your storage optimized and reduce unnecessary costs, and Julius would certainly approve of these actions.

julius aws

References

Imagem de capa de fullvector no Freepik

Compartilhe / Share
Fernando Müller Junior
Fernando Müller Junior

I am Fernando Müller, a Tech Lead SRE with 16 years of experience in IT, I currently work at Appmax, a fintech located in Brazil. Passionate about working with Cloud Native architectures and applications, Open Source tools and everything that exists in the SRE world, always looking to develop and learn constantly (Lifelong learning), working on innovative projects!

Articles: 41

Receba as notícias por email / Receive news by email

Insira seu endereço de e-mail abaixo e assine nossa newsletter / Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *