Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Efficient data lifecycle management in Amazon S3 is crucial for optimizing costs and performance in the cloud. In this article, we’ll explore how to implement the AWS S3 lifecycle in an automated way using Lambda functions, which can be useful when the Lifecycle offered by S3 itself doesn’t meet or work as expected, providing a dynamic and flexible solution for data management.
Topics
This week I needed to activate the ALB logs in AWS, in order to troubleshoot an incident, so I forwarded the access logs to the bucket in S3.
The main advantage of activating ALB logs is the ability to troubleshoot problems and identify performance bottlenecks. By analyzing the logs, you can detect traffic patterns, identify errors and anomalies, and take corrective measures to improve application efficiency.
In addition, ALB logs are also valuable for compliance and security purposes. They record information such as IP addresses, URLs accessed and HTTP status codes, which can be crucial for investigating suspicious activity, identifying possible attacks and carrying out compliance audits.
However, it is important to bear in mind that storing logs can have a significant cost, especially in high-demand environments or with many requests. Logs can take up considerable space in your data storage, which can result in additional charges.
To mitigate the costs associated with ALB logs, it is advisable to implement an efficient management strategy (lambda lifecycle). This can include configuring retention policies to limit the amount of time logs are stored, using log compression to reduce file sizes and using data analysis services to process and filter logs more efficiently.
It is also important to consider the proper configuration of access permissions to ALB logs. Ensuring that only the relevant teams and individuals have access to the logs can help prevent leaks of sensitive information and minimize security risks.
In summary, enabling ALB logs on AWS is key to monitoring and analyzing network traffic and ensuring proper application performance and security. However, it is important to be aware of the costs associated with storing logs and to adopt efficient management practices to optimize usage and minimize unnecessary expenses.
AWS S3 lifecycle is a powerful tool that allows you to define rules to automatically manage the lifecycle of objects stored in S3. These rules can include transitioning objects to more economical storage classes or deleting old objects.
The integration of AWS Lambda with the S3 lifecycle offers a programmatic and highly customizable approach to managing the lifecycle of objects. This makes it possible to create complex logics that go beyond the static rules of the S3 lifecycle.
As many of you know, the cost of storing a massive amount of logs in S3 is enormous, so I decided to activate S3’s own lifecycle, specifying a specific path (where the ALB logs are) for it to delete logs older than 2 days.
However, two days passed and the items were still in the bucket, generating unnecessary costs.
In another bucket, where I activated the lifecycle for the entire bucket (aws s3 bucket lifecycle), the process was going as expected.
Since the solution offered by AWS wasn’t working as expected, I decided to take a different approach by creating a Python script that performs this lifecycle process.
In order to create a structure where the lifecycle process takes place in an automated way, similar to what I would achieve through the configuration of S3, it was necessary:
The first step is to access the AWS console and go to the Lambda service.
In Lambda, access:
In the window that will open, leave the Author from scratch option checked, in Function name I put “s3-lifecycle” (you can choose the name that suits you best) and in Runtime choose “Python 3.9”.
Below is an example image:
Leave the rest as it is, click on Create function.
A screen like this will be displayed:
In the section where there is an example Python code, let’s remove the lines and add our code:
import boto3
from datetime import datetime, timedelta, timezone
def delete_objects(bucket_name, prefix, days):
s3 = boto3.client('s3')
cutoff_date = datetime.now(timezone.utc) - timedelta(days=days)
objects_to_delete = []
paginator = s3.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix)
for page in page_iterator:
if 'Contents' in page:
for obj in page['Contents']:
key = obj['Key']
last_modified = obj['LastModified'].replace(tzinfo=timezone.utc)
if last_modified < cutoff_date:
objects_to_delete.append({'Key': key})
if len(objects_to_delete) > 0:
s3.delete_objects(Bucket=bucket_name, Delete={'Objects': objects_to_delete})
print(f'{len(objects_to_delete)} objects deleted.')
else:
print('No objects found to delete.')
def lambda_handler(event, context):
bucket_name = 'devops-mind'
prefix = 'tshoot-incidente-alb/AWSLogs/'
days = 2
delete_objects(bucket_name, prefix, days)
In this script, you only need to adjust the following fields:
After adjusting the code, a message “Changes not deployed” will appear, you can click Deploy.
Before proceeding with using Lambda or configuring EventBridge, we need to adjust the permissions, both on the S3 Bucket and on the role used by Lambda.
In order for the whole process to take place properly, permissions related to the S3 bucket are required.
Assuming you already have a bucket in S3 (I won’t cover the bucket creation part in this article), go to:
Let’s add the following policy:
{
"Version": "2008-10-17",
"Id": "Policy1335892530063",
"Statement": [
{
"Sid": "DevOps-Mind-lambda",
"Effect": "Allow",
"Principal": {
"Service": "lambda.us-east-1.amazonaws.com"
},
"Action": [
"s3:*"
],
"Resource": "arn:aws:s3:::devops-mind/tshoot-incidente-alb/AWSLogs/*",
"Condition": {
"StringEquals": {
"aws:SourceAccount": "123456"
}
}
}
]
}
Edit the fields according to the features and structure of your objects in S3.
It is also necessary to edit the SourceAccount, I tried to leave permissions for resources of a specific account only.
In actions we left them all, but we could release more specific actions. In order to make the article simpler, we’re going to allow all of them.
Once you have finished adjusting the policy, click on Save changes.
When we created Lambda, a Role was created along with it.
In my case, the role is s3-lifecycle-role-fggxxkgz:
We need to create a policy and link it to this role to ensure that it has the necessary privileges in our S3 bucket.
Access the IAM service on AWS, click on Policies and then on the Create policy button, as highlighted in yellow in the image below:
Let’s use the following code to create our policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DevOpsMindBucket1",
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
},
{
"Sid": "DevOpsMindBucket2",
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::devops-mind/tshoot-incidente-alb/AWSLogs/*",
"arn:aws:s3:::devops-mind"
]
}
]
}
Our screen will look like this:
Define a name for the policy and click Create policy:
Once this is done, search for the role in the IAM.
Go to Role and click on Add permissions.
Then click on Attach policies:
Select the policy in the search.
Click em Add permissions.
Expected result:
For the lifecycle process to take place as expected, we need to set up a trigger in our Lambda.
The easiest way is to go to the AWS console, access our Lambda again and click on Add trigger, as highlighted in yellow:
Configure the event source to use EventBridge.
Define a name for the rule and add a description.
In Schedule expression, add the values:
cron(0 12 * * ? *)
The expression cron(0 12 * * ? *) in an Amazon EventBridge rule defines an event schedule that takes place every day at 12:00 (noon) UTC.
Let’s look at the cron expression in detail:
Therefore, the event rule with this cron expression will be triggered every day at exactly 12:00 (noon) UTC.
After configuring all the fields, we’ll have something like this:
Click on Add to finish the process.
Expected result:
When the S3 lifecycle doesn’t work properly, it can be frustrating to deal with unnecessary object storage or a lack of deletion of expired items. Fortunately, by using AWS Lambdas, you can create a customized process to manage the object lifecycle in an automated and efficient way.
With Lambdas, you can solve specific lifecycle problems in S3, ensuring that objects are correctly transitioned or deleted according to your business rules. In addition, Lambdas offer flexibility and scalability, allowing you to adapt the process according to your changing needs.
By using AWS Lambdas to automate the S3 lifecycle process, you can keep your storage optimized and reduce unnecessary costs, and Julius would certainly approve of these actions.