Amazon EMR (Elastic MapReduce) is a managed big data platform that allows users to process large amounts of data using open-source tools such as Apache Hadoop, Spark, and Hive. Amazon EC2 (Elastic Compute Cloud) is a web service that provides resizable compute capacity in the cloud. Amazon CloudWatch Logs is a monitoring service that allows users to monitor, store, and access log files from Amazon EC2 instances, AWS CloudTrail, and other sources.
In this article, we will discuss how to transfer Amazon EMR step logs from Amazon EC2 instances to Amazon CloudWatch Logs.
Step 1: Create an Amazon S3 bucket
The first step is to create an Amazon S3 bucket where the EMR step logs will be stored. To create an S3 bucket, follow these steps:
1. Log in to the AWS Management Console.
2. Navigate to the S3 service.
3. Click on the “Create bucket” button.
4. Enter a unique name for your bucket and select the region where you want to create it.
5. Leave the default settings for the rest of the options and click on the “Create bucket” button.
Step 2: Configure EMR to write step logs to S3
The next step is to configure EMR to write step logs to the S3 bucket that you created in step 1. To do this, follow these steps:
1. Log in to the AWS Management Console.
2. Navigate to the EMR service.
3. Click on the “Create cluster” button.
4. Enter a name for your cluster and select the region where you want to create it.
5. Select the appropriate software configuration for your cluster.
6. Under “Edit software settings”, expand “Advanced options”.
7. In the “Classification” field, enter “emrfs-site”.
8. In the “Properties” field, enter the following:
fs.s3.consistent.retryPeriodSeconds: 10
fs.s3.consistent: true
fs.s3.consistent.retryCount: 5
fs.s3.consistent.metadata.tableName: emrfs-metadata
fs.s3.consistent.metadata.region: us-east-1
fs.s3.consistent.retryPolicyType: exponential
9. Under “Edit software settings”, expand “Bootstrap actions”.
10. Click on the “Add bootstrap action” button.
11. Enter a name for your bootstrap action and select “Custom action”.
12. In the “Script location” field, enter the following URL:
s3://elasticmapreduce/bootstrap-actions/configure-hadoop
13. In the “Arguments” field, enter the following:
–mapred-config-file
s3:///emrfs-site.xml
14. Replace “” with the name of the S3 bucket that you created in step 1.
15. Click on the “Create cluster” button.
Step 3: Configure CloudWatch Logs agent on EC2 instances
The next step is to configure the CloudWatch Logs agent on the EC2 instances that are running your EMR cluster. To do this, follow these steps:
1. Log in to the EC2 instance that you want to configure.
2. Download and install the CloudWatch Logs agent by running the following commands:
sudo yum install -y awslogs
sudo service awslogs start
3. Edit the CloudWatch Logs agent configuration file by running the following command:
sudo nano /etc/awslogs/awslogs.conf
4. Add the following lines to the end of the file:
[/var/log/hadoop/steps/*]
datetime_format = %Y-%m-%d %H:%M:%S,%f
file = /var/log/hadoop/steps/application.log
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name =
5. Replace “” with the name of the CloudWatch Logs log group that you want to use.
6. Save and close the file.
7. Restart the CloudWatch Logs agent by running the following command:
sudo service awslogs restart
Step 4: Verify logs are being transferred to CloudWatch Logs
The final step is to verify that the EMR step logs are being transferred to CloudWatch Logs. To do this, follow these steps:
1. Log in to the AWS Management Console.
2. Navigate to the CloudWatch service.
3. Click on the “Logs” menu item.
4. Select the log group that you specified in step 3.
5. Verify that log streams are being created for each EC2 instance in your EMR cluster.
6. Click on a log stream to view the EMR step logs.
In conclusion, transferring Amazon EMR step logs from Amazon EC2 instances to Amazon CloudWatch Logs is a straightforward process that involves configuring EMR to write step logs to an
- SEO Powered Content & PR Distribution. Get Amplified Today.
- PlatoAiStream. Web3 Intelligence. Knowledge Amplified. Access Here.
- Source: Plato Data Intelligence: PlatoData