{"id":2535439,"date":"2023-04-07T11:43:11","date_gmt":"2023-04-07T15:43:11","guid":{"rendered":"https:\/\/platoai.gbaglobal.org\/platowire\/how-to-transfer-amazon-emr-step-logs-from-amazon-ec2-instances-to-amazon-cloudwatch-logs\/"},"modified":"2023-04-07T11:43:11","modified_gmt":"2023-04-07T15:43:11","slug":"how-to-transfer-amazon-emr-step-logs-from-amazon-ec2-instances-to-amazon-cloudwatch-logs","status":"publish","type":"platowire","link":"https:\/\/platoai.gbaglobal.org\/platowire\/how-to-transfer-amazon-emr-step-logs-from-amazon-ec2-instances-to-amazon-cloudwatch-logs\/","title":{"rendered":"How to Transfer Amazon EMR Step Logs from Amazon EC2 Instances to Amazon CloudWatch Logs"},"content":{"rendered":"

Amazon EMR (Elastic MapReduce) is a managed big data platform that allows users to process large amounts of data using open-source tools such as Apache Hadoop, Spark, and Hive. Amazon EC2 (Elastic Compute Cloud) is a web service that provides resizable compute capacity in the cloud. Amazon CloudWatch Logs is a monitoring service that allows users to monitor, store, and access log files from Amazon EC2 instances, AWS CloudTrail, and other sources.<\/p>\n

In this article, we will discuss how to transfer Amazon EMR step logs from Amazon EC2 instances to Amazon CloudWatch Logs.<\/p>\n

Step 1: Create an Amazon S3 bucket<\/p>\n

The first step is to create an Amazon S3 bucket where the EMR step logs will be stored. To create an S3 bucket, follow these steps:<\/p>\n

1. Log in to the AWS Management Console.<\/p>\n

2. Navigate to the S3 service.<\/p>\n

3. Click on the “Create bucket” button.<\/p>\n

4. Enter a unique name for your bucket and select the region where you want to create it.<\/p>\n

5. Leave the default settings for the rest of the options and click on the “Create bucket” button.<\/p>\n

Step 2: Configure EMR to write step logs to S3<\/p>\n

The next step is to configure EMR to write step logs to the S3 bucket that you created in step 1. To do this, follow these steps:<\/p>\n

1. Log in to the AWS Management Console.<\/p>\n

2. Navigate to the EMR service.<\/p>\n

3. Click on the “Create cluster” button.<\/p>\n

4. Enter a name for your cluster and select the region where you want to create it.<\/p>\n

5. Select the appropriate software configuration for your cluster.<\/p>\n

6. Under “Edit software settings”, expand “Advanced options”.<\/p>\n

7. In the “Classification” field, enter “emrfs-site”.<\/p>\n

8. In the “Properties” field, enter the following:<\/p>\n

fs.s3.consistent.retryPeriodSeconds: 10<\/p>\n

fs.s3.consistent: true<\/p>\n

fs.s3.consistent.retryCount: 5<\/p>\n

fs.s3.consistent.metadata.tableName: emrfs-metadata<\/p>\n

fs.s3.consistent.metadata.region: us-east-1<\/p>\n

fs.s3.consistent.retryPolicyType: exponential<\/p>\n

9. Under “Edit software settings”, expand “Bootstrap actions”.<\/p>\n

10. Click on the “Add bootstrap action” button.<\/p>\n

11. Enter a name for your bootstrap action and select “Custom action”.<\/p>\n

12. In the “Script location” field, enter the following URL:<\/p>\n

s3:\/\/elasticmapreduce\/bootstrap-actions\/configure-hadoop<\/p>\n

13. In the “Arguments” field, enter the following:<\/p>\n

–mapred-config-file<\/p>\n

s3:\/\/\/emrfs-site.xml<\/p>\n

14. Replace “” with the name of the S3 bucket that you created in step 1.<\/p>\n

15. Click on the “Create cluster” button.<\/p>\n

Step 3: Configure CloudWatch Logs agent on EC2 instances<\/p>\n

The next step is to configure the CloudWatch Logs agent on the EC2 instances that are running your EMR cluster. To do this, follow these steps:<\/p>\n

1. Log in to the EC2 instance that you want to configure.<\/p>\n

2. Download and install the CloudWatch Logs agent by running the following commands:<\/p>\n

sudo yum install -y awslogs<\/p>\n

sudo service awslogs start<\/p>\n

3. Edit the CloudWatch Logs agent configuration file by running the following command:<\/p>\n

sudo nano \/etc\/awslogs\/awslogs.conf<\/p>\n

4. Add the following lines to the end of the file:<\/p>\n

[\/var\/log\/hadoop\/steps\/*]<\/p>\n

datetime_format = %Y-%m-%d %H:%M:%S,%f<\/p>\n

file = \/var\/log\/hadoop\/steps\/application.log<\/p>\n

buffer_duration = 5000<\/p>\n

log_stream_name = {instance_id}<\/p>\n

initial_position = start_of_file<\/p>\n

log_group_name = <\/p>\n

5. Replace “” with the name of the CloudWatch Logs log group that you want to use.<\/p>\n

6. Save and close the file.<\/p>\n

7. Restart the CloudWatch Logs agent by running the following command:<\/p>\n

sudo service awslogs restart<\/p>\n

Step 4: Verify logs are being transferred to CloudWatch Logs<\/p>\n

The final step is to verify that the EMR step logs are being transferred to CloudWatch Logs. To do this, follow these steps:<\/p>\n

1. Log in to the AWS Management Console.<\/p>\n

2. Navigate to the CloudWatch service.<\/p>\n

3. Click on the “Logs” menu item.<\/p>\n

4. Select the log group that you specified in step 3.<\/p>\n

5. Verify that log streams are being created for each EC2 instance in your EMR cluster.<\/p>\n

6. Click on a log stream to view the EMR step logs.<\/p>\n

In conclusion, transferring Amazon EMR step logs from Amazon EC2 instances to Amazon CloudWatch Logs is a straightforward process that involves configuring EMR to write step logs to an<\/p>\n