Ensure AWS Elastic MapReduce (EMR) clusters capture detailed log data to Amazon S3

EMR cluster can be configured to periodically archive the log files stored on the master node to Amazon S3. This ensures that the log files are available after the cluster terminates, whether this is through normal shut down or due to an error. Amazon EMR archives the log files to Amazon S3 at 5 minute intervals.

Risk Level: Low
Cloud Entity: EMR Cluster
CloudGuard Rule ID: D9.AWS.LOG.17
Covered by Spectral: Yes
Category: Analytics

GSL LOGIC

EmrCluster should have logUri

REMEDIATION

From Portal

  1. Sign in to the AWS Management Console, and open the Amazon EMR console at https://console.aws.amazon.com/emr/.
  2. Under EMR on EC2 in the left navigation pane, choose Clusters, and then choose Create cluster.
  3. Under Cluster logs, select the Publish cluster-specific logs to Amazon S3 check box.
  4. In the Amazon S3 location field, type (or browse to) an Amazon S3 path to store your logs. If you type the name of a folder that doesn't exist in the bucket, Amazon S3 creates it.
    Note: When you set this value, Amazon EMR copies the log files from the EC2 instances in the cluster to Amazon S3. This prevents the log files from being lost when the cluster ends and the EC2 terminates the instances hosting the cluster. These logs are useful for troubleshooting purposes. For more information, see View log files.
  5. Optionally, select the Encrypt cluster-specific logs check box. Then, select an AWS KMS key from the list, enter a key ARN, or create a new key. This option is only available with Amazon EMR version 5.30.0 and later, excluding version 6.0.0. To use this option, add permission to AWS KMS for your EC2 instance profile and Amazon EMR role. For more information, see To encrypt log files stored in Amazon S3 with an AWS KMS customer managed key.
  6. Choose any other options that apply to your cluster.
  7. To launch your cluster, choose Create cluster.

From TF

resource "aws_emr_cluster" "example" {
	+ log_uri = "s3n://my-emr-logs/test/example"
}

From Command Line

  1. To archive log files to Amazon S3 using the AWS CLI, type the create-cluster command and specify the Amazon S3 log path using the --log-uri parameter. To log files to Amazon S3 type the following command and replace myKey with the name of your EC2 key pair.
aws emr create-cluster --name CLUSTER_NAME --release-label EMR_RELEASE_VERSION--log-uri S3_LOCATION --applications Name=Hadoop Name=Hive Name=Pig --use-default-roles --ec2-attributes KeyName=myKey --instance-type EC2_INSTANCE_TYPE --instance-count VALUE
  1. When you specify the instance count without using the --instance-groups parameter, a single primary node is launched, and the remaining instances are launched as core nodes. All nodes will use the instance type specified in the command.

Note: If you have not previously created the default Amazon EMR service role and EC2 instance profile, enter aws emr create-default-roles to create them before typing the create-cluster subcommand.

References

  1. https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-debugging.html
  2. https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/ensure-amazon-emr-logging-to-amazon-s3-is-enabled-at-launch.html
  3. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/emr_cluster
  4. https://awscli.amazonaws.com/v2/documentation/api/latest/reference/emr/create-cluster.html

EMR Cluster

Amazon EMR (previously called Amazon Elastic MapReduce) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. Using these frameworks and related open-source projects, you can process data for analytics purposes and business intelligence workloads. Amazon EMR also lets you transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

Compliance Frameworks

  • AWS CloudGuard Best Practices
  • AWS CloudGuard SOC2 based on AICPA TSC 2017
  • AWS HITRUST
  • AWS HITRUST v11.0.0
  • AWS ISO27001:2022
  • AWS ITSG-33
  • AWS MITRE ATT&CK Framework v10
  • AWS MITRE ATT&CK Framework v11.3
  • AWS NIST 800-53 Rev 5
  • CloudGuard AWS All Rules Ruleset