Ensure EMR clusters nodes should not have public IP

EMR cluster is a collection of Amazon Elastic Compute Cloud (Amazon EC2) instances. Each instance in the cluster is called a node. AwsEmrInstance has an associated public IP address. Although it allows to create a secure access using SSH tunnel, associating the master node with public IP address directly and not within a VPC or a private subnet that has IPv4 does not stand with security best practices.

Risk Level: High
Cloud Entity: EMR Cluster
CloudGuard Rule ID: D9.AWS.NET.93
Covered by Spectral: No
Category: Analytics

GSL LOGIC

EmrCluster should not have instances with [ isPublic=true ]

REMEDIATION

From Portal
It is recommended to create a new cluster in VPC private subnet. After launch, it is not possible to manually disassociate a public IPv4 address from that instance. Following are the steps to create a new cluster in VPC private subnet.

  1. Sign in to the AWS Management Console, and open the Amazon EMR console at https://console.aws.amazon.com/emr.
  2. Under EMR on EC2 in the left navigation pane, choose Clusters, and then choose Create cluster.
  3. Under Networking, go to the Virtual private cloud (VPC) field. Enter the name of your VPC or choose Browse to select your VPC. Alternatively, choose Create VPC to create a VPC that you can use for your cluster.
  4. Choose any other options that apply to your cluster.
  5. To launch your cluster, choose Create cluster.

From TF
Use following code to create new cluster into a VPC, Subnet id represent the VPC ID.

resource "aws_emr_cluster" "example" {
	name          = "emr-test-arn"
	release_label = "release_version_name"
	
	ec2_attributes {
		subnet_id     = "aws_VPC_subnet_id"
	}

From Command Line
Use Following example command to creates a cluster in an Amazon VPC subnet. Instance group details may vary as per specific requirement.

aws emr create-cluster --ec2-attributes SubnetId=VPC_subnet_id --release-label release_version_name --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large --auto-terminate

**References**
1. https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-vpc-launching-job-flows.html
2. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/emr_cluster
3. https://awscli.amazonaws.com/v2/documentation/api/latest/reference/emr/create-cluster.html 


## EMR Cluster
Amazon EMR (previously called Amazon Elastic MapReduce) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. Using these frameworks and related open-source projects, you can process data for analytics purposes and business intelligence workloads. Amazon EMR also lets you transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

## Compliance Frameworks
 - AWS CloudGuard Best Practices
 - AWS NIST 800-53 Rev 5
 - CloudGuard AWS All Rules Ruleset
 - CloudGuard AWS Default Ruleset