Ensure that Dataproc Cluster is encrypted using Customer-Managed Encryption Key

When you use Dataproc, cluster and job data is stored on Persistent Disks (PDs) associated with the Compute Engine VMs in your cluster and in a Cloud Storage staging bucket. This PD and bucket data is encrypted using a Google-generated data encryption key (DEK) and key encryption key (KEK). The CMEK feature allows you to create, use, and revoke the key encryption key (KEK). Google still controls the data encryption key (DEK).Cloud services offer the ability to protect data related to those services using encryption keys managed by the customer within Cloud KMS. These encryption keys are called customer-managed encryption keys (CMEK). When you protect data in Google Cloud services with CMEK, the CMEK key is within your control.

Risk Level: High
Cloud Entity: GCP Dataproc Cluster
CloudGuard Rule ID: D9.GCP.CRY.15
Covered by Spectral: No
Category: Database

GSL LOGIC

DataprocCluster should not have config.encryptionConfig isEmpty()

REMEDIATION

From Portal

  1. Login to the GCP Console and navigate to the Dataproc Cluster page by visiting https://console.cloud.google.com/dataproc/clusters.
  2. Select the project from the projects dropdown list.
  3. On the Dataproc Cluster page, click on the Create Cluster to create a new cluster with Customer managed encryption keys.
  4. On Create a cluster page, perform below steps:
    a)Inside Set up cluster section perform below steps: -In the Name textbox, provide a name for your cluster.
    i) From Location select the location in which you want to deploy a cluster.
    ii) Configure other configurations as per your requirements.
    b) Inside Configure Nodes and Customize cluster section configure the settings as per your requirements.
    c) Inside Manage security section, perform below steps:
    i) From Encryption, select Customer-managed key.
    ii) Select a customer-managed key from dropdown list.
    iii) Ensure that the selected KMS Key have Cloud KMS CryptoKey Encrypter/Decrypter role assign to Dataproc Cluster service account ("serviceAccount:service-<project_number>@compute-system.iam.gserviceaccount.com").
    iv) Click on Create to create a cluster.
    d) Once the cluster is created migrate all your workloads from the older cluster to the new cluster and delete the old cluster by performing the below steps:
    i) On the Clusters page, select the old cluster and click on Delete cluster.
    ii) On the Confirm deletion window, click on Confirm to delete the cluster.
    iii) Repeat step above for other Dataproc clusters available in the selected project.
    e) Change the project from the project dropdown list and repeat the remediation procedure for other Dataproc clusters available in other projects.

From Command Line

  1. Before creating cluster ensure that the selected KMS Key have Cloud KMS CryptoKey Encrypter/Decrypter role assign to Dataproc Cluster service account ("serviceAccount:service-<project_number>@compute-system.iam.gserviceaccount.com"). Run clusters create command to create new cluster with customer-managed key:
gcloud dataproc clusters create CLUSTER_NAME --region=us-central1 --gce-pd-kms-key=KEY_RESOURCE_NAME

The above command will create a new cluster in the selected region.
2. Once the cluster is created migrate all your workloads from the older cluster to the new cluster and Run clusters delete command to delete cluster:

gcloud dataproc clusters delete CLUSTER_NAME --region=us-central1
  1. Repeat step no. 1 to create a new Dataproc cluster. Change the project by running the below command and repeat the remediation procedure for other projects:
gcloud config set project PROJECT_ID

From TF

  1. In your template use resource: google_dataproc_cluster, and use argument cluster_config.encryption_config.kms_key_name to assign kms key.
resource "google_dataproc_cluster" "mycluster" {
	name     = "mycluster"
	region   = "us-west1"
	...
	
	cluster_config {
		...
		encryption_config {
			kms_key_name = "projects/projectId/locations/region/keyRings/keyRingName/cryptoKeys/keyName"
		}
		...
	}
	...
}

References

  1. https://workbench.cisecurity.org/sections/811635/recommendations/1775176
  2. https://cloud.google.com/sdk/gcloud/reference/dataproc/clusters/create
  3. https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/dataproc_cluster

GCP Dataproc Cluster

Dataproc is a fully managed and highly scalable service for running Apache Spark, Apache Flink, Presto, and 30+ open source tools and frameworks. Use Dataproc for data lake modernization, ETL, and secure data science, at planet scale, fully integrated with Google Cloud, at a fraction of the cost.

Compliance Frameworks

  • CloudGuard GCP All Rules Ruleset
  • GCP CIS Controls V 8
  • GCP CIS Foundations v. 1.3.0
  • GCP CIS Foundations v. 2.0
  • GCP CloudGuard Best Practices
  • GCP MITRE ATT&CK Framework v12.1
  • GCP NIST 800-53 Rev 5