Overview

Following are some architectural highlights of Amazon Web Services' (AWS) managed Kubernetes clusters (also known as Elastic Kubernetes Service or (EKS)), provisioned by Palette:

  1. Cluster resources such as Virtual Machines (VMs) can be provisioned into an existing infrastructure (Gateways, VPCs, Subnets etc.) as part of static provisioning as well as new dedicated infrastructure as part of dynamic provisioning.
  1. Full support for EKS Fargate profiles
  1. Spot instance support

    eks_cluster_architecture.png

Prerequisites

The following prerequisites must be met before deploying an EKS workload cluster:

  1. You must have an active AWS cloud account with all the permissions listed below in the AWS Cloud Account Permissions section.
  1. You must register your AWS cloud account in Palette as described in the Creating an AWS Cloud account section below.
  1. Have an Infrastructure cluster profile already created in Palette for EKS.
  1. Sufficient capacity in the desired AWS region should exist for the creation of the following resources:
    • vCPU
    • VPC
    • Elastic IP
    • Internet Gateway
    • Elastic Load Balancers
    • NAT Gateway

AWS Cloud Account Permissions

The following four policies include all the required permissions for provisioning clusters through Palette:

Controller Policy

Last Update: May 25, 2022

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:AllocateAddress",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateNatGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:ReplaceRoute",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVpc",
"ec2:ModifyVpcAttribute",
"ec2:DeleteInternetGateway",
"ec2:DeleteNatGateway",
"ec2:DeleteNetworkInterface",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteTags",
"ec2:DeleteVpc",
"ec2:DescribeAccountAttributes",
"ec2:DescribeAddresses",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeInternetGateways",
"ec2:DescribeImages",
"ec2:DescribeKeyPairs",
"ec2:DescribeNatGateways",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeNetworkInterfaceAttribute",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVolumes",
"ec2:DetachInternetGateway",
"ec2:DisassociateRouteTable",
"ec2:DisassociateAddress",
"ec2:ModifyInstanceAttribute",
"ec2:ModifyNetworkInterfaceAttribute",
"ec2:ModifySubnetAttribute",
"ec2:ReleaseAddress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances",
"ec2:TerminateInstances",
"tag:GetResources",
"elasticloadbalancing:AddTags",
"elasticloadbalancing:CreateLoadBalancer",
"elasticloadbalancing:ConfigureHealthCheck",
"elasticloadbalancing:DeleteLoadBalancer",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeLoadBalancerAttributes",
"elasticloadbalancing:ApplySecurityGroupsToLoadBalancer",
"elasticloadbalancing:DescribeTags",
"elasticloadbalancing:ModifyLoadBalancerAttributes",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:RemoveTags",
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeInstanceRefreshes",
"ec2:CreateLaunchTemplate",
"ec2:CreateLaunchTemplateVersion",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DeleteLaunchTemplate",
"ec2:DeleteLaunchTemplateVersions"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"autoscaling:CreateAutoScalingGroup",
"autoscaling:UpdateAutoScalingGroup",
"autoscaling:CreateOrUpdateTags",
"autoscaling:StartInstanceRefresh",
"autoscaling:DeleteAutoScalingGroup",
"autoscaling:DeleteTags"
],
"Resource": [
"arn:*:autoscaling:*:*:autoScalingGroup:*:autoScalingGroupName/*"
]
},
{
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": [
"arn:*:iam::*:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
],
"Condition": {
"StringLike": {
"iam:AWSServiceName": "autoscaling.amazonaws.com"
}
}
},
{
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": [
"arn:*:iam::*:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing"
],
"Condition": {
"StringLike": {
"iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"
}
}
},
{
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": [
"arn:*:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot"
],
"Condition": {
"StringLike": {
"iam:AWSServiceName": "spot.amazonaws.com"
}
}
},
{
"Effect": "Allow",
"Action": [
"iam:PassRole"
],
"Resource": [
"arn:*:iam::*:role/*.cluster-api-provider-aws.sigs.k8s.io"
]
},
{
"Effect": "Allow",
"Action": [
"secretsmanager:CreateSecret",
"secretsmanager:DeleteSecret",
"secretsmanager:TagResource"
],
"Resource": [
"arn:*:secretsmanager:*:*:secret:aws.cluster.x-k8s.io/*"
]
},
{
"Effect": "Allow",
"Action": [
"ssm:GetParameter"
],
"Resource": [
"arn:*:ssm:*:*:parameter/aws/service/eks/optimized-ami/*"
]
},
{
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": [
"arn:*:iam::*:role/aws-service-role/eks.amazonaws.com/AWSServiceRoleForAmazonEKS"
],
"Condition": {
"StringLike": {
"iam:AWSServiceName": "eks.amazonaws.com"
}
}
},
{
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": [
"arn:*:iam::*:role/aws-service-role/eks-nodegroup.amazonaws.com/AWSServiceRoleForAmazonEKSNodegroup"
],
"Condition": {
"StringLike": {
"iam:AWSServiceName": "eks-nodegroup.amazonaws.com"
}
}
},
{
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": [
"arn:aws:iam::*:role/aws-service-role/eks-fargate-pods.amazonaws.com/AWSServiceRoleForAmazonEKSForFargate"
],
"Condition": {
"StringLike": {
"iam:AWSServiceName": "eks-fargate.amazonaws.com"
}
}
},
{
"Effect": "Allow",
"Action": [
"iam:ListOpenIDConnectProviders",
"iam:CreateOpenIDConnectProvider",
"iam:AddClientIDToOpenIDConnectProvider",
"iam:UpdateOpenIDConnectProviderThumbprint",
"iam:DeleteOpenIDConnectProvider"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"iam:GetRole",
"iam:ListAttachedRolePolicies",
"iam:DetachRolePolicy",
"iam:DeleteRole",
"iam:CreateRole",
"iam:TagRole",
"iam:AttachRolePolicy"
],
"Resource": [
"arn:*:iam::*:role/*"
]
},
{
"Effect": "Allow",
"Action": [
"iam:GetPolicy"
],
"Resource": [
"arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
]
},
{
"Effect": "Allow",
"Action": [
"eks:DescribeCluster",
"eks:ListClusters",
"eks:CreateCluster",
"eks:TagResource",
"eks:UpdateClusterVersion",
"eks:DeleteCluster",
"eks:UpdateClusterConfig",
"eks:UntagResource",
"eks:UpdateNodegroupVersion",
"eks:DescribeNodegroup",
"eks:DeleteNodegroup",
"eks:UpdateNodegroupConfig",
"eks:CreateNodegroup"
],
"Resource": [
"arn:*:eks:*:*:cluster/*",
"arn:*:eks:*:*:nodegroup/*/*/*"
]
},
{
"Effect": "Allow",
"Action": [
"eks:AssociateIdentityProviderConfig",
"eks:ListIdentityProviderConfigs"
],
"Resource": [
"arn:aws:eks:*:*:cluster/*"
]
},
{
"Effect": "Allow",
"Action": [
"eks:DisassociateIdentityProviderConfig",
"eks:DescribeIdentityProviderConfig"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"eks:ListAddons",
"eks:CreateAddon",
"eks:DescribeAddonVersions",
"eks:DescribeAddon",
"eks:DeleteAddon",
"eks:UpdateAddon",
"eks:TagResource",
"eks:DescribeFargateProfile",
"eks:CreateFargateProfile",
"eks:DeleteFargateProfile"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"iam:PassRole"
],
"Resource": [
"*"
],
"Condition": {
"StringEquals": {
"iam:PassedToService": "eks.amazonaws.com"
}
}
}
]
}
Note: All the above policies are required as part of the Cluster API requirement, derived using clusterawsadm bootstrap iam print-policy.

Restricting Palette Static Minimum Permissions for Existing VPC

You can choose to have Palette work in a static or dynamic environment. You can also set it to restrict or allow Palette to perform an AWS cluster creation into an existing VPC. The following policy allows Palette work but restricts it to the Principle of Least Privilege.



This is a policy for those who want to restrict Palette to a single VPC and not give Palette access to create or delete VPCs.


Minimum Dynamic Permissions

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:DescribeInstances",
"iam:RemoveRoleFromInstanceProfile",
"ec2:AttachInternetGateway",
"iam:AddRoleToInstanceProfile",
"ec2:DeleteRouteTable",
"ec2:AssociateRouteTable",
"ec2:DescribeInternetGateways",
"ec2:CreateRoute",
"ec2:CreateInternetGateway",
"ec2:DescribeVolumes",
"ec2:DescribeKeyPairs",
"ec2:DescribeNetworkAcls",
"ec2:DescribeRouteTables",
"ec2:CreateTags",
"ec2:CreateRouteTable",
"ec2:RunInstances",
"ec2:ModifyInstanceAttribute",
"ec2:TerminateInstances",
"ec2:DetachInternetGateway",
"ec2:DisassociateRouteTable",
"ec2:RevokeSecurityGroupIngress",
"ec2:DescribeIpv6Pools",
"ec2:DeleteVpc",
"ec2:CreateSubnet",
"ec2:DescribeSubnets",
"iam:CreateInstanceProfile",
"ec2:DisassociateAddress",
"ec2:DescribeAddresses",
"ec2:CreateNatGateway",
"ec2:DescribeRegions",
"ec2:CreateVpc",
"ec2:DescribeDhcpOptions",
"ec2:DescribeVpcAttribute",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeNetworkInterfaceAttribute",
"ec2:CreateSecurityGroup",
"ec2:ModifyVpcAttribute",
"iam:DeleteInstanceProfile",
"ec2:ReleaseAddress",
"iam:GetInstanceProfile",
"ec2:DescribeTags",
"ec2:DeleteRoute",
"ec2:DescribeNatGateways",
"ec2:DescribeIpamPools",
"ec2:AllocateAddress",
"ec2:DescribeSecurityGroups",
"ec2:DescribeImages",
"ec2:DescribeVpcs",
"elasticloadbalancing:DeleteLoadBalancer",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeLoadBalancerAttributes",
"elasticloadbalancing:CreateLoadBalancer",
"elasticloadbalancing:ModifyLoadBalancerAttributes",
"elasticloadbalancing:DescribeTags",
"secretsmanager:CreateSecret",
"secretsmanager:DeleteSecret",
"secretsmanager:TagResource",
"secretsmanager:GetSecretValue",
"autoscaling:StartInstanceRefresh",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"eks:DescribeCluster",
"eks:ListClusters",
"cloudformation:CreateStack",
"cloudformation:DescribeStacks",
"cloudformation:UpdateStack",
"ecr:GetAuthorizationToken",
"iam:PassRole",
"elasticloadbalancing:ConfigureHealthCheck",
"elasticloadbalancing:DescribeTargetHealth",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:BatchGetImage",
"ec2:DeleteInternetGateway",
"ec2:DeleteNatGateway",
"ec2:DeleteNetworkInterface",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteTags",
"ssm:UpdateInstanceInformation",
"ssmmessages:CreateControlChannel",
"ssmmessages:CreateDataChannel",
"ssmmessages:OpenControlChannel",
"ssmmessages:OpenDataChannel",
"pricing:GetProducts",
"sts:AssumeRole",
"ec2:ReplaceRoute",
"ec2:ModifyNetworkInterfaceAttribute",
"ec2:AssociateAddress",
"tag:GetResources",
"ec2:ModifySubnetAttribute"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"iam:PassRole"
],
"Resource": [
"arn:*:iam::*:role/*.cluster-api-provider-aws.sigs.k8s.io"
]
}
]
}
Ensure that the role created encompasses all the policies defined above.
These policies cannot be used as an inline policy, as it exceeds the 2048 non-whitespace character limit by AWS.
The following warning is expected and can be ignored:

These policies define some actions, resources, or conditions that do not provide permissions. To grant access, policies must have an action that has an applicable resource or condition.

Creating an AWS Cloud Account

AWS Account Creation Using the Access Credentials Method

To create an AWS cloud account using access credentials follow the steps below:

  1. In the AWS console, create the four policies listed. Assign this role to the root user or the IAM user to be used from Palette.
  1. Login to Palette console as Tenant Admin.
  1. Go to Tenant Settings > Cloud Accounts and click +Add AWS Account.
  1. In the cloud account creation wizard give the following information:
    • Account Name
    • Description
    • Credentials:
      • AWS Access key
      • AWS Secret access key
  1. Validate the credentials, once the credentials are validated, a toggle button appears as Add IAM Policies, once the option is toggled, a dropdown appears listing all the policies attached in the AWS console. User can make the choice of policies from the dropdown and attach it to the cloud account being created.

Global Role Additional Policies:

There may be situations where additional node-level policies must be added to your deployment. To add additional node-level policies, switch to the Tenant Admin project, and click on the Tenant Settings on the Main Menu. Click on Cloud Accounts. Add an account if one does not exists. After validation of the cloud credentials, ensure Add IAM policies are enabled. You can specify additional policies to be attached. The attached policies will be included to all the clusters launched with this specific Google cloud Account.

Disable OIDC Associate Provider for AWS Service Account

Palette customers can optionally disable the OIDC Associate Provider if the service provider restricts the cluster deployment with the OIDC associate provider in enable state. Customize the EKS Kubernetes pack values as follows:


# Disable creation of an identity provider for the controller
# for use with IAM roles for service accounts
# Changes to this field after the cluster is provisioned
# will have no effect on the associated OIDC Identity provider.
# Defaults to false
#disableAssociateOIDCProvider: true
## Controlplane Logging
logging:
# Setting to toggle Kubernetes API Server logging (kube-apiserver)
apiServer: false


Deploying an EKS Cluster

The following steps need to be performed to provision a new EKS cluster:

  1. Provide the basic cluster information like Name, Description, and Tags. Tags on a cluster are propagated to the VMs deployed on the cloud/data center environments.
  1. Select the Cluster Profile created for the EKS cloud. The profile definition will be used as the cluster construction template.
  1. Review and override pack parameters, as desired. By default, parameters for all packs are set with values defined in the cluster profile.
  1. Provide the AWS Cloud account and configure the cluster.

    ParameterDescription
    Cloud AccountSelect the desired cloud account. AWS cloud accounts with AWS credentials need to be preconfigured in project settings.
    Static PlacementBy default, Palette uses dynamic placement, wherein a new VPC with a public and private subnet is created to place cluster resources for every cluster.
    These resources are fully managed by Palette and deleted, when the corresponding cluster is deleted. Turn on the Static Placement option if it's desired to place resources into preexisting VPCs and subnets.
    RegionChoose the preferred AWS region where you would like the clusters to be provisioned.
    SSH Key Pair NameChoose the desired SSH Key pair. SSH key pairs need to be preconfigured on AWS for the desired regions. The selected key is inserted into the VMs provisioned.
    Cluster Endpoint Access:Select Private or Public or Private & Public, based on how the customer want to establish the communication with the endpoint for the managed Kubernetes API server and your cluster.
    Public Access CIDR:For Public or Private & Public end point access, give the CIDR values.
    Enable EncryptionThe user can enable secret encryption by toggling the Enable Encryption option. Provide the provider KMS key ARN to complete the wizard.
    Worker Pool UpdateOptionally enable the option to update the worker pool in parallel.
The following Tags should be added to the public subnet to enable automatic subnet discovery for integration with AWS load balancer service.

kubernetes.io/role/elb = 1
sigs.k8s.io/cluster-api-provider-aws/role = public
kubernetes.io/cluster/[ClusterName] = shared
sigs.k8s.io/cluster-api-provider-aws/cluster/[ClusterName] = owned
  1. Configure the Node Pools.

    • Configure one or more worker node pools. A single worker node will be configured by default.

      ParameterDescription
      NameA descriptive name for the node pool.
      SizeMake your choice of minimum, maximum and desired sizes for the worker pool. The size of the worker pool will scale between the minimum and maximum size under varying workload conditions.
      Taints:Optionally enable node affinity optionally to attracts pods to a set of nodes
      Labels:Optionally enable Labels to constrain a Pod to only run on a particular set of Node(s)
      Instance TypeSelect the AWS instance type to be used for all nodes in the node pool.
  • Cloud Configuration settings:

    ParameterDescription
    Instance Option:Make the selection of instance allocation
    * On Demand
    * Spot
    Instance Type:Make the selection of the instance type
    Availability Zones:Select at least one availability zone within the VPC
    Disk SizeMake the choice of disk size as per requirement
  • Optionally, create one or more Fargate Profile(s) to aid the provisioning of on-demand, optimized compute capacity for the workload clusters.

    ParameterDescription
    NameProvide a name for the Fargate profile.
    SubnetsPods running on Fargate Profiles are not assigned public IP addresses, so only private subnets (with no direct route to an Internet Gateway) are accepted for this parameter. For dynamic provisioning, this input is not required and subnets are automatically selected.
    SelectorsDefine pod selector by providing a target namespace and optionally labels. Pods with matching namespace and app labels are scheduled to run on dynamically provisioned compute nodes.
    You can have up to five selectors in a Fargate profile and a pod only needs to match one selector to run using the Fargate profile.
  1. Configure the Cluster Management options as per user requirements.
  1. Review the settings and deploy the cluster. Provisioning status with details of ongoing provisioning tasks is available to track progress.
New worker pools may be added if it is desired to customize certain worker nodes to run specialized workloads. As an example, the default worker pool may be configured with the m3.large instance types for general-purpose workloads, and another worker pool with instance type g2.2xlarge can be configured to run GPU workloads.

EKS Cluster Secrets Encryption

Palette encourages using AWS Key Management Service (KMS) to provide envelope encryption of Kubernetes secrets stored in Amazon Elastic Kubernetes Service (EKS) clusters. This encryption is a defense-in-depth security strategy to protect the sensitive data such as passwords, docker registry credentials, and TLS keys stored as Kubernetes Secrets.

Prerequisites:

  • KMS key created in the AWS console.
  • Key Type: Symmetric
  • Key Usage: Encrypt and Decrypt

AWS Cloud Account Permissions:

The AWS permissions listed below need to be configured in the AWS account to enable key creation:

kms:CreateGrant

Enable secret encryption at step 4 of EKS cluster creation by toggling the botton and by updating ARN of the encryption key to the wizard.

AWS Instance Type and Pod Capacity

Choose the instance type and the number of instances to be launched according to the number of pods required for the workload. The number of pods that can be scheduled on the nodes for an instance type needs to be calculated for the same; otherwise, the cluster creation cannot go to completion, as the pods cannot come up on the target cluster, due to resource unavailability.

The following section describes the method of calculating the pod capacity for individual AWS instance types. This will help in making exact choices of desired size of worker pool during cluster creation. We recommend selecting an instance that can support at least 30 pods.

Formula for Pod Calculation

Number of pods = N * (M-1) + 2

Where:

  • N is the number of Elastic Network Interfaces (ENI) of the instance type (Maximum network interfaces).
  • M is the number of IP addresses of a single ENI (Private IPv4 addresses per interface/IPv6 addresses per interface).
  • Values for N and M for each instance type can be referred from this document.

Example Calculation:

  • For instance type = t3.medium
  • For values of N = 3, and M = 6 (values derived from AWS document )
  • N * (M-1) + 2 = 3(6-1)+2 =17 pods/instances
  • In this example, we will need at least two (2) t3.medium instances to reach the minimum of 30 pods threshold.
Select the type and number of instances to support a minimum of 30 pods.

Hence, while setting the desired size of the worker pool, make the choice as per pod requirement. In the example given above, we need to launch a minimum of two (2) instances of t3.medium to satisfy the resource requirement of an EKS cluster.

Troubleshooting

If your EKS cluster worker pool ends up in Failed or Create Failed or Error nodes failed to join state, please refer to this Amazon provided Runbook

Deleting an EKS Cluster

The deletion of an EKS cluster results in the removal of all Virtual Machines and associated Storage Disks, created for the cluster. The following tasks need to be performed to delete an EKS cluster:

  1. Select the cluster to be deleted from the Cluster View page and navigate to the Cluster Overview page.
  1. Invoke a delete action available on the page: Cluster > Settings > Cluster Settings > Delete Cluster.
  1. Click Confirm to delete.

Cluster status is updated to Deleting while cluster resources are being deleted. Provisioning status is updated with the ongoing progress of the delete operation. Once all resources are successfully deleted, the cluster status changes to Deleted and is removed from the list of clusters.

Force Delete a Cluster

A cluster stuck in the Deletion state can be force deleted by the user through the User Interface. The user can go for a force deletion of the cluster only if it is stuck in a deletion state for a minimum of 15 minutes. Palette enables cluster force delete from the tenant admin and project admin scope.

To force delete a cluster:

  1. Log in to the Palette Management Console.
  1. Navigate to the Cluster Details page of the cluster stuck in deletion.

    • If the deletion is stuck for more than 15 minutes, click the Force Delete Cluster button from the Settings dropdown.

    • If the Force Delete Cluster button is not enabled, wait for 15 minutes. The Settings dropdown will give the estimated time for the auto-enabling of the force delete button.

If there are any cloud resources still on the cloud, the user should cleanup those resources before going for the force deletion.