Overview

Following are some of the architectural highlights of kubernetes clusters provisioned by Spectro Cloud on VMware:

  • Kubernetes nodes can be distributed across multiple compute clusters which serve as distinct fault domains.
  • Support for static IP as well as DHCP
  • IP pool management for assigning blocks of IPs dedicated to clusters or projects.
  • In order to facilitate communication between the Spectro Cloud management platform and vCenter installed in the private datacenter, a Private Cloud Gateway needs to be set up within the environment.
  • Private Cloud Gateway(PCG) is Spectro Cloud's on-prem component to enable support for isolated private cloud or datacenter environments. Spectro Cloud Gateway, once installed on-prem registers itself with Specto Cloud's SaaS portal and enables secure communication between the SaaS portal and private cloud environment. The gateway enables installation and end-to-end lifecycle management of Kubernetes clusters in private cloud environments from Spectro Cloud's SaaS portal.

vmware_arch_oct_2020.png

Prerequisites

The following prerequisites must be met before deploying a kubernetes clusters in VMware:

  • vSphere 6.7U3 or later (recommended).
  • NTP configured on all Hosts.
  • You must have an active vCenter account with all the permissions listed below in the "VMware Cloud Account Permissions" section.
  • You should have an Infrastructure cluster profile created in Spectro Cloud for VMWare.
  • You should install a Private Cloud Gateway for VMware as decribed in the "Installing Private Cloud Gateway - VMware" section below. Installing the Private Cloud Gateway will automatially register a cloud account for VMware in Spectro Cloud. You can register your additional VMware cloud accounts in Spectro Cloud as described in the "Creating a VMware Cloud account" section below.
  • Egress access to the internet (direct or via proxy):
    • For proxy: HTTP_PROXY, HTTPS_PROXY (both required).
    • Outgoing internet connection on port 443 to api.spectrocloud.com.
  • Private cloud gateway IP requirements:
    • 1 node - 1 IP or 3 nodes - 3 IPs.
    • 1 Kubernetes control-plane VIP.
    • 1 Kubernetes control-plane extra.
  • IPs for application workload services (e.g.: LoadBalancer services).
  • Subnet with egress access to the internet (direct or via proxy):
    • For proxy: HTTP_PROXY, HTTPS_PROXY (both required).
    • Outgoing internet connection on port 443 to api.spectrocloud.com.
  • DNS to resolve public internet names (e.g.: api.spectrocloud.com).
  • NTP configured on all Hosts.
  • Shared Storage between vSphere hosts.
  • Configuration Requirements - A Resource Pool needs to be configured across the hosts, onto which the workload clusters will be provisioned. Every host in the Resource Pool will need access to shared storage, such as VSAN, in order to be able to make use of high-availability control planes. Network Time Protocol (NTP) must be configured on each of the ESXi hosts.

VMware Cloud Account Permissions

The vSphere user account used in the various Spectro Cloud tasks must have the minimum vSphere privileges required to perform the task. The Administrator role provides super-user access to all vSphere objects. For users without the Administrator role, one or more custom roles can be created based on the tasks being performed by the user.

Privileges under root-level role

The root-level role privileges are applied to root object and datacenter objects only.
vSphere ObjectPrivileges
CnsSearchable
DatastoreBrowse datastore
HostConfiguration
* Storage partition configuration
vSphere TaggingCreate vSphere Tag
Edit vSphere Tag
NetworkAssign network
SessionsValidate session
Profile-driven storageProfile-driven storage view
Storage viewsView

Privileges under spectro role

The Spectro role privileges are applied to hosts, clusters, virtual machines, templates, datastore and network objects.
vSphere ObjectPrivileges
CnsSearchable
DatastoreAllocate space
Browse datastore
Low level file operations
Remove file
Update virtual machine files
Update virtual machine metadata
FolderCreate folder
Delete folder
Move folder
Rename folder
HostLocal operations
Reconfigure virtual machine
vSphere TaggingAssign or Unassign vSphere Tag
Create vSphere Tag
Delete vSphere Tag
Edit vSphere Tag
NetworkAssign network
ResourceApply recommendation
Assign virtual machine to resource pool
Migrate powered off virtual machine
Migrate powered on virtual machine
Query vMotion
SessionsValidate session
Profile-driven storageProfile-driven storage view
Storage viewsConfigure service
View
TasksCreate task
Update task
vAppExport
Import
View OVF environment
vApp application configuration
vApp instance configuration
Virtual machinesChange Configuration
* Acquire disk lease
* Add existing disk
* Add new disk
* Add or remove device
* Advanced configuration
* Change CPU count
* Change Memory
* Change Settings
* Change Swapfile placement
* Change resource
* Configure Host USB device
* Configure Raw device
* Configure managedBy
* Display connection settings
* Extend virtual disk
* Modify device settings
* Query Fault Tolerance compatibility
* Query unowned files
* Reload from path
* Remove disk
* Rename
* Reset guest information
* Set annotation
* Toggle disk change tracking
* Toggle fork parent
* Upgrade virtual machine compatibility
Edit Inventory
* Create from existing
* Create new
* Move
* Register
* Remove
* Unregister
Guest operations
* Guest operation alias modification
* Guest operation alias query
* Guest operation modifications
* Guest operation program execution
* Guest operation queries
Interaction
* Console interaction
* Power off
* Power on
Provisioning
* Allow disk access
* Allow file access
* Allow read-only disk access
* Allow virtual machine download
* Allow virtual machine files upload
* Clone template
* Clone virtual machine
* Create template from virtual machine
* Customize guest
* Deploy template
* Mark as template
* Mark as virtual machine
* Modify customization specification
* Promote disks
* Read customization specifications
Service configuration
* Allow notifications
* Allow polling of global event notifications
* Manage service configurations
* Modify service configuration
* Query service configurations
* Read service configuration
Snapshot management
* Create snapshot
* Remove snapshot
* Rename snapshot
* Revert to snapshot
vSphere Replication
* Configure replication
* Manage replication
* Monitor replication
vSANCluster
ShallowRekey

Creating a VMware cloud gateway

For self hosted version, a system gateway is provided out of the box and typically installing a Private Cloud Gateway is not required. However, additional gateways can be created as required to support provisioning into remote datacenters that do not have direct incoming connection from the management console.
  • Minimum capacity required for a Private Cloud Gateway:
    • 1 node - 2 vCPU, 4GB memory, 30GB storage.
    • 3 nodes - 6 vCPU, 12GB memory, 70GB storage.

Setting up a cloud gateway involves initiating the install from the tenant portal, deploying gateway installer VM in vSphere, and launching the cloud gateway from the tenant portal.

Tenant Portal - Initiate Install

  • As a tenant administrator, navigate to the Private Cloud Gateway page under settings and invoke the dialogue to create a new private cloud gateway.
  • Note down the link to the Spectro Cloud Gateway Installer OVA and PIN displayed on the dialogue.

vSphere - Deploy Gateway Installer

  • Initiate deployment of a new OVF template by providing a link to the installer OVA as the URL.
  • Proceed through the OVF deployment wizard by choosing the desired name, placement, compute, storage, and network options.
  • At the 'Customize Template' step, specify Spectro Cloud properties as follows:
ParameterValueRemarks
Installer NameDesired Spectro Cloud Gateway NameThe name will be used to identify the gateway instance. Typical environments may only require a single gateway to be deployed, however, multiple gateways might be required for managing clusters across multiple vCenters. Choose a name that can easily identify the environment that this gateway instance is being configured for.
Console endpointURL to Spectro Cloud management platform portalhttps://console.spectrocloud.com by default
Pairing CodePIN displayed on the Spectro Cloud management platform portal's 'Create a new gateway' dialogue.
SSH Public KeyOptional key, useful for troubleshooting purposes (Recommended)Enables SSH access to the VM as 'ubuntu' user
Pod CIDROptional - IP range exclusive to podsThis range should be different to prevent an overlap with your network CIDR.
Service cluster IP rangeOptional - IP range in the CIDR format exclusive to the service clustersThis range also must not overlap with either the pod CIDR or your network CIDR.

Additional properties that are required to be set only for a Proxy Environment. Each of the proxy properties may or may not have the same value but all the three properties are mandatory.

ParameterValueRemarks
HTTP PROXYThe endpoint for the HTTP proxy serverThis setting will be propagated to all the nodes launched in the proxy network. e.g., http://USERNAME: PASSWORD@PROXYIP: PROXYPORT
HTTPS PROXYThe endpoint for the HTTPS proxy serverThis setting will be propagated to all the nodes launched in the proxy network. e.g., http://USERNAME: PASSWORD@PROXYIP: PROXYPORT
NO ProxyA comma-separated list of vCenter server, local network CIDR, hostnames, domain names that should be excluded from proxyingThis setting will be propagated to all the nodes to bypass the proxy server . e.g., vcenter.company.com, .company.org, 10.10.0.0/16
  • Finish the OVF deployment wizard and wait for the OVA to be imported and Virtual Machine to be deployed.
  • Power on the Virtual Machine.

Tenant Portal - Launch Cloud Gateway

  • Close the 'Create New Gateway' dialogue if still open or navigate to the Private Cloud Gateway page under settings in case you have navigated away or been logged out.
  • Wait for a gateway widget to be displayed on the page and for the "Configure" option to be available. The IP address of the installer VM will be displayed on the gateway widget. This may take a few minutes after the virtual machine is powered on. Failure of the installer to register with the Spectro Cloud management platform portal within 10 mins of powering on the Virtual Machine on vSphere, might be indicative of an error. Please follow the troubleshooting steps to identify and resolve the issue.
  • Click on the "Configure" button to invoke the Spectro Cloud Configuration dialogue. Provide vCenter credentials and proceed to the next configuration step.
  • Choose the desired values for Datacenter, Compute Cluster, Datastore, Network, Resource pool, and Folder. Optionally provide one or more SSH Keys and/or NTP server addresses.
  • Choose the IP Allocation Scheme - Static IP or DHCP. If static IP is selected, an option to create an IP pool is enabled. Proceed to create an IP pool by providing an IP range (start and end IP addresses) or a subnet. The IP addresses from this IP Pool will be assigned to the gateway cluster. By default, the IP Pool is available for use by other tenant clusters. This can be prevented by enabling the "Restrict to a single cluster" button. A detailed description of all the fields involved in the creation of an IP pool can be found here.
  • Click on Confirm, to initiate provisioning of the gateway cluster. The status of the cluster on the UI should change to 'Provisioning' and eventually 'Running' when the gateway cluster is fully provisioned. This process might take several minutes (typically 8 to 10 mins). You can observe a detailed provisioning sequence on the cluster details page, by clicking on the gateway widget on the UI. If provisioning of the gateway cluster runs into errors or gets stuck, relevant details can be found on the summary tab or the events tab of the cluster details page. In certain cases where provisioning of the gateway cluster is stuck or failed due to invalid configuration, the process can be reset from the Cloud Gateway Widget on the UI.
  • Once the Gateway transitions to the 'Running' state, it is fully provisioned and ready to bootstrap tenant cluster requests.
Gateway cluster installation automatically creates a cloud account behind the scenes using the credentials entered at the time of deploying the gateway cluster. This account may be used for the provisioning of clusters across all tenant Projects

vSphere - Clean up installer

  • Power off installer OVA which was initially imported at the start of this installation process.

Troubleshooting

Gateway installer - Unable to register with the tenant portal

The installer VM, when powered on, goes through a bootstrap process and registers itself with the tenant portal. This process typically takes 5 to 10 mins. Failure of the installer to register with the tenant portal within this duration might be indicative of a bootstrapping error. SSH into the installer virtual machine using the key provided during OVA import and inspect the log file located at '/var/log/cloud-init-output.log'. This log file will contain error messages in the event there are failures with connecting to the Spectro Cloud management platform portal, authenticating, or downloading installation artifacts. A common cause for these errors is that the Spectro Cloud management platform console endpoint or the pairing code is typed incorrectly. Ensure that the tenant portal console endpoint does not have a trailing slash. If these properties were incorrectly specified, power down and delete the installer VM and re-launch with the correct values.

Another potential issue is a lack of outgoing connectivity from the VM. The installer VM needs to have outbound connectivity directly or via a proxy. Adjust proxy settings (if applicable) to fix the connectivity or power down and delete the installer VM and relaunch in a network that enables outgoing connections.

If the above steps do not resolve your issues, copy the following script to the installer VM and execute to generate a logs archive. Open a support ticket and attach the logs archive to the ticket to allow the Spectro Cloud Support team to troubleshoot and provide further guidance:

#!/bin/bash
DESTDIR="/tmp/"
CONTAINER_LOGS_DIR="/var/log/containers/"
CLOUD_INIT_OUTPUT_LOG="/var/log/cloud-init-output.log"
CLOUD_INIT_LOG="/var/log/cloud-init.log"
KERN_LOG="/var/log/kern.log"
KUBELET_LOG="/tmp/kubelet.log"
SYSLOGS="/var/log/syslog*"
FILENAME=spectro-logs-$(date +%-Y%-m%-d)-$(date +%-HH%-MM%-SS).tgz
journalctl -u kubelet > $KUBELET_LOG
tar --create --gzip -h --file=$DESTDIR$FILENAME $CONTAINER_LOGS_DIR $CLOUD_INIT_LOG $CLOUD_INIT_OUTPUT_LOG $KERN_LOG $KUBELET_LOG $SYSLOGS
retVal=$?
if [ $retVal -eq 1 ]; then
echo "Error creating spectro logs package"
else
echo "Successfully extracted spectro cloud logs: $DESTDIR$FILENAME"
fi

Gateway Cluster - Provisioning stalled/failure

Installation of the gateway cluster may run into errors or might get stuck in the provisioning state for a variety of reasons like lack of infrastructure resources, IP addresses not being available, unable to perform NTP sync, etc. While these are most common, some of the other issues might be related to the underlying VMware environment. The Cluster Details page, which can be accessed by clicking anywhere on the gateway widget, contains details of every orchestration step including an indication of the current task being executed. Any intermittent errors will be displayed on this page next to the relevant orchestration task. The events tab on this page also provides a useful resource to look at lower-level operations being performed for the various orchestration steps. If you think that the orchestration is stuck or failed due to an invalid selection of infrastructure resources or an intermittent problem with the infrastructure, you may reset the gateway by clicking on the 'Reset' button on the gateway widget. This will reset the gateway state to 'Pending' allowing you to reconfigure the gateway and start provisioning of a new gateway cluster. If the problem persists, please contact Spectro support via the Service Desk.

Upgrading a VMware cloud gateway

Spectro Cloud maintains the OS image and all configurations for the cloud gateway. Periodically, the OS images, configurations, or other components need to be upgraded to resolve security or functionality issues. Spectro Cloud releases such upgrades when required and communication about the same is presented in the form of an upgrade notification on the gateway.

Administrators should review the changes and apply them at a suitable time. Upgrading a cloud gateway does not result in any downtime for the tenant clusters. During the upgrade process, the provisioning of new clusters might be temporarily unavailable. New cluster requests are queued while the gateway is being upgraded, and are processed as soon as the gateway upgrade is complete.

Deleting a VMware cloud gateway

The following steps need to be performed to delete a cloud gateway:

  • As a tenant administrator, navigate to the Private Cloud Gateway page under settings.
  • Invoke the ‘Delete’ action on the cloud gateway instance that needs to be deleted.
  • The system performs a validation to ensure there are no running tenant clusters associated with the gateway instance being deleted. If such instances are found, the system presents an error. Delete relevant running tenant clusters and retry the deletion of the cloud gateway.
  • Delete the gateway Virtual Machines from vSphere.

Resizing a VMware cloud gateway

A Cloud gateway can be set up as a 1-node or a 3-node cluster. For production environments, it is recommended that 3 nodes are set up. A cloud gateway can be initially set up with 1 node and resized to 3 nodes at a later time. The following steps need to be performed to resize a 1-node cloud gateway cluster to a 3-node gateway cluster:

  • As a tenant administrator, navigate to the Private Cloud Gateway page under settings.

  • Invoke the resize action for the relevant cloud gateway instance.

  • Update the size from 1 to 3.

  • The gateway upgrade begins shortly after the update. Two new nodes are created on vSphere and the gateway is upgraded to a 3-node cluster.

Scaling a 3-node cluster down to a 1-node cluster is not permitted.

A load balancer instance is launched even for a 1-node gateway to support future expansion.

IP Address Management

Spectro cloud supports DHCP as well as Static IP based allocation strategies for the VMs that are launched during cluster creation. IP Pools can be defined using a range or a subnet. Administrators can define one or more IP pools linked to a private cloud gateway. Clusters created using a private cloud gateway can select from the IP pools linked to the corresponding private cloud gateway. By default, IP Pools are be shared across multiple clusters, but can optionally be restricted to a cluster. The following is a description of various IP Pool properties:

PropertyDescription
NameDescriptive name for the IP Pool. This name will be displayed for IP Pool selection when static IP is chosen as the IP allocation strategy
Network TypeSelect 'Range' to provide a start and an end IP address. IPs within this range will become part of this pool. Alternately select 'Subnet' to provide the IP range in CIDR format.
StartFirst IP address for a range based IP Pool E.g. 10.10.183.1
EndLast IP address for a range based IP Pool. E.g. 10.10.183.100
SubnetCIDR to allocate a set of IP addresses for a subnet based IP Pool. E.g. 10.10.183.64/26
Subnet PrefixNetwork subnet prefix. e.g. /18
GatewayNetwork Gateway E.g. 10.128.1.1
Nameserver addressesA comma-separated list of name servers. e.g., 8.8.8.8
Restrict to a Single ClusterSelect this option to reserve the pool for the first cluster that uses this pool. By default, IP pools can be shared across clusters.

Creating a VMware cloud account

Configuring the private cloud gateway is a prerequisite task. A default cloud account is created when the private cloud gateway is configured. This cloud account can be used to create a cluster.
Enterprise version users should choose the "Use System Gateway" option.

In addition to the default cloud account already associated with the private cloud gateway, new user cloud accounts can be created for the different vSphere users.

PropertyDescription
Account NameCustom name for the cloud account
Private cloud gatewayReference to a running cloud gateway
vCenter ServerIP or FQDN of the vCenter server
UsernamevCenter username
PasswordvCenter password

Deploying a VMware Cluster

The following steps need to be performed to provision a new VMware cluster:

  • Provide basic cluster information like name, description, and tags. Tags are currently not propagated to the VMs deployed on the cloud/data center environments.

  • Select a cluster profile created for the VMware environment. The profile definition will be used as the cluster construction template.

  • Review and override pack parameters as desired. By default, parameters for all packs are set with values defined in the cluster profile.

  • Provide a vSphere Cloud account and placement information.

    • Cloud Account - Select the desired cloud account. VMware cloud accounts with credentials need to be pre-configured in project settings. An account is auto-created as part of the cloud gateway setup and is available for provisioning of tenant clusters if permitted by the administrator.
    • Datacenter -The vSphere datacenter where the cluster nodes will be launched.
    • Folder - The vSphere VM Folder where the cluster nodes will be launched.
    • SSH Keys (Optional) - Public key to configure remote SSH access to the nodes (User: spectro).
    • NTP Server (Optional) - Setup time synchronization for all the running nodes.
    • IP Allocation strategy - DHCP or Static IP
  • Configure the master and worker node pools. A master and a worker node pool are configured by default.

    • Name - A descriptive name for the node pool.
    • Size - Number of nodes to be provisioned for the node pool. For the master pool, this number can be 1, 3, or 5.
    • Allow worker capability (master pool) - To workloads to be provisioned on master nodes.
    • CPU - Number of CPUs to be allocated to the nodes.
    • Memory - Amount of memory in GB to be allocated to the nodes.
    • Disk - Storage disk size in GB to be attached to the node.
    • One or more placement domains. VMs are distributed across multiple placement domains on a round-robin basis. Currently, only one placement domain is supported for a master pool.
      • Compute Cluster - A Compute cluster under the selected Datacenter.
      • Datastore - The vSphere storage in the selected Datacenter.
      • Network - The vSphere Network in the selected Datacenter, to enable connectivity for the cluster nodes.
      • Resource Pool- The vSphere resource pool where the cluster nodes will be launched.
      • IP Pool - An IP pool to be used for allocation IP addresses to cluster VMs. Required only for Static IP allocation. IP pools need to be predefined for private cloud gateways.
  • Review settings and deploy the cluster. Provisioning status with details of ongoing provisioning tasks is available to track progress.

New worker pools may be added if it is desired to customize certain worker nodes to run specialized workloads. As an example, the default worker pool may be configured with 4 CPUs, 8GB of memory for general-purpose workloads, and another worker pool with 8CPU, 16 GB of memory for advanced workloads that demand larger resources.

Deleting a VMware Cluster

The deletion of a VMware cluster results in the removal of all Virtual machines and associated storage disks created for the cluster. The following tasks need to be performed to delete a VMware cluster:

  • Select the cluster to be deleted from the cluster view and navigate to the cluster overview page
  • Invoke a delete action available on the page
  • Confirm delete action
  • Cluster status is updated to ‘Deleting’ while cluster resources are being deleted. Provisioning status is updated with the ongoing progress of the delete operation. Once all resources are successfully deleted, the cluster status changes to ‘Deleted’ and it is removed from the list of clusters.
Delete action is only available for clusters that are fully provisioned. For clusters that are still in the process of being provisioned, ‘Abort’ action is available to stop provisioning and delete all resources.