Nutanix Cluster Shutdown and Start-Up

Hi all –

In this post, we are going to see about the steps involved in shutdown and start-up of Nutanix Cluster.

In general, we will not be in a scenario very often to bring down Nutanix Cluster. Most of the time for host \ node level maintenance activity, we can take one (or) two host at a time based on Resiliency Factor (RF) set.

However, some DC Infra & Network layer activities may disturb network availability and power state of all the nodes \ hosts part of Nutanix Cluster. During those scenarios, we need to shutdown the Nutanix Cluster prior to maintenance and bring up once completed with it.

Steps to shutdown the Nutanix Cluster:

  1. Ensure that the “Data resiliency” status of the target cluster is normal and there is no active critical alerts present.
  2. Shut down the User VMs present in Nutanix provided Datastore.
  3. Log in to any one of the CVM through SSH.
  4. Run the command “cluster stop” to bring down the Nutanix Cluster.
  5. Ensure all the services came down except Zeus and Scavenger in all the CVMs post executing above command. You can check the same by running the command “cluster status“.
  6. Once the Cluster came down, Nutanix provided datastores mounted on ESXi host will go inaccessible.
  7. Now you can bring down all the CVMs associated to that cluster from vSphere client (or) webclient gracefully [Using Shutdown guest option].
  8. All the VMs and Nutanix Cluster brought down properly. Now you can place the ESXi hosts in maintenance mode.

Steps to Start-up the Nutanix Cluster:

  1. Exit the ESXi hosts from maintenance mode.
  2. Power-ON the CVMs, wait for some time to complete boot-up.
  3. Once all the CVMs brought up, log in to any one of them via SSH.
  4. Run the command “cluster start” to bring up the Nutanix Cluster.
  5. Post executing the above command, ensure all the services came up by running the command “cluster status“.
  6. With in few minutes, PRISM portal will come accessible. You can see that the “Data resiliency” status reporting back as Normal.
  7. All the Nutanix provided datastores will come back accessible to ESXi hosts, now you can power-on the required User VMs.

Intended Audience – Administrators of Nutanix Virtual Computing Platform with vSphere ESXi.

Thanks for reading the post and do share your views 🙂


Never Stop Learning !

 

 

 

Nutanix Cluster – Enabling Maintenance mode on ESXi Host

Lets have an overview about Nutanix Virtual Computing Platform prior directly jumping in to the steps on enabling maintenance mode.

Hyper-Converged Infra (HCI)

HCI is a software based architecture which tightly integrates the compute, storage, network and Virtualization. Here the vital part is that the local storage of physical servers which are part of cluster are converged and provided as a pool of shared storage resource to utilize the Virtualization features.

Nutanix Virtual Computing Platform

Nutanix Virtual Computing Platform is a Hyper-Converged Infra. Here, Nutanix  uses its own developed NDFS (Nutanix Distributed File System) for converging storage resources.

In general, the physical hosts which are part of Nutanix Cluster are installed with Standard Hypervisor (In this case assume ESXi) and they have their own hardware resources such as Processor, Memory, Storage and Network.

Here Nutanix places its Controller-VM (CVM) in each host \ node of cluster. It is the one which is responsible for forming the unified shared storage resource and serving the IOPS from hypervisor. So, CVM is the key one which enables storage level convergence.

Logically we can say

  • Compute level clustering is happening with help of vSphere HA & DRS of hypervisor.
  • Storage level convergence \ clustering is happening with help of Nutanix CVM.

So, while taking a host \ node for activity. Two level of maintenance have to be placed.

  • Hypervisor level maintenance
  • CVM level maintenance.

Consider we are having Nutanix Cluster with 5-ESXi hosts and resiliency factor is set to withstand single node failure. So, we can safely take one node for maintenance activity.

Steps to enable maintenance mode:

It is good to collect the NCC log and have it verified with Nutanix to ensure that there is no existing critical issues in cluster.

Verify the “Data Resiliency status” of Nutanix Cluster in PRISM portal, it should be Normal prior starting the activity.

As a first step, we have to migrate all the User VMs which are residing in the target host (except CVM) to other hosts available in cluster.

Connect the CVM of target ESXi host via SSH and execute the below mentioned command to find its UUID.

ncli host ls | grep -C7 [IP-Adress of CVM]

Place the CVM in maintenance mode using its UUID which we have fetched in previous step.

ncli host edit id=[UUID] enable-maintenance-mode=”true”

Verify that the CVM has been placed in maintenance mode using following command. In this stage, CVM level Maintenance mode is enabled and confirmed.

cluster status | grep CVM

Now do the shutdown of CVM using below command.

cvm_shutdown -h now

Now it is safe to enable maintenance mode at hypervisor level. All the user VMs were migrated to other nodes and CVM also brought down gracefully as per previous steps.

Place the target ESXi host in maintenance mode and take it for your maintenance activity.

Steps to exit from maintenance mode:

Once completed with the maintenance activity, now we have to add the nodes back to cluster.

Exit the ESXi host from Maintenance mode and Power-ON the CVM.

Connect a neighbor CVM available in Cluster via SSH.

Check the status of CVM which we have Powered-ON. In this stage it should be reported as it is in maintenance mode.

ncli host ls | grep -C7 [IP-Adress of CVM]

Exit the CVM from maintenance mode using its UUID which we have fetched in previous step.

ncli host edit id=[UUID] enable-maintenance-mode=”false”

Verify that the CVM has been removed from maintenance mode using following command.

cluster status | grep CVM

CVM came out of maintenance mode now.

Ensure that the “Data resiliency and Meta-data sync status” came normal in PRISM portal. It may take few minutes to reflect.

Note: In the given commands, parameters in the brackets [ ] should be replaced with appropriate value.

For example –

ncli host ls | grep -C7 [IP-Address of CVM]   –>   ncli host ls | grep -C7 169.254.20.1

Intended Audience – Administrators of Nutanix Virtual Computing Platform with vSphere ESXi.

Thanks for reading the post and do share your views 🙂


Never Stop Learning !