Wednesday, October 28, 2015

vSphere on UCS design - networking

Recently, I have had a number of customers who wanted recommendations on how to setup their networking when using vSphere on the Cisco UCS Platform.

One customer called Cisco and got this recommendation from the vendor:

- Setup multiple vNICs on UCS - one for each FI (A and B) for vMotion, one for Management and then one or more for other VLANs for maximum flexibility.  If we then match the vNICs that are attached to FI-A together for vMotion into the same Distributed Switch, we can avoid Northbound (external switch) traffic. We can also perform QOS/Traffic shaping at the Cisco UCS level.

It makes sense that the hardware vendor would suggest a hardware solution that highlights and uses the capabilities of their platform.

However, I don't think this is the optimum solution.

I would just leave the vNICs at 2 - one per FI.  And don't have them set for failover - because if a path fails, I want to see that failure at the vSphere level and not have it hidden from me.  I will have the vNICs setup to handle failures already in vSphere, so I don't need it twice.

Then, just have more portgroups - one for Management, two for vMotion (one per FI/vNIC - so all vNICs that are tied to FI-A are together in one portgroup and all vNICs that are tied to FI-B are together in another portgroup), and then additional portgroups as needed for VMs.

Adding additional vNICs does not change the amount of bandwidth available or the number of physical connections coming out of a blade - it is limited to 1 connection to each the rest is semantics - adding portgroups is effectively the same as adding vNICs.

Plus, using the VMware method allows you to use a single pane of glass to control and manage all those features.

In reality, it really makes little difference whether you use vNICs or Portgroups - all that changes is the point at which you are separating the traffic. In the vNIC case, you are separating it before vSphere sees the Portgroup case you are allow vSphere to handle the separation.

The decision comes down to whether your business prefers to control this from the hardware or software side.  Plus, are the features you require available in UCS or vSphere? If you really want to use NIOC or other Distributed vSwitch features, then choose the vSphere method.

I would definitely not do both - don't enable QOS or Traffic Shaping at the vSphere level and then also do it at the UCS level. You are adding complexity without adding benefit. Plus diagnosing issues becomes much more difficult.

Thursday, July 31, 2014

Change ESXi Hostname/IP when using a vSphere Distributed Switch (VDS)

Recently, I had a client that wanted to re-organize their ESXi servers.  They wanted all the IP’s nicely lined up and the host names changed.


But they were also using a VDS. This was my solution.


The only way I know of to accomplish this is to remove the host from the cluster and re-add it. Because the host is part of a VDS, you have to remove it from the VDS first before you can remove it from the cluster.

I had previously setup the cluster to use Host Profiles – THIS IS IMPORTANT!  It will save tons of time.

These steps are for the vSphere Client.

So…here are the steps:

  1. Put the host into maintenance mode.
  2. Host->Configuration->vSphere Distributed Switch->Manage Physical Adapters->Remove one adapter (you do have more than one right?!)
  3. vSphere Standard Switch->Add Networking->Virtual Machine Type->Click through to Finish (we are going to migrate the VMK port, so we don’t need a new one)
  4. vSphere Distributed Switch->Manage Virtual Adapters-><the vmk# port> –>Migrate->The vSwitch you just created->enter the right VLAN!
  5. vSphere Distributed Switch->Manage Physical Adapters->Remove other physical ports
  6. Home->Networking->DVSwitch->Hosts Tab->right-click the host in maintenance mode->Remove from vSphere Distributed Switch
  7. Hosts and Clusters->Right Click the host in maintenance mode->Remove
  8. Go to the KVM console of the host and change the IP and hostname
  9. Back to the vSphere client->Cluster->Add host
  10. Host->Attach Profile (Manage Profile)
  11. Host Profiles->Apply Profile
  12. Host Profiles->Check Compliance
  13. Exit Maintenance mode

That should do it!

Saturday, July 26, 2014

Just For Laughs review

So my friends and I spent thousands of dollars to stay for 5 days in Montreal and go to the Just For Laughs shows.

We figured the best shows to see would be the Galas with the big names.

But, the way it works is this: the headliner only hosts the show. They just introduce the comics. And, you can't find out before hand who you are actually gonna see. So, you buy tickets based on the big name.

All the headliners (Chevy Chase, Adam Samberg) didn't really do anything except read introductions from a TelePrompTer. I coulda done that.

Most of the comics weren't funny and we had lights shining in our eyes the whole show.

So disappointing.

Out of 5 Galas and 2 other shows in 5 nights, we might have seen a total of 5-6 good comics.

There were a few bright spots, but mostly it was just crude, lude and racist humour.

Ron White was pretty good, but his intro comic was actually funnier.

Oh well, lesson learned.


Just saw the Gala hosted by Russell Peters - now that it is how it should be done!  What a great show! George Wallace, Godfrey...actually all the comics were funny! Russell himself did a set too and was good.

Super glad they weren't all weak.

Tuesday, June 17, 2014

vSphere 5.5 bug with Traffic Shaping causes PSOD

I was recently installing a new vSphere 5.5 installation on IBM PureFlex x240's.

It consisted of 9 blades at one site and 9 at a 2nd location for DR.

These blades have 10gb nics.

As per my normal practice, I configured Traffic Shaping on the vMotion ports on the vDS.

Next morning I had a PSOD!

Ouch.  Updated to latest firmware and patch release of vSphere, but the problem did not go away.

After a call to VMware support, I was directed to this KB article:

Apparently, although 5.1 has been patched, 5.5 hasn't yet (as of this writing - June 17, 2014).

The workaround is to turn off traffic shaping.

Something to watch out for.

Friday, June 13, 2014


I really like the new VMware VSAN technology. However, I can only think of one real use case.


VMware VSAN is a way to turn the commodity hard disks that sit inside your ESXi servers into a distributed storage array. It requires at least 1 SSD drive in each server, which it uses to increase the performance of the array.


It also requires a minimum of 3 hosts. This is where I start having problems with the product.


I have installed vSphere in many SMB’s and also large enterprise customers.


The SMB’s will typically purchase vSphere Essentials or Essentials Plus – it is a perfect fit for them. Provides them with enough capacity while being inexpensive enough to provide good value.


However, in order to make VSAN work, you really need 4 hosts or maybe 5.  I want N+2 redundancy for my clients, so I want to be able to take a host down (maintenance), and still be able to function. But then what if a host that is in production fails? So, I require that the system can handle a failure while a server is in maintenance.

For a SMB client with 3 servers, this isn’t going to work….VSAN will fail if it goes down to 1 server. But in a shared storage environment (and I sell a lot of the little EMC 3150 storage arrays), this would not be a problem.

And a SMB customer doesn’t want to purchase extra servers just so his VSAN can function – better to spend the money on a shared storage array that is dedicated for the purpose and has redundant service processors that handle failures, online updates etc.


For large enterprise customers, you just won’t be able to fit the storage requirements they have into servers. Many of these customers are purchasing blades (Cisco UCS is product that I install alot and love!) and you simply cannot fit a lot of storage inside these units.  In fact, I am now recommending just booting from either a) Auto Deploy server, or B) USB sticks (or a combination of both using the Stateless Caching feature of the Auto Deploy server.


So…where do I see this technology being useful?  In a VDI (Horizon View) deployment.  I would still see the customer using a SAN, but it makes sense to store the Replica’s (copies of the Golden Master images) on the VSAN for close and fast access.


This is just my opinion – let me know what you think.



Thursday, October 24, 2013

VMware Web Client vs vSphere Client

As a VMware consultant, I am constantly using the VMware products. As most of you know, the vSphere Client or “Thick” is going away and we will be left with only the Web Client.

I have several problems with this.

I have heard other people say this, and many times it is just, “Well, I don’t like the web client” or “I’m not comfortable with it because it is not familiar”. I thought I would give some rather more concrete reasons.

The biggest problem is that it is going to make my job harder. What do I mean?

- First, screen real-estate.  On the exact same laptop, the vSphere Client lets me see way more stuff.  Look at these examples taken on the same laptop (my home lab):

image image

On the vSphere Client screen, I can see stats, datastores, networks all on one pane. With the Web Client, I have to click to another screen to see either the datastores or the networks. Plus, the Web Client just feels more crowded – I believe the fonts are bigger and so everything takes up more room. And I couldn’t find any way to move the Tasks pane to the bottom like the vSphere Client – though you can hide it completely though to give your middle pane more room.


- Tasks take more clicks and are sometimes difficult to find. Lets take an example that I do all the time. I often have to setup a small installation consisting of 2-3 hosts and 1 EMC NAS (often a VNXe). The VNXe is typically hooked up via iSCSI, and I often have only 4 NIC’s in the machine. This means I now have little choice but to use 2 of the iSCSI NICs for vMotion as well. I find that the best way to do this is make a single vSwitch with 2 NICs for the iSCSI traffic, but then change the failover order of the individual NICs so that they have only one NIC per VMKernel port (which is a requirement in order to do iSCSI multipathing on VMware).

So assuming that the vSwitch was already created with 2 NICs……to configure this in the vSphere Client, click the host in question, then the Configuration tab, then Networking, then click the VMKernel port, then Edit, then NIC Teaming and finally check the Override switch failover order and move the other adapters down to Unused adapters…like this:


How about the Web Client?  Well…it took me a long time to figure out how to do it, but I did finally find the way.  Click the host in question, the click the Manage tab, then the Networking tab, then you would think that it is under the VMKernel Adapters, but its not, you have to click the Virtual Switches, and then scroll down till you find the VMKernel ports, then click the VMKernel port till the box is highlighted and then click the Edit link (the pencil), and then go to Teaming and Failover, check the Override box and adjust the adapters till only one is Active and the rest unused….like this:


So…for the vSphere Client, it was 7 clicks to get to the screen where I could make the NIC changes, and for the Web Client it was 8 clicks plus some scrolling.  It may not seem like much, but if things are not logically laid out, then it makes it more difficult. Even during writing this article, I found myself looking around for the right spots again in the Web Client. To make matters worse, I am supposedly a VMware expert….I can’t imagine how daunting this must feel to someone that is new to VMware.


- The Web Client is slower – considerably slower. At everything. Slower to refresh, slower to draw, slower to respond to inputs.

- The Web Client crashes. The browser window just becomes unresponsive and the only thing you can do is close the browser, and start it back up and then login again. I am not saying that this is necessarily VMware’s fault – browser issues, Adobe issues, Addon’s that are enabled – but that isn’t the users (my fault) either.  I almost never have the vSphere Client crash and ultimately I have to compare usefulness of products. The fact that VMware wants to use a Web Based client is their technology decision, it is THEIR fault that they have decided to allow themselves to have many more variables involved.

- I can’t find a way to pass username/password to the Web Client as I could with the vSphere Client. I am logging into lots of different systems and I use a program called mRemoteNG to handle all my connections. With the vSphere Client, I can safely store the username/password in mRemote and then just double click the link and voila!, I am logged in and connected to either a ESXi or vCenter server.  No such luck with the Web Client. This means I have to go lookup the username/password for each client and take extra time to login.

On that note, Is there going to be a Web Client for ESXi?  Once the vSphere Client is gone, how do I connect directly to a ESXi server to manage it if the vCenter server isn’t running?


All of these things add up to wasting my time.  In my industry, my time IS my money. The more efficient I am with my time, the more money I make per hour.

If VMware had just made the Web Client with the same interface as the vSphere Client, I think I would have a lot less to complain about. It would still be familiar, I would know where to find things. But they decided to change things. I don’t mind change, but it is supposed to make my life easier, not harder.

And this change definitely makes my life harder – and that’s not a good thing.


Monday, April 9, 2012

High Availability Windows Share Using Linux Samba/DRBD/CTDB and GFS


This combination should allow me to have a single DNS name that maps to a single Virtual IP that can move back and forth between two Virtual Machines (running on my vSphere environment). This way I can have them on different storage (I.E. one on local storage and the other on iSCSI storage) and be able to turn one storage unit off for maintenance and still allow access to the Windows share. This setup automatically handles the replication between the nodes.

I was trying to follow these instructions: – but there are a number of errors in the instructions, plus some things weren’t clear to me.

We will be using Samba running on top of the GFS clustered filesystem, using CTDB (the clustered version of the TDB database used by Samba) and DRBD to handle all the replication duties.

We will have Static IPs for each machine, in my case

smb1 - and

smb2 - and

The 2nd IP on each VM is for the DRBD interface for replication

Lastly, there will be two Virtual IP’s that is shared between them – and

- First installed 32-bit version of Centos 5.5 with no options checked. Single hard drive only – 20 GB Thin provisioned. Set a static IP. Give it 2 NICs (one for network connectivity and the other for DRDB replication).

- setup – turn off SELinux

- Next, perform a:  yum update, then reboot. This updates to Centos 5.8 (as of this writing).

- Install VMware Tools.

- Clone the machine

- Add 2nd hard drive, add 2nd NIC to both VMs

- Boot the VM, fdisk /dev/sdb, n (new), p (primary partition), 1, accept defaults for cylinder start/end, w (write)

- no need to create filesystem or format the partition as we will get DRBD to use the new partition directly (/dev/sdb1)

- make sure that /etc/hosts file only has localhost.localdomain localhost and not the hostname of the machine

- Adjust /etc/hosts on 2nd VM

- Adjust IP of 2nd NIC on both VMs (setup and then service network restart – you might want to erase ifcfg-eth0.bak from /etc/sysconfig/network-scripts on the 2nd VM)

- Confirm that you can ping both machines from each other using the private IP as well as the Public IP

- allow root login (edit /etc/ssh/sshd_config and change PermitRootLogin to yes, we are behind a firewall right?!), then service sshd restart

- yum -y install drbd82 kmod-drbd82 samba joe autoconf automake gcc-c++

(I like the joe editor)

- yum -y groupinstall "Cluster Storage" "Clustering"

- Adjust /etc/drbd.conf on both nodes:

   1: global {
   2:     usage-count yes;
   3: }
   5: common {
   6:   syncer {
   7:                 rate 100M;
   8:                 al-extents 257;
   9:          }
  10: }
  12: resource r0 {
  14:   protocol C;
  16:   startup {
  17:     become-primary-on both;              ### For Primary/Primary ###
  18:     degr-wfc-timeout 60;
  19:     wfc-timeout  30;
  20:   }
  22:   disk {
  23:     on-io-error   detach;
  24:   }
  26:   net {
  27:     allow-two-primaries;                 ### For Primary/Primary ###
  28:     cram-hmac-alg sha1;
  29:     shared-secret "mysecret";

30: after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;

  33:   }
  35:   on smb1.yniw.local {
  36:     device     /dev/drbd0;
  37:     disk       /dev/sdb1;
  38:     address;
  39:     meta-disk  internal;
  40:   }
  42:   on smb2.yniw.local {
  43:     device     /dev/drbd0;
  44:     disk       /dev/sdb1;
  45:     address;
  46:     meta-disk  internal;
  47:   }
  48: }

On both nodes do:

drbdadm create-md r0

To put the two nodes as primary, on both nodes do:

drbdsetup /dev/drbd0 primary –o

On both nodes (at almost the same time), do:

service drbd start

Make drbd service start automatically at boot:

chkconfig --level 35 drbd on

Check on status of the drbd replication:

cat /proc/drbd or you can do:

service drbd status

Next we configure the GFS filesystem. Put the following into the /etc/cluster/cluster.conf on each system.

   1: <?xml version="1.0"?>
   2: <cluster name="cluster1" config_version="3">
   4: <cman two_node="1" expected_votes="1"/>
   6: <clusternodes>
   7: <clusternode name="smb1.yniw.local" votes="1" nodeid="1">
   8:         <fence>
   9:                 <method name="single">
  10:                         <device name="manual" ipaddr=""/>
  11:                 </method>
  12:         </fence>
  13: </clusternode>
  15: <clusternode name="smb2.yniw.local" votes="1" nodeid="2">
  16:         <fence>
  17:                 <method name="single">
  18:                         <device name="manual" ipaddr=""/>
  19:                 </method>
  20:         </fence>
  21: </clusternode>
  22: </clusternodes>
  24: <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
  26: <fencedevices>
  27:         <fencedevice name="manual" agent="fence_manual"/>
  28: </fencedevices>
  30: </cluster>

Next, we start cman on both systems:

service cman start

And then we start the other services:

service clvmd start
service gfs start
service gfs2 start

chkconfig --level 35 cman on
chkconfig --level 35 clvmd on
chkconfig --level 35 gfs on
chkconfig --level 35 gfs2 on

Then we format the cluster filesystem (only on one node):

gfs_mkfs -p lock_dlm -t cluster1:gfs -j 2 /dev/drbd0

Then we create the mount point and mount the drbd device (on both nodes):

mkdir /clusterdata
mount -t gfs /dev/drbd0 /clusterdata

Then we insert the following line into the /etc/fstab (mine is different than the instructions):

/dev/drbd0              /clusterdata            gfs

I found that the gfs argument was necessary – however…do not add the other items – default 1 1 –as this will cause the auto-mounting system to try and fsck them at startup which won’t work.

Now…you should be able to copy data onto that /clusterdata mount point on one node and have it show up on the other automatically.

Next we configure samba. Again, my working file is different than the original instructions.  Edit /etc/samba/smb.conf

   1: [global]
   3: clustering = yes
   4: idmap backend = tdb2
   5: private dir=/clusterdata/ctdb
   6: fileid:mapping = fsname
   7: use mmap = no
   8: nt acl support = yes
   9: ea support = yes
  10: security = user
  11: map to guest = Bad Password
  12: max protocol = SMB2
  14: [public]
  15: comment = public share
  16: path = /clusterdata/public
  17: public = yes
  18: writeable = yes
  19: only guest = yes
  20: guest ok = yes

Next, create the directories needed by samba:

mkdir /clusterdata/ctdb
mkdir /clusterdata/public
chmod 777 /clusterdata/public

Follow the same instructions to install CTDB:

First, we need to download it:

cd /usr/src
rsync -avz .
cd ctdb/

Then we can compile it:

cd /usr/src/ctdb/
make install

Creating the init scripts and config links to /etc:

cd /usr/src/ctdb
cp config/ctdb.sysconfig /etc/sysconfig/ctdb
cp config/ctdb.init /etc/rc.d/init.d/ctdb
chmod +x /etc/init.d/ctdb
ln -s /usr/local/etc/ctdb/ /etc/ctdb
ln -s /usr/local/bin/ctdb /usr/bin/ctdb
ln -s /usr/local/sbin/ctdbd /usr/sbin/ctdbd

Next, we need to config /etc/sysconfig/ctdb on both nodes:

joe /etc/sysconfig/ctdb

Again…there are mistakes in the example originally given and I have provided my copy:

   1: CTDB_RECOVERY_LOCK="/clusterdata/ctdb.lock"
   3: CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
   5: ulimit -n 10000
   6: CTDB_NODES=/etc/ctdb/nodes
   7: CTDB_LOGFILE=/var/log/log.ctdb

Now, config /etc/ctdb/public_addresses on both nodes:

vi /etc/ctdb/public_addresses

Then, config /etc/ctdb/nodes on both nodes:

vi /etc/ctdb/nodes

Then, config /etc/ctdb/events.d/11.route on both nodes:

vi /etc/ctdb/events.d/11.route

   1: #!/bin/sh
   3: . /etc/ctdb/functions
   4: loadconfig ctdb
   6: cmd="$1"
   7: shift
   9: case $cmd in
  10:     takeip)
  11:          # we ignore errors from this, as the route might be up already when we're grabbing
  12:          # a 2nd IP on this interface
  13:          /sbin/ip route add $CTDB_PUBLIC_NETWORK via $CTDB_PUBLIC_GATEWAY dev $1 2> /dev/null
  14:          ;;
  15: esac
  17: exit 0

Set +x permission on script:

chmod +x /etc/ctdb/events.d/11.route

Next…start the ctdb service:

service ctdb start

Here is one place I differ from those other instructions – he says to have the samba service auto-start, but the ctdb service handles the starting and stopping of samba, so you don’t do that.

Plus, I couldn’t make everything start properly when done from the init.d (using the chkconfig –level commands). The problem is that the GFS filesystem tries to be mounted by the fstab, but other things aren’t ready yet, so it doesn’t work. So I wrote the following lines into /etc/rc.local:

service drbd start
mount –a
service ctdb start

Also….to stop one of the servers and take it offline:

service ctdb stop
umount /clusterdata
service drdb stop

You should now have a working active/active Windows style share available that is fully redundant.

You can get to it by using a Windows PC and going to \\virtualip\public

So…in my example:  \\\public