Over the last few months I have been designing and building a solution for vRealize Automation 6 for a customer. (I know, I know, its not the latest and greatest version with all the whiz-bang features of vRA7 – but the only solace I take is that the VCAP-CMA exams are currently based on vRA 6 – so hopefully it is good practice).
You will have noticed that the last few posts I have created are concerning the VMware vSphere Platform Services Controller (PSC) and how to install it (and automate it). This is because I am using the PSC as the SSO solution instead of the Identity Appliance (which not highly available and therefore not suitable for production environments) or the vSphere 5.5 SSO (out dated and replaced with the vSphere PSC).
The SSO Design
The majority of the design was “simple”:
- Separate from vCenter PSC to remove single point of failure for sso and so the vCenter SSO solution could avoid the restrictions placed on PSC by vRA6.
- Create a two node appliance based cluster in the Primary DC.
- Load Balance using a F5 Load Balancer to make it “Highly Available”
- A Single SSO domain (vsphere.local)*
- A Single SSO site (Default-First-Site)*
- Add the PSC to Active Directory to allow integrated authentication for the Default Tenant.
- Protect the PSC cluster via VMware SRM for failover (stretched VLAN so same IP) to the Secondary DC
- Backup using VMware Backup Appliance (which is then backed up via EMC Networker).
*Constraint of the vRA6 Architecture.
The only real design decision to make was whether or not to create the PSC as a VMware Certificate Authority (VMCA) Subordinate CA or not.
Now for those who don’t know (or have been marooned on a desert island for the last 12 months) within vSphere 6, VMware introduced the ability to manage Certificate Services for the vSphere infrastructure. For a quick overview of the options, head on other to @AtherBeg’s post vSphere 6: VMware Certificate Authority (VMCA): Design Decisions.
Where it went wrong
I originally decided it would be overkill to make the PSC as a VMCA SubCA after all there was no vSphere components requiring certificates to be managed (these would be connected to the other vSphere PSC solution through vCenter Server). It would work as a SubCA but as it wasn’t also being used in a vSphere environment, it wasn’t needed. Design decision made and documented. I then did something I don’t recall ever doing before in 15 years of IT. I started to question my own design decision, no-one prompted me or said anything, I was given to much time to think about it. I broke one of my cardinal rules of architecture and design:
Never question a decision unless you find hard evidence that means you know you’ve made a mistake.
Don’t get me wrong, with the benefit of hindsight, there have been a few design decisions that I would change if I could. But not because they were wrong, just knowing what I know now, there may have been a better way. I think they call that development and learning.
Anyway, for some unknown reason I changed my mind and decided to make both the PSCs VMCA Subordinate CA’s as I didn’t think it would be a major problem. It’s basically all about the SSL Certificates afterall…
When deploying the PSC as a VMCA there are certain things that you need to think about:
1 – You have to wait 24 hours before you can assign the certificate to the VMCA
It seems when using VMware VMCA as a Subordinate CA, the solution as enforces the minimum recommended Staging Period detailed with RFC6489 for a new Certificate Authority. This is a good thing, unless the team responsible for creating certificates takes days to do so …
2 – You will need more than one SSL certificate!
Ok so this isn’t a massive problem when you have your own Enterprise CA, but if you’re buying SSL Certificates from a third party you need 3 SSL Certs when using VMCA SubCA rather than just 1 if you use the External CA model. You then have to manage separate certificates and replace them based on whatever time frame you have decided upon. You will need an issuing SSL Certificate for each node of the PSC Cluster, plus an SSL additional SAN Certificate.
3 – There is work to be done for VAMI!
When you replace the Machine SSL Certificate with Subject Alternative Name (SAN) Certificate using the VMware Certificate Manager, you can then run the
/usr/lib/applmgmt/support/scripts/postinstallscripts/lighttpd-vecs-integration.sh command to automatically replace the VMware Appliance Management Interface (VAMI) Certificate so that when you log into https://psc.fqdn:5480 that certificate is replaced with a trusted one. When you replace the VMCA Root Certificate, this doesn’t work so you will need another SSL certificate (hence the need for the additional SAN certificate). You can obviously choose to leave the VAMI certificate untrusted – it just doesn’t look very professional.
4 – There is MORE work to be done if you want HA! (and who doesn’t want HA)?
VMware has a one liner in their article on Replace VMCA Root Certificate with Custom Signing Certificate and Replace All Certificates, stating:
If company policy requires that you replace all certificates, replace the vmdir root certificate. See Replace the VMware Directory Service Certificate.
This seems a pretty innocuous statement, however what they really mean (amongst other things) is that if you want to configure your Platform Service Controller within a an externally Load Balanced cluster (to provide HA) and you want that cluster to have a complete trusted certificate chain, then you need to replace the VMware Directory Services (vmdir) Certificate with a trusted SAN SSL Certificate because the script (which is run from the ssh shell)
python gen-lb-cert.py --primary-node --lb-fqdn=load_balanced_fqdn used to generate the keys for the Load Balancer (see VMware KB2113315) will use the vmdir certificate, which when generated as part of the PSC installation is not trusted.
So what happened next?
Well I think you can guess what happened next… I began to question my design decision on moving to VMCA rather than External CA. I spoke to the Lead Architect on the project and explained that I felt the VMCA just added complexity and provide no tangible benefit and recommended we reverted back to the original design. Luckily, the SSL SAN Certificate already created could be re-used, so there was no delay from that side. It was just a quick (and automated) rebuild of the Platform Service Controller Cluster and a reconfiguration to use an External CA rather than VMCA.
Was it more work? Yes. Was it necessary? Probably Not. Was it a better solution? Yes. Is it easier to support from the customer perspective? Yes. Will I make the same mistake again? I hope not!
Deploying the PSC and configuring it as a Subordinate CA in a vSphere environment is definitely the way to go. It is more complicated than an External CA but you only have to manage the 3 certificates rather than certificates for each of your VMware ESXi hosts. When using the PSC as an identity source solely for vRA6, if you are given the option, I would stick with the External CA.