Tag Archives: vcap

Sprint to VCIX (Part 4) — Determine Requirements, Constraints, Assumptions, and Risks

We are going to wrap up the conceptual design by categorizing the assessment findings into requirements, constraints, assumptions, and risks (RCARs).

Section 1 – Create a vSphere 6.5 Conceptual Design

Objective 1.3 for the VCAP-DCV Design exam is to Determine Requirements, Constraints, Assumptions, and Risks. vBrownBag hosted a 52-minute discussion on Objective 1.3 for the Design exam:

I think Rebecca Fitzhugh did a great job in the presentation. Personally, this is the most difficult part of the material to really lockdown. Throughout the vBrownBag series, many of the presenters do short quizzes on determining whether a statement is a requirement, constraint, assumption, or risk. Even at the writing of this article, I still spend quite a bit of time thinking about it. I have a management and design background and I still have to pay attention to the specific definitions of each concept.

“Determining the requirements, making and proving assumptions, determining constraints, and identifying risks forms the conceptual design and provides the foundation to build on for the logical design. Business and technical design factors that are identified as part of the conceptual design will be mapped to the resources that are necessary to satisfy them during the logical design process. The conceptual design phase is highlighted within the overall design and implementation flow diagram:”

VMware vSphere 6.7 Data Center Design Cookbook

The conceptual design is the combination of documented requirements, constraints, assumptions, and risks. It organizes design factors and qualities into a comprehensive reference for the rest of the design process. With a high-level diagram, the conceptual design is the foundation for design decisions.

Differentiate between the concepts of Requirements, Constraints, Assumptions, and Risks

Requirements

There are two types of requirements: functional requirements and nonfunctional requirements.

Hersey Cartwright, one of the authors of VMware vSphere 6.7 Data Center Design Cookbook, outlined a simple explanation of functional vs. nonfunctional requirements in a Reddit reply:

“From a very high level the way I look at it is a functional requirement is the WHAT and a non-functional requirement (or constraint) is the HOW the WHAT has to be accomplished. Functional requirements are the things the design must do. A non-functional requirement would be a constraint on HOW the design meets the functional requirements. A customer specifically noting something does not change the type of requirement (functional or non-functional). A design must satisfy both the functional requirements and non-functional requirements.

Providing N+2 redundancy to support a failure during a maintenance operations (one thing in maintenance and one thing fails) is functionality the design must provide. Splitting a hair or two here, the better way the requirement might be listed is: Provide the redundancy to support a failure during a maintenance operation(WHAT). Then you might list N+2(HOW) as a constraint due to the level of redundancy the design must provide (WHAT). In a customer design I (and this is a personal thing) probably would not separate it like that – though it would probably be more correct when aligning directly to the VMware Design methodology.”

Hersey Cartwright, Reddit

Functional

WHAT must be done? A functional requirement specifies a function that a system or component must accomplish. Rebecca gives an interesting perspective in the video above: “The business can’t function…without meeting this requirement.”

  • Business goals, rules, or policies
  • Legal, regulatory, certification, and compliance requirements
  • Reporting requirements
  • Data Retention
  • Audits Trails

For example:

  • The design will store and transact customer PHI so it must meet HIPAA compliance
  • The design should allow for auditing of all user actions within the compute systems
  • The design must allow for backups

Non-Functional

HOW must the WHAT be done? A nonfunctional requirement defines how the design accomplishes the functional requirements. These are descriptions, usually categorized as a design quality.

  • Availability
  • Manageability
  • Performance
  • Recoverability
  • Security

For example:

  • The patient PAC and EMR databases need to store up to 1000 patients’ records
  • The auditing system must be able to retain logs for at least 90 days
  • The backup storage system must be able to de-duplicate the data by at least 3x

Constraints

“Constraints limit design choices. If multiple options are not available to make a design decision, then it’s a constraint.”

VCAP6-DCV Design Objective 1.3 with Rebecca Fitzhugh

Pre-existing vendor relationship/contract seems like a common stated constraint in the vBrownBag series and other resources. The vendor lock-in eliminates many design possibilities and features that could be made with freedom to go with another vendor.

Personal Thought: To me, most constraints are rooted in cost. If there was an unlimited budget, then you could forgo most constraints. This is how I differentiate between a nonfunctional requirement (often a constraint-ish statement desiring a specific quality — AMPRS) and a true constraint — the cost origin. The nonfunctional requirement is usually a constraint on the design with a feature or quality requirement that has room for decisions, whereas a constraint is a fixed limitation due to some unacceptable cost to the business.

In my mind, that alludes to the largest constraint, project budget. Every project will have a budget and the design will be constrained by how much money can be spent on the solution.

Other constraints I’ve seen have to do with business policies that dictate very specific technical methods: must use VPN to access internal resources, encryption standard/key lengths.

This is where I think people talk themselves into fitting things into one category or another while applying their own assumptions about a statement without being able to ask questions. As a test-taking strategy, I think it is important to make sure you do not read into the question too much and get yourself confused.

  • I think, on the test, the statement “External access must be through the standard corporate VPN,” by itself, is a constraint. Take it for face value and do not add your assumptions or experience.
  • In an actual customer engagement, I would ask if this is a security policy, compliance issue, or another factor that may give insight as to why we are being constrained in this way. The business is making this decision for a reason and the reason may be an arbitrary interpretation of a policy, standard, etc. turning this into a requirement that we can design around. If the reason is they don’t have the resources to have a new solution or it is out of scope (for whatever reason), then it is a constraint.

Assumptions

Assumptions are statements about the environment or design that are necessary to continue through the design process but have not been validated to be true. Many things at the beginning of the design will be assumed to be a certain way but need to be validated as true or false before implementation.

We need to document these assumptions and who we talked to about that assumption. Continuous assessment during the design process, ideally, gets us an assumption-free implementation.

For example:

  • The organization has sufficient bandwidth and low enough latency between sites to support replication.
  • The server closet HVAC can cool the new infrastructure.

Risks

Anything that may prevent you from meeting business or application requirements is a risk. All risks should be mitigated and documented in your design.

As we work through the design process and identify requirements, constraints, and assumptions; inherent risks will be identified. Limited choices and unknown variables introduce risk. These could be not having the resources to implement a best practice or not having the information required to design properly.

The vBrownBag had a great discussion on this risk: the organization’s main datacenter contains only a single core router, which is a single point of failure. This was identified in discovery but poses a risk to the virtual infrastructure and should be mitigated. The best practice would be to add core networking components, but that may be out of scope/budget. Identify, document, and reengage with stakeholders to try to mitigate the risk. Maybe they have a support contract with their vendor to do a 1-for-1 swap in less than 48hrs. Is that an acceptable amount of downtime for their datacenter? Perhaps the DC is in a COLO and they have a support SLA with them to mitigate lost revenue during downtime.

Given a statement, determine whether it is a risk, requirement, constraint, or an assumption

I think this is a pure test-taking objective and if we are solid in our understanding of RCARs then we should be able to read a statement or drag and drop statements in the test.

Analyze the impact of VMware best practices to identified risks, constraints, and assumptions

We need to be familiar with the VMware best practices so we can identify what constraints, assumptions, and risk will have impact on our design and know what we can do to mitigate those risks. This is from the VMware Networking Best Practices concerning recommended service bandwidth and limiting broadcast domain traffic:

Constraint: The project scope and budget do not allow for the purchasing of new networking devices. In-place 1Gbps network must be incorporated into the design.

Risk: VMware services and VM traffic will be limited to 1Gbps physical network

Impact: Possible network resource contention

Risk mitigation: LACP/LAGs, greater port density on hosts, teaming/failover, Network I/O Control, etc

Conceptual Design Diagram

Some of the vBrownBags use a rainbow blob as their conceptual diagram but the one above is pulled from the VMware SDDC Architecture Overview Documentation. I have given similar diagrams and briefs to Directors and non-technical leaders. I would add blocks representing each requirement and be prepared to discuss constraints, assumptions, and risks with business leaders.

I found another example from Craig Kilborn at VMFOCUS

This conceptual design will have the requirements represented as well as documented with the constraints, assumptions, and risks. This can sometimes be in a spreadsheet.

Section 1 Conclusion

This finishes up Section 1 from the VMware VCAP-DCV Design Exam guide. Next, we will start with Section 2 and start mapping conceptual design components to logical design components.

Sprint to VCIX (Part 3) — Gather and Analyze Application Requirements

We are still working with the conceptual design but now looking at application requirements. What does the business want to accomplish with their applications, what requirements do they have for each application, and what dependencies exist that need to be accounted for in our design?

Section 1 – Create a vSphere 6.5 Conceptual Design

Objective 1.2 for the VCAP-DCV Design exam is to gather and analyze application requirements. vBrownBag hosted a 45-minute discussion on Objective 1.2 for the Design exam:

In the first five minutes, Mark Gabryjelski states the most important thing to keep in mind is that the design is not always about what is cool, but meeting business requirements.

Scope

During interviews and workshops, we need to define which applications are in scope for the vSphere design. The business may not want to virtualize certain parts of the business, they have a migration plan in place where some apps are going to the cloud, or applications are being replaced by a SaaS.

Application Functional Requirements

Identify the functional requirements the applications support — WHAT do the applications do for the business? Do they run back up software that supports their environment? What processes and tools do they use for collaboration? Are they a hybrid cloud shop and if so, what business connections and integration requirements exist?

HOW should business applications run?

“Of course, they should run [great, fast, responsive, securely, etc.].” Well, those are subjective assessments from a business leader. What they are describing is a user experience or function of the application. We need to drill down and identify specific metrics that will help us identify nonfunctional requirements for their application.

These specific metrics are going to be identified through a few techniques, most of which are similar to what we discussed in Objective 1.1. We will gain insights into the application requirements when we engage in interviews/workshops with the subject matter experts (SMEs) that own, operate, and administer the business’s specific applications. Existing documentation and the amount of detail done in the current state analysis will capture performance and data characteristics that will drive our design decisions.

Metrics gathered will inform our decisions when it comes to designing for Availability, Manageability, Performance, Recoverability, and Security (AMPRS). Keep in mind that I will be separating concepts and features into the design qualities, but they are all interdependent. We need to be sure to keep a holistic view when designing for applications.

Availability

In most of the resources I have read through and watched, availability requirements usually boil down to some type of uptime SLA. These could be business-critical applications like a revenue-generating service, internal applications, or application dependencies. The business will have an idea of what availability means to them and each application will be different based on its function. This is also going to be tied to cost. The business justification for running a fault-tolerant application with 99.999% (five-nines) must be rooted in a quantifiable loss to the business. This is usually stated as an SLA that they are obligated to meet for their external or internal customers. This consideration will be unique to each application as well. A business’s eCommerce web server availability requirement is different from its internal knowledgebase server.

AvailabilityDowntime/YearDowntime/MonthDowntime/Week
99%3.65 days7.2 hours1.68 hours
99.9%8.76 hours43.2 minutes10.1 minutes
99.99%52.6 minutes4.32 minutes1.01 minutes
99.999%4.26 minutes25.9 seconds6.05 seconds
99.9999%31.5 seconds2.59 seconds.605 seconds
Uptime Service Level Agreements (SLAs)

If SMEs do not have SLAs or cannot agree, I like Mark Gabryjelski’s recommendation to make your own SLA. The business will most likely refine but at least we were able to get the conversation started. Having business SLAs and business requirements will help us justify availability design decisions and what type of components may be put into the design.

As we have discussions and workshops to identify the level of availability per application, we should start to think of the tools vSphere provides to support those requirements. vMotion and Storage vMotion allows us to reduce planned downtime and enables transparent host maintenance. vSphere HA leverages multiple ESXi hosts configured as a cluster to provide rapid recovery from outages. vSphere Fault Tolerance provides continuous availability. vCenter High Availability (vCenter HA) protects not only against host and hardware failures but also against vCenter Server application failures. Predictive HA works with DRS to provide early detection and VM evacuation to a healthy host.

Availability considerations can also be impacted by hardware choices and single points of failure. We need to identify component, host, cluster, rack, and datacenter levels of availability to meet uptime SLAs. Current hardware configurations could be constraints or risks to the design.

vCenter

Manageability

Manageability requirements come in many forms and can impact our design. The business may need to manage all the virtual infrastructure in one place. vCenter is the core product for managing vSphere and controls datacenter, cluster, and host resources. Integrating with VMware Site Recovery Manager (SRM) centralizes part of their Business Continuity Plan (BCP).

Platform management is becoming more important with hybrid cloud deployments and VMware Validated Designs for SDDCs. If the business is needing unified management to provision, configure, and administer its multi-cloud platforms then the vRealize Suite may become part of the design.

Provisioning, configuration, and automation tools may be in place such as Ansible, Terraform, or SaltStack. How these integrate into vSphere will be part of the design.

Distributed and cloud-native applications bring unique management problems. If a business is currently using something like Kubernetes, the management of the virtual infrastructure, K8s management, and cluster provisioning can get blurred between business roles. Platform and resource consumption by application clusters should be considered. Not part of the 6.5 exam, but vSphere 7 brings Tanzu to solve some of these challenges for Kubernetes.

VMware SDDC

Performance

Like availability, most performance requirements will come from some type of metric. This time it will be linked to your datacenter resources in the form of a non-functional requirement for compute, storage, or networking performance. Customer-facing API latency needs to be less than 500ms, SharePoint needs to support 1000 users, storage will exceed 100,000 IOPS for short periods of time, and top-or-rack switches routinely pass 80+Gbps on uplinks are all performance requirements.

These requirements will be rooted in a business justification either providing a level of service to customers or defined by the performance monitoring and current state analysis we did before. Remember to consider the growth of performance needs in your design. A business may be serving one million web requests today but what will they look like in 5 years?

Some applications may have compute, storage, or network requirements that need special attention such as latency-sensitive applications. Take a look at Deploying Extremely Latency-Sensitive Applications in VMware vSphere.

Cluster design will directly impact the performance of the application. Is the application monolithic and need to scale up or is it distributed and has 1000 small VMs across many hosts? This will inform your cluster scaling strategy.

Business policies can also drive performance requirements. Perhaps management and development environments should not impact production performance. This may mean that we use Network/Storage IO Control and Resource Pools to guarantee resources to production workloads or those management/development workloads reside in different clusters altogether.

Different business units have different needs and depending on the organization you can use logical structures like vApps and Resource Pools to manage resources. We will be digging into the resource management guide in Sections 2 and 3.

Disaster Recovery Timeline

Recoverability

The disaster recovery timeline is a critical part of the business as is understanding their business continuity plan. Every business needs a plan to continue operations if their primary site has a disaster. Business goals and application requirements need to be quantified into the main segments in the DR timeline so we can design a proper vSphere environment. This Disaster Recovery 101 post has a great outline of the timeline above and its elements.

Failures happen and we need to plan for them. Based on 2019 revenue data from online store sales, Amazon.com would lose $4,480 of revenue every second their website was unavailable. That does not include 3rd party stores hosted on Amazon or subscription services. Availability and Recoverability are usually tied due to cost constraints. Every business has limited resources and 100% uptime is very expensive. Application availability and recovery should be driven by a business objective and sized appropriately.

RPO, RTO, WRT, and MTD will dictate your backup schedule, location, for which applications, and method of failover and fail-back. Identify third-party backup tools in place.

VMware Site Recovery Manager is the primary VMware product providing automated disaster recovery failover, planned migration and disaster avoidance, and seamless workflow automation with centralized recovery plans. The technical overview of SRM is a great way to get familiar with the technology.

vSphere Replication is another tool included in vSphere Essentials Plus and higher that provides flexible recovery options, ensures consistent application and virtual machine data, and integrates with the VMware product stack.

Security –

Security has a huge impact on the conceptual design of a vSphere design. Application security requirements range from business policies on OS configurations, security software installed, and ports and protocols running through subnets. We need to identify application-specific requirements that the business needs to accomplish its goals.

Remembering Confidentiality, Integrity, and Availability can help us engage with SMEs about securing their virtual environments; VM encryption may be a requirement, policies on network segmentation, and workload separation at the host or cluster level.

Backups are part of the recovery plan but also a way to mitigate some security risk in the case of ransomware or other data loss.

The business may follow standardized security frameworks or are responsible for meeting security compliance standards like PCI DSS, HIPAA, or DISA STIGs. All these requirements for applications, data flow, and resource placement will have heavy impacts on the design.

Hybrid and multi-cloud cloud deployments face even more difficult challenges as more data moves out of and between on-premises datacenters and cloud IaaS, PaaS, and SaaS providers.

Application Dependencies

What infrastructure services do business applications depend on; AD, DNS, DHCP, NTP? These should be part of the current state analysis and listing application dependencies as well as vSphere dependencies will lay a foundation for a successful design. If you introduce new services or components, make sure they are identified.

Some vendors and software require access to the internet for updates and management while some security policies and compliance demand applications are air-gapped. We need to make sure we are identifying those applications and develop a plan to address those requirements.

Clustered VMs and distributed applications are becoming more common as the drive for higher availability continues. Clustered platforms will have internal dependencies that we need to understand.

Some applications may have specific hardware they need to function; hardware tokens, PCI Passthrough devices, direct I/O connections, and GPUs are just a few.

vSphere Upgrades and Migrations

I’ve seen other blog posts reviewing the DCV6 and 6.5 exams stress that you need to know the upgrade paths between sphere versions and component interoperability. I would place this in the conceptual design portion of needing to know what versions of vSphere the business is on and if they need to make upgrades for any of the design quality reasons.

In our logical design, we will look at the specific vSphere version upgrades and component changes like vCenter and the PSC. In line with this, we should know what licensing the business has which will inform us of any constraints on the design and engage with the business about license upgrades for their design.

VMware Product Upgrade path and Interoperability of VMware Products

Design Impacts

Once all application requirements are identified we can assess the impact they will have on the design. Many requirements will have second and third-order effects that need to be addressed. Below are some major vendor application virtualization best practices to start identifying:

If you only have time for one then I recommend this whitepaper: Virtualizing Business-Critical Applications on vSphere

When engaging business leaders, having vSphere ROI and adoption trends discussions may be beneficial to helping them understand the business value of a VMware solution:

Use this whitepaper, Business and Financial Benefits of Virtualization, and the VMware TCO calculator to understand the financial benefits of virtualizing applications.

Prepare your Conceptual Design

In the next article we will take the business and application requirements from Objectives 1.1 and 1.2; identify risks, constraints, assumptions, and risks; and develop our conceptual design.