ENow Exchange & Office 365 Solutions Engine Blog (ESE)

Alternative Architecture for Exchange On-Premises (Medium to Large Environments)

Posted by Andrew Higginbotham on Aug 8, 2017 6:00:00 AM

In my first article in this series, I discussed Alternative Architecture options for Small Businesses who choose to stay on-premises. My intent was to ensure that if a business chose to remain on-premises but did not wish to implement Microsoft’s Preferred Architecture for Exchange, they would at least deploy in a way that will reduce complexity and increase uptime of the solution. While the first article focused on options for small businesses, this article will begin to discuss common deployment options seen in medium to large environments. We’ll focus on popular storage technologies found in this space; RAID and advanced storage solutions (SAN/NAS/Hyper converged). As the type of architecture found in this space is so varied, we’ll focus more on sound design principles and best practices.

RAID

Types of RAID

The most common slight (but impactful) deviation from the Preferred Architecture I come across is customers who feel more comfortable deploying local RAID storage instead of JBOD (in practice, many single-disk RAID 0s acting in a JBOD fashion). They instead utilize RAID 1, 10, or RAID 5 for their storage. There marks our first mistake; using RAID 5. With certain types of RAID there are write penalties which affect the write performance of the storage. With RAID 1 and RAID 10 the write penalty is 2, meaning for every one write IO operation at the application level, 2 operations must occur at the RAID storage level. When using RAID, Microsoft recommends either RAID 1 or 10 for Exchange storage. Comparatively, RAID 5 has a write penalty of 4. Therefore a RAID 5 will require twice as many disk operations as RAID 1 or 10 for the same workload. Of course this depends on the workload being measured and whether it is read or write intensive. As this blog post indicates, RAID 5 and RAID 10 will give roughly the same performance if the workload is 100% reads. However, if the workload is 100% writes then RAID 5 will require almost twice the number of disks to achieve the same IOPS, as the RAID 5 write penalty of 4 is at its most impactful. Therefore, if using traditional RAID for Exchange (which is a more write-intensive workload) you should follow Microsoft’s guidance and utilize either RAID 1 or 10 for best performance.

Controller Caching

The caching settings of the controller are equally important. In this previous ENow post I detailed the importance of proper controller caching settings in a Preferred Architecture environment. Those recommendations still apply when using traditional RAID. Your RAID controller should have a dedicated cache (typically 512MB-2GB) and should be configured with the proper settings. In the “Exchange Storage Configuration Options” article from Microsoft, a 100% write cache setting is recommended for direct attached storage, while a 75% write/25% read configuration is preferred for SAN solutions. However, not all vendors use a percentage model for controller caching. For instance, Dell RAID controllers have either a write cache enabled (Write Back) setting, or a write cache disabled (Write Through) setting. Contact your vendor for their recommended settings. I know Dell recommends you also enable your read cache (Read Ahead) as well as the write cache for best Exchange server performance.

Stripe Size

A RAID Stripe Size of at least 256 KB is also recommended for optimal performance. This is simply because Exchange Background Database Maintenance churns in 256 KB chunks. I’ve found that in testing, some controllers will see slightly better performance if this is instead 512 KB. Ultimately this setting will have less impact than the caching settings or RAID type.

Getting each of these settings correct before you deploy is crucial, as changes to RAID type or Stripe Size will likely involve data loss via a rebuild of your RAID arrays. However, changing the caching settings of the controller are not destructive and should not require a reboot.

Advanced Storage Solutions

SAN vs NAS

Let us move on to advanced storage solutions (SAN, NAS, Hyper converged, etc.). A quick primer on the differences between Storage Area Networks (SAN) and Network Attached Storage (NAS). A SAN is where the storage is presented to the server, typically via iSCSI or Fibre Channel, as raw blocks. This allows the server to format the storage with a file system such as NTFS, ReFS, or ext4. Think of this as physically connecting a new disk to your machine, detecting it in Disk Management, and choosing to format it. However, instead of connecting it via a SAS connection, you’re using iSCSI or FC over a dedicated storage network. In contrast, NAS is where a remote server has storage that it has formatted and then chosen to share over the network to your server as file-level storage via SMB or NFS. In this context, “file-level storage” is simply storage which already has a file system which is being maintained by a remote server. You can liken them to mapping a network drive as a drive letter. One of the most common use cases for NAS solutions are virtual machine storage.

This differentiation is important for Exchange because the Exchange Product Team does not support Exchange on NAS solutions. An exception to this rule is if an Exchange virtual machine on Hyper-V 2012 or newer is stored on an SMBv3 share. As Microsoft owns the code for SMB they are comfortable supporting Exchange virtual machines on SMBv3, a protocol for which they have full developmental control over. This ensures that every hop in the IO chain is being handled by a Microsoft product that they feel comfortable supporting (Exchange/Windows/Hyper-V/SMB). My recommendation here is that while alternative NAS solutions may technically work, it is up to each organization to determine the risks of their design. If you will experience support delays because Microsoft does not support the solution Exchange is installed on, this delay is a risk you have to consider. It should be noted that Storage Spaces Direct, Microsoft’s hyper-converged solution, utilizes SMBv3 and is therefore supported by Microsoft for storing Exchange virtual machines.

Hyper-Converged

Speaking of hyper-converged solutions and Exchange, the supportability comes down to the file system in play. As previously stated, the Exchange Product Team has chosen to support SMBv3 storage for Exchange virtual machines. As Storage Spaces Direct utilizes SMBv3, it is a supported platform for virtualized Exchange servers. If a solution is using a different file system such as NFS in the storage chain then it is technically unsupported by Microsoft. However, in practice, Microsoft Support will only make this an issue if the system is experiencing performance or stability issues on the storage. If you’re having a mail flow or calendaring issue, there’s no reason why the underlying storage would ever matter.

Tiered Storage

Tiered storage found in SANs, Storage Spaces Direct, and similar solutions can also pose challenges. Tiered storage solutions provide the capability to create a single LUN made up of multiple types of storage media. Flash storage can be used for “hot blocks” of data while slower 7.2K NL SAS disks can serve as your high capacity tier; all within a single LUN presented to a server. While this technology is extremely useful for many workloads, it’s not recommended by the Exchange Product Team. They feel that at any given time, any block of storage within a mailbox database could be accessed, and if your databases are several hundred GB or even over a TB in size you won’t have enough tier 1 storage (SSD) to accommodate the database file. However, I’ve heard different feedback from storage vendors who have claimed Exchange has no performance issues on their storage solutions. Ultimately, a customer must understand that by using an advanced storage solution they’re making a decision on good faith that the performance and capacity claims offered by their storage vendor will hold true. If you choose this route, be sure to work closely with the storage vendor to perform Jetstress testing as there may be certain caveats to the process.

Storage Connectivity

Lastly, storage connectivity issues are extremely common and time consuming to troubleshoot. Ensure you work closely with the storage provider during sizing and deployments phases of the project. All vendor guidance should be followed and a detailed design document should be maintained for future reference. The following should be considered:

  • Logical Network Layout
  • Physical Network Layout
  • Driver/Firmware versions for Switches, Storage Controllers, and NIC/HBAs
  • Regular updating of Windows (Storport.sys, Msiscsi.sys, Mpio.sys, etc.)
  • Vendor guidance for Jumbo Frames, RSS, MPIO, etc.

Most storage connectivity support issues I’ve worked were resolved by getting a customer aligned to best practices in each of these areas. Therefore a sound design and ongoing maintenance is critical for a properly performing storage network. While SAN solutions typically have dedicated network fabric, NAS solutions typically leverage the same network as user traffic. This can reduce complexity but also introduce performance issues without the proper planning. Ensure the vendor you’re working with has answers for each of these challenges and offers guidance for performance and scalability of the solution.

In the next article in this series we’ll discuss Exchange Virtualization and architecture mistakes to avoid.

Topics: Exchange, Exchange 2016

Gain visibility into your Office 365 Deployment

See why monitoring makes sense in a cloudy world.