3 min read

High Availability When You Need It

Panzura Feb 26, 2024

Technology

Table of Contents

These three principles of High Availability (HA) describe its value and purpose in any network.

Elimination of single points of failure. Adding or building redundancy into the system so that failure of a component does not mean failure of the entire system.
Reliable failover. Ensuring the crossover point is not itself a single point of failure.
Detection of failures as they occur. If the first two principles are observed, the user may never see a failure; However, monitoring must be able to show issues.

High availability is designed to be a failsafe for infrastructure failures and cloud operational problems, and depending on your situation, it may be an important feature for your organization’s unfettered file access. But not all cloud file systems offer HA, claiming instead that node rebuilds can be carried out in 20 to 60 minutes. We say, what a hassle, and how insufficient for today’s pace of business—Especially if it’s your job to look after data resilience, security, and speedy access. How are you going to feel having to report that data has been lost or will be inaccessible for hours to days because a server went down?

In contrast, consider the following HA capabilities for local and global uptime.

How Panzura CloudFS HA Works

With Panzura, for every 25 terabytes of global read/write archive, you get 5 node (filer) licenses with a choice of how each is configured: (1) active read/write, (2) local site standby, or (3) global standby.

Active Read/Write

Primary nodes allow storing, editing, and sharing data no matter where it is physically stored. Robust security, including encryption, access controls, and authentication protect sensitive data from unauthorized access. And versioning allows users to track changes and recover earlier versions of files.

Local HA

Located in the same data center, campus, or region as a primary node, a CloudFS local HA node ensures automatic failover in case the primary node fails. No human intervention or traffic rerouting is needed for the failover. The local HA enables this seamless switch because it maintains all activity: all files, metadata, and file locks.

You might want a local HA node for a busy site with hundreds or thousands of users accessing files from the active primary.

If the local server goes down, the CloudFS node will have retained its network profile, the site’s cached data, and the file locks. When the server comes back up, the node will also recognize that it’s no longer the primary. It can begin functioning as the new HA standby, or it can failback to its original role. In other words, failover and failback between a local primary and an HA standby is automatic, seamless, and fast.

However, with Panzura CloudFS, there’s one more trick up your sleeve: The DFS-Namespace. DFS-N is a service role in Windows Server that lets you group shared folders on multiple servers into one or more logically structured namespaces. This gives users a virtual view and access to shared folders on multiple servers.

This means that if New York and Austin share a namespace, and the New York cloud or Panzura node goes down, New York users can still see and open their files whether this site has an HA node or not.

Global HA

A global HA node supports any and all nodes and can be located anywhere in the network. Until called into service, a global standby node holds all the network’s metadata—and only the metadata. This design prepares it to begin support for the entire network if necessary. If all other nodes fail, a global HA can continue to serve all file access needs.

To further explain the network’s operations, let’s say one or more local nodes fail, and you determine a failover is needed. A network administrator performs a 2-step process:

Adjusts a setting so that the global HA node assumes the affected nodes’ full image(s) and identity(ies). All the file locks then migrate, and the global HA takes ownership of all the files that the down machine(s) held. Users will be able to connect and start working.
Changes the DNS network setting so traffic flows to the global standby.

Cloud File System Performance, User Experience, and Your Job

The essential thing about HA is that business continuity is preserved. Even if your network has 100 nodes, and 99 of them fail, with a global HA, all users will still be able to access their data. This is possible because of access timing and the network’s speed.

Remember when the changeover first occurs, the global HA node only has the metadata, not any cached data. However, because of the file system’s unique architecture and superior performance, this is not a problem.

Let’s say a site serving 800 users each has 10 files that they’re currently working on. However, not all 800 users will be accessing all their files at the same time. In fact, it may be that only 100 users need access to only 4 or 5 of their files at a time, so only 500 files will need to be served simultaneously, not 8,000. After a failover to a global HA, the user experience will be something like this:

A user clicks on one of their open files; it appears they’ve lost connection. They double-click to open it. Because of the speed of the CloudFS network, it will take just a few seconds for that file to be downloaded and presented.

How much better this experience than having to report either that data has been lost, or that accessing important files may take days.