A leading video game developer, had 1.5 PB of build files stored variously across 30 offices. Using Panzura, they were able to consolidate that down to 45 TB in the cloud, over a 99% reduction in the total storage footprint. They also were able to take advantage of cloud economics, so they now pay about $4,000 a month for all tiers of storage, rather than spending millions of dollars on enterprise NAS systems, mirroring and backups.
Three technologies in the Panzura Cloud File System (PCFS) make this possible: Global deduplication, compression, and active data caching.
Other storage systems deduplicate data locally, but can’t manage data deduplication across sites. Using patented technology, the PCFS deduplicates data before it is ever stored. Each unique block from a file is stored once, so only one unique copy of a file is preserved by the file system. It’s the only system that can efficiently deduplicate data across a global enterprise. Here’s how our deduplication works:
- It’s inline. Data is deduplicated inline when files are created or changed. If a file block already exists, metadata references are created so the data doesn’t need to be written.
- It’s global. Deduplication data is stored with the metadata, so each controller has a full record of the deduplication tables. Even if a file block isn’t stored on a local Panzura Freedom Family instance, the deduplication engine uses it to make the entire system more efficient.
- It gets updates instantly through metadata from other controllers and local write activity.
- It scales to petabytes of data. Our patented technology minimizes lookups for unique data, making it efficient even for large volumes of data.
Each Freedom Filer appliance in the network benefits from data seen by all other Freedom appliances, ensuring even greater capacity reduction, guaranteeing all data in the cloud is unique, and driving down cloud storage and network capacity (and cost) consumed by the enterprise.
Deduplication works by taking advantage of redundancy across files and creating metadata cross references. The PCFS also compresses file data as it’s created, taking advantage of redundancy within files and making the individual file smaller. Here’s how it works:
- The PCFS uses a lossless compression algorithm.
- As each file is created, it’s broken into blocks.
- Each block is then compressed in-line, in memory as it’s created.
- Blocks will compress differently based on its content.
Caching Hot Data Locally
We’ve talked about metadata and why global metadata is always cached locally in flash. What about the file data? To make sure users have a fast file access experience, we use several techniques:
- Cache hot data on each Freedom Filer based on read and write frequency
- Provide policy-based caching based on file types, folders, and other criteria
- Optimize metadata to get maximum utilization of the Freedom appliance
Bonus: Built-In WAN Acceleration
All of this makes your WAN much more efficient in several ways:
- Global deduplication and compression mean less data is sent over your network in the first place. Only new, unique file data is sent to the cloud when it’s created.
- Only active data is cached locally. That means your network isn’t cluttered with file synchronization for files that aren’t accessed.
- Global file locking means file operations always happen locally. Application data doesn’t need to cross the WAN each time a user opens, saves, or closes a file.
- Latency is no longer a problem since file operations are local.
As a result, you can reduce or completely eliminate expensive WAN optimization appliances. You can also eliminate costly MPLS or other private networks between offices, and use regular internet connections instead.
All of these capabilities are critical to maximizing the performance and efficiency of a distributed cloud file system. To learn more about how our deduplication, compression, and WAN acceleration work, read our white paper about the distributed cloud file system architecture.
NEXT: Read more about Global Metadata.