Global Deduplication and Compression

(Massively) Reduce Storage Costs

Eliminate redundant data across all sites to drastically reduce the amount of storage you need.

Reduce Bandwidth Demands

Send only unique data blocks to the object store, minimizing network traffic while boosting speed.

Optimize Data Operations

Move the smallest amount of data over the shortest distance for immediate global file sync and data recovery.

File data typically contains a LOT of duplication. Users make endless copies so you end up storing similar or even identical files multiple times. Backup and disaster recovery processes make even more copies. That creates storage bloat, it's a win for your storage vendor, and a painfully expensive (and continually growing) storage bill for you.

The CloudFS hybrid cloud file platform takes a radically different approach. It's laser-focused on reducing the amount of data stored by squeezing out every last redundant byte of data, everywhere. Even at enterprise scale, that can be up to 80% of the total volume of data you're currently paying to store and back up.

Global Scope: Other deduplication solutions operate only within a single site or storage volume. That's better than nothing but it's far from as efficient as it could be. With CloudFS, deduplication is global. This means it identifies and eliminates duplicate data across all locations and in the central object storage.

Inline Deduplication: We like to say that only immediate is fast enough and that applies to everything we do with data. CloudFS performs deduplication inline — as data is being written and changed, it's compared to data that already exists. This is more efficient than post-process deduplication, which processes data after it's been written, as it prevents redundant data from ever being stored in the first place.

Block-Level Deduplication: CloudFS doesn't just look for duplicate files. It translates files into data blocks that are a tiny 128kb in size (that's as small as it gets) and creates metadata pointers that record which blocks make up the file in that moment. It then compares these individual blocks. If an identical block already exists anywhere within the global file system, it creates and stores a metadata pointer to that block. Instead of storing redundant megabytes, you're adding lightweight metadata.

Shared Metadata and Deduplication Reference Table: We do really smart things with metadata and this is the piece that means you squeeze out every last byte of data across your entire file system. Our deduplication reference table, which tracks unique data blocks and their locations, is embedded in the metadata that is instantly shared among all Panzura nodes. This means that every location has an up-to-date view of all unique data blocks stored globally.

Object Store as the Single Source of Truth: Panzura consolidates all data into a single, authoritative dataset in your chosen cloud object storage — AWS S3, Azure Blob Storage, Google Cloud Storage and others or on-premises object storage, for example Nutanix Objects, Cloudian and others. This object storage then acts as the central repository for the deduplicated data.

The deduplication data flow

Segmentation: When a file is created or modified, CloudFS breaks it into variable-length data blocks, 128kb in size.

Hashing and Comparison: Each data block is put through a hash algorithm to generate a unique fingerprint. That's compared against the global deduplication reference table.

Decision: If the block is unique, it's compressed, encrypted, and sent to the object store. The global deduplication reference table is updated with information about this new block. If it's a duplicate, CloudFS simply stores a metadata pointer to the existing block.

Global Updates: The deduplication reference table is instantly shared as part of the metadata across all Panzura nodes, so every location immediately benefits from data deduplicated by any other node in the network. That's why only unique data ever makes it to the object store.

Local Caching: Intelligent local caching at the edge provides local-feeling performance for users, even though the authoritative data resides in the object store. The cache is also aware of the global deduplication, further optimizing performance and reducing cloud egress costs by serving cached, deduplicated data locally.

Problem

1.5 PB of build files across 30 offices
Spending millions of dollars on enterprise NAS systems, mirroring, and backups

Download whitepaper →

Results

Consolidated file data down to 45 TB of storage (99% reduction)
Cloud economics enabled them to pay $4000 a month for all tiers of storage

Modernizes Storage Architecture

Allows organizations to significantly reduce storage costs by consolidating dispersed file data into a unified, deduplicated, compressed, and secured data set in the cloud or on premises.

Unlocks Organizational Productivity

Enables file data to look and feel local to users and processes everywhere. It uniquely empowers users to harness their collective skills by working collaboratively, regardless of location.

Delivers Seamless Cloud File Services

Turns public or private cloud storage into a high performance, immutable global file system that flawlessly delivers file data to people, processes, and AI, and makes it resilient to damage.

Reduces Operational Complexity

Lets IT teams turn their attention to innovation with less file data and infrastructure to manage and protect, fewer storage refreshes to plan for and less worry about recovering from file damage.

THE HYBRID CLOUD LEADER

OUR ECOSYSTEM

PLATFORMS

USE CASES

Industries

Panzura Resources

Customer and Partner Resources

ABOUT PANZURA

TECHNOLOGY DEEP DIVE

Global Deduplication & Compression

The deduplication data flow

Why Global Deduplication and Compression Matter

Leading Video Game Developer

Problem

Results

Accelerate digital transformation with a powerful hybrid cloud file platform that: