How and Why We Took a New Approach to File Synchronization

Rsync turned 20 last month. It has been the backbone of file synchronization for most of that time and is as relevant today as it was when it was first published. Rsync synchronizes files between two locations in a three step process:

  • First it breaks both the source and destination copies of a file into blocks and computes checksums on those blocks.
  • Then it compares the checksums on blocks that are supposed to be the same.
  • Finally, for any blocks that do not have matching checksums between the source and destination file, rsync will update the destination file block.

While all of that seems like a lot of work, let’s see why this is so valuable.

Over a network, rsync uses local system CPU and IO on both endpoints to calculate the checksums. Those checksums are 20 bytes and they typically cover 128KB blocks of a file. So when it sends checksum data across the network, a typical 5MB file turns into 780 bytes of checksum data. Once that’s complete, it sends any differential blocks (rather than the entire 5MB file). Rsync provides a whopping 6000:1 network savings at the cost of 2x the IO (10MB) and 2x the CPU! Until more than half of the file has changed, rsync is ahead of the game. No wonder it has been the backbone of network file synchronization.

At Panzura, we’ve taken that efficient transfer model of rsync and married it with the existing notoriously strong checksum system of ZFS in SMART Sync. In ZFS, each file block has a checksum already calculated and stored in the file system. Unlike rsync, where the checksums must be calculated for each transfer,SMART  Sync uses the ZFS checksums without ever reading the data directly. This has a number of benefits:

  • SMART Sync reads 32 checksum bytes from disk for every 128 KB of data on the source and destination, saving 4000:1 on the IO;
  • SMART Sync doesn’t use the CPU for calculating the checksum, saving 100% on CPU;
  • Due to the 4000:1 IO and CPU reduction,SMART Sync reduces the local latency by 4000:1 making synchronization blazingly fast; and
  • For a cloud based storage system where data is kept in the cloud,SMART Sync allows Panzura to skip the cloud data read that would have been required to calculate the checksums, saving significant WAN bandwidth.

The result is all of the value of rsync without the CPU or IO overhead.SMART Sync can synchronize a 100MB file in under a second on a 10 mbps WAN link. Even better, it can sync that same file across 20 sites in under a second when 2% of the data has changed. In our customer’s typical file systems, an rsync type synchronization solution, which has 20 years of proven value, would regularly pass the 1 minute barrier for files over 50 MBs. Single small files at 1 MB would work fine. Small clusters of small files worked fine. But when it comes to scaling the file size or number of files for Enterprise level performance, theSMART Sync marriage of rsync’s principles to ZFS’s checksums is a scalable all over solution to global file synchronization.

In a future post, I’ll take a closer look at how a multi-site topology changes the equation and why a low latency sync approach matters.