A Better Way to Manage Machine Generated Data

A Better Way to Manage Machine Generated Data

Managing rapidly growing volumes of data is hardly a new problem. It has been a challenge for IT for as long as there has been IT. And yet we have kept dealing with the problem in the same way – by making storage bigger. First with bigger drives, then with denser arrays, and finally with scale-out clusters. That may have been the right solution in the past, but we can no longer afford to deal with the problem in that way. We don’t need something bigger. What we need is something better.

The way we manage storage growth has to change, and that change is being driven by two factors. First is the rate at which data is growing. Data growth has finally reached the point where managing it using traditional on-premises storage is no longer practical. The datasets are too big and they are growing too fast. The second factor is how different types of data need to be managed. It is that second factor that I will explore in this blog.

Data Growth is Driving Workloads to the Cloud

As data volumes have grown unmanageable for on-premises storage, user workloads have migrated to the cloud. We can see this everywhere in the adoption of cloud-based resources such as Google Docs, Office 365, Gmail, and similar applications.

These are the traditional user-generated workloads, such as documents, graphics, etc., that utilize the SMB protocol to transfer data. Workloads like these have historically generated the massive volumes of data that IT has to manage and these have been the first to move to the cloud.

The Face of Data Growth is Changing

There are increasing volumes of machine-generated data that also have to be managed. These workloads use the NFS protocol, instead of SMB, and include content such as log files, IoT data, Splunk data, and more. Common estimates are that machine-generated data is growing at the rate of 50X that of traditional business generated data.

The Exponential Growth of Data

Yet for all the rapid growth of machine-generated data, this particular data type has not yet made the jump to the cloud in the same way that SMB data has. Why is that?

The answer is simple. The focus on data movement to the cloud has been on user-generated data which typically uses the SMB protocol. The applications that utilize NFS for machine-generated data, such as Hadoop or Splunk, can quickly consume terabytes or even petabytes of storage. They need to ingest that data as rapidly as possible to perform real-time analytics on those large datasets. To get the local performance that they require has typically meant caching on flash storage for performance, backed by some form of local NAS for capacity.

Reading the massive volumes of data that these applications require from the cloud is simply not an option. The latency inherent to cloud reads is simply too high. So when this data does go to the cloud, it is more for long-term archive than for active data.

The challenge now is that enterprises are inundated with NFS data that they need to act on. Enterprises are generating enormous datasets that they need to store, access, and perform analysis on to extract actionable information. Continuing to store these massive machine generated datasets using the traditional, on-premises storage model is simply not practical. The data is growing too rapidly, making the costs of storing, managing, and backing up that data too expensive.

Panzura Freedom Has Made NFS Performance a Priority

Panzura Freedom NAS is the first hybrid cloud NAS solution that has been specifically engineered to deliver exceptional performance for both SMB and NFS workloads in the enterprise.

As the leader in NFS performance, Freedom NAS was the first, and to date only, hybrid cloud NAS solution to design in an NVMe Separate Intent Log (SLOG) device. A SLOG is similar in concept to a write cache for NFS data (and it certainly performs that function), but it actually does more than that. It also enhances data integrity, making it both fast and efficient. By taking advantage of the latest technology, such as NVMe, Freedom NAS can deliver the performance enterprises need for their growing volume of machine-generated data.

The benefits to NFS performance in Freedom NAS are not limited to hardware. NFS performance has been maximized in virtual instances as well. The latest version of CloudFS, the file system that underpins Panzura Freedom NAS, has been highly optimized for NFS workloads. This was a primary goal of the current release of CloudFS, to deliver unmatched performance for both NFS and SMB workloads in both physical and virtual environments.

Only Freedom NAS can deliver this combination of cutting-edge hardware acceleration and advanced software to deliver exceptional performance for both NFS and SMB workloads.

The result is that Freedom NAS can deliver maximum performance across the network. Each Freedom NAS filer can fully saturate 20Gbps of network bandwidth. To be clear, unlike other solutions, this does not mean that multiple Freedom instances can aggregate to 20Gbps or that it is a maximum burst number you might see on your network once. An individual Freedom NAS instance can fully saturate a 20Gbps connection and sustain that level of performance.

Summary

Applications that consume and process the vast amounts of machine-generated data being created need the performance of local storage and they need to access that data using the NFS protocol. Panzura uses intelligent caching, next-generation hardware, and advanced software to deliver LAN-speed performance while leveraging the scalability and durability benefits of the cloud. The data these applications need is both available locally for fast access and securely stored in the cloud as a single source of truth.

1t is now possible for large distributed enterprises to store the vast amounts of IoT data, machine logs, 3D medical images, 4k video and other machine-generated data in cloud, while still achieving the extreme local performance that applications such as Splunk and Hadoop demand.

Leave a Reply