Research Library

Alluxio In-memory File System – Industry Snapshot

Published October 27th, 2016. John Webster, Sr. Analyst, gives a brief summary of Alluxio In-Memory File System in this free Industry Snapshot Report.

Alluxio Delivers Community and Enterprise Editions of the Alluxio In-memory FS

The advance of Big Data analytics and the need for real time results in application environments such as IoT is driving the need for new approaches to storage. An example of this trend can be seen in the Alluxio In-memory File System. Alluxio has available as a free open source download. Now it is also available in more robust versions are available at cost from Alluxio. These include a Community Edition that adds a management interface and an Enterprise Edition that offers Kerberos security authentication, data replication and support.

Background

In 2012, researchers at the UC Berkeley APM Lab open-sourced a memory-centric, fault-tolerant virtual distributed storage system called Tachyon. Because of its memory-centric design, it found early acceptance when coupled with the Big Data analytics platforms built to deliver real-time or near real-time results such as Apache Spark and Storm. The project was later renamed Alluxio.

With Alluxio, working data sets are loaded into Alluxio’s in memory file system where they can be accessed simultaneously and at memory speed by multiple applications. Alluxio’s tiered storage framework pre-loads data into distributed cluster memory from a unified, persistent storage layer. Accessed through an API, the long term persistent storage layer can include both local (SSD, disk array, etc.) and distributed file stores including the Hadoop Distributed Files System (HDFS), Amazon S3, and Swift object stores. The pre-loading process can occur automatically or be done manually. When automated, the user defines policies for data allocation and eviction.

Evaluator Group Comments

We believe that the following combination of attributes makes Alluxio unique:

  • Distributed node-based memory rather than solid state disk or rotating disk forms the short-term persistent storage layer for computation.
  • Re-computation of log data (referred to as lineage) is used to provide fault tolerance as opposed to creating three distributed copies of data on ingest as is typical with distributed file systems. The result is an ability to rebuild data sets from a point in time before a failure occurred as well as increase overall file system performance.
  • A distributed, in memory file system manages memory-speed data access, facilitates data sharing and tracks data locality between jobs.
  • Long term persistent storage resources can include Amazon S3, traditional storage arrays (Dell/EMC, NetApp, etc.) OpenStack Swift, and Red Hat GlusterFS. As such, Alluxio can serve as a unifying layer for any number of varied data sources.

Register or login to download this Free document

Forgot your password? Reset it here.