Imagine a day when you open your favorite social-networking page and find that some cherished photographs have irrevocably vanished! Or you are looking for data to make a time-critical decision concerning an urgent flight reservation or an immediate stock purchase, only to be told that the data is ``temporarily unavailable’’! Feels nightmarish? It is possible of course! But, Professor Vijay Kumar and his Codes and Signal Design Lab at IISc, Bangalore, are working on ways to ensure that this never happens.
Any communication system (telephone, computer connected to the internet, mobile phone) is prone to errors during the transmission, storage and retrieval of data. Most of these devices store data as binary symbols represented by ones and zeros. These symbols are then converted into electrical signals and sent over the “wire”. Any error in the electrical signals or the binary symbols could result in the transmission, storage or reception of erroneous data.
Large amounts of data are usually distributed and stored in “storage nodes”. A storage node is a physical machine with one or more storage disks. Storage nodes make data easily accessible and retrievable. When any of these nodes fail due to a fault, or else are unavailable for data retrieval either because of routine maintenance, or else are simply busy serving rival demands for other data stored on the same node, the redundant information present in the other nodes can be used to either reconstruct or retrieve the lost or unavailable data.
Correcting for errors and erasures is a daunting task. Coding theory is a branch of study that focuses on mechanisms that are designed to accomplish precisely this objective. Specific “error-correcting codes (ECC)” or signals are used to represent data so as to make them resistant to errors and erasures in transmission or storage. The key is to carefully craft redundant data, that along with the message data, is stored (or transmitted). Reed-Solomon, Low-Density Parity-Check, Turbo, Bose–Chaudhuri–Hocquenghem, Hamming, Golay and Reed–Muller codes are some of the famous ECCs in use today.
How does this work when dealing with “Big Data” – data on the scale of Petabytes (1012 bytes)? Across how many nodes should data be distributed? How thick should the bandpipes for communication between nodes in the data centre be? How does one handle the inevitable failure of a storage node ? “The sheer volume of data that needs to be reliably stored today, calls for streamlining every aspect of the process of recovery of data that is lost or unavailable”, says Prof. Vijay Kumar. A new class of error-correcting codes called “regenerating codes” has been designed to address the problems posed by storing Big Data.
“Regenerating codes are a recently invented class of erasure-recovery codes, that minimize the amount of data transfer from the remaining code symbols needed to ensure recovery of the erased symbol,” says Prof. Vijay Kumar. This class of codes use far fewer redundant code symbols to regenerate lost data and offer a trade-off between the amount of data stored and the communication cost to regenerate a data fragment. Thus, the codes provide considerable flexibility in designing a distributed storage system.
During this process of regeneration, “repair bandwidth” (the volume of data downloaded) and “repair degree” (number of storage nodes accessed during repair) are important parameters. The best repair mechanism would need both low repair bandwidth and repair degree. Conventional regenerating codes focus on the repair bandwidth and help in lowering it by introducing ideal redundancy. Yet, to lower the repair degree, a second set of codes become necessary. Can a single set of codes do both? The answer is now a “yes”, thanks to Professor Vijay Kumar and his team.
The team at the Codes and Signal Design Lab has been working on designing a new set of codes called “codes with local regeneration”. In this mechanism, the lost data is regenerated by a combination of survival data fragments belonging to a small subset of `local’ nodes. Further, the amount of data downloaded is kept to a minimum. Thus, these codes have both low repair degree and small repair bandwidth.
The ever-increasing demand for data poses fresh challenges to the data storage industry. “The new classes of error-correcting codes that have arisen to meet this challenge include regenerating codes, locally repairable codes as well as variations and combinations of these two classes. These recent developments represent an important new direction for the rich field of error-correcting codes,” concludes Prof. Vijay Kumar. The team is on a mission to ensure that your precious data is safe and available when you need it! Here is wishing them the best.
Professor Vijay Kumar can be contacted at firstname.lastname@example.org