Mambo: Running Analytics on Enterprise Storage presented at ATC 2015

by Xing Lin, Gokul Soundararajan, Jingxin Feng,

Summary : Big data is defined broadly as large datasets with unstructured types of formats, and which cannot be processed by traditional database systems. Businesses have turned to big data analytical tools such as Apache Hadoop to help store and analyze these datasets. Apache Hadoop software is a framework that enables the distributed processing of large and varied datasets, across clusters of computers, by using programming models. Hadoop Distributed File System (HDFS) provides high throughput of application data. Hadoop provides integration to enhance specific workloads, storage efficiency, and data management.