Scalable Data Management
My main research focus is at the moment on the design of distributed and parallel data management systems for the next generation data center and enterprise cluster hardware. For example, high-speed RDMA-capable networks such as InfiniBand FDR/EDR used to be a very expensive technology that was only deployed in high-performance computing (HPC) clusters. However, InfiniBand has recently become cost-competitive with Ethernet and is becoming an interesting alternative for future data centers and enterprise clusters. Our initial results of building an InfiniBand-optimized system called I-Store show that this trend towards high-speed networks enables a new bread of distributed data management systems which lead to major performance gains compared to existing systems for analytical but also transactional workloads.
I have also been working on research problems that arise in large-scale data management scenarios in the cloud. My research group and I have therefore been developing an open-source distributed data management system called XDB. XDB is designed to analyze data in parallel on cloud deployments composed of commodity hardware and slow networks. One of the major contributions of XDB is a locality-aware and elastic partitioning scheme that minimizes the communication costs for data-intensive analytical workloads resulting in a significant speed-up compared to existing partitioning schemes. We have received a best paper award at the IEEE Big Data 2014 conference and a best demo award at SIGMOD 2014 conference for our research results in this area.
Interactive Data Exploration
Another line of the lab is about data management techniques to better support human-in-the-loop data exploration workloads over large amounts of data.
To that end, we are collaboration with the data management labs of Brown and MIT on a system in called Vizdom that allows users to visually compose and execute complex analytical workflows on interactive whiteboards (e.g., a Microsoft Surface device). The backend of the system extends state of the art approximate query processing techniques to leverage perceptual effects in order to run the computations at interactive speeds. This work received a best demo award at the VLDB 2015 conference.
Moreover, recent advances in deep learning have led to a new generation of natural language interfaces. However, understanding natural language questions and translating them accurately to SQL is a challenging task, and thus Natural Language Interfaces for Databases (NLIDBs) have not yet made their way into practical tools and commercial products. Therefore, we are working on a novel data exploration tool with a robust natural language interface called DBPal. DBPal leverages recent advances in deep neural network models to make query understanding more robust.
Other Research Projects
Other research projects are in the area of data integration as well as in the area of benchmarking and testing complement the profile.