Large-scale Data Management

My main research focus is at the moment on the design of distributed and parallel data management systems for the next generation data center and enterprise cluster hardware. For example, high-speed RDMA-capable networks such as InfiniBand FDR/EDR used to be a very expensive technology that was only deployed in high-performance computing (HPC) clusters. However, InfiniBand has recently become cost-competitive with Ethernet and is becoming an interesting alternative for future data centers and enterprise clusters. Our initial results of building an InfiniBand-optimized system called I-Store show that this trend towards high-speed networks enables a new bread of distributed data management systems which lead to major performance gains compared to existing systems for analytical but also transactional workloads.

I have also been working on research problems that arise in large-scale data management scenarios in the cloud. My research group and I have therefore been developing an open-source distributed data management system called XDB. XDB is designed to analyze data in parallel on cloud deployments composed of commodity hardware and slow networks. One of the major contributions of XDB is a locality-aware and elastic partitioning scheme that minimizes the communication costs for data-intensive analytical workloads resulting in a significant speed-up compared to existing partitioning schemes. We have received a best paper award at the IEEE Big Data 2014 conference and a best demo award at SIGMOD 2014 conference for our research results in this area.

Human-in-the-loop Data Management

Another line of research at Brown is about data management techniques to better support human-in-the-loop data exploration and machine learning (ML) workloads over large amounts of data.

To that end, we have started to build a system called Vizdom that allows users to visually compose and execute complex analytical workflows on interactive whiteboards (e.g., a Microsoft Surface device). The backend of our system extends state of the art approximate query processing techniques to leverage perceptual effects in order to run the computations at interactive speeds. This work received a best demo award at the VLDB 2015 conference.

Moreover, recent advances in automatic speech recognition and natural language processing have led to a new generation of robust voice-based interfaces. Yet, there is very little work on using voice-based interfaces to query database systems. In fact, one might even wonder who in her right mind would want to query a database system using voice commands!  With this project, we make the case for querying database systems using a voice-based interface, a new querying and interaction paradigm we call Query-by-Voice (QbV ). Moreover, we want to demonstrate the practicality and utility of QbV for relational DBMSs using a using a proof-of-concept system called EchoQuery.

Other Research Projects

Other research projects are in the area of data integration as well as in the area of benchmarking and testing complement my profile.

Data Integration:

  • RODI/IncMap: Relational-to-Ontology Data Integration
  • U2: Query Processing over Unknown Unknowns

Benchmarking and Testing:

  • Benchmarking Cloud Databases
  • Test Data Generation