binary data on a computer screen

Improving Data Integration Techniques

A Programmatic Grant

Helping Researchers Harness Multiple Data Sources 

As the amount and diversity of available datasets rapidly increase, researchers often harness multiple data sources to answer important questions. The opportunities to combine information from multiple sources—voting records, campaign contributions, IRS tax records, student test scores, court records and electronic medical record, etc.—are limitless and a vital component of any cutting-edge empirical research. 

The goal of our work is to enable researchers to integrate the information from multiple data sets when no unique identifier that links records across datasets is available. While it is widely understood that this problem is inherently uncertain, computationally difficult, and one where preserving privacy for matched records could be incredibly important, our team’s interdisciplinary nature and related experience will allow the proposal of more principled solutions.

We will base our work on four pillars: 

  1. Accuracy
  2. Computational efficiency
  3. Privacy protection
  4. Algorithmic fairness 

To promote the importance of data integration among the different academic and research communities at Washington University, our team will also host an interdisciplinary speaker series, where experts in the field will share their views and work on this topic. 

Faculty Leads