Improving Data Integration Techniques
As the amount and diversity of available datasets rapidly increases, researchers often harness multiple data sources to answer important questions. The opportunities to combine information from these varied sources—voting records, campaign contributions, IRS tax records, student test scores, court records and electronic medical record, etc.—are limitless and a vital component of any cutting-edge empirical research.
Our goal is to enable researchers to integrate information from multiple data sets when no unique identifier that links records across datasets is available. While it is widely understood that this problem is inherently uncertain, computationally difficult, and one where preserving privacy for matched records could be incredibly important, our team’s interdisciplinary nature and related experience will allow the proposal of more principled solutions.
Our work is based on the following pillars: accuracy, computational efficiency, privacy protection, and algorithmic fairness.
Faculty Leads: Kunal Agrawal, Ted Enamorado, and Soumendra Lahiri