Skip to main content

Talk Title: Evaluating the Impact of Entity Resolution in Social Network Metrics

Watch Abby’s Research Lightning Talk

Talk Abstract: Modern databases are filled with potential duplicate entries—caused by misspellings, change in address, or differences in abbreviations. The probabilistic disambiguation of entries is often referred to as entity resolution.

Entity resolution of individuals (nodes) in relational datasets is often viewed as a pre-processing step in network analysis. Studies in bibliometrics have indicated that entity resolution changes network properties in citation networks, but little research has used real-world social networks that vary in size and type. We present a novel perspective on entity resolution in networks—where we propagate error from the entity resolution process into downstream network inferences. We also seek to understand how match thresholds in unsupervised entity resolution affect both global and local network properties, such as the degree distribution, centrality, transitivity, and motifs such as stars and triangles. We propose a calibration of these network metrics given measures of entity resolution quality, such as node “splitting” and “lumping” errors.

We use a respondent driven sample of people who use drugs (PWUD) in Appalachia and a longitudinal network study of Chicago-based young men who have sex with men (YMSM) to demonstrate the implications this has for social and public health policy.

Bio: Abby Smith is a Ph.D. Candidate in Statistics at Northwestern University. Her work centers around evaluating the impact of entity resolution error in social network inferences. She is particularly interested in collaborative research and data science for social good applications, and most recently served as a Solve for Good consultant at the mHealth nonprofit Medic Mobile. Abby is passionate about building community for women in statistics and data science in Chicago, and serves as a WiDS Ambassador and R-Ladies: Chicago board member. She holds a Masters in Statistical Practice and a B.S. in Mathematics, both from Carnegie Mellon.