Large-scale author name disambiguation using approximate network structures

Zeng, Tong1,2 , Acuna, Daniel E.2,*

1School of Information Science, Nanjing University, China
2School of Information Studies, Syracuse University, Syracuse, NY, USA


Properly identifying the author of a scientific article is an important task for giving credit, tracking progress, and identifying ideas’ lineages. Usually, publications and citations do not provide unique identifiers to authors but only the raw string character representation of their name and affiliation. The fundamental problem is that an author might change the string representations due to changing in name spelling (e.g., removing accents), journal limitations (e.g., only allow first letter of first name), or simply two people having the same name. Several researchers have proposed methods to solve this problem, but most methods do not scale well and are not open to the community. In this work, we develop a scalable method that we make publicly available to disambiguate large-scale publications

