Predicting the longevity of resources shared in scientific publications

Acuna, Daniel E.1,* , Jian Jian2 , Zeng, Tong3 , Lizhen Liang4 , Han Zhuang4

1The University of Colorado Boulder
2NetApp, Inc.
3Virginia Tech
4Syracuse University


Research has shown that most resources shared in articles (e.g., URLs to code or data) are not kept up to date and mostly disappear from the web after some years (Zeng et al., 2019). Little is known about the factors that differentiate and predict the longevity of these resources. This article explores a range of explanatory features related to the publication venue, authors, references, and where the resource is shared. We analyze an extensive repository of publications and, through web archival services, reconstruct how they looked at different time points. We discover that the most important factors are related to where and how the resource is shared, while surprisingly little consideration is given to the author’s reputation or prestige of the journal. By examining the places where long-lasting resources are shared, we suggest that it is critical to educate researchers on modern sharing technologies. Finally, we discuss implications for reproducibility and acknowledge scientific datasets as first-class citizens of science.

Full Text

Please click here to read the full paper.

Code and Data

Please vist our Github repo for the code and data.