iConference 2021 workshop

Machine Learning and Artificial Intelligence for Science of Science and Computational Discovery: Principles, Applications, and Future Opportunities

Daniel E. Acuna¹, Tong Zeng², Han Zhuang¹, Lizhen Liang¹
¹School of Information Studies, Syracuse University, Syracuse, NY, USA
²School of Information Science, Nanjing University, China

Background

With the development of the Internet, scientific literature has been transformed into digital formats that are indexed, linked, and readily available. Together with other large scale datasets produced by the scientific process, they form the “big scholar data”. Recently, there has been an unprecedented release of these digital artifacts for researchers to pursue, including the PubMed Open Access full-text dataset, the Microsoft Academic Graph citation dataset, the Crossref metadata dataset, and the Federal Exporter funding dataset. These datasets offer tremendous opportunities to find relationships between various entities (e.g., funding agencies, institutions, researchers, citizens) and activities (e.g., grant applications, research workforce, publication).

To fully exploit these newly available scientific datasets, we need to use modern Machine Learning (ML) and Artificial Intelligence (AI) techniques to discover, predict, and unfold latent patterns and find and forecast future trends. ML/AI aims at developing algorithms that allow computers to learn from data without being pre-programmed. These techniques can be used for learning patterns in text, images, video, and audio. Thus, they are highly suitable for analyzing the large datasets that SciSci uses. They can also help scientists discover new ideas, predict future innovations, and validate results. Interestingly, the ML and AI techniques and applications have remained mostly unknown for a portion of researchers attending the iConference. This workshop aims to help bring awareness to ML and AI partially.

Purpose and Intended Audience

The Science of Science (SciSci) studies Science itself with the scientific method. It investigates various aspects of the scientific process using quantitative methods to understand the organization, mechanism, evolution, impact, and improvement of scientific activities. Many of SciSci research’s guiding ideas could be traced back to the 1930s, taking inspiration from other fields such as Meta-Science, Meta-Knowledge, and Bibliometrics. The distinctive feature of SciSci is its use of large, heterogeneous datasets about the doing of science, including large citation networks, full-text articles, mentorship networks, and success measures ( Fortunato et al., 2018; Acuna et al., 2012 ).

Similarly, advancements in computational techniques and datasets about science have allowed researchers to develop methods for Computational Discovery (CD): the partial automatization of processes traditionally done by scientists such as knowledge discovery, evaluation of ideas, and validation of results (Evans and Rzhetsky, 2010; more recently Thsitoyan et al., 2019).

This workshop aims to help bring awareness to ML and AI partially. It also aims to close this gap with a half-day workshop that will teach principles and techniques to a broad set of attendees. We will pay special attention to include historically under-represented disciplinary and demographic audiences. After this workshop finishes, attendees will have a good understanding of SciSci, and CD but will also grasp limitations and opportunities for future research.

The purpose of this workshop is to:

Introduce researchers to the Science of Science (SciSci) and Computational Discovery (CD) research communities
Demonstrate and help researchers interested in getting started with Machine Learning and Artificial Intelligence
Allow practitioners of SciSci and CD multiple opportunities to interact and network with the organizers, and peers.

Intended Audience:

Researchers from all research areas in critical information issues that affect contemporary society.
These researchers include Information Scientists, Network Scientists, Data Scientists, Computer Scientists, and Librarians.
Programming experience is preferred but not required.

Workshop Schedule

10:00 AM - 10:10 AM: Welcoming: goals, format, speakers, and schedule of the workshop

10:10 AM - 11:00 AM: Introduction to Science of Science: A broad overview of the scale and growth of science vs. scientists, biases, novelty, and problems in peer review, issues of false results, and non-reproducible science

11:00 AM - 11:50 AM: Introduction to ML and AI: Models, Ideas and Applications

11:50 AM - 12:00 AM: Short Break

12:00 AM - 13:00 AM: Application of Machine Learning in Science of Science: Four Use Cases

13:00 PM - 14:00 PM: Discussion, Flash Talk, and Conclusion

Date

Wednesday, March 17

References

Fortunato, S., Bergstrom, C. T., Börner, K., Evans, J. A., Helbing, D., Milojević, S., … & Vespignani, A. (2018). Science of science. Science , 359 (6379).
Evans, James, and Andrey Rzhetsky. “Machine science.” Science 329.5990 (2010): 399-400.
Tshitoyan, Vahe, et al. “Unsupervised word embeddings capture latent knowledge from materials science literature.” Nature 571.7763 (2019): 95-98.