Skip to main content

Organized by the University of Chicago’s Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship Program.

4:30pm – 5:15pm: Presentation
5:15pm – 5:30pm: Q&A
5:30pm – 6:00pm: Reception

Abstract: Many common AI algorithms rely on distance calculations. For example, Gaussian process regression makes predictions by weighting nearby points, k-nearest neighbors performs classification based on the known labels of neighboring points, and uniform manifold approximation and projection performs dimensionality reduction where the relative distances are attempted to be preserved in the lower dimensionality space. In the context of chemistry, distance can be directly mapped to chemical similarity. Thus, knowledge of chemical similarity may be used to find other molecules with similar properties or similar synthetic routes. In the case of small molecules, the chemical structure is typically well-defined enabling a variety of established methods for computing similarity. In contrast, synthetic polymers are typically stochastic in nature where each polymer is characterized by an ensemble with distributions across molecular mass, topology and sequence. Furthermore, these ensembles are not always known. Here, we consider two different approaches to chemical similarity for polymers depending on the amount of known information. In the first case, where only a drawn structure is known, we first represent the polymer as a graph. This graph representation is then decomposed into three parts: repeat units, end groups, and topology. Similarity scores for each of the three parts are computed and linearly combined to yield a pairwise chemical similarity for polymers that is tunable based on the needs of the user. In the second case, where information about the distributions such as molecular mass are known, we adopt the Earth Mover’s Distance along with explicit calculation of distances between all constituents via Graph Edit Distance. In both cases, we demonstrate that our methods provide quantitative and intuitive values for polymer similarity through a variety of case studies. Ultimately, this similarity can be used for enhanced search in data resources such as a Community Resource for Innovation in Polymer Technology and to accelerate machine learning enabled polymer design through common AI algorithms for regression, classification and dimensionality reduction.

Bio: Dr. Debra J. Audus is the project leader of the Polymer Analytics Project in the Materials Science and Engineering Division at National Institute of Standards and Technology (NIST). She came to NIST in 2013 after receiving her PhD in Chemical Engineering at University of California, Santa Barbara and BS from Cornell University. Her research focuses on machine learning, polymer databases and using both theory and simulation to understand polymer physics. She is also involved in the polymer property predictor and database ( and the community resource for innovation in polymer technology (


Meeting location
William Eckhardt Research Center. Room 401
5640 S Ellis Avenue, Chicago, IL 60637
Map It

Campus North Parking
5505 S Ellis Ave
Map It