Discovery of Novel STEM Documents
author : Tofara Moyo
Abstract
We present a novel scientific document discovery system inspired by molecular chemistry and AI-driven drug discovery. Our approach treats document tokens as atomic units, which are combined to form “molecular” representations of mathematical documents. We employ a probabilistic framework to maximize the likelihood of forming coherent mathematical documents while minimizing the probability of random token combinations and non-STEM document tokens.To achieve this, we develop a token embedding scheme that maps property vectors to a musical keyboard, effectively representing each token as a musical chord. We further differentiate between STEM and non-STEM documents by introducing a harmonic constraint on adjacent nodes in document graphs. Specifically, STEM documents are characterized by polyphonic harmonization of adjacent node vectors, whereas non-STEM documents exhibit dissonant relationships.Our system integrates a graph neural network/transformer decoder architecture, trained end-to-end to generate STEM documents from input graphs. This innovative approach has the potential to revolutionize scientific document discovery and retrieval.
https://www.researchgate.net/publication/385937370_Discovery_of_Novel_STEM_Documents