This is a small repository exploring how Word2Vec models can be used to process data from gamma ray spectroscopy experiments. It was adapted from Tensorflow's Skip-Gram Tutorial: https://github.com/tensorflow/docs/blob/master/site/en/tutorials/text/word2vec.ipynb
In this project, a skip-gram language model is trained on spectroscopic data instead of natural language. Typically, a skip-gram model is trained on sets of "positive skip-grams" and "negative skip-grams," which correspond to pairs of words that either do co-occur or don't co-occur in written text.
In gamma ray spectroscopy, scientists use detectors to measure gamma rays emitted from a source nucleus and use those measurements to infer structural information about the nucleus. Among other data, these detectors can record the energy contained in a gamma ray and the time window in which a ray was emitted. Then, scientists study the list of all gamma ray energies that occurred in the same time window as other gamma rays energies, and use this data to make inferences.
This repository contains a notebook 'alpha.ipynb' that illustrates one way that spectroscopic data of this type can be used to train a skip-gram language model, as well as how that model can be inspected. It is primarily intended as a proof-of-concept. In this example, we draw an analogy between words that occur in close proximity in written language with gamma rays that occur within the same time window. Consequently, we can form positive and negative skip grams based on which gamma ray energies are measured in close temporal proximity to other energies. From this framework, we can train a skip-gram (Word2Vec) model that represents the full spectrum of detected gamma rays in an embedding space. Analysis of this embedding space may highlight significant relationships among gamma rays of different energies, which can be used to infer information about the atomic nucleus that generated the gamma rays.