Strathclyde Mechanical and Aerospace Research Toolboxes for Natural Language Processing
The code stored in this repository was used to generate the results of the paper "Space mission design ontology: extraction of domain-specific entities and concepts similarity analysis", presented at the 2020 AIAA SciTech Forum in Orlando, USA in the Invited Session on Cognitive Assistants. The paper is available on researchgate.
The code allows to automatically extract a domain-specific lexicon from a domain-specific corpus, in this case, the corpus is related to space mission design. The lexicon items are embedded with word2vec and similar concepts are identified with cosine similarity.
The code stored in this repository was used to generate results presented at the International Astronautical Congress 2019, in Washington DC (USA), in the session on 'Knowledge Management for space activities in the digital era'. The original paper 'The automatic categorisation of space mission requirements for the Design Engineering Assistant' (IAC-19,D5,2,7,x51013) is available on researchgate.
The code allows to train and evaluate unsupervised, supervised and updated LDA models, a common Topic Modeling method, from a wikipedia-based space mission design corpus. The models can be evaluated via the categorisation of space mission requirements, extracted from freely-available European Space Agency mission documents.
The repository includes
- The functions to train an unsupervised, semi-supervised and updated LDA models,
- The functions to perform 5 cross-fold validation to optimise a model topic number,
- A first 'Space Mission Design' wikipedia corpus, the requirements corpus and the Natural Language Processing pipeline used to process them,
- Examples of unsupervised and semi-supervised topics dictionaries, labeled and validated by Human annotators,
- The 'Space' lexical priors, or seed words, used to train the semi-supervised LDA model, validated by Human annotators.
The code stored in this repository was used to generate results presented at the 9th International Systems & Concurrent Engineering for Space Applications Conference (SECESA 2020), a digital event held in October 2020. The paper 'From engineering models to knowledge graph: delivering new insights into models' is available on researchgate.
The code allows to automatically migrate Engineering Models based on the ECSS-E-TM-10-25A TM (in our case exported from the RHEA CDP4-CE platform), to a Grakn Knowledge Graph (KG). Code is also provided to infer a new type of relationship isIncludedInMassBudget within the graph, and automatically generate a dry mass budget for each design option. Finally, a pipeline to train a doc2vec model with the Gensim Python library, embed requirements sets found in the populated KG and assess their similarity with cosine similarity is provided.
Comment: The authors warmly thank Sabrina Mirtcheva and Serge Valera from ESA for kindly providing the ECSS requirements data set used to train the doc2vec model. This data set is based on the EARM_ECSS_export(DOORS-v0.7_May2019).xlsx document that can be found here.