📟 Fine-tuning transformer models with job adverts

Flows to fine-tune transformer models for a variety of downstream tasks using job adverts

👋 Welcome!

This repo contains metaflows that train transformer models for both domain adaptation and a variety of downstream tasks using job adverts from Nesta's Open Jobs Observatory. With the permission of job board sites, we have been collecting online job adverts since 2021 and building algorithms to extract and structure information. We have collected millions of job adverts since the project's inception.

Although we are unable to share the raw data openly, we aim to build open source tools, algorithms and models that anyone can use for their own research and analysis. For example, we have built an open-source Skills Extractor library and have an open locations extraction repo.

This repo contains the metaflows used to fine-tune transformer models with job adverts for a variety of downstream tasks, including:

next-sentence prediction
masked language modelling
skill semantic similarity
named entity recognition

💘 Using fine-tuned model checkpoints

The fine-tuned models (and their associated model cards) can be accessed via huggingface's hub:

Setup

To run the flows, you will need to:

Meet the data science cookiecutter requirements, in brief:
- Install: direnv and conda
Run make install to configure the development environment:
- Setup the conda environment
- Configure pre-commit
Download spacy model: python -m spacy download en_core_web_sm
install Pytorch: conda install pytorch torchvision -c pytorch (if you are using mac OS x 13.4 operating system - pip install torch)
Set up batch processing with Metaflow
Sign into huggingface hub to push models to huggingface
run export LC_ALL="en_GB.UTF-8" in your terminal

However, to simply use the models, please refer to 💘 Using fine-tuned model checkpoints section.

Contributor guidelines

Technical and working style guidelines

Project based on Nesta's data science project template (Read the docs here).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
company_descriptions		company_descriptions
salaries		salaries
sic		sic
skillner		skillner
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📟 Fine-tuning transformer models with job adverts

👋 Welcome!

💘 Using fine-tuned model checkpoints

Setup

Contributor guidelines

About

Releases

Packages

Contributors 4

Languages

License

nestauk/ojd_daps_language_models

Folders and files

Latest commit

History

Repository files navigation

📟 Fine-tuning transformer models with job adverts

👋 Welcome!

💘 Using fine-tuned model checkpoints

Setup

Contributor guidelines

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages