description |
---|
Most powerful recommendation engine of profiles and job offers. |
Scoring technology helps you predict success and detect hidden gems. It allows you to score your profiles & jobs by relevance across all your pools.
We developed a fair-by-design technology that leverages external benchmark, market best practices and internal knowledge about your company to help you build tailor-made and bias-free prediction models across all your jobs in any industry. These models are also getting smarter with every interaction and feedback.
- **Tokenization : **Our method of tokenization gives our models additional advantages in terms of making them less prone or prone to typing errors by leveraging information on subwords and multigrams.
- Vectorization : Our vectorization method consists of a hierarchy of levels, a first one at the word level by using our in-house pre-trained word embeddings on the world largest HR entities dataset, then a second one for encoding paragraphs (sections in documents) based on Natural Language Processing state of the art models.
The scoring dataset plays a key role in the success of a scoring engine retraining. Here are some requirements on its form and content.
The folder structure of a scoring training dataset must look like this:
<dataset_dir>/
├── resumes/
| ├── 00.<resume_ext>
| ├── 00.json
| ├── 01.<resume_ext>
| ├── 01.json
| ├── ...
├── jobs/
| ├── 00.json
| ├── 01.json
| ├── ...
| hiring_process.<ext>
└── status.json
- the subfolder resumes contains:
- Resume in any supported extension (pdf, docx, image and more). They are save in the following format: <resume_id>.<resume_ext>
- Any relevant additional information (metadata) about the resume (e.g. category of the resume if it has been categorized), stored in JSON format as <resume_id>.json
- the subfolder _jobs _contains job objects stored as JSON files
- The hiring_process.<ext> fully describes the various statuses of an application. Furthermore, a diagram emphasizing how application statuses are linked to each other is essential.
- The status.json links the relationship between the resumes and the job offers.
Given the following textbook case hiring process: screening -> interview -> hiring, the status.json might look something similar to this_:_
[
{"job_id": "00", "resume_id": "00", "status": "screening"},
{"job_id": "00", "resume_id": "01", "status": "discarded_after_screening"},
{"job_id": "00", "resume_id": "02", "status": "interview"},
{"job_id": "00", "resume_id": "03", "status": "discarded_after_interview"},
{"job_id": "00", "resume_id": "04", "status": "hired"},
...
]
The first elements of this example shows five different applications to the same job offer (id "00").\
A particular attention must be paid to the statistical biases of the dataset. This issue originates from the overrepresentation of some category of data (e.g. dataset containing mostly IT profiles, senior profiles, males over females, etc).
Without any safety precaution taken, a deep learning model naturally tends to inadvertently leverage biases to better fit the data.
In order to successfuly retrain an unbiased scoring engine, the requirements are:
- Provide all the relevant side informations about the applicant in the _<resume_id>.json _files
- Provide all the relevant side informations about the jobs, they should be available in the job objects (metadatas)
- Provide enough diversity of data with regards to all the side informations available
In addition, in the scope of retraining a scoring engine, a minimum of:
- 10k unique candidates
- 20k unique jobs
- 3k applications in hired status
is highly recommended.