Skill Extraction Green
annotations of relevant entities related to different skill categories
Description
This release is an adaptation of the Green dataset (https://github.com/acp19tag/skill-extraction-dataset). The original dataset includes annotations of relevant entities related to different skill categories such as Skill, Qualification, Experience, Occupation, and Domain. We have adapted this dataset by remapping the existing annotations into only the Skills and Occupation categories in order to improve the model’s decision process.
Reference
If you use this dataset, please cite the following publications:
Vásquez-Rodríguez, L., Audrin, B., Michel, S., Galli, S., Rogenhofer, J., Negro Cusa, J., & Van Der Plas, L. (2025). Skill Extraction from Resumes and Job Offers across Six Languages. (Submitted to EACL 2026).
@INPROCEEDINGS{Vasquez-Rodriguez_RECSYSINHR24_2024,
author = {V{\'{a}}squez-Rodr{\'{\i}}guez, Laura and Audrin, Bertrand and Michel, Samuel and Galli, Samuele and Rogenhofer, Julneth and Negro Cusa, Jacopo and van der Plas, Lonneke},
projects = {Idiap, SEM24},
month = jul,
title = {Hardware-effective Approaches for Skill Extraction in Job Offers and Resumes},
journal = {CEUR Workshop Proceedings},
booktitle = {The 4th Workshop on Recommender Systems for Human Resources, in conjunction with the 18th ACM Conference on Recommender Systems},
volume = {3788},
year = {2024},
url = {https://ceur-ws.org/Vol-3788/RecSysHR2024-paper_9.pdf},
pdf = {https://publications.idiap.ch/attachments/papers/2024/Vasquez-Rodriguez_RECSYSINHR24_2024.pdf}
}