Skill Extraction
a collection of hard skill entities extracted from a corpus of resumes
Description
This dataset is a collection of hard skill entities extracted from a corpus of resumes. It is designed to benchmark the differences in skill extraction performance between human annotators and automatic systems. The resource contains two types of labels:
- Human-Annotated Labels: Created during an organized student workshop at the EHL Business School. Multiple annotations per CV were collected to establish a reliable consensus for the ground truth.
- Automatic System Labels: Generated by a state-of-the-art supervised machine learning system and conversational LLM (see related paper).
Reference
If you use this dataset, please cite the following publication:
Vásquez-Rodríguez, L., Audrin, B., Michel, S., Galli, S., Rogenhofer, J., Negro Cusa, J., & Van Der Plas, L. (2025). A human perspective to ai-based candidate screening. Proceedings of the 58th Hawaii International Conference on System Sciences