Tools

Job Searcher: A Distilled Model for Resume-Aware Job Matching

Hugging Face released a fine-tuned 8B model that filters LinkedIn job postings by matching them against candidate resumes, using structured reasoning from a larger teacher model.

Last verified:

Graduate job searches often devolve into mechanical clicking—posting after posting reviewed with diminishing care as weeks pass. Hugging Face released Job Searcher, a fine-tuned model that inverts this workflow: instead of candidate-driven screening, the model filters and ranks opportunities based on structured reasoning about fit. According to the Hugging Face Blog, the system combines a larger teacher model (DeepSeek V4 Pro) for data labeling with a smaller inference-time student (Qwen3-8B quantized to Q4_K_M) that runs on a single GPU slice.

Closed-Loop Data Construction Avoids Distribution Shift

The engineering insight underlying Job Searcher is deliberate: the training corpus was built in a resume-aware, end-to-end loop that mirrors real job-search behavior. The Hugging Face post describes the process: 2,500 resumes from the Divyaamith/Kaggle-Resume dataset fed into DeepSeek V4 Pro, which generated LinkedIn-shaped search queries. JobSpy then scraped the actual results for those queries, yielding approximately 10,000 real postings. This design ensures that the student model trains on jobs that genuinely surface in response to candidate profiles—not arbitrary postings—reducing the risk of distribution mismatch between training and deployment.

Five-Dimensional Scoring with Inline Reasoning

According to the Hugging Face Blog, the model assigns each (resume, job) pair a fit score across five axes: skills match, experience relevance, education and certifications, industry and domain fit, and seniority alignment. Each dimension includes one sentence of reasoning, making the shortlist transparent rather than opaque. The output is not a ranked list of fifty candidates but a curated shortlist with defensible explanations—the model articulates why it prefers the second-ranked position over the third.

Training Methodology and Deployment

The Hugging Face post reports that training consisted of two LoRA fine-tuning runs on a single A100 GPU via Modal. Adapter configuration used rank-16 projections across attention and multi-layer perceptron layers, dropout disabled, one epoch per task, with mid-epoch checkpoints every 200 steps. The resulting Qwen3-8B adapter ships in safetensors format and a quantized Q4_K_M GGUF variant, both available at build-small-hackathon/job-searcher-qwen3-8B, enabling single-slice deployment.

Why This Matters

Job-matching products built in-house or via third-party API face a choice: license a large closed-source model (with latency and cost overhead) or adopt an open-weights alternative. Job Searcher demonstrates that an 8B distilled model, trained on structured reasoning from a larger peer and fine-tuned on resume-aware data, can replicate the scoring logic of larger systems while running on commodity GPU infrastructure. For AI engineers building applicant-tracking integrations or candidate-sourcing tools, this release reduces the barrier to deploying reasoning-enabled filtering without external API dependencies. The open-weights nature also allows custom re-training on proprietary resume corpora or domain-specific job taxonomies—a degree of control that closed-API alternatives do not permit.

Frequently Asked Questions

What models does Job Searcher use?

According to the Hugging Face Blog, the system employs DeepSeek V4 Pro as an offline label generator and Qwen3-8B as the inference-time model. The Qwen model is quantized to Q4_K_M and runs on a single ZeroGPU slice.

How was the training data constructed?

The Hugging Face post describes a closed-loop pipeline: 2,500 resumes from a public Kaggle dataset, synthetic LinkedIn-shaped queries generated by DeepSeek V4 Pro, ~10,000 real job postings scraped via JobSpy, and five-dimensional labels scored by DeepSeek for each (resume, job) pair.

What are the five evaluation dimensions?

According to the blog, the model scores jobs on skills alignment, experience relevance, education and certifications, industry and domain fit, and seniority alignment. Each dimension includes one sentence of reasoning.

How much compute was required to train the model?

The Hugging Face Blog reports that two LoRA fine-tuning runs used a single A100 GPU via Modal, with rank-16 adapters, one epoch per task, and mid-epoch checkpoints every 200 steps.

#model-distillation #job-matching #fine-tuning #qwen #deepseek