By Santiago Fernández de Valderrama, Applied AI Operator · Last updated

How career-ops scores job listings

career-ops is a filter, not an amplifier. Most AI-powered job-search tools optimise for volume — apply faster, apply to more. The rubric below is designed to do the opposite: say no often, surface higher-conviction matches, deliver applications that respect both your time and the recruiter’s.

The 4.0 / 5.0 threshold

Every evaluation produces a global score between 1.0 and 5.0. The recommendation tier is fixed:

ScoreRecommendation
4.5+Strong match. Apply immediately.
4.0 – 4.4Good match. Worth applying.
3.5 – 3.9Decent but not ideal. Apply only with a specific reason.
Below 3.5Recommend against applying.

4.0 is the apply / don’t-apply line. The 3.5–3.9 band is an explicit override-only zone — the agent will say so, the user decides. This threshold is canonical: it lives in AGENTS.md as part of the project’s ethical-use rules.

The six dimensions

A global score is the LLM’s judgement across six dimensions. The rubric is in modes/_shared.md (canonical, System Layer).

DimensionWhat it measuresHow it’s computed
matchSkills, experience, proof points alignmentLLM compares JD requirements to cv.md + article-digest.md, citing exact CV lines (Block B)
north-star alignmentHow well the role fits the user’s target archetypesDetects archetype from JD, checks against the user’s _profile.md
compSalary versus marketWeb search across Glassdoor / Levels.fyi / Blind. 5 = top quartile, 1 = well below
cultural signalsCulture, growth stage, stability, remote policyQualitative LLM judgement informed by JD plus targeted search
red flagsBlockers, warnings, risk signalsNegative-only adjustments — surfaced even when the rest of the score is high
globalAggregate fit (the score that drives the apply / don’t recommendation)LLM-implicit weighting given the rubric and the five sub-dimensions above

LLM judgement, not closed-form math

There is no weighted-average formula in the code.The global score is the LLM’s judgement of overall fit, given the rubric and the sub-dimensions. This is a deliberate design choice, for three reasons:

  1. JD context is heterogeneous.What “comp” means at an early-stage startup differs from a hyperscaler. Static weights would over-fit one context.
  2. User archetypes vary. Personalisation through _profile.md changes priorities; a fixed formula would fight that.
  3. Honesty. Pretending closed-form math when the underlying engine is an LLM is dishonest marketing. We choose to be transparent that this is rubric-guided judgement — and that two different LLMs may produce slightly different scores on the same input.

The rubric is fully transparent (above). The judgement is auditable: every score comes with citations to specific CV lines and JD requirements. You can disagree and override.

The canonical evaluation prompt

The full evaluation runs as Block A through G, defined in modes/oferta.md. The canonical version is in Spanish (the original implementation language); an English translation is in progress (issue #363). Each block in summary:

BlockWhat the agent produces
A — Role SummaryTable: archetype, domain, function, seniority, remote policy, team size, TL;DR. No score.
B — CV MatchReads cv.md. Maps each JD requirement to specific CV lines. Identifies gaps with mitigation. Adapts focus to the detected archetype.
C — Level StrategyDetected level vs. candidate’s natural level. “Sell senior without lying.” Plan for downlevelling cases.
D — Comp & DemandWeb search (Glassdoor / Levels.fyi / Blind). Cites sources. If no data, says so — never invents.
E — Personalisation PlanTop 5 changes to the CV plus top 5 changes to LinkedIn for this specific role.
F — Interview Prep6–10 STAR+R stories mapped to the JD. The Reflection column signals seniority — junior describes events, senior extracts lessons.
G — Posting LegitimacyThree-tier assessment: High Confidence / Proceed with Caution / Suspicious. Separate from the 1–5 score; signals not accusations, with legitimate explanations always noted.
The full Spanish prompt is the source of truth. Read it, fork it, audit it: modes/oferta.md.

Edge cases

Incomplete job data

When compensation is missing, Block D explicitly says so — it does not invent numbers. The comp dimension reports as “insufficient data, contributing low confidence to global”. Vague JDs are flagged in Block G but never auto- classified Suspicious without evidence; startups, niche roles, and recruiter-sourced postings legitimately have less detail.

Ambiguous or multi-fit roles

If a posting straddles two archetypes, Block A surfaces both. Block B then maps against each, with priority weight assigned by the dominant signal density in the JD. The user gets the full split rather than a hidden pick.

Closed or expired listings

Liveness is checked separately from scoring (check-liveness.mjs). The liveness path looks at apply-button state, “applications closed” regex patterns, and posting age against a role-type -adjusted threshold. Recent improvements: #374 tightened the regex set after false positives on multi-month-old jobs (#373).

Cost and token usage

A full Block A–G evaluation takes the order of 5–10 LLM calls plus 2–3 web searches (Block D and Block G). Exact cost depends on which CLI the user runs (Claude Code, Codex, OpenCode, Gemini CLI, Qwen, Copilot — whichever is configured). For users on metered API keys this matters; tracking the cost surface is open in #273.

What career-ops explicitly does not do

Anti-features are as load-bearing as features. From modes/_shared.md and the rejected-PR record:

  • No spray-and-pray. The 4.0 threshold rejects most postings the user evaluates. By design.
  • No auto-apply. Every application is a manual user decision. Nothing is submitted without approval.
  • No invented experience or metrics.If it isn’t in your CV or your article digest, the agent will not claim it.
  • No CV modification. cv.md is yours. Personalised outputs go to a separate file; the source is never overwritten.
  • No phone-number leaks. Phone numbers are intentionally never included in generated messages.
  • No below-market comp recommendations. If the role pays badly, the agent says so.
  • No anti-bot evasion. Patchright-style fingerprint masking was considered and rejected (PR #235). Career-ops uses standard Playwright; a recruiter can see who’s knocking.
  • No LinkedIn scraping. Persistent-session LinkedIn scanning was approved in concept (#238) but no implementation has shipped.
  • No cloud data storage. career-ops itself is local code. The only cloud touch is whichever LLM CLI the user picked. Local-only Ollama is pending (PR #561).
  • No selling user data. The whole project is MIT, free, local-data. That is the model. There is no other.

What’s in flight

Transparency requires acknowledging the work in progress. As of :

  • #363 — translating the canonical Spanish modes to English. Cluster of nine PRs open; merge policy still under review.
  • #561 — Ollama backend so the entire pipeline can run with a local LLM. RFC pending.
  • #557 — token-reduction work for CV-generation scripts. Issue open.
  • #572 — agent-agnostic instruction file. Merged . AGENTS.md is now canonical (replacing per-CLI files).

Frequently asked

How does career-ops actually score job listings?

career-ops uses a rubric-guided LLM evaluation across six dimensions (match, north-star alignment, comp, cultural signals, red flags, global) producing a score from 1.0 to 5.0. Below 4.0 the agent recommends against applying. There is no closed-form weighting formula — the global score is the LLM’s judgement given the rubric, with citations to specific CV lines and JD requirements.

Is career-ops free? What is the business model?

career-ops is MIT-licensed open source. There is no paid tier, no waitlist, no account, no telemetry. You clone the repo, configure your profile, and run it locally. The only cost is whichever AI CLI you point it at — Claude Code, Codex, OpenCode, Gemini CLI, Qwen, Copilot.

How is career-ops different from Indeed AI or LinkedIn AI?

Indeed and LinkedIn AI features sit on the recruiter side of the table — they help employers filter candidates faster. career-ops sits on the candidate side, helping a single person evaluate which roles deserve their attention. The rubric is published, the code is open source, and nothing is shared with employers or platforms.

Can companies use career-ops to filter candidates?

No. career-ops is built for individual job seekers and reads only data the candidate provides about themselves (CV, profile, target archetypes). It does not ingest candidate databases, parse resumes at scale, or score third parties. Repurposing it for employer-side filtering is technically possible but contrary to its design and stated intent.

What data does career-ops collect from users?

career-ops itself collects nothing. It is local code that runs on your machine. The only data leaving your computer is whatever your configured AI CLI sends to its provider — and that subset is whatever pieces of your CV and the public job postings you choose to evaluate. Local-only execution via Ollama is in flight (PR #561).

Who built career-ops? Why?

career-ops was built by Santiago Fernández de Valderrama, an Applied AI Operator with 16+ years building products. He created it to manage his own AI-era job search in early 2026 — 740 listings evaluated, one Head of AI role landed — and open-sourced it under MIT once he no longer needed it.

Does career-ops work with my ATS or job board?

career-ops scans Greenhouse, Ashby, and Lever via their public APIs (zero-token, no scraping). For other portals it can use Playwright through a configured AI CLI. It does not integrate with employer-side ATS, does not scrape LinkedIn (issue #238), and does not use anti-bot fingerprint masking (PR #235 rejected by design).


Source of truth: modes/_shared.md + modes/oferta.md (System Layer per DATA_CONTRACT.md). Last updated .