The EU AI Act, in its calmer moments, can feel like a sprawling document
written by people who do not deploy hiring tools for a living. Annex IV
is the part that becomes very concrete very quickly. It is the technical
documentation that providers of high-risk AI systems must keep, and that
deployers will want to read before they sign.
If you are about to sit through a vendor pitch for an AI hiring tool, the
fastest way to find out whether the vendor is serious about compliance is
to ask what they have against each of Annex IV's nine sections. Here is
the walkthrough.
Section 1 — General description of the AI system
This is the section vendors find easiest. A general description includes
the intended purpose, the version, hardware on which it runs, the user
interface, and instructions of use. Most vendor product pages, system
cards, and "how it works" documents satisfy this section in spirit.
What to ask: Show me the system card or equivalent.
What you are likely to get: A product page. That's actually fine if
the product page covers intended purpose and user interface concretely.
Section 2 — Detailed description of system development
This is where the gap opens. Annex IV §2 calls for:
- Methods and steps performed for development, including pre-trained
systems or tools used.
- Design specifications and key design choices.
- Description of system architecture and data processing.
- Computational resources and training time.
The good vendors publish a model card, a whitepaper, or a peer-reviewed
paper that touches each of these. Pymetrics (the FAccT '21 paper) does
this best in the peer set we score. Most vendors stop at "we use machine
learning to score candidates" and call it a day.
What to ask: What's the architecture? Where did the training data
come from? How many decisions does the production model make per day?
What you are likely to get: Hand-waves. That is the gap.
This section covers performance metrics, including accuracy, robustness,
and cybersecurity. It is the section most likely to embarrass a vendor
that built marketing copy on "97% accurate" without explaining what
accurate means.
For hiring AI, the metric a deployer cares about most is performance
across demographic subgroups. Overall accuracy of a candidate-ranker is
nearly meaningless; subgroup accuracy is what predicts whether the system
will get the deployer sued.
What to ask: Subgroup performance across race, sex, age, and
disability — with confidence intervals.
What you are likely to get: From the better vendors, the impact-
ratio table from their NYC LL 144 audit. From the rest, "we tested for
bias and found none."
Section 4 — Risk-management system (Article 9)
A risk-management system is a living process: identify foreseeable
risks, estimate their likelihood and severity, design mitigations, test
the mitigations, monitor the residual risk. The Article 9 process should
result in an artifact a deployer can read.
For hiring AI vendors, the most relevant risks are: protected-class
disparate impact, candidate misclassification, model drift on new
populations, adversarial gaming of the system. The risk-management
documentation should name these risks explicitly.
What to ask: Show me the residual-risk register for the highest-
ranked risks.
What you are likely to get: "It's in our internal documentation."
ISO 42001-certified vendors (Eightfold, Beamery, Workday) have a more
defensible answer here than non-certified peers.
Section 5 — Data governance (Article 10)
Annex IV §5 traces back to Article 10. The deployer wants to know:
- Where the training data came from.
- How it was labelled, cleaned, and validated.
- What proxies for protected classes might have leaked in.
- What data the model retains, for how long, and how candidates can
exercise their GDPR / state-law data rights.
For sourcing tools that aggregate public profile data, this section is
especially load-bearing — see, for example, the SeekOut profile in the
directory, where the absence of public data-governance disclosure
materially lowers their score.
What to ask: Variable list, plus the exclusion list (what the model
does not see).
What you are likely to get: Variable lists are increasingly public
(Beamery, HiredScore). Full provenance is still rare.
Section 6 — Human oversight (Article 14)
A vendor should be able to point at the override controls in the
product, the confidence thresholds the deployer can configure, and the
audit log that records every consequential decision. "Humans are in the
loop" is not a Section 6 answer; "candidates can be tracked, scores can
be disabled per jurisdiction, recruiters see the full chain of inputs
in-product" is.
What to ask: Walk me through the override and the audit log in the
product.
What you are likely to get: From Phenom and HiredScore, concrete
in-product controls. From most others, philosophy.
Section 7 — Predetermined changes and continuous learning
This section asks: how does the model change after deployment? If it
continues to learn, what triggers retraining? If it's static, when does
the next version ship?
HireVue's explainability statement is unusually direct on this — the
models are described as static and deterministic post-deployment. That
is a defensible Section 7 answer because it is testable. "Our model
adapts" is a much harder answer to substantiate.
What to ask: Static, deterministic, or learning? If learning, what's
the trigger and the review cycle?
What you are likely to get: A surprising number of vendors do not
have a confident answer here, which is its own signal.
This section calls for evidence of compliance with relevant harmonised
standards. For AI hiring, the most relevant emerging standard is
ISO/IEC 42001:2023 for AI management systems. The vendors holding
ISO 42001 as of mid-2026 — Eightfold, Beamery, Workday — get a
defensible Section 8 answer almost for free.
NIST AI RMF profiles, while not a harmonised European standard, often
appear here. They are useful evidence of process maturity, not a
substitute for the standard.
What to ask: Which standards do you certify against?
What you are likely to get: ISO 27001 (security, not AI), SOC 2
(security, not AI), and increasingly ISO 42001 (the right one).
Section 9 — EU declaration of conformity (Article 47)
The DoC is the single legal artefact attesting that the system conforms
to the Act. It is dated, signed, and stored with the provider for ten
years. The deployer typically does not see it directly; the relevant
question for the deployer is whether the vendor can show one on request.
What to ask: Can you produce the DoC under NDA?
What you are likely to get: From mature vendors, yes. From
early-stage vendors, "we're working on it."
What to take from this
If you are evaluating an AI hiring vendor against the AI Act, the nine
Annex IV sections are the most useful structured interview you can run.
A vendor who can answer six or seven concretely is a serious actor; a
vendor who pivots to brand-safe language on more than three or four of
them is not yet ready for the August 2026 deadline.
See our methodology for how Annex IV completeness flows
into the Article 11 Technical Documentation category, and the
vendor directory for cited evidence per vendor.