Annex IV section-by-section: what AI hiring vendors must document

The EU AI Act, in its calmer moments, can feel like a sprawling document written by people who do not deploy hiring tools for a living. Annex IV is the part that becomes very concrete very quickly. It is the technical documentation that providers of high-risk AI systems must keep, and that deployers will want to read before they sign.

If you are about to sit through a vendor pitch for an AI hiring tool, the fastest way to find out whether the vendor is serious about compliance is to ask what they have against each of Annex IV's nine sections. Here is the walkthrough.

Section 1 — General description of the AI system

This is the section vendors find easiest. A general description includes the intended purpose, the version, hardware on which it runs, the user interface, and instructions of use. Most vendor product pages, system cards, and "how it works" documents satisfy this section in spirit.

What to ask: Show me the system card or equivalent. What you are likely to get: A product page. That's actually fine if the product page covers intended purpose and user interface concretely.

Section 2 — Detailed description of system development

This is where the gap opens. Annex IV §2 calls for:

Methods and steps performed for development, including pre-trained systems or tools used.
Design specifications and key design choices.
Description of system architecture and data processing.
Computational resources and training time.

The good vendors publish a model card, a whitepaper, or a peer-reviewed paper that touches each of these. Pymetrics (the FAccT '21 paper) does this best in the peer set we score. Most vendors stop at "we use machine learning to score candidates" and call it a day.

What to ask: What's the architecture? Where did the training data come from? How many decisions does the production model make per day? What you are likely to get: Hand-waves. That is the gap.

Section 3 — Detailed information about monitoring, functioning, and control

This section covers performance metrics, including accuracy, robustness, and cybersecurity. It is the section most likely to embarrass a vendor that built marketing copy on "97% accurate" without explaining what accurate means.

For hiring AI, the metric a deployer cares about most is performance across demographic subgroups. Overall accuracy of a candidate-ranker is nearly meaningless; subgroup accuracy is what predicts whether the system will get the deployer sued.

What to ask: Subgroup performance across race, sex, age, and disability — with confidence intervals. What you are likely to get: From the better vendors, the impact- ratio table from their NYC LL 144 audit. From the rest, "we tested for bias and found none."

Section 4 — Risk-management system (Article 9)

A risk-management system is a living process: identify foreseeable risks, estimate their likelihood and severity, design mitigations, test the mitigations, monitor the residual risk. The Article 9 process should result in an artifact a deployer can read.

For hiring AI vendors, the most relevant risks are: protected-class disparate impact, candidate misclassification, model drift on new populations, adversarial gaming of the system. The risk-management documentation should name these risks explicitly.

What to ask: Show me the residual-risk register for the highest- ranked risks. What you are likely to get: "It's in our internal documentation." ISO 42001-certified vendors (Eightfold, Beamery, Workday) have a more defensible answer here than non-certified peers.

Section 5 — Data governance (Article 10)

Annex IV §5 traces back to Article 10. The deployer wants to know:

Where the training data came from.
How it was labelled, cleaned, and validated.
What proxies for protected classes might have leaked in.
What data the model retains, for how long, and how candidates can exercise their GDPR / state-law data rights.

For sourcing tools that aggregate public profile data, this section is especially load-bearing — see, for example, the SeekOut profile in the directory, where the absence of public data-governance disclosure materially lowers their score.

What to ask: Variable list, plus the exclusion list (what the model does not see). What you are likely to get: Variable lists are increasingly public (Beamery, HiredScore). Full provenance is still rare.

Section 6 — Human oversight (Article 14)

A vendor should be able to point at the override controls in the product, the confidence thresholds the deployer can configure, and the audit log that records every consequential decision. "Humans are in the loop" is not a Section 6 answer; "candidates can be tracked, scores can be disabled per jurisdiction, recruiters see the full chain of inputs in-product" is.

What to ask: Walk me through the override and the audit log in the product. What you are likely to get: From Phenom and HiredScore, concrete in-product controls. From most others, philosophy.

Section 7 — Predetermined changes and continuous learning

This section asks: how does the model change after deployment? If it continues to learn, what triggers retraining? If it's static, when does the next version ship?

HireVue's explainability statement is unusually direct on this — the models are described as static and deterministic post-deployment. That is a defensible Section 7 answer because it is testable. "Our model adapts" is a much harder answer to substantiate.

What to ask: Static, deterministic, or learning? If learning, what's the trigger and the review cycle? What you are likely to get: A surprising number of vendors do not have a confident answer here, which is its own signal.

Section 8 — Standards and conformity assessment

This section calls for evidence of compliance with relevant harmonised standards. For AI hiring, the most relevant emerging standard is ISO/IEC 42001:2023 for AI management systems. The vendors holding ISO 42001 as of mid-2026 — Eightfold, Beamery, Workday — get a defensible Section 8 answer almost for free.

NIST AI RMF profiles, while not a harmonised European standard, often appear here. They are useful evidence of process maturity, not a substitute for the standard.

What to ask: Which standards do you certify against? What you are likely to get: ISO 27001 (security, not AI), SOC 2 (security, not AI), and increasingly ISO 42001 (the right one).

Section 9 — EU declaration of conformity (Article 47)

The DoC is the single legal artefact attesting that the system conforms to the Act. It is dated, signed, and stored with the provider for ten years. The deployer typically does not see it directly; the relevant question for the deployer is whether the vendor can show one on request.

What to ask: Can you produce the DoC under NDA? What you are likely to get: From mature vendors, yes. From early-stage vendors, "we're working on it."

What to take from this

If you are evaluating an AI hiring vendor against the AI Act, the nine Annex IV sections are the most useful structured interview you can run. A vendor who can answer six or seven concretely is a serious actor; a vendor who pivots to brand-safe language on more than three or four of them is not yet ready for the August 2026 deadline.

See our methodology for how Annex IV completeness flows into the Article 11 Technical Documentation category, and the vendor directory for cited evidence per vendor.