EU AI Act • Regulation 2024/1689

AI Act Risk Classification and Training Data

How EU AI Act risk categories affect training data requirements

The EU AI Act (Regulation 2024/1689) classifies AI systems into risk categories. Classification determines regulatory obligations — including requirements for training data governance under Article 10.

Risk Framework

The Four Risk Categories

The AI Act defines four levels of risk. Regulatory obligations increase with risk level.

Unacceptable Risk — Prohibited

AI systems that pose clear threats to safety, livelihoods, or fundamental rights. These are banned outright. Examples include social scoring systems, real-time biometric identification in public spaces (with limited exceptions), and AI that exploits vulnerabilities or uses subliminal manipulation.

Training data is not relevant for prohibited systems — they cannot be deployed.

High Risk — Regulated

AI systems in safety-critical applications or sensitive use cases. These require conformity assessment, technical documentation, and ongoing compliance obligations including training data governance.

Training data requirements under Article 10 apply.

Limited Risk — Transparency Obligations

AI systems that interact directly with people without significant implications. Requirements focus on disclosure — ensuring users know they are interacting with AI. Examples include chatbots and AI-generated content.

Training data governance is not mandated, though good practice applies.

Minimal Risk — Unregulated

AI systems with negligible or no impact on rights or safety. No mandatory requirements. Examples include spam filters and AI-enabled games.

Classification Mechanisms

How High-Risk Classification Works

An AI system is classified as high-risk through two pathways defined in Article 6.

Pathway 1: Safety Component (Annex I)

The AI system is a safety component of a product — or is itself a product — covered by EU harmonization legislation listed in Annex I.

  • Medical devices
  • Automotive systems
  • Aviation equipment
  • Machinery
  • Lifts
  • Radio equipment
  • Toys (where safety-relevant)

If the product requires third-party conformity assessment and the AI is integral to safety, it is high-risk.

Pathway 2: High-Risk Use Cases (Annex III)

The AI system falls within one of eight areas defined in Annex III:

  1. Biometrics
  2. Critical Infrastructure
  3. Education and Vocational Training
  4. Employment
  5. Essential Services
  6. Law Enforcement
  7. Migration and Border Control
  8. Justice and Democratic Processes

Automatic High-Risk Classification

Any AI system that profiles natural persons — automated processing of personal data to evaluate or predict aspects of a person's life, work, health, preferences, or behavior — is always classified as high-risk, regardless of exemptions.

Exemptions from High-Risk (Limited)

AI systems in Annex III areas may be exempt if they:

  • Perform narrow procedural tasks
  • Improve results of previously completed human activity
  • Detect decision-making patterns without replacing human judgment
  • Perform preparatory tasks only

These exemptions do not apply if the system profiles individuals.

Article 10 Obligations

Training Data Requirements for High-Risk Systems

Article 10 of the AI Act establishes data governance requirements for high-risk AI systems. These requirements apply to training, validation, and testing datasets.

Quality Requirements (Article 10.3)

  • Relevant to intended purpose
  • Sufficiently representative
  • Free of errors to best extent possible
  • Complete for intended purpose
  • Statistically appropriate for populations

Governance Requirements (Article 10.2)

• Design choices & collection processes

• Data preparation (annotation, labeling)

• Formulation of assumptions

• Assessment of availability & suitability

• Examination for biases

• Identification of data gaps

• Measures to address identified issues

Contextual Requirements (Article 10.4)

Datasets must reflect the specific context of deployment:

  • Geographic setting
  • Behavioral context
  • Functional environment
  • Affected populations

Documentation Requirements

High-risk system providers must maintain technical documentation demonstrating Article 10 compliance. This documentation is subject to review during conformity assessment and may be requested by competent authorities.

Supply Chain

What This Means for Training Data Procurement

Organizations deploying high-risk AI systems must demonstrate that training data meets Article 10 requirements. This obligation sits with the deploying organization, not the data provider. However, training data sourcing decisions directly affect compliance outcomes.

Supports Compliance

  • Documented provenance & collection methodology
  • Transparent sampling & representativeness info
  • Bias assessment & limitations disclosure
  • Version control & reproducibility

Creates Compliance Risk

  • Lack of traceability to data sources
  • No documentation of collection/preparation
  • No assessment of representativeness/gaps
  • No governance artifacts suitable for audit

The question during conformity assessment is not whether training data is available, but whether training data governance can be demonstrated.

Vendor Responsibility

YPAI's Role in AI Act Compliance

YPAI is a speech and language data provider. YPAI does not classify AI systems, perform conformity assessments, or issue compliance certifications.

What YPAI Provides

  • • European speech datasets with documented governance
  • • Provenance records & collection methodology
  • • Sampling methodology & representativeness info
  • • Bias assessment & known limitations disclosure
  • • Technical documentation for Article 10 support

What YPAI Does Not Provide

  • • System risk classification
  • • Conformity assessment services
  • • Legal advice on regulatory interpretation
  • • Certification of compliance

How this supports customer compliance: Organizations deploying high-risk AI systems can use YPAI's documentation to demonstrate training data governance during conformity assessment. The documentation is structured to address Article 10 requirements.

Whether training data meets the specific requirements for a given AI system depends on the system's intended purpose, deployment context, and affected populations. This assessment is the responsibility of the deploying organization.

Legal Disclaimer

Classification Is the Customer's Responsibility

YPAI does not determine whether a customer's AI system is high-risk. That determination depends on the system's intended purpose, whether it falls under Annex I or Annex III, whether exemptions apply, and whether the system profiles natural persons.

Organizations uncertain about classification should consult legal counsel or refer to guidance from the European Commission and national competent authorities.

The Commission is required to publish guidelines with practical examples of high-risk and non-high-risk systems by February 2026.

Timeline Reference

Key Dates

Date Milestone
August 2024 AI Act entered into force
February 2025 Prohibited practices in effect
August 2025 GPAI model obligations in effect
August 2026 High-risk system obligations in effect (Annex III)
August 2027 High-risk system obligations in effect (Annex I embedded products)

Organizations deploying high-risk AI systems should ensure training data governance is in place before August 2026.

Related Resources

Related Resources

Need AI Act compliant training data?

Our team can help you navigate AI Act requirements and source compliant speech data for your high-risk AI systems.

Request AI Act–Ready Speech Data