Contributors: (ordered alphabetically) Arthit Suriyawongkul(ADAPT Centre, Trinity College Dublin),
Daniel Doherty(Trinity College Dublin),
Delaram Golpayegani(ADAPT Centre, Trinity College Dublin),
Georg P. Krog(Signatu AS),
Harshvardhan J. Pandit(AI Accountability Lab (AIAL), Trinity College Dublin),
Julian Flake(University of Koblenz),
Scott Kellum(Typetura).
NOTE: The affiliations are informative, do not represent formal endorsements, and may be outdated as this list is generated automatically from existing data.
The AI extension extends the [[[DPV]]] and its [[[TECH]]] extension to represent AI techniques, applications, risks, and mitigations. The namespace for terms in ai is https://www.w3id.org/dpv/ai#. The suggested prefix for the namespace is ai. The AI vocabulary and its documentation are available on GitHub.
DPV Specifications: The [[DPV]] is the core specification that is extended by specific extensions. A [[PRIMER]] introduces the concepts and modelling of DPV specifications, and [[GUIDES]] describe application of DPV for specific applications and use-cases. The Search Index page provides a searchable hierarchy of all concepts. The Data Privacy Vocabularies and Controls Community Group (DPVCG) develops and manages these specifications through GitHub. For meetings, see the DPVCG calendar.
Contributing: The DPVCG welcomes participation to improve the DPV and associated resources, including expansion or refinement of concepts, requesting information and applications, and addressing open issues. See contributing guide for further information.
Core Concepts
Overview of AI extension
The [[[AI]]] extension further extends the [[TECH]] extension to represent concepts specifically associated with development, use, and operation of AI, and provides:
Techniques such as machine learning and natural language programming
Capabilities such as image recognition and text generation
AI Systems and Models such as expert systems, or general purpose AI models (GPAI)
Data such as for training, testing, and validation
Development Phases such as training
Risks such as data poisoning, statistical noise and bias, etc.
Risk Measures to address the AI specific risks
Lifecycle such as data collection, training, fine-tuning, etc.
Documentation such as Datasheets and Model Cards
Actors such as AI Developer and AI Deployer
Status associated with AI development
The following sources have been used to model the concepts defined in this specification:
[[[ISO-22989]]] -- this standard defines the terminology for AI and represents an international consensus.
[[[AIAct]]] -- as the world's first regulation to govern Artificial Intelligence, the AI Act's terminologies and definitions have been instrumental in identifying which concepts should be modelled and to develop taxonomies for their detailed representation. The [[EU-AIAct]] extension defines the concepts explicitly linked/derived from the law by building on the general AI concepts provided in this specification.
[[[EUVOC-AI-Taxonomy]]] is an authoritative taxonomy of AI concepts provided by the Publications Office of the European Union
[[[AIRO]]] and [[[VAIR]]], which represent comprehensive research-based artefacts to represent AI and AI Act related concepts, have been incorporated into the DPV vocabularies.
The following sources represent potential sources for future concepts:
CREATION OF A TAXONOMY FOR THE EUROPEAN AI ECOSYSTEM: A report of the Cross-KIC Activity “Innovation Impact Artificial Intelligence” provides an overview of taxonomies used in the context of AI, and provides an harmonised or aligned view of them to evaluate the different perspectives on AI terminologies. While the report is comprehensive, it does not provide clear definitions or criteria for how concepts are modelled, and therefore is being considered as a potential source of concepts for future enhancements to the DPV's AI extension.
Artificial Intelligence (AI) is a category of Technology that exhibits or satisfies specific behaviour. While the exact definition of what constitutes 'AI' continues to be a subject of debate and regulation, we focus on the generally understood use of 'AI technologies' and thereby provide this extension to represent information about them in terms of developing and using them, and describing other relevant information about AI such as the specific risks involved and relevant mitigations and measures, documentation, involved data, and a description of the underlying technology itself in terms of specific operations and functions. As there is no consistent vocabulary or standard which is used uniformly within this domain, the concepts provided in this extension represent the specific way the DPVCG has chosen to represent information about AI technologies.
The AI extension is based on the modelling of technologies in DPV vocabularies. For this reason, it extends the [[TECH]] extension, and only provides AI-specific concepts in this extension. For example, the entity that is the developer of an AI system is represented by the same concept as the developer of any technology through the `tech:Developer` concept. If and when we identify AI-specific actors and roles, those will be defined in this extension by extending the relevant DPV and TECH entities.
Conceptual Model
Overview of the conceptual model for how AI is described as a technology in DPV and AI extension. The notes provide an example showing how the process of unlocking a phone for identity authentication is described using DPV, and the details of how this functions at a technical level is provided through the concepts in AI extension.
The concept [=AI=] and its corresponding relation [=hasAI=] represent the broad and generic concept of 'AI' and its use in different contexts. For example, AI might be used to refer to a specific technical algorithm (e.g. conventional computer science use of AI), or a way of automating specific tasks (e.g. business process use of AI), or to describe a process where AI is used in part (e.g. marketing use of AI). To explicitly and accurately describe what is involved in 'AI', we provide further granular additional concepts based on a 'three-layer approach' consisting of [=Technique=] and [=Capability=] for describing the technical implementation and goals, and 'Purpose' (represented by `dpv:Purpose`) for describing the broader aim of the process.
[=Technique=] represents the underlying 'technique' or 'algorithm', for example [=MachineLearning=] or its specific forms [=NeuralNetwork=] and [=SupervisedLearning=]. It is a technical detail that does not have a specific goal or purpose in the implementation, and which is applied in different contexts to achieve different outcomes.
[=Capability=] refers to the use of a technique to achieve or perform a (technical) goal or objective. It describes what the technology is 'capable' of doing in terms of a 'technical goal'. For example, [=FaceRecognition=] is a capability for using some underlying [=Technique=] to achieve its goal of recognising faces. However, by itself, we still don't know why facial recognition is being used or developed within the process. This is where `dpv:Purpose` then describes the broader goal or aim for not just the use of AI but also other contextual information such as data, people, entities - such as to state this is being done for identity verification and enforcement of security.
The separation of concepts in this manner also allows for an efficient and accurate representation of how AI technologies are developed and applied in practice. For example, _Entity1_ develops a algorithmic framework to ingest data and perform some statistical operations on it - this is represented as a [=Technique=]. This framework is then taken by _Entity2_ who uses it towards generating content - this is represented as a [=Capability=]. It then puts this on the market as a product. _Entity3_ then uses this product to provide a service to its customers in terms of recommendations - this is represented as a `dpv:Purpose`. In its knowledge graph, _Entity3_ records that it uses a technology with the relevant AI capability, while the knowledge graph of _Entity2_ represents that it uses the framework produced by _Entity1_.
Techniques
[=Technique=] represents the underlying technical implementation, and is associated using [=hasTechnique=]. It represents the lowest level of technical detail within the conceptual model used in this extension to describe 'AI technology'. It is useful to describe how the AI technology works in terms of specific algorithms and methodologies being used. By itself, a technique is not sufficient to describe what the AI technology is being used for, i.e. [=Technique=] is distinct from [=Capability=] and dpv:Purpose
.
An implementation of AI technology can be developed only based on a technique - for example as a library or as a framework that can be reused by others. Therefore, a technique can act as a component of a larger AI system where it represents a particular method for implementing something. A technique can also involve the use of other techniques in a composite or combined manner.
ai:KnowledgeTechnique: Techniques based on the use of knowledge bases
go to full definition
ai:InductiveProgramming: An algorithm or program featuring recursive calls or repetition control structures
go to full definition
ai:KnowledgeRepresentation: Encoding knowledge in a formal language or in a form that can be used for computer-based problem solving
go to full definition
ai:RuleBasedTechnique: Artificial intelligence approach governed by human-defined rules that explicitly dictate behaviour, relying on logical statements (rules) to determine actions in specific situations
go to full definition
ai:HeuristicProgramming: Programming approach designed to tackle problems for which there lacks a systematic or optimized approach, frequently used in expert systems
go to full definition
ai:SymbolicReasoning: Reasoning based on the knowledge encoded in a formal language
go to full definition
ai:MachineLearning: Process of optimising model parameters through computational techniques, such that the model's behaviour reflects the data or experience
go to full definition
ai:DecisionTree: Technique for which inference is encoded as paths from the root to a leaf node in a tree structure
go to full definition
ai:DeepLearning: Approach to creating rich hierarchical representations through the training of neural networks with many hidden layers
go to full definition
ai:TransferLearning: a technique in machine learning in which knowledge learned from a task is re-used in order to boost performance on a related task
go to full definition
ai:FrugalMachineLearning: Machine learning techniques that aim to make models more efficient, cost-effective, and accessible while maintaining or even improving their performance
go to full definition
ai:GeneticAlgorithm: Algorithm which simulates natural selection by creating and evolving a population of individuals (solutions) for optimization problems
go to full definition
ai:NeuralNetwork: Network of one or more layers of neurons connected by weighted links with adjustable weights, which takes input data and produces an output
go to full definition
ai:ConvolutionalNeuralNetwork: Feed forward neural network using convolution in at least one of its layers
go to full definition
ai:FeedForwardNeuralNetwork: Neural network where information is fed from the input layer to the output layer in one direction only
go to full definition
ai:LongShortTermMemory: Type of recurrent neural network that processes sequential data with a satisfactory performance for both long and short span dependencies
go to full definition
ai:RecurrentNeuralNetwork: Neural network in which outputs from both the previous layer and the previous processing step are fed into the current layer
go to full definition
ai:ReinforcementLearning: Learning of an optimal sequence of actions to maximise a reward through interaction with an environment
go to full definition
ai:SelfSupervisedLearning: Machine learning approach that uses unsupervised learning for tasks that typically require supervision, generating implicit labels from unstructured data, where models are trained on a task using the data itself to provide supervisory signals, often used in neural networks to exploit inherent structures or relationships within input data to generate training signals
go to full definition
ai:SemiSupervisedLearning: Machine learning that makes use of both labelled and unlabelled data during training
go to full definition
ai:SupervisedLearning: Machine learning that makes only use of labelled data during training
go to full definition
ai:SupportVectorMachine: A machine learning algorithm that finds decision boundaries with maximal margins
go to full definition
ai:UnsupervisedLearning: Machine learning that makes only use of unlabelled data during training
go to full definition
ai:BayesianNetwork: Probabilistic technique that uses Bayesian inference for probability computations using a directed acyclic graph
go to full definition
ai:BayesianOptimisation: Refers to Bayesian optimisation technique
go to full definition
ai:TrainingTechnique: Process to determine or to improve the parameters of a machine learning model based on a machine learning algorithm by using training data
go to full definition
Capabilities
[=Capability=] represents the use of a [=Technique=] to achieve a particular technical goal or objective, which is indicated using the [=hasCapability=] relation. In this sense, it forms the middle level of technical detail within the conceptual model used in this extension. By itself, a capability only describes a 'goal' or 'purpose' that is limited to the technical context i.e. what the technology is trying to do or perform. This is distinct from the goal or purpose of the process within which the AI technology is used, which forms the highest level of detail. This distinction allows clarity on what is the actual purpose of an activity in the sense of a high-level or broad goal (expressed using dpv:Purpose
) distinct and separate from the specific reason for using the technology towards achieving this goal.
As with techniques, specific software and services can provide capabilities for inclusion in an AI system or technology. For example, [=ImageRecognition=] can be integrated in an AI system through an API through an API service, or [=NamedEntityRecognition=] can be implemented by using software libraries. This implies that AI technologies need not be developed with specific capabilities, and that such capabilities can be added or combined later within the AI system.
ai:AudioCapability: Capabilities related to the processing and generation of audio
go to full definition
ai:FaceRecognition: Capability involving automatic pattern recognition for comparing stored images of human faces with the image of an actual face, indicating any matching, if it exists, and any data, if they exist, identifying the person to whom the face belongs
go to full definition
ai:GestureRecognition: Capability for recognising human gestures
go to full definition
ai:ImageClassification: Capability of categorising and labelling groups of pixels or vectors within an image based on particular rules, involving the assignment of images to predefined classes or categories
go to full definition
ai:ImageRecognition: Capability for image classification process that classifies object(s), pattern(s) or concept(s) in an image
go to full definition
ai:MotionAnalysis: Capability of deriving meaningful information about motion from visual data, including tracking objects across frames, analysing trajectories, estimating velocity and acceleration, and interpreting the meaning of motion patterns
go to full definition
ai:ObjectDetection: computer vision technology that detects instances of semantic objects in digital images and videos, covering domains like face detection and pedestrian detection, with applications spanning image retrieval and video surveillance
go to full definition
ai:PatternRecognition: Capability for automated identification of patterns and regularities in data, utilising algorithms to detect patterns or regularities for categorising data into distinct groups, encompassing diverse applications such as image analysis, speech processing, and biometric authentication
go to full definition
ai:PerceptionBasedAI: Capability of using agents to interpret and understand information from their environment through sensory inputs
go to full definition
ai:RemoteSensing: Capability for acquisition of information about an object or phenomenon without physical contact, typically using aerial sensor technologies
go to full definition
ai:VisualRecognition: Capability that identifies and categorises objects, scenes, activities, and other visual elements in images or video, and includes image classification, object detection, scene understanding, and visual pattern recognition
go to full definition
ai:ContentGeneration: Capability to generate new content that is distinct from merely deriving or transforming existing content
go to full definition
ai:HumanOrientedCapability: Capabilities that are inherently about humans or oriented towards human characteristics and activities
go to full definition
ai:BehaviourAnalysis: Capability of a system in analysing people's behaviour
go to full definition
ai:BiometricCapability: Capability involving processing of biometric data or related to biometrics
go to full definition
ai:BiometricCategorisation: Capability involving assigning natural persons to specific categories based on their biometric data
go to full definition
ai:BiometricIdentification: Capability involving automated recognition of physical, physiological and behavioural human features such as the face, eye movement, body shape, voice, prosody, gait, posture, heart rate, blood pressure, odour, keystrokes characteristics, for the purpose of establishing an individual’s identity by comparing biometric data of that individual to stored biometric data of individuals in a reference database, irrespective of whether the individual has given its consent or not
go to full definition
ai:LocalBiometricIdentification: Capability involving biometric identification carried out locally
go to full definition
ai:PostTimeBiometricIdentification: Capability involving biometric identification carried out later or not in real-time or non-instantaneously
go to full definition
ai:RealTimeBiometricIdentification: Capability involving biometric identification carried out in real-time or instantaneously
go to full definition
ai:RemoteBiometricIdentification: Capability involving biometric identification carried out remotely
go to full definition
ai:FaceRecognition: Capability involving automatic pattern recognition for comparing stored images of human faces with the image of an actual face, indicating any matching, if it exists, and any data, if they exist, identifying the person to whom the face belongs
go to full definition
ai:GestureRecognition: Capability for recognising human gestures
go to full definition
ai:ComputationalCreativity: computer systems that emulate human creative processes and produce artistic, design output that simulates innovation and originality
go to full definition
ai:EmotionRecognition: Capability for identifying and categorising emotions expressed in a piece of text, speech, video or image or combination thereof
go to full definition
ai:BiometricEmotionRecognition: Capability for recognising emotions based on biometrics information
go to full definition
ai:HumanIdentification: Capability of a system that identifies a human whether at an individual or group level
go to full definition
ai:BiometricIdentification: Capability involving automated recognition of physical, physiological and behavioural human features such as the face, eye movement, body shape, voice, prosody, gait, posture, heart rate, blood pressure, odour, keystrokes characteristics, for the purpose of establishing an individual’s identity by comparing biometric data of that individual to stored biometric data of individuals in a reference database, irrespective of whether the individual has given its consent or not
go to full definition
ai:LocalBiometricIdentification: Capability involving biometric identification carried out locally
go to full definition
ai:PostTimeBiometricIdentification: Capability involving biometric identification carried out later or not in real-time or non-instantaneously
go to full definition
ai:RealTimeBiometricIdentification: Capability involving biometric identification carried out in real-time or instantaneously
go to full definition
ai:RemoteBiometricIdentification: Capability involving biometric identification carried out remotely
go to full definition
ai:LieDetection: Capability to detect lies in the context of human speech, behaviour, information, or activities
go to full definition
ai:PersonalityTraitAnalysis: Capability for determining and analysing people's personality traits
go to full definition
ai:Profiling: Capability where AI is used to construct a profile of an individual (human) or a group of individuals
go to full definition
ai:SentimentAnalysis: Capability for computationally identifying and categorising opinions expressed in a piece of text, speech or image, to determine a range of feeling such as from positive to negative
go to full definition
ai:SpeakerRecognition: Capability of recognising speaker(s) in audio recordings
go to full definition
ai:SpeechRecognition: Capability of converting a speech signal to a representation of the content of the speech
go to full definition
ai:InformationRetrieval: Capability for retrieving relevant documents or parts of documents from a dataset, typically based on keyword or natural language queries
go to full definition
ai:AutomaticSummarisation: Capability for shortening a portion of content such as text while retaining important semantic information
go to full definition
ai:ContentBasedRetrieval: Capability for retrieval of information using the actual content to identify, select, filter, and provide results
go to full definition
ai:ContextAwareRetrieval: Capability for retrieval of information that takes into account the user's context such as e.g., location, time, device, or activity to provide more relevant results
go to full definition
ai:MultiModalRetrieval: Capability for retrieval of information using multiple modalities such as text, images, audio, and video and supporting cross-modal queries such as taking text as input to search images
go to full definition
ai:MusicInformationRetrieval: Capability for retrieving, analysing, and categorising music-related information such as audio files, melodies, or lyrics using audio features, metadata, and user queries
go to full definition
ai:AutomaticSummarisation: Capability for shortening a portion of content such as text while retaining important semantic information
go to full definition
ai:DialogueManagement: Capability for shortening a portion of content such as text while retaining important semantic information
go to full definition
ai:MachineTranslation: Capability for automated translation of text or speech from one natural language to another using a computer system
go to full definition
ai:NamedEntityRecognition: Capability for recognising and labelling the denotational names of entities and their categories for sequences of words in a stream of text or speech
go to full definition
ai:NaturalLanguageGeneration: Converting data carrying semantics into natural language
go to full definition
ai:NaturalLanguageProcessing: Capability enabling computers to understand and communicate with human language
go to full definition
ai:ChatbotCapability: Capability to simulate human-like conversation with a user through messaging platforms, websites, mobile apps, or telephone systems, often employing natural language processing and machine learning to engage in conversation mimicking human interaction
go to full definition
ai:TextClassification: Capability of assigning predefined labels to text data in order to automatically categorise it into groups
go to full definition
ai:TextDataMining: Capability of selecting and analysing large amounts of text or data resources to discover patterns, relationships, and semantic insights that provide valuable information for research and decision-making
go to full definition
ai:PartOfSpeechTagging: Capability for assigning a category (e.g. verb, noun, adjective) to a word based on its grammatical properties
go to full definition
ai:QuestionAnswering: Capability for determining the most appropriate answer to a question provided in natural language
go to full definition
ai:RelationshipExtraction: Capability for identifying relationships among entities mentioned in a text
go to full definition
ai:SentimentAnalysis: Capability for computationally identifying and categorising opinions expressed in a piece of text, speech or image, to determine a range of feeling such as from positive to negative
go to full definition
AI Systems and Models
[=AISystem=] is defined by ISO/IEC 22989:2023 as "An engineered system that generates outputs such as content, forecasts, recommendations or decisions for a given set of human-defined objectives", and by OECD as "A machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment." Or simply, it represents a 'system' which uses 'AI technologies'.
The property [=hasAISystem=] associates the use of an AI system in context. It is a specialised form of `dpv:isImplementedUsingTechnology` which indicates that a process is being implemented through the use of the stated technology. The components of an AI system can be described through the use of concepts provided in this ([[AI]]) extension as well as through the [[TECH]] extension.
[=Model=] is defined as "a physical, mathematical or otherwise logical representation of a system, entity, phenomenon, process or data involving the use of AI techniques". Or simply, it represents a 'model' of something using 'AI technologies'. The property [=hasModel=] associates a model with a context, such as to indicate a particular AI system utilises the specified model. To specifically represent General-Purpose AI (GPAI) models, the concept [=GPAIModel=] and relation [=hasGPAIModel=] are provided.
The below taxonomy provides additional concepts based on categorisation of [=AISystem=] and [=Model=] in different contexts.
ai:AGI: Type of AI system that addresses a broad range of tasks with a satisfactory level of performance
go to full definition
ai:CognitiveComputing: Category of AI systems that enables people and machines to interact more naturally
go to full definition
ai:CyberphysicalSystem: Engineered system built from, and depending upon, the seamless integration of computation and physical components
go to full definition
ai:EdgeAI: The deployment and execution of AI and ML models on Edge devices, including smartphones, IoT sensors, industrial controllers, and other resource-constrained devices located at the Edge of the network and closer to the data sources
go to full definition
ai:ExpertSystem: AI system that accumulates, combines and encapsulates knowledge provided by a human expert or experts in a specific domain to infer solutions to problems (ISO/IEC 22989:2022 definition); Artificial intelligence system emulating human expert decision-making abilities, addressing complex problems through reasoning across knowledge bases primarily represented as if-then rules, and comprising two sub-systems: an inference engine for applying rules to known facts and deducing new facts, and a knowledge base containing facts and rules; potentially featuring explanation and debugging capabilities (EU Vocabularies' AI Taxonomy definition)
go to full definition
ai:FrugalAISystem: AI systems intended to be more efficient, cost-effective, and accessible while maintaining or even improving their performance
go to full definition
ai:GPAI: Artificial intelligence system based on a general-purpose model and capable of serving a variety of purposes, either in direct use or integrated with other AI systems
go to full definition
ai:GPAIModel: A model that displays generality in terms of capabilities and potential applications
go to full definition
ai:IntelligentControlSystem: Category of AI systems which implement intelligent control principles for real-world applications by using AI capabilities and techniques
go to full definition
ai:LLM: Deep learning model that uses artificial neural networks trained on vast amounts of data to understand and generate natural language and other types of content to perform a wide range of tasks
go to full definition
ai:MachineLearningModel: Mathematical construct that generates an inference or prediction based on input data or information
go to full definition
ai:MachineLearningPlatform: Technology platform for developing, deploying, and managing machine learning models and resources
go to full definition
ai:NarrowAI: Type of AI system that is focused on defined tasks to address a specific problem i.e. it addresses a narrow scope of tasks and problems
go to full definition
ai:ReasoningSystem: Artificial intelligence systems that uses available information to generate predictions, make inferences and draw conclusions, involving the representation of data in machine-processable form and application of logic to arrive at decisions
go to full definition
ai:Robot: An automation system with actuators that performs intended tasks in the physical world, by means of sensing its environment and a software control system
go to full definition
ai:IndustrialRobot: A robot or robotic system for use in industrial automation applications
go to full definition
ai:ServiceRobot: A robot or robotic system in personal use or professional use that performs useful tasks for humans or equipment
go to full definition
ai:SocialRobot: A robot or robotic system with social interaction functions
go to full definition
ai:RoboticProcessAutomation: Technology focused on automation of repetitive, routine, rule-based human tasks
go to full definition
ai:Robotics: The science of designing, engineering and using robots, i.e. machines controlled by computers which perform jobs automatically
go to full definition
ai:FineTunedModel: Model resulting from fine-tuning of a pre-trained model
go to full definition
AI Agents
The concept [=AIAgent=] represent the generic notion of 'AI agents' which are software that represent an entity or act on their behalf. To reflect this, [=AIAgent=] is declared as both [=AI=] and tech:SoftwareAgent
. Currently, there is no property to directly state AI Agents being used, though the AI-related property [=hasAI=] can be used to associate this concept in a specific context. Additional generic properties such as dpv:hasEntity
, tech:hasSoftware
, and dpv:isImplementedUsingTechnology
may also be relevant to associate AI Agents in specific roles or functions.
Data
The concept [=Data=] is a broad generic term for describing data involved in the context of AI technology. At the moment, this includes three categories - [=TrainingData=], [=ValidationData=], and [=TestingData=]. The DPVCG welcomes proposals and participation to further enhance this taxonomy.
[=Data=] extends dpv:Data
, and can be associated with the property [=hasData=] which is a specialised form of dpv:hasData
to indicate the specified data is involved in context of an AI technology. To specifically indicate the contextual involvement of data within AI development, the properties [=hasTrainingData=], [=hasTestingData=], and [=hasValidationData=] are provided.
To indicate the involvement of personal data, the concept `dpv:PersonalData` should be used along with its relation `dpv:hasPersonalData`. The [[DPV]] taxonomy contains specific concepts to model sensitive data - including that related to confidential and IP, and the [[PD]] extension provides a taxonomy of personal data categories that can be used to indicate involvement in AI technologies.
ai:Data: Data involved in the development and use of an AI system or model
go to full definition
ai:TestingData: Data involved in the testing of an AI system or model
go to full definition
ai:TrainingData: Data involved in the training of an AI system or model
go to full definition
ai:ValidationData: Data involved in the validation of an AI system or model
go to full definition
Model Development Phases
[=ModelTraining=] refers to the process through which [=TrainingData=] is transformed to produce a [=TrainedModel=]. [=ModelFineTuning=] represents a further refinement phase where a prior [=TrainedModel=] is further refined using an additional smaller (and usually more targeted) [=TrainingData=] to produce a [=FineTunedModel=]. In these, the concepts for _training_ are defined by extending the generic concept of `Processing` in [[DPV]] to represent the processing of data.
As part of the development and training of models, data usually undergoes several key steps. These are represented by the concept [=DataOperation=], which involves broadly [=DataCollection=] and [=DataPreparation=], and optionally [=DataAggregation=]. These are also defined by extending relevant processing operations from [[DPV]], for example to represent where they collect, obtain, or transform data.
ai:DataOperation: Processing of data for the development or use of AI models
go to full definition
ai:DataCollection: Processing operation where data is collected, e.g. in a raw or unrefined form
go to full definition
ai:DataPreparation: Processing operation where data is prepared, e.g. organising and transforming it to make it ready for use
go to full definition
ai:DataAggregation: Processing operation where data is aggregated, e.g. by combining multiple records into one
go to full definition
ai:DataAnnotation: Processing operation where data is annoated, e.g. by adding additional metadata or context to make it useful
go to full definition
ai:DataLabelling: Processing operation where data annotation is carried out through labelling, e.g. by assigning tags or categories
go to full definition
ai:DataCleaning: Processing operation where data is cleaned, e.g. by detecting and removing errors or unwanted records
go to full definition
ai:DataEnrichment: Processing operation where data is enriced, e.g. by adding additional data that increases its value or usefulness
go to full definition
ai:DataUpdating: Processing operation where data is updated, e.g. by adding the latest records
go to full definition
ai:ModelTraining: Process to determine or to improve the parameters of a machine learning model based on a machine learning technique by using training data
go to full definition
ai:ModelFineTuning: Process where a pre-trained model is further refined through the use of a smaller training dataset
go to full definition
Risk Concepts
The concept [=RiskConcept=] in this extension extends dpv:RiskConcept
to represent risk sources, risks, consequences, and impacts specific to the development, use, or operation of AI. As with the [[RISK]] extension, the risk concepts presented here can taken on different roles in different use-cases, for example what is a risk source in one scenario could be the consequence in another. The relations risk:hasRiskSource, dpv:hasRisk, dpv:hasConsequence, and dpv:hasImpact are useful to indicate the specific interpretation and role of the AI risk concepts in a scenario.
The AI Risk Concepts are broadly categorised according to the following:
[=DataRisk=] - Risk associated with data used or produced or otherwise involved in the context of AI
[=SecurityAttack=] - Risks or issues associated with security attacks related to AI technologies, models, and systems
[=ModelRisk=] - Risks associated with AI Models
[=AISystemRisk=] - Risks associated with AI Systems
[=UserRisk=] - Risks associated with Users of AI Systems
[=AIBias=] - Bias associated with development, use, or other activities involving an AI technology or system
Data Risks
[=DataRisk=] represent risks associated with the data involved in AI technologies. To represent these risks in the context of the role the data is playing (training, testing, validation), the same set of data risks are expressed for each of the three data categories to accurately represent both the origin and occurrence of the risk.
ai:InputDataRisk: Risks and risk concepts related to input data
go to full definition
ai:InputDataBias: Concept representing input data containing or potentially containing bias
go to full definition
ai:InputDataInaccurate: Concept representing input data being inaccurate
go to full definition
ai:InputDataInappropriate: Concept representing input data being inappropriate
go to full definition
ai:InputDataIncomplete: Concept representing input data being incomplete
go to full definition
ai:InputDataInconsistent: Concept representing input data being inconsistent
go to full definition
ai:InputDataMisclassified: Concept representing input data being misclassified
go to full definition
ai:InputDataMisinterpretation: Concept representing input data being misinterpreted
go to full definition
ai:InputDataNoise: Concept representing input data being noisy
go to full definition
ai:InputDataOutdated: Concept representing input data being outdated
go to full definition
ai:InputDataSelectionError: Concept representing an error in input data selection
go to full definition
ai:InputDataSparse: Concept representing input data being sparse
go to full definition
ai:InputDataUnrepresentative: Concept representing input data being unrepresentative
go to full definition
ai:InputDataUnstructured: Concept representing input data being unstructured
go to full definition
ai:InputDataUnverified: Concept representing input data being unverified
go to full definition
ai:TestingDataRisk: Risks and risk concepts related to testing data
go to full definition
ai:TestingDataBias: Concept representing testing data containing or potentially containing bias
go to full definition
ai:TestingDataInaccurate: Concept representing testing data being inaccurate
go to full definition
ai:TestingDataInappropriate: Concept representing testing data being inappropriate
go to full definition
ai:TestingDataIncomplete: Concept representing testing data being incomplete
go to full definition
ai:TestingDataInconsistent: Concept representing testing data being inconsistent
go to full definition
ai:TestingDataMisclassified: Concept representing testing data being misclassified
go to full definition
ai:TestingDataMisinterpretation: Concept representing testing data being misinterpreted
go to full definition
ai:TestingDataNoise: Concept representing testing data being noisy
go to full definition
ai:TestingDataOutdated: Concept representing testing data being outdated
go to full definition
ai:TestingDataSelectionError: Concept representing an error in testing data selection
go to full definition
ai:TestingDataSparse: Concept representing testing data being sparse
go to full definition
ai:TestingDataUnrepresentative: Concept representing testing data being unrepresentative
go to full definition
ai:TestingDataUnstructured: Concept representing testing data being unstructured
go to full definition
ai:TestingDataUnverified: Concept representing testing data being unverified
go to full definition
ai:ValidationDataRisk: Risks and risk concepts related to validation data
go to full definition
ai:ValidationDataBias: Concept representing validation data containing or potentially containing bias
go to full definition
ai:ValidationDataInaccurate: Concept representing validation data being inaccurate
go to full definition
ai:ValidationDataInappropriate: Concept representing validation data being inappropriate
go to full definition
ai:ValidationDataIncomplete: Concept representing validation data being incomplete
go to full definition
ai:ValidationDataInconsistent: Concept representing validation data being inconsistent
go to full definition
ai:ValidationDataMisclassified: Concept representing validation data being misclassified
go to full definition
ai:ValidationDataMisinterpretation: Concept representing validation data being misinterpreted
go to full definition
ai:ValidationDataNoise: Concept representing validation data being noisy
go to full definition
ai:ValidationDataOutdated: Concept representing validation data being outdated
go to full definition
ai:ValidationDataSelectionError: Concept representing an error in validation data selection
go to full definition
ai:ValidationDataSparse: Concept representing validation data being sparse
go to full definition
ai:ValidationDataUnrepresentative: Concept representing validation data being unrepresentative
go to full definition
ai:ValidationDataUnstructured: Concept representing validation data being unstructured
go to full definition
ai:ValidationDataUnverified: Concept representing validation data being unverified
go to full definition
Bias
The bias concepts represented here are specific to AI, and there are generic bias concepts as well as discrimination impact concepts in [[RISK]] extension. While we are interested in further expanding these concepts, the following external sources should be of interest:
DocBiasO - an ontology-driven approach to support the documentation of bias in data, which has a larger expansive categorisation of bias and provides additional concepts and properties to model specifics such as ethnicities and measurements which are useful in bias measurement and documentation.
ai:AutomationBias: Bias that occurs due to propensity for humans to favour suggestions from automated decision-making systems and to ignore contradictory information made without automation, even if it is correct
go to full definition
ai:DataBias: Bias that occurs due to unaddressed data properties that lead to AI systems that perform better or worse for different groups
go to full definition
ai:DataAggregationBias: Bias that occurs from aggregating data covering different groups of objects that might have different statistical distributions which introduce bias into the data used to train AI systems
go to full definition
ai:DataLabellingProcessBias: Bias that occurs due to the labelling process itself introducing societal or cognitive biases
go to full definition
ai:DistributedTrainingBias: Bias that occurs due to distributed machine having different sources of data that do not have the same distribution of feature space
go to full definition
ai:MissingFeaturesBias: Bias that occurs when features are missing from individual training samples
go to full definition
ai:NonRepresentativeSamplingBias: Bias that occurs if a dataset is not representative of the intended deployment environment, where the model learns biases based on the ways in which the data is non-representative
go to full definition
ai:EngineeringDecisionBias: Bias that occurs due to machine learning model architectures - encompassing all model specifications, parameters and manually designed features
go to full definition
ai:AlgorithmSelectionBias: Bias that occurs from the selection of machine learning algorithms built into the AI system which introduce unwanted bias in predictions made by the system because the type of algorithm used introduces a variation in the performance of the ML model
go to full definition
ai:FeatureEngineeringBias: Bias that occurs from steps such as encoding, data type conversion, dimensionality reduction and feature selection which are subject to choices made by the AI developer and introduce bias in the ML model
go to full definition
ai:HyperparameterTuningBias: Bias that occurs from hyperparameters defining how the model is structured and which cannot be directly trained from the data like model parameters, where hyperparameters affect the model functioning and accuracy of the model
go to full definition
ai:InformativenessBias: Bias that occurs or some groups, the mapping between inputs present in the data and outputs are more difficult to learn and where a model that only has one feature set available, can be biased against the group whose relationships are difficult to learn from available data
go to full definition
ai:ModelBias: Bias that occurs when ML uses functions like a maximum likelihood estimator to determine parameters, and there is data skew or under-representation present in the data, where the maximum likelihood estimation tends to amplify any underlying bias in the distribution
go to full definition
ai:ModelInteractionBias: Bias that occurs from the structure of a model to create biased predictions
go to full definition
ai:ModelExpressivenessBias: Bias that occurs from the number and nature of parameters in a model as well as the neural network topology which affect the expressiveness of the model and any feature that affects model expressiveness differently across groups
go to full definition
Security Attacks
ai:AdversarialAttack: Inputs designed to cause the model to make a mistake
go to full definition
ai:DataPoisoning: Attack trying to manipulate the training dataset
go to full definition
ai:ModelEvasion: An input, which seems normal for a human but is wrongly classified by ML models
go to full definition
ai:ModelInversion: A type of attack to AI models, in which the access to a model is abused to infer information about the training data
go to full definition
Overview of Risk Concepts
The below table provides suggestions for the role each concept can be used for in the context of risk assessment, and how they can be categorised within the conventional 'CIA security model'. For example, [=AdversarialAttack=] can be used as a risk source (i.e. it can cause further issues to arise), a risk (i.e. it is a risk of concern), or as a consequence (i.e. it can occur due to another risk), and it is classified as affecting 'integrity' in the CIA model.
This table is based on a similar table within the [[RISK]] extension which provides a detailed taxonomy of concepts and the potential roles they can take across use-cases.
The concept dpv:TechnicalOrganisationalMeasure
provides a rich taxonomy of concepts that include dpv:TechnicalMeasure
that operate at a technical level such as for security, dpv:OrganisationalMeasure
for organisational processes such as impact assessments and risk management, and dpv:LegalMeasure
for legally enforceable measures such as contractual agreements. These can also be reused to describe measures in context of AI, including development of AI models and deployment of AI systems. The DPVCG welcomes proposals and participation to further expand the taxonomy of measures.
ai:Benchmarking: Measure where performance and outputs are compared to best practices, gold standards, or other forms of desired quality
go to full definition
ai:BiasAssessment: Examination of a data set, model, or AI system for bias
go to full definition
[=LifecycleStage=] models the lifecycle of AI technologies from its inception to deployment, use, and retirement. While we use the term 'lifecycle' here, these stages are also useful in other similar contexts such as 'AI Value Chain' and 'AI Supply Chain'. The AI-specific lifecycle is extended from the concept `tech:LifecycleStage` defined in the [[TECH]] extension to model lifecycle and stages of technologies in general. It can therefore be used with the existing relation `tech:hasLifecycleStage` to denote its applicability or involvement.
ai:ContinuousValidationStage: The stage in the lifecycle where there is continuous learning within the AI system by incremental training on an ongoing basis while the system is running in production
go to full definition
ai:DeploymentStage: The stage in the lifecycle where the AI system is installed, released or configured for deployment and operation in a target environment
go to full definition
ai:DesignStage: The stage in the lifecycle where designs are created for the AI system
go to full definition
ai:DevelopmentStage: The stage in the lifecycle where the development and creation of the system occurs, signalling upon completion that it is ready for verification and validation
go to full definition
ai:InceptionStage: The stage in the lifecycle where inception regarding AI occurs and one or more stakeholders decide to turn an idea into a tangible system
go to full definition
ai:OperationStage: The stage in the lifecycle where an AI system is running and generally available for operations
go to full definition
ai:IncidentMonitoringStage: The stage in the lifecycle where an AI system is actively being monitored for incidents
go to full definition
ai:RepairStage: The stage in the lifecycle where an AI system is being repaired due to suspected or occurred incidents
go to full definition
ai:UpdateStage: The stage in the lifecycle where an AI system is being or has been updated
go to full definition
ai:ReevaluationStage: The stage in the lifecycle where the AI system is reevaluated after the operation and monitoring stage based on the operations of the AI system
go to full definition
ai:RetirementStage: The stage in the lifecycle where the AI system is retired and becomes obsolete
go to full definition
ai:DecommissionStage: The stage in the lifecycle where the AI system is being decommissioned as part of retirement
go to full definition
ai:DiscardStage: The stage in the lifecycle where the AI system is being discarded as part of retirement
go to full definition
ai:ReplaceStage: The stage in the lifecycle where the AI system is being replaced as part of retirement
go to full definition
ai:ValidationStage: The stage in the lifecycle where the AI system is validated for requirements and objectives for an intended use or application
go to full definition
ai:VerificationStage: The stage in the lifecycle where the AI system is being verified to satisfy requirements and meet objectives
go to full definition
A technical and scientific field devoted to the engineered system that generates outputs such as content, forecasts, recommendations or decisions for a given set of human-defined objectives
An AI Agent, also known as an 'intelligent agent', is a software agent that utilises AI technologies
Usage Note
Other definitions of AI Agents may include the perception of environment as some capacity to incorporate the environment through sensors (e.g. computer vision) or data (e.g. web environments), and to produce changes in outputs through some degree of autonomy (i.e. without human intervention). The definition of 'AI Agent' provided in this extension relaxes such requirements and instead focuses on classifying these agents based solely on their use of AI technologies to allow compatibility with the evolving definitions
An engineered system that generates outputs such as content, forecasts, recommendations or decisions for a given set of human-defined objectives (ISO/IEC 22989:2022 definition); or A machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment (OECD 2024 definition); or A machine-based system demonstrating varying degrees of autonomy and adaptiveness after deployment, generating outputs such as predictions, content, recommendations, or decisions to influence physical or virtual environments (EU Vocabularies' AI Taxonomy definition)
Bias that occurs from the selection of machine learning algorithms built into the AI system which introduce unwanted bias in predictions made by the system because the type of algorithm used introduces a variation in the performance of the ML model
Bias that occurs due to propensity for humans to favour suggestions from automated decision-making systems and to ignore contradictory information made without automation, even if it is correct
Capability involving automated recognition of physical, physiological and behavioural human features such as the face, eye movement, body shape, voice, prosody, gait, posture, heart rate, blood pressure, odour, keystrokes characteristics, for the purpose of establishing an individual’s identity by comparing biometric data of that individual to stored biometric data of individuals in a reference database, irrespective of whether the individual has given its consent or not
Capability or use of AI to achieve a technical goal or objective
Usage Note
This concept refers to the application of an AI technique to achieve a technical goal or function, and is necessary to distinguish the 'algorithm' (ai:Technique) from the 'application' (ai:Capability) and 'goal' (dpv:Purpose)
Capability to simulate human-like conversation with a user through messaging platforms, websites, mobile apps, or telephone systems, often employing natural language processing and machine learning to engage in conversation mimicking human interaction
Capability for retrieval of information that takes into account the user's context such as e.g., location, time, device, or activity to provide more relevant results
The stage in the lifecycle where there is continuous learning within the AI system by incremental training on an ongoing basis while the system is running in production
Bias that occurs from aggregating data covering different groups of objects that might have different statistical distributions which introduce bias into the data used to train AI systems
The stage in the lifecycle where the development and creation of the system occurs, signalling upon completion that it is ready for verification and validation
The deployment and execution of AI and ML models on Edge devices, including smartphones, IoT sensors, industrial controllers, and other resource-constrained devices located at the Edge of the network and closer to the data sources
AI system that accumulates, combines and encapsulates knowledge provided by a human expert or experts in a specific domain to infer solutions to problems (ISO/IEC 22989:2022 definition); Artificial intelligence system emulating human expert decision-making abilities, addressing complex problems through reasoning across knowledge bases primarily represented as if-then rules, and comprising two sub-systems: an inference engine for applying rules to known facts and deducing new facts, and a knowledge base containing facts and rules; potentially featuring explanation and debugging capabilities (EU Vocabularies' AI Taxonomy definition)
Capability involving automatic pattern recognition for comparing stored images of human faces with the image of an actual face, indicating any matching, if it exists, and any data, if they exist, identifying the person to whom the face belongs
Usage Note
EU Vocabularies' AI taxonomy defines 'Facial Recognition as "computer vision technology employing biometric analysis and mapping of facial characteristics for facial identification purposes, recognising and verifying an individual's identity", which not only refers to the capability for face recognition but also indicates the *purposes* for which this is being used. In DPV, such purposes are defined separately within the main DPV
Bias that occurs from steps such as encoding, data type conversion, dimensionality reduction and feature selection which are subject to choices made by the AI developer and introduce bias in the ML model
Machine learning techniques that aim to make models more efficient, cost-effective, and accessible while maintaining or even improving their performance
Use of artificial intelligence models that can learn from and mimic large amounts of data to create content such as text, images, music, videos, code, and more, based on inputs or prompts
Artificial intelligence system based on a general-purpose model and capable of serving a variety of purposes, either in direct use or integrated with other AI systems
Bias that occurs from hyperparameters defining how the model is structured and which cannot be directly trained from the data like model parameters, where hyperparameters affect the model functioning and accuracy of the model
Capability of categorising and labelling groups of pixels or vectors within an image based on particular rules, involving the assignment of images to predefined classes or categories
Bias that occurs or some groups, the mapping between inputs present in the data and outputs are more difficult to learn and where a model that only has one feature set available, can be biased against the group whose relationships are difficult to learn from available data
Usage Note
This can happen when some features are highly informative about one group, while a different set of features is highly informative about another group. If this is the case, then a model that only has one feature set available, can be biased against the group whose relationships are difficult to learn from available data
Deep learning model that uses artificial neural networks trained on vast amounts of data to understand and generate natural language and other types of content to perform a wide range of tasks
Bias that occurs when ML uses functions like a maximum likelihood estimator to determine parameters, and there is data skew or under-representation present in the data, where the maximum likelihood estimation tends to amplify any underlying bias in the distribution
Bias that occurs from the number and nature of parameters in a model as well as the neural network topology which affect the expressiveness of the model and any feature that affects model expressiveness differently across groups
A type of attack to AI models, in which the access to a model is abused to infer information about the training data
Usage Note
(HLEG Assessment List for Trustworthy Artificial Intelligence (ALTAI),https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment)
Capability of deriving meaningful information about motion from visual data, including tracking objects across frames, analysing trajectories, estimating velocity and acceleration, and interpreting the meaning of motion patterns
Capability for retrieval of information using multiple modalities such as text, images, audio, and video and supporting cross-modal queries such as taking text as input to search images
Capability for retrieving, analysing, and categorising music-related information such as audio files, melodies, or lyrics using audio features, metadata, and user queries
Bias that occurs if a dataset is not representative of the intended deployment environment, where the model learns biases based on the ways in which the data is non-representative
computer vision technology that detects instances of semantic objects in digital images and videos, covering domains like face detection and pedestrian detection, with applications spanning image retrieval and video surveillance
Capability for automated identification of patterns and regularities in data, utilising algorithms to detect patterns or regularities for categorising data into distinct groups, encompassing diverse applications such as image analysis, speech processing, and biometric authentication
Artificial intelligence systems that uses available information to generate predictions, make inferences and draw conclusions, involving the representation of data in machine-processable form and application of logic to arrive at decisions
An automation system with actuators that performs intended tasks in the physical world, by means of sensing its environment and a software control system
Artificial intelligence approach governed by human-defined rules that explicitly dictate behaviour, relying on logical statements (rules) to determine actions in specific situations
Machine learning approach that uses unsupervised learning for tasks that typically require supervision, generating implicit labels from unstructured data, where models are trained on a task using the data itself to provide supervisory signals, often used in neural networks to exploit inherent structures or relationships within input data to generate training signals
Capability for computationally identifying and categorising opinions expressed in a piece of text, speech or image, to determine a range of feeling such as from positive to negative
The underlying technological algorithm, method, or process that forms the technique for using or applying AI
Usage Note
This concept refers to the foundational computational implementation and is necessary to distinguish the 'algorithm' (ai:Technique) from the 'application' (ai:Capability) and 'goal' (dpv:Purpose)
Capability of selecting and analysing large amounts of text or data resources to discover patterns, relationships, and semantic insights that provide valuable information for research and decision-making
Capability that identifies and categorises objects, scenes, activities, and other visual elements in images or video, and includes image classification, object detection, scene understanding, and visual pattern recognition
DPV uses the following terms from [[RDF]] and [[RDFS]] with their defined meanings:
rdf:type to denote a concept is an instance of another concept
rdfs:Class to denote a concept is a Class or a category
rdfs:subClassOf to specify the concept is a subclass (subtype, sub-category, subset) of another concept
rdf:Property to denote a concept is a property or a relation
The following external concepts are re-used within DPV:
External
Future Work
Funding Acknowledgements
Funding Sponsors
The DPVCG was established as part of the SPECIAL H2020 Project, which received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 731601 from 2017 to 2019. Continued developments have been funded under: RECITALS Project funded under the EU's Horizon program with grant agreement No. 101168490.
Harshvardhan J. Pandit was funded to work on DPV from 2020 to 2022 by the Irish Research Council's Government of Ireland Postdoctoral Fellowship Grant#GOIPD/2020/790.
The ADAPT SFI Centre for Digital Media Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under the European Regional Development Fund (ERDF) through Grant#13/RC/2106 (2018 to 2020) and Grant#13/RC/2106_P2 (2021 onwards).
Funding Acknowledgements for Contributors
The contributions of Delaram Golpayegani have received funding through the PROTECT ITN Project from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 813497, in particular through the development of AI Risk Ontology (AIRO) and Vocabulary of AI Risks (VAIR) which have been integrated in to this extension.
The contributions of Harshvardhan J. Pandit and Delaram Golpayegani have been made with the financial support of Science Foundation Ireland under Grant Agreement No. 13/RC/2106_P2 at the ADAPT SFI Research Centre. The contributions of Harshvardhan J. Pandit have been made with the AI Accountability Lab (AIAL) which is supported by grants from following groups: the AI Collaborative, an Initiative of the Omidyar Group; Luminate; the Bestseller Foundation; and the John D. and Catherine T. MacArthur Foundation.