The AI extension extends the [[[DPV]]] and its [[[TECH]]] extension to represent AI techniques, applications, risks, and mitigations. The namespace for terms in ai is https://www.w3id.org/dpv/ai#. The suggested prefix for the namespace is ai. The AI vocabulary and its documentation are available on GitHub.
Contributing: The DPVCG welcomes participation to improve the DPV and associated resources, including expansion or refinement of concepts, requesting information and applications, and addressing open issues. See contributing guide for further information.
DPV and Related Resources
[[[DPV]]]: is the base/core specification for the 'Data Privacy Vocabulary', which is extended for Personal Data [[PD]], Locations [[LOC]], Risk Management [[RISK]], Technology [[TECH]], and [[AI]]. Specific [[LEGAL]] extensions are also provided which model jurisdiction specific regulations and concepts . To support understanding and applications of [[DPV]], various guides and resources [[GUIDES]] are provided, including a [[PRIMER]]. A Search Index of all concepts from DPV and extensions is available.
[[DPV]] and related resources are published on GitHub. For a general overview of the Data Protection Vocabularies and Controls Community Group [[DPVCG]], its history, deliverables, and activities - refer to DPVCG Website. For meetings, see the DPVCG calendar.
The peer-reviewed article “Creating A Vocabulary for Data Privacy” presents a historical overview of the DPVCG, and describes the methodology and structure of the DPV along with describing its creation. An open-access version can be accessed here, here, and here. The article Data Privacy Vocabulary (DPV) - Version 2, accepted for presentation at the 23rd International Semantic Web Conference (ISWC 2024), describes the changes made in DPV v2.
Core Concepts
The [[[AI]]] extension is currently under development. It further extends the [[TECH]] extension to represent concepts associated with AI, and will provide:
Techniques such as machine learning and natural language programming
Capabilities such as image recognition and text generation
Lifecycle such as data collection, training, fine-tuning, etc.
Risks such as data poisoning, statistical noise and bias, etc.
Note: These are intended to represent bias concepts specific to AI development and use, and do not contain general bias concepts which exist in other contexts e.g. applicable for any technology. The Risk extension contains the general set of concepts that this vocabulary extends to represent biases that are specific to the development and use of AI.
ai:DataBias: Bias that occurs due to unaddressed data properties that lead to AI systems that perform better or worse for different groups
go to full definition
ai:DataAggregationBias: Bias that occurs from aggregating data covering different groups of objects that might have different statistical distributions which introduce bias into the data used to train AI systems
go to full definition
ai:DataLabelsAndLabellingProcessBias: Bias that occurs due to the labelling process itself introducing societal or cognitive biases
go to full definition
ai:DistributedTrainingBias: Bias that occurs due to distributed machine having different sources of data that do not have the same distribution of feature space
go to full definition
ai:MissingFeaturesAndLabelsBias: Bias that occurs when features are missing from individual training samples
go to full definition
ai:NonRepresentativeSamplingBias: Bias that occurs if a dataset is not representative of the intended deployment environment, where the model learns biases based on the ways in which the data is non-representative
go to full definition
ai:EngineeringDecisionBias: Bias that occurs due to machine learning model architectures - encompassing all model specifications, parameters and manually designed features
go to full definition
ai:AlgorithmSelectionBias: Bias that occurs from the selection of machine learning algorithms built into the AI system which introduce unwanted bias in predictions made by the system because the type of algorithm used introduces a variation in the performance of the ML model
go to full definition
ai:FeatureEngineeringBias: Bias that occurs from steps such as encoding, data type conversion, dimensionality reduction and feature selection which are subject to choices made by the AI developer and introduce bias in the ML model
go to full definition
ai:HyperparameterTuningBias: Bias that occurs from hyperparameters defining how the model is structured and which cannot be directly trained from the data like model parameters, where hyperparameters affect the model functioning and accuracy of the model
go to full definition
ai:InformativenessBias: Bias that occurs or some groups, the mapping between inputs present in the data and outputs are more difficult to learn and where a model that only has one feature set available, can be biased against the group whose relationships are difficult to learn from available data
go to full definition
ai:ModelBias: Bias that occurs when ML uses functions like a maximum likelihood estimator to determine parameters, and there is data skew or under-representation present in the data, where the maximum likelihood estimation tends to amplify any underlying bias in the distribution
go to full definition
ai:ModelInteractionBias: Bias that occurs from the structure of a model to create biased predictions
go to full definition
ai:ModelExpressivenessBias: Bias that occurs from the number and nature of parameters in a model as well as the neural network topology which affect the expressiveness of the model and any feature that affects model expressiveness differently across groups
go to full definition
A technical and scientific field devoted to the engineered system that generates outputs such as content, forecasts, recommendations or decisions for a given set of human-defined objectives
OECD 2024 definition: An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment. ISO/IEC 22989:2023 definition: engineered system that generates outputs such as content, forecasts, recommendations or decisions for a given set of human-defined objectives
Bias that occurs from the selection of machine learning algorithms built into the AI system which introduce unwanted bias in predictions made by the system because the type of algorithm used introduces a variation in the performance of the ML model
Bias tha occurs due to propensity for humans to favour suggestions from automated decision-making systems and to ignore contradictory information made without automation, even if it is correct
Bias that occurs from aggregating data covering different groups of objects that might have different statistical distributions which introduce bias into the data used to train AI systems
Bias that occurs from steps such as encoding, data type conversion, dimensionality reduction and feature selection which are subject to choices made by the AI developer and introduce bias in the ML model
Bias that occurs from hyperparameters defining how the model is structured and which cannot be directly trained from the data like model parameters, where hyperparameters affect the model functioning and accuracy of the model
Bias that occurs or some groups, the mapping between inputs present in the data and outputs are more difficult to learn and where a model that only has one feature set available, can be biased against the group whose relationships are difficult to learn from available data
Usage Note
This can happen when some features are highly informative about one group, while a different set of features is highly informative about another group. If this is the case, then a model that only has one feature set available, can be biased against the group whose relationships are difficult to learn from available data
Bias that occurs when ML uses functions like a maximum likelihood estimator to determine parameters, and there is data skew or under-representation present in the data, where the maximum likelihood estimation tends to amplify any underlying bias in the distribution
Bias that occurs from the number and nature of parameters in a model as well as the neural network topology which affect the expressiveness of the model and any feature that affects model expressiveness differently across groups
Bias that occurs if a dataset is not representative of the intended deployment environment, where the model learns biases based on the ways in which the data is non-representative
DPV uses the following terms from [[RDF]] and [[RDFS]] with their defined meanings:
rdf:type to denote a concept is an instance of another concept
rdfs:Class to denote a concept is a Class or a category
rdfs:subClassOf to specify the concept is a subclass (subtype, sub-category, subset) of another concept
rdf:Property to denote a concept is a property or a relation
The following external concepts are re-used within DPV:
External
Contributors
The following people have contributed to this vocabulary. The names are ordered alphabetically. The affiliations are informative do not represent formal endorsements. Affiliations may be outdated. The list is generated automatically from the contributors listed for defined concepts.
Daniel Doherty ()
Funding Acknowledgements
Funding Sponsors
The DPVCG was established as part of the SPECIAL H2020 Project, which received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 731601 from 2017 to 2019.
Harshvardhan J. Pandit was funded to work on DPV from 2020 to 2022 by the Irish Research Council's Government of Ireland Postdoctoral Fellowship Grant#GOIPD/2020/790.
The ADAPT SFI Centre for Digital Media Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under the European Regional Development Fund (ERDF) through Grant#13/RC/2106 (2018 to 2020) and Grant#13/RC/2106_P2 (2021 onwards).
Funding Acknowledgements for Contributors
The contributions of Delaram Golpayegani have received funding through the PROTECT ITN Project from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 813497.
The contributions of Harshvardhan J. Pandit and Delaram Golpayegani have been made with the financial support of Science Foundation Ireland under Grant Agreement No. 13/RC/2106_P2 at the ADAPT SFI Research Centre.