The Data Privacy Vocabulary [[DPV]] enables expressing machine-readable metadata about the processing of personal data and use of technologies such as AI. It provides representation of information to support regulatory compliance, such as that for [[[GDPR]]]. This document is the ‘Primer’ for DPV - and introduces fundamental concepts with examples of use-cases and applications as a starting point for adopters wanting to understand and use the DPV. The Primer contains:

[[PRIMER-concise]] is a shorter version (2 pages) of the primer intended for a quick introduction.

Contributing: The DPVCG welcomes participation to improve the DPV and associated resources, including expansion or refinement of concepts, requesting information and applications, and addressing open issues. See contributing guide for further information.

DPV and Related Resources

[[[DPV]]]: is the base/core specification for the 'Data Privacy Vocabulary', which is extended for Personal Data [[PD]], Locations [[LOC]], Risk Management [[RISK]], Technology [[TECH]], and [[AI]]. Specific [[LEGAL]] extensions are also provided which model jurisdiction specific regulations and concepts . To support understanding and applications of [[DPV]], various guides and resources [[GUIDES]] are provided, including a [[PRIMER]]. A Search Index of all concepts from DPV and extensions is available.

[[DPV]] and related resources are published on GitHub. For a general overview of the Data Protection Vocabularies and Controls Community Group [[DPVCG]], its history, deliverables, and activities - refer to DPVCG Website. For meetings, see the DPVCG calendar.

The peer-reviewed article “Creating A Vocabulary for Data Privacy” presents a historical overview of the DPVCG, and describes the methodology and structure of the DPV along with describing its creation. An open-access version can be accessed here, here, and here. The article Data Privacy Vocabulary (DPV) - Version 2, accepted for presentation at the 23rd International Semantic Web Conference (ISWC 2024), describes the changes made in DPV v2.

Introduction

The [[[DPVCG]]] was formed in 2018 through the [[[SPECIAL]]] with the ambition of providing a machine-readable and interoperable vocabulary for representing information about the use and processing of personal data, whilst inviting perspectives and contributions from a diverse set of stakeholders across computer science, IT, law, sociology, philosophy – representing academia, industry, policy-makers, and activists. It identified the following issues through the W3C Workshop on Privacy and Linked Data:

  1. lack of standardised vocabularies to represent concepts related to personal data, and who/how/where it is processed;
  2. lack of descriptive taxonomies that describe how purposes of processing personal data which are not restricted to a particular domain or use-case; and
  3. lack of machine-readable representations of concepts that can be used for technical interoperability of information.

The outcome of addressing these resulted in the creation of the [[[DPV]]], which provides a vocabulary and ontology for expressing information related to processing of personal data, entities involved and their roles, details of technologies utilised, relation to laws and legal justifications permitting its use, and other relevant concepts based on privacy and data protection. While it uses the EU’s [[[GDPR]]] as a guiding source for the creation and interpretation of concepts, the ambition and scope of DPV is to provide a broad globally useful vocabulary that can be extended to jurisdiction or domain specific applications.

People, organisations, laws, and use-cases have different perspectives and interpretations of concepts and requirements which cannot be modelled into a single coherent universal vocabulary. The aim of DPV is to provide a foundational framework of ‘common concepts’ that can be extended to represent specific laws, domains, or applications. This lets any two entities agree that a term, for example, PersonalData, refers to the same semantic concept, even though they might apply it differently within their own use-cases.

While most of DPV is focused on processing of personal data, it also supports representing uses of non-personal data and representing technologies such as cloud services and AI. Through these concepts, DPV enables supporting regulations and use-cases which affect use of both personal and non-personal data - such as the [[[DGA]]], and regulations which regulate technologies such as [[[NIS2]]] and [[[AIAct]]].

DPV Specification

Structure of DPV

Structure of DPV vocabularies where DPV defines the core concepts which are then extended in specific extensions

DPV provides hierarchical taxonomies of concepts where each core concept represents the top-most abstract concept in a tree and each of its children provide a lesser abstract or more concrete concept. For example, consider the concept of PersonalData which is the abstract representation of personal data. It can be further refined or extended as SensitivePersonalData, and further as SpecialCategoryPersonalData and then as GeneticData and so on.

From this perspective, the top-most abstract concepts are collectively referred to as the core vocabulary within DPV. The goal of the DPV is to provide a rich collection of concepts for each of the top concepts so as to enable their application within real-world use-cases. The identification of what constitutes a core concept is based on the need to represent information about it in a modular and independent form, such as that required for legal compliance.

Each core concept is intended to be independent from other core concepts. For example, the Purpose (e.g. Optimisation) refers only to the purpose of why personal data is processed and is independent as a concept from the PersonalData (e.g. Location) or the Processing activities (e.g. Collect, Store) involved to carry out that purpose. Such separation is necessary in order to represent and answer questions such as:

The separation of concepts creates a modular structure for concept hierarchies within DPV, which in turn allows an adopter to use one particular concept taxonomy or module (e.g. list of purposes) independently without reusing the others, or to select only those concepts which are needed for their particular use-case. The separation also permits greater flexibility of representation and usage - such as using different combinations of core concepts as needed in use-cases. For example, a use-case can specify a single concept representing both Purpose and Processing by combining their respective concepts from DPV. The modular design of DPV also makes it possible to define domain and jurisdiction specific concepts in a separate namespace - such as the [[[DPV-NACE]]] purpose taxonomy providing a way for Purpose to indicate sectors using NACE taxonomy, and the [[[EU-GDPR]]] for using LegalBasis to represent the legal bases provided by [[GDPR]].

Overview of Core Concepts

Overview of concepts in DPV 2.0 - red indicates new concepts and blue indicates expansion of scope to include (non-personal) data and technologies

Purpose

Specifying Purpose using DPV

see more information: DPV spec

Representing the purpose for which personal data is processed, for e.g. ‘Personalisation’ as a broad category of purpose. Information about the purpose can be further specified by denoting information about its interpretation within a particular Sector, such as from standardised authoritative lists e.g. [[NACE]], to indicate domain-specific applications and interpretations, or to indicate applicability of sectorial laws.

Data and Personal Data

Specifying use of Data and Personal Data using DPV

see more information: DPV spec

‘Personal data’ refers to data about a natural person. ‘Personal data’ is also commonly referred to as ‘personally identifiable information (PII)’. However the terms should not be interchangeably used as based on definitions (e.g. those in GDPR), ‘personal data’ can be interpreted as a broader term than PII, and where PII may refer only to information that can directly identify a person. DPV’s definition of personal data is based on the broadest possible definition (i.e. from GDPR) as it covers a wider range of information considered ‘personal data’. Personal data can be declared as a category, such as ‘Email’, or an instance, such as ‘x@y.z’.

DPV defines the concept Data which has subtypes NonPersonalData and PersonalData, which are associated using the relation hasData. To specifically indicate involvement of personal data, DPV provides the relation hasPersonalData.

Processing

Specifying Processing operations on data using DPV

see more information: DPV spec

Representing processing as in the actions or operations over personal data, for e.g. collect, use, share, store. To indicate the origin or source of data, the concept DataSource along with relation hasDataSource is provided. For additional contextual information regarding operations or processing, such as whether it include humans or automation, the concept ProcessingContext is provided which can be associated using the relation hasContext (description of Context is provided later in the document). Examples of ProcessingContext include conditions such as profiling, automated decision making, human involvement.

Processing and Storage Conditions

Specifying temporal and geo-spatial context associated with processing and storage using DPV

see more information: DPV spec

Indicating information about conditions or limitations associated with processing (including storage) of personal data - such as its location, duration, deletion (e.g. erasure mechanisms), or restoration (e.g. backup availability).

Legal Basis

Specifying legal basis using DPV

see more information: DPV spec

A legal basis is a law or a clause in a law that justifies or permits the processing of personal data or use of technologies in the specified manner. It is a jurisdictional concept given the scoping of laws to specified countries or regions, as well as a domain-specific concept given the specific laws enacted scoped to particular domains. A law, such as the GDPR, that regulates the use of personal data requires that every processing of personal data must be justified with some legal basis to ensure it is lawful, and to further assess its correctness, accountability, and impact based on the obligations applicable. However, what is considered a legal basis varies greatly across cultures, domains, use-cases, and laws themselves. The aim of DPV is therefore to provide an upper-level abstract taxonomy of categories of legal bases, such as consent and contract, that can be customised and applied as needed.

Entities

Specifying entities using DPV

see more information: DPV spec

Representing the ‘entities’ or ‘actors’ involved in the processing of personal data. DPV provides a broad categorisation of entities based on their relevance in jurisprudence (i.e. legal roles) as well as categorisation in real-world (e.g. organisation types).

Data Controller

Specifying Data Controller entity using DPV

Representing the organisation(s) responsible for processing the personal data.

DataSubject

Specifying Data Subject entity using DPV

Representing the categories or groups (e.g. Users of a Service), or instances (e.g. Jane Doe) of individual(s) whose personal data is being processed.

Recipient

Specifying Recipient entities using DPV

Represents the entities that receive personal data, e.g. when it is being collected, or transferred, or shared.

Technical Organisational Measure

Specfiying Technical, Organisational, Legal, and Physical measures using DPV

see more information: DPV spec

DPV provides a taxonomy of technical and organisational measures for representing information about how the processing of personal data is technically and organisationally protected, safeguarded, secured, or otherwise managed. This is distinct from what technology is used for carrying out processing, and instead refers to what measures are in place (i.e. what the technology intends to provide in terms of features).

Technical and Organisational measures consist of activities, processes, or procedures used in connection with ensuring data protection, carrying out processing in a secure manner, and complying with legal obligations. Such measures are required by regulations depending on the context of processing involving personal data. For example, GDPR (Article 32) states implementing appropriate measures by taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing, as well as risks, rights and freedoms.

Location and Jurisdiction

Specifying Location and Jurisdiction using DPV

see more information: DPV spec

Representing the locations associated with entities, processing, data, and other information that is important to consider jurisdictions and from that understand the applicability of laws, involvement of authorities, and discover rights.

Risk

Specifying Risk Assessment using DPV

see more information: DPV spec

Risk refers to potential negative events. DPV enables representing risk(s) associated with a concept, for e.g. risk of unauthorised data disclosure related to processing, technical measure, or vulnerability of data subjects. In addition to the risk, DPV also enables representing the consequences (e.g. denial of service) and their impacts for specific entities (e.g. right violated for data subject).

Technology

Specfiying Technology using DPV. The TECH extension provides additional concepts to describe the technology such as involved actors, intended use, capabilities and functions, and documentation

see more information: DPV spec

Representing the technologies used to implement the processing, or associated with the processing. For example, software products, cloud services, or AI technologies. This also involves specifying who is doing the implementing i.e. a technology and its implementer.

Rights

Specifying Rights and Rights Exercise information using DPV

see more information: DPV spec

The concept Right represents a normative concept for what is permissible or necessary in accordance with a system such as laws. To associate rights with concepts that are relevant or within which those rights occur, the relation hasRight is used. Rights can be passive, which means they are always applicable without requiring anything to be done, or active where they require some action to be taken to initiate or exercise them. To represent these concepts, DPV uses PassiveRight and ActiveRight respectively. Rights can be applicable to different contexts or entities. To differentiate rights applicable or afforded to data subjects, the concept DataSubjectRight is used.

Rules

Specifying Rights and Rights Exercise information using DPV

see more information: DPV spec

Rules are relevant to explicitly denote how a system should implement operations, and enable associating specifics such as requirements, constraints, and other forms of 'rules' that are needed in order to control executions or affect interpretations or achieve compliance (e.g. with law). DPV defines the concept Rule and relation hasRule to enable representation of such conditions and requirements, and provides a minimal set of concepts for types of rules, namely - representing Permissions, Prohibitions, and Obligations. DPV does not define additional semantics for rules and limits its scope and focus to provide a simple way to specify common rules associated with personal data and its processing activities, with the recommendation to consider other richer and mature efforts dedicated to expression of conditions and rules, such as: [[ODRL]], [[SHACL]], and [[RuleML]].

Process

In legal terminology, it is common to refer to all information about how personal data is being processed using the colloquial term processing. This results in confusion between the use of processing as a concept referring to all information (i.e. purposes, personal data, collection, storage, etc.), and processing as a concept referring to (only) the specific actions or operations (e.g. collect, use).

To avoid this ambiguity and enable clarity of information, DPV defines a new concept called Process for representing how the core concepts are combined and applied for a particular use-case. The association of a concept to Process is made using the relationships or properties provided for each concept. For example, to indicate a Process includes personal data, the relationship hasPersonalData is used along with the concept PersonalData.

Nesting Process to express granular models

Instances of Process can be nested, which means one instance can contain other instances, much like a box with several smaller boxes inside. This permits breaking down complex or dense use-cases into more granular ones and representing them in a more precise and modular fashion. Such a representation also facilitates reuse of the granular or modular processes, or in defining 'templates' and 'patterns', for example to craft a single process representing collecting and storing email addresses and using it in different processes for different purposes.

From the earlier example, consider the situation where a single Process instance consists of two additional instances representing: (i) data is stored using a data processor, (ii) data is used for Marketing. While it is certainly possible to represent all of this information within one single instance of Process, the adopter may decide to create separate instances of Process based on requirements such as reflecting similar separations for legal documentation or accountability purposes.

Alternative Models

Process is intended to provide a convenient concept for tying the core concepts together, and DPV does not make its use binding, nor does it constrain the relationships to only be defined between Process and the other core concepts. This is so as to permit using DPV in alternate or differing models. For example, where a central concept already exists, such as when describing relevant information for a smartphone app, the concept for App can be a replacement for Process based on statements such as <App> hasPurpose <SomePurpose>. Even in such cases, Process can provide granular expression thereby enabling description of different contexts within which the app uses personal data, such as for registration or complaint resolution. Therefore unless necessary for the use-case, DPV recommends using Process or its subtype/subclass as a central concept for ensuring interoperability.

An example of where the adopter or use-case wants to use another concept in a way which is not compatible with Purpose is the use of Purpose to indicate it involves some data i.e. <SomePurpose hasPersonalData SomePersonalData>, or to indicate which legal basis is used for that purpose by using the hasLegalBasis relationship. While not explicitly prohibited by DPV, the implications of using Purpose in this manner is that the personal data and processing and other associated concepts are now strictly tied to the purpose instance (and implementation). Changing any of these would mean changing the purpose, and in addition to these, it is not possible to combine multiple purposes together or have nested purposes with different details in the same manner as with a Process. Therefore, DPVCG recommends the use of Process to ensure compatibility between use-cases as well as to ensure the use of concepts does not create ambiguity or restrict further use-cases from reusing existing information.

When using custom-defined restrictions and data models, it is important to note the consequences such models have on interpretation and interoperability of data defined using DPV. For example, consider a compliance assessment tool that takes DPV data as input. If the tool expects a Process with links to relevant information, using other alternate models and relationships can produce invalid or incorrect results. To avoid this, we recommend:

  1. Documenting alternate models to clearly indicate their interpretation and use of DPV semantics;

  2. Where possible, ensuring and providing mappings between the alternate models and the Process or equivalent concepts within DPV so that the data can be transformed for interoperability;

  3. Consider contributing your idea or implementation of an alternate model to DPVCG to create a ‘library of models’, which can act as documentation for adopters and provide better understanding of the model's impacts on requirements and interpretation of information specified using DPV. This exercise can also assist in selecting a common model as the 'default' and to provide mechanisms for conversion/interoperability between it and other models.

Using DPV

The motivation of DPV is to provide a 'data model' or an 'ontology' of concepts for interoperable representation and exchange of information about processing of (personal) data and the use of technologies. For this, the DPV specification defines concepts and relationships using the [[RDF]] standard, and which can additionally be implemented and applied using technologies appropriate to a use-case's specific requirements.

DPV Serialisations

In addition to being used as a semantic web resource, the DPV can also be used without (or alongside) semantic web by utilising a format such as [[JSON-LD]] that retains the semantics and provides convenience of using JSON, or through other formats such as a CSV or a flat-list of concepts which do not capture 'semantics'. This section provides an overview of such approaches where DPV can be used both with and without semantic web.

The following are four (non-exhaustive) ways DPV can be used based on the requirements of an use-case. For guidance on how to adopt DPV concepts within an use-case, refer to [[[GUIDES]]].

  1. As a taxonomy or collection of concepts: The [[DPV]] specification provides taxonomies of concepts (e.g. Purpose). This is useful where only the 'taxonomy' in DPV is needed, for example to populate forms or annotate information. For this, the default serialisation of [[SKOS]] is useful, or implementers can utilise other formats such as CSVs or JSON while retaining the IRIs for interoperability.
  2. As a 'schema' or 'lightweight-ontology': The [[DPV]] uses [[RDFS]] and [[SKOS]] as its default serialisation to define 'classes' and 'properties' as well as 'taxonomies' that can be used with it (e.g. hasPurpose and instances of Purpose taxonomy). The classes and properties form a 'data model' or 'schema' to represent how information should be structured and organised, and do not contain any complex restrictions (e.g. unions and intersections of concepts). It is suitable for cases where the use-case wants to use DPV as a schema or data model or to describe its activities, and where creating constraints, inferences, or reasoning is either implemented separately (e.g. SWRL or SHACL) or is not required.
  3. As an 'logic-based ontology': The [[DPV-OWL]] is a serialisation of the [[DPV]] specification using [[OWL]] language that contains the same concepts but is provided under a separate namespace. It enables the use of description logic in [[OWL]] for modelling knowledge and describing desired inferences through a logic-based reasoner. OWL offers more powerful (and complex) features compared to RDFS+SKOS regarding expression of information and its use to produce desired inferences in a coherent manner. It also restricts ways in which DPV concepts can be used - see example showing implications of using SKOS vs OWL. Also see the [[[GUIDE-OWL2]]].
  4. Other Uses: For cases where the above are not suitable or sufficient, an adopter can create their own serialisation of the DPV by implementing the [[DPV]] specification in RDF (or other semantics-aware languages) or for alternate formats and environments such as CSVs, programming APIs, and frameworks. When using DPV in such a manner, it is advised to retain compatibility (and interoperability) by either using the entire IRI (e.g. https://w3id.org/dpv#Purpose) or providing documentation for how the custom implementation aligns with the [[DPV]] specification (e.g. stating MyPurposeConcept is the same as dpv:Purpose). Doing this ensures that the data remains compatible and interoperable with the other uses and applications of DPV.

Areas of Application

The following is an illustrative, but non-exhaustive list of applications possible with the DPV:

See the community maintained [[[DPV-ADOPTION]]]

Semantics of DPV

DPV defines a broad notion of semantics for providing a conceptual model of concepts and relationships between them. As explained in the [[[#serialisations]]] section, [[DPV]] provides concepts which are represented using [[RDFS]] and [[SKOS]] which permits its use as a taxonomy or as a light-weight ontology. In addition to this, the same concepts are provided with [[OWL]] serialisation in a separate namespace to enable complex ontological reasoning. The following section introduces why we need 'concepts' and 'relationships' and how they are modelled in DPV.

Concepts and Relationships

[[DPV]] is a collection of concepts. Here the term 'concept' is broadly used as consisting of a term non-exhaustively representing any of the following: idea, thought, meaning, object, event, relations, class, or category. Thus, in DPV, 'concepts' consist of terms and relationships between them.

Semantic relationships between concepts used in DPV - generalisation and specialisation (arrowhead), instantiation, and association

A ‘concept' in DPV is a 'term' representing information associated with that particular concept. For example, the concept Email refers to information about emails. This information may contain email addresses, aliases, signatures, and so on. While an intuitive use of Email may be taken to only refer to email address, within DPV concepts are defined with a strict scope as being representatives of all concepts that are inherently a part of it. Therefore, for emails, the concept Email is inclusive of email addresses, aliases, and so on from above. To specifically refer to 'email address', the concept Email Address should be used, which is 'narrower' or 'more specific' than the concept Email, or in terms of sets EmailAddress is a subset of Email, or if representing information as 'classes' we say EmailAddress is a 'subclass' of Email in terms of information. We use the term 'subtype' to indicate all such relationships consisting of 'broader/narrower' or 'superclass/subclass' or 'subset/superset' to enable different semantic interpretations when serialising the concepts using standards such as [[RDFS]], [[SKOS]], and [[OWL]] (e.g. 'is-a' or 'subclass').

Through this interpretation, the DPV is structured as a hierarchy of concepts where each parent or top or broader concept represents a broad set of information and its children or bottom or narrower concepts represent parts of that set. For example, the top concept Data has more specific subtypes Personal Data - which has a further subtype Sensitive Data.

In taking this view of concepts and relationships, DPV provides a way to agree upon what a term means and is intended to represent. For example, when two different use-cases use the concept Personal Data using DPV, both refer to the same concept. Similarly, when Email is declared as a subtype of Personal Data, another entity receiving and reading this information must interpret it in the same manner. DPV is thus intended to be a foundational model for terms and relationships when representing and exchanging information.

DPV as an Ontology

The use of DPV concepts in actual use-cases is often accompanied with additional information and a specific 'serialisation' that make it possible to use DPV in a given technological or theoretical framework. For example, consider the relation hasPersonalData used to indicate association or applicability of PersonalData subtypes/subclasses or instances. While this information about what concepts the relationship is being used with/for can be implicitly understood by humans based on the phrasing 'has personal data', it can also be explicitly declared as machine-readable information so as to: (i) express the inherent logic and interpretation of which concepts are related; (ii) enable verification that the object of relation is indeed a type of personal data; and (iii) provide hints or suggestions such as a list of personal data concepts in GUI when using the relation. To express such additional information that defines relations between concepts and constraints their uses, DPV must be specified as an 'ontology' using a serialisation that supports representing this and any other required information.

One option to represent ontologies is RDF ([[[RDF]]]) which provides a formal method for expressing information or facts, with RDFS ([[[RDFS]]]), SKOS ([[[SKOS]]]), and OWL ([[[OWL]]]) for representing a more detailed and logic-based assertion of the model in terms of relationships and restrictions. While there are other alternatives available to RDF for representing information, and to RDFS, SKOS, and OWL for representing taxonomies and ontologies, the DPVCG uses these to serialise the DPV specification as an ontology based on their status as standards.

Initially, DPV was only provided as an [[OWL]] ontology. This was expanded upon in DPV v1 which used custom [[SKOS]] extensions to define the 'core' vocabulary with serialisations in [[RDFS]]+[[SKOS]] and OWL2. In DPV v2, the custom [[SKOS]] extensions were removed in favour of [[RDFS]]+[[SKOS]] as the default serialisation with [[OWL]] as an alternative serialisation. The [[RDFS]]+[[SKOS]] serialisation defines concepts as [[RDFS]] classes and instances of a top-concept with [[SKOS]] used to represent the hierarchy, whereas the [[OWL]] serialisation uses subclasses to represent the hierarchy.

The table provides an overview of the expression of concepts across DPV serialisations.

Concept [[DPV]] [[DPV-OWL]]
Conforms with [[RDFS]], [[SKOS]] [[OWL]]
Concept rdfs:Class, skos:Concept owl:Class
is subtype of rdfs:subClassOf or skos:broader owl:subClassOf
is instance of rdf:type rdf:type
has concept rdf:Property owl:ObjectProperty
relationship subject or domain rdfs:domain, dcam:domainIncludes, schema:domainIncludes rdfs:domain, dcam:domainIncludes, schema:domainIncludes
relationship object or range rdfs:range, dcam:rangeIncludes, schema:rangeIncludes rdfs:range, dcam:rangeIncludes, schema:rangeIncludes

Extending Concepts for Use-Cases

Most of the concepts within DPV are provided as hierarchies of classes representing categories of information, which are intentionally generic or abstract or broad so as to permit their application across a diverse and varied landscape of real-world use-cases. In order to accurately reflect the particulars of an use-case, concepts within DPV would (most likely) need to be extended. The specifics for how this should be done depend on the manner in which DPV is utilised. For example, using the default [[DPV]] specification which contains [[RDFS]] and [[SKOS]] semantics, extending is done by declaring a new concept an instance of the top concept using rdf:type and then using skos:broader to denote where it fits within the hierarchy. In [[DPV-OWL]] which uses [[OWL]] semantics, rdfs:subClassOf relationship is used to create hierarchy of sub-classes. Where an exact concept is not present within the DPV and a broader concept exists for representing the same information - one should subtype or extend that broad concept to define the required information.

The mechanism for extending concepts (via both subclasses/subtypes and instances) is useful to align existing concepts or vocabularies with the DPV taxonomies, such as by declaring them as subclasses of a particular concept. This permits the creation of domain or jurisdiction specific extensions, such as [[[EU-GDPR]]] for expressing the legal bases provided by GDPR. Extensions also permit more accurate representations of a use-case by extending from multiple concepts to refine and scope the interpretation. This means each concept can have multiple parents representing the intersection of their respective sets.

It is not necessary to extend concepts unless one wishes to depict use-case specific information. For example, if in a use-case it is sufficient to (only) say some information is collected, then dpv:Collect can be directly used. However, where more specific information is needed, such as also specifying a method of collection (e.g. CollectViaWebForm), then it is recommended to extend the concept, for example as <CollectViaWebForm a dpv:Collect>. If there are lots of forms and they need to be 'grouped' together as collection methods, then one would subtype/subclass Collect as CollectViaWebForm and create instances of it for each form to be represented.

Though this example used a web form as a method of collection by directly mentioning it within the concept as CollectViaWebForm, this may not always be desirable. For example, that same web form may also need to be represented separately for logging purposes. DPV also provides the DataSource and Technology concepts for representing information regarding how concepts are implemented and the use of specific technological artefacts such as web forms, databases, along with their functions such as data storage and retrieval.

Maintaining Interoperability

DPV intends to provide a core or foundational framework for different entities to exchange information and interpret concepts for interoperability. When an adopter (e.g. an organisation using DPV) extends concepts to refine them for their own use-case, the concept is still (weakly) interoperable by relying on DPV’s broad taxonomies to provide a common point of reference.

Extensions

To supplement the concepts and taxonomies in [[DPV]] for specific applications, use-cases, or to provide separation for better management of terms, we provide several extensions to the DPV.

Personal Data (PD)

[[[PD]]] provides additional concepts that extend the DPV's personal data taxonomy based on an opinionated structure contributed by R. Jason Cronk from EnterPrivacy. This separation is to enable adopters to decide whether the extension's concepts are useful to them, or to use other external vocabularies, or define their own.

Concepts within [[PD]] are broadly structured in top-down fashion by utilising their relevance and origin as:

Locations (LOC)

[[[LOC]]] provides additional concepts regarding locations such as countries and regions based on the ISO 3166 standards. It enables representing information such as processing takes place within Ireland, represented by loc:IE, or within European Union (EU) by using loc:EU. We are working on expanding this list to also specify regions, cities, and other pertinent location details, and welcome participation and contributions for this.

Risk Management (RISK)

[[[RISK]]] builds on top of the lightweight risk framework within DPV by providing the following extensive concepts related to risk assessment and management. We are in the process of identifying additional concepts and taxonomies for the risk extension, such as for risk management procedures and the creation of a risk ontology based on ISO standards.

Technologies (TECH)

[[[TECH]]] extends the DPV's terms to represent further specific details regarding technologies, their management, and relevance to actual real-world tools and systems. It provides concepts for the following:

The intention and aim of developing the TECH extension is to describe real-world tools and services, such as a specific cloud storage provider, and provide categorisation and metadata to connect it to DPV's concepts, such as to indicate the cloud storage instance features encryption at rest as a technical measure. Through these, the management and documentation of use-cases can be made easier by providing the relationships between tools/services and technical measures as a 'knowledge graph'.

Artificial Intelligence (AI)

[[[AI]]] is an extension under development which will further extend the [[TECH]] extension to represent concepts associated with AI. These will include representation of:

Justifications

[[[JUSTIFICATIONS]]] provides concepts for use as 'justifications' with DPV. For example, where a right cannot be fulfilled, a justification such as 'identity could not be verified' is represented using a specific concept.

Notes

This document is based on inspiration from the following:

Funding Acknowledgements

Funding Sponsors

The DPVCG was established as part of the SPECIAL H2020 Project, which received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 731601 from 2017 to 2019.

Harshvardhan J. Pandit was funded to work on DPV from 2020 to 2022 by the Irish Research Council's Government of Ireland Postdoctoral Fellowship Grant#GOIPD/2020/790.

The ADAPT SFI Centre for Digital Media Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under the European Regional Development Fund (ERDF) through Grant#13/RC/2106 (2018 to 2020) and Grant#13/RC/2106_P2 (2021 onwards).

Funding Acknowledgements for Contributors

The contributions of Beatriz Esteves have received funding through the PROTECT ITN Project from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 813497.

The contributions of Harshvardhan J. Pandit have been made with the financial support of Science Foundation Ireland under Grant Agreement No. 13/RC/2106_P2 at the ADAPT SFI Research Centre.