Keywords

1 Introduction

This chapter relates to the technical priority of data management from the European Big Data Value Strategic Research and Innovation Agenda [23]. It addresses the horizontal concern of data management from the BDV Technical Reference Model and the vertical concerns of cybersecurity. Additionally, this chapter relates to the data for AI enablers of the AI, Data, and Robotics Strategic Research, Innovation, & Deployment Agenda [22].

Sharing sensitive data, between healthcare organizations, for example, can facilitate significant societal, environmental, and economic gains such as medical diagnoses and biomedical research breakthroughs. However, as this data is sensitive, organizations understand the importance (and increasing compliance requirements) of securely sharing, storing, managing, and accessing such data. Here, sensitive data is specified to include personal data or personally identifiable information (PII), GDPR special category personal data,Footnote 1 and business confidential or restricted data that does not normally leave an organization. Most recent works in sensitive data sharing have used cryptosystems and blockchain approaches [4, 10, 21]. These approaches were designed to facilitate the sharing of sensitive data, such as sharing patient medical records between healthcare institutions, but need additional infrastructure to support collaborative data sharing environments for the purpose of research or collaborative analysis. This chapter explores the use of a dataspace, a data management framework capable of interrelating heterogeneous data, for the sharing of sensitive data in a collaborative environment. It also illustrates the use of Knowledge Graphs (KGs) in constructing a trusted data sharing environment for sensitive data.

In recent years, KGs have become the base of many information systems which require access to structured knowledge [2]. A KG provides semantically structured information which can be interpreted by computers, offering great promise for building more intelligent systems [24]. KGs have been applied in different domains such as recommendation systems, information retrieval, data integration, medicine, education, and cybersecurity, among others [20]. For example, in the medical domain, KGs have been used to construct, integrate, and map healthcare information [24]. A dataspace integrates data from different sources and heterogeneous formats, offering services without requiring upfront semantic integration [6]. It follows a “pay-as-you-go” approach to data integration where the priority is to quickly set up the fundamental aspects of the dataspace functionality, such as dataset registration and search, and then improve upon the semantic cohesion of the dataspace over time [6, 8]. The dataspace services offered over the aggregated data do not lose their surrounding context, i.e., the data is still managed by its owner, thus preserving autonomy [5].

A dataspace requires security aspects, such as access control and data usage control [2, 17, 18], to avoid data access by unauthorized users. In this sense, access control is a fundamental service in any dataspace where personal data is shared [2, 3, 13, 15, 17, 18]. According to Curry et al. [2], a trusted data sharing dataspace should consider both personal data handling and data security in a clear legal framework. However, there is currently a lack of solutions for dataspaces that consider both the privacy and security aspects of data sharing and collaboration (see Sect. 3). This work explores the following research question: to what extent will the development of a multi-user and multi-organization dataspace, based on Linked Data technologies, personal data handling, data privileges, and data interlinking, contribute to building a trusted sharing dataspace for a collaborative environment? In response, this work proposes the Trusted Integrated Knowledge Dataspace (TIKD)—an approach to the problem of secure data sharing in collaborative dataspaces.

The TIKD is a multi-user and multi-organization Linked Data approach to trustworthy data sharing between an organization’s users. The security access to data follows a context-based access control (CBAC) model, which considers the user and data context to authorize or deny data access. The CBAC implementation is based on the Social Semantic SPARQL Security for Access Control Ontology [19] (S4AC) which defines a set of security policies through SPARQL ASK queries. TIKD defines a privacy protecting user log, based on the PROV ontology, to create user history records. User logs are securely stored following a pseudonymized process based on the Secure Hash Algorithm 3 (SHA-3). The TIKD also provides personal data handling, based on the data privacy vocabularyFootnote 2 (DPV), to comply with the General Data Protection Regulation (GDPR). It implements an interlinking process to integrate external data to the KG based on the Comprehensive Knowledge Archive NetworkFootnote 3 (CKAN) data management tool. The contributions of this research are:

  1. 1.

    A trusted dataspace, based on Knowledge Graph integration and information security management, for collaborative environments such as healthcare

  2. 2.

    An information security management system to securely handle organizational data sharing, personal data, user history logs, and privacy-aware data interlinking by means of a context-based access control that includes data privileges and applies a pseudonymization process for user logs

This work extends TIKD from the former work [7] by updating the access control model, improving the personal data handling process, describing the data classification mechanism, and incorporating a new evaluation process based on the privacy information ISO 27701 standard.

The structure of the remainder of this chapter is as follows: the Use Case section defines the requirements of the ARK-Virus Project. The Related Work section presents the state of the art in dataspace data sharing approaches. The Description of the TIKD section details the services of the dataspace. The Evaluation section presents the results from the ISO 27001 Gap Analysis Tool (GAT) and the ISO 27701 control requirements. Finally, the Conclusion section presents a summary of this research and its future directions.

2 Use Case—Sensitive Data Sharing and Collaboration for Healthcare in the ARK-Virus Project

The ARK-Virus Project.Footnote 4 extends the ARK Platform to provide a collaborative space for use in the healthcare domain—specifically for the risk governance of personal protective equipment (PPE) use for infection prevention and control (IPC) across diverse healthcare and public service organizations [12]. The consortium consists of the ARK academic team (ADAPT Centre, Dublin City University, and the Centre for Innovative Human Systems, Trinity College Dublin) and a community of practice which includes safety staff in St. James’s Hospital Dublin, Beacon Renal, and Dublin Fire Brigade. Staff across all three organizations are involved in trialing the ARK Platform application which is hosted in Trinity College Dublin. This creates many overlapping stakeholders that must be appropriately supported when handling sensitive information.

The ARK Platform uses Semantic Web technologies to model, integrate, and classify PPE risk data, from both qualitative and quantitative sources, into a unified Knowledge Graph. Figure 1 illustrates the ARK Platform’s data model supporting the collaborative space for PPE. This model is expressed using the ARK Cube ontologyFootnote 5 and the ARK Platform VocabularyFootnote 6 [9, 12]. The Cube ontology is used in the overall architecture of the ARK Platform to support data analysis through the Cube methodology—an established methodology for analyzing socio-technical systems and for managing associated risks [1, 11]. The ARK Platform Vocabulary allows for the modeling of platform users, access controls, user permissions, and data classifications.

Fig. 1
figure 1

The ARK Platform data model

Through the ARK-Virus Project a set of security requirements for the ARK Platform were defined (see Table 1). These requirements included data interlinking, data accessibility (privacy-aware evidence distillation), and secure evidence publication (as linked open data), as priority security aspects. The ARK Platform implements the TIKD to cope with these requirements (see Table 1) and to provide secure management of personal data, pseudonymized data (for compliance with the GDPR, explained later in this chapter), and security logs (for history records).

Table 1 ARK-Virus Project requirements, description, and solution proposed with TIKD

3 Related Work

This section compares available data sharing approaches with the ARK-Virus requirements (see Table 1) in order to establish their suitability. The approaches analyzed can be divided into two main techniques: dataspace-based and blockchain-based, where blockchain is an Internet database technology characterized by decentralization, transparency, and data integrity [14].

Dataspace approaches to data sharing services are primarily associated with the Internet of Things (IoT) [2, 15, 17, 18], where data integration from heterogeneous devices and access control are the main objective. On the other hand, blockchain approaches [4, 10, 21] integrate cryptography techniques as part of the data management system in order to share data between agents (users or institutions). Table 2 provides a comparison of the state of the art and TIKD in relation to the requirements of the ARK-Virus Project.

Table 2 Comparison of data sharing in dataspace and trusted data sharing approaches

Data sharing approaches based on blockchain methods [3, 4, 10, 21] use a unified scheme. In most cases records must be plain text, avoiding the integration of data in different formats, and usage policies, which restrict the kind of action that an agent can perform over data, are not defined. Even when the main concern of these approaches is to keep a record’s information secure, they do not propose any agent information tracking for activity records. TIKD implements an authorization access control based on security policies that consider context information, security roles, and data classification (explained in the next section) in order to share data between users in the same or different organizations.

Typical state-of-the-art dataspaces implement security features such as access control authentication methods [13, 17], defined access roles [2, 15], user attributes [18]), and usage control [13, 15] in order to provide data sharing services. In addition to security aspects, dataspace approaches with sharing services cope with data fusion [17], usage control between multiple organizations[13], real-time data sharing [2], and privacy protection [18]. However, these approaches do not provide mechanisms for personal data handling in compliance with GDPR, privacy-aware log records, or privacy-protected interlinking with external resources. TIKD is based on a set of Linked Data vocabularies that support these aspects, e.g., the Data Protection Vocabulary (DPV) to cope with personal data handling, the Data Catalog VocabularyFootnote 7 (DCAT) to cope with interlinking external resources, and PROVFootnote 8 to cope with user logs.

4 Description of the TIKD

The Trusted Integrated Knowledge Dataspace (TIKD) was designed in accordance with the ARK-Virus Project security requirements (see Sect. 2). The TIKD services (Fig. 2) define data permissions (Knowledge Graph integration, subgraph sharing, and data interlinking), user access grants (security control), and external resource integration (data interlinking) to provide a trusted environment for collaborative working.

Fig. 2
figure 2

The Trusted Knowledge Integrated Dataspace services

TIKD is a multi-user and multi-organization dataspace with the capability of securely sharing information between an organization’s users. The security control module asserts that only granted users, from the same organization, can access KGs, shared information, and interlinked data. This module follows a context-based approach considering security roles and data classifications (explained later in this section), i.e., access to the organization’s data is determined by the user’s context and the target data classification. The next subsections explain each of these services.

4.1 Knowledge Graph Integration

The Knowledge Graph integration service (Fig. 2, Knowledge Graph integration) is a central component of the TIKD. This service defines a dataspace where i) multiple users can work on a KG within an organization, ii) multiple organizations can create KGs, iii) linking to datasets by means of DCAT, instead of graphs/data, is supported, iv) fine-grained record linkage via DCAT records is supported, and v) evidence and KG integration/linking are supported.

4.2 Security Control

The security control service (Fig. 2, security control) is the main service of the TIKD. This service makes use of Linked Data vocabularies to handle personal data, access control context specification, and privacy protecting user logs. The following subsections explain in detail each one of these services.

4.2.1 Personal Data Handling

Personal data is described through the DPV, proposed by the W3C’s Data Privacy Vocabularies and Controls Community Group [16] (DPVCG). DPV defines a set of classes and properties to describe and represent information about personal data handling for the purpose of GDPR compliance.

The ARK Platform collects user’s personal data through a registration process which enables the access to the ARK Platform. The registration process requires a username, email address, organization role, platform role, and a password. On the other hand, the TIKD security control service authenticates an ARK user through their username, or email address, and their password. To represent these kinds of personal data, the following DPV classes (Fig. 3) were used:

  • Personal data category (dpv:PersonalDataCategory): identifies a category of personal data. The classes dpv:Password, dpv:Username, and dpv:EmailAddress are used to represent the personal data handled by TIKD.

    Fig. 3
    figure 3

    DPV classes used to describe personal data annotations for the TIKD

  • Data subject (dpv:DataSubject): identifies the individual (the ARK user) whose personal data is being processed.

  • Data controller (dpv:DataController): defines the individual or organization that decides the purpose of processing personal data. The data controller is represented by the ARK Platform.

  • Purpose (dpv:Purpose): defines the purpose of processing personal data. The security class (dpv:Security) is used to define the purpose.

  • Processing (dpv:Processing): describes the processing performed on personal data. In this sense, the ARK Platform performs the action of storing (dpv:Store) the ARK user’s personal data and TIKD performs the action of pseudonymizingFootnote 9 (dpv:PseudoAnonymise) the data to perform log actions.

4.2.2 Data Classification

The ARK Platform uses different data classification levels to define the visibility, accessibility, and consequences of unauthorized access to an access control entityFootnote 10 (ACE). An ACE defines a KG representing an ARK Project or an ARK Risk Register.Footnote 11 Table 3 describes each data classification access level. Considering the data classification levels, a public ACE can be accessed by the general public and mishandling of the data would not impact the organization. Conversely, the impact of unauthorized access or mishandling of a restricted ACE would seriously impact the organization, staff, and related partners. The integration of data classification to the TIKD provides certainty about who can access which data based on the constraints of the data itself.

Table 3 Data classification access level alongside availability release and unauthorized access impact

An ACE can be associated with one or more data entities. A data entityFootnote 12 represents an individual unit (data) or aggregate of related data (group of data), each of which can have its own data classification. The data classification of data entities follows a hierarchical structure whereby the ACE represents the root node and the data entities represent a child or sub-node. In line with this hierarchy, sub-nodes cannot have a less restrictive access level than the root/parent node, i.e., if the ACE data classification is defined as internal, then its data entities cannot be classified as public.

4.2.3 Access Control

The access controls (AC) were designed to meet the privacy-aware evidence distillation requirement (Table 1) of providing access to users with the appropriate level of clearance. The AC follows a context-based approach, alongside data classifications, to allow or deny access to an ACE.

Considering the security role, the AC mediates every request to the ARK Platform, determining whether the request should be approved or denied. TIKD defines a context-based access control (CBAC) model, based on context and role specification, where data owners can authorize and control data access. In a CBAC model, policies associate one or more subjects with sets of access rights, pertaining to users, resources, and the environment, in order to grant or deny access to resources. In this sense, the set of policies consider the current user’s context information to approve or deny access to ACEs. The AC takes into account the following authorization access elements (Fig. 4):

Fig. 4
figure 4

Access authorization elements

  • ARK user: an ARK user has associated an organization role, a platform status, and a security role. The security role is assigned after creating or relating an ARK user with an ACE.

  • ARK Platform status: defines the user’s status in the ARK Platform, e.g., active, pending, update pending, and update approved.

  • Organization role: each organization has the facility to define their own organization and security role hierarchy independently. The ARK Platform contains some predefined security roles (admin, owner, collaborator, and read-only) and platform roles (frontline staff, clinical specialist, and safety manager, among others). However, these roles can be extended according to the organization’s requirements.

  • Security role: an ARK user is associated with an ACE through their security role. In this sense, an ARK user could take one of the following predefined security roles: admin, owner, collaborator, or read-only, where owner and admin are the highest level roles.

  • Data classification: defines the data visibility of ACEs and data entities considering the rules from Table 3.

  • Data entity (evidence): refers to interlinked data. A user can interlink data from external sources to enrich an ACE. In the ARK Platform context, this interlinked data is considered “evidence.” The evidence is under the owning organization’s jurisdiction, i.e., only users from the same organization have access. Additionally, the evidence can take any of the data classification access level, i.e., an evidence could be defined as public, internal, confidential, or restricted.

The TIKD AC (Fig. 5) is based on the Social Semantic SPARQL Security for Access Control Ontology (S4AC). The S4AC is a fine-grained access control over Resource Description Framework (RDF) data. The access control model provides the users with means to define policies to restrict the access to specific RDF data at named graphs or triple level. It reuses concepts from SIOC,Footnote 13 SKOS,Footnote 14 WAC,Footnote 15 SPIN,Footnote 16 and the Dublin Core.Footnote 17

Fig. 5
figure 5

S4AC ontology. The dashed rectangle defines the integrated ARK Platform context information

The main element of the S4AC model is the access policy (Fig. 5). An access policy defines the constraints that must be satisfied to access a given named graph or a specific triple. If the access policy is satisfied, the user is allowed to access the data, but if not, access is denied. TIKD access policies consider ARK user context (the ARK Platform status, the security role, organization role) and the data classification of the target resource (an ACE or a data entity).

The TIKD AC integrates the arkp:AccessControlContext class to the S4AC to define the ARK Platform context information. The ARK user’s context information is represented as a hash string to validate the relationship between the ARK user and the target ACE (Fig. 6a). The ARK user context corresponds to the attributes which define the current state of the user in relationship with the ARK Platform (their status), the ACE (their security role), and the organization (their organization’s role). These attributes are the input for the hash function to generate a corresponding hash string, which will be associated with the user and the ACE (Fig. 6b), through the property arkp:hasContextValidation in the corresponding class.

Fig. 6
figure 6

Hash string generation process

4.2.4 Policy Specification

The TIKD AC defines two kinds of policies: global and local. The global policy and context policy compare the ARK user’s context hash string against the hash string from the target ACE (ARK Project or ARK Risk Register). If both are the same, access to the ACE is granted; otherwise, it is denied. The local policy considers the data classification of ACEs and data entities to grant or deny access to an ARK user. Table 4 describes the data classification and the security role required to access the data. Local polices check if an ARK user’s security role has the correct permissions to access the requested data.

Table 4 Data classification

A TIKD AC policy is defined by the tuple P =< ACS, AP, R, AEC > , where ACS stands for the set of access conditions, AP for the access privilege (create, delete, read, update), R for the resource to be protected, and AEC for the access evaluation context. An access condition is defined through a SPARQL ASK query, representing a condition to evaluate a policy or policies. The AEC is represented by the hash string value produced from the ARK user context.

The policy specification process selects the corresponding global and local policies. After an ARK user sends a request to access an ACE (Fig. 7a), the global policy is selected (Fig. 7b, c). The local policies include the ACE and their data entity data classification configuration (Fig. 7d), which defines data authorization access; according to this configuration, the corresponding ASK queries are selected.

Fig. 7
figure 7

Policy enforcement and decision process

4.2.5 Policy Enforcement

The policy enforcement process executes the corresponding ASK queries and returns the decision to grant or deny access to the ACE (Fig. 7e). The global policies are executed first, and if the ASK query returns a true value, then the local policies are executed. In the ARK Platform, the user context could change at any moment by several factors, e.g., update to organization role, organization change, update to security role, update to platform status, etc. The global policy validates the ARK user context with the target ACE. A correct validation means that the user is granted access to the ACE. On the other hand, the local policy defines a fine-grained data access for data entities allowed to be accessed by the user.

4.2.6 Privacy Protecting User Logs

Finally, the privacy protecting user logs record the actions performed by users during their sessions on the ARK Platform for historical record purposes. User information is pseudonymized in the log data, using the SHA-3 algorithm, by combining the username, email, and registration date parameters.

The user logs record user activities on the platform and the results retrieved by the system (failure, success, warning, etc.) during a session, e.g., if the user tries to modify the KG but their role is read-only, the privacy protecting user log process will record this activity as well as the failure response from the system. The PROV ontologyFootnote 18 is used to implement the privacy protecting user logs following an agent-centered perspective i.e., focusing on the people or organizations involved in the data generation or manipulation process.

4.3 Data Interlinking

TIKD supports the integration of KGs and also provides special support for the integration of potentially sensitive external resources (a data interlinking requirement of the ARK-Virus Project), by means of an interlinking service (Fig. 2 data interlinking).

The data interlinking service allows users to add data from an external source as evidence to a risk management project. Evidence is used as supporting data for the KG, providing findings or adding valuable information to enrich the content of the KG. The multi-user and multi-organizational nature of the ARK Platform requires an access restriction to evidence. In this sense, the access control service restricts access to evidence only to users from the same organization.

The TIKD data interlinking process was implemented through CKAN, a data management system which enables organizations and individuals to create and publish datasets, and associated metadata, through a web interface. CKAN is an open-source community project, thus providing a rich number of extensions/plugins.

The data interlinking process (Fig. 8) consists of three main steps: (i) dataset creation, (ii) API communication, and (iii) evidence integration. In step one, a user creates a dataset, containing evidence resources, using CKAN (Fig. 8a). In step two, the API communication (Fig. 8b) handles the evidence requests, i.e., the ARK Platform requests evidence metadata via the CKAN API which returns the requested information as a DCAT record. In step three, (Fig. 8c), users request access to evidence metadata through the ARK Platform, which validates the user’s grants based on the access control, in order to interlink the evidence to the project KG.

Fig. 8
figure 8

Data interlinking process

Datasets created using CKAN can be classified as public or private—public datasets are visible to everyone and private datasets are visible only to users of the owning organization. Private datasets align with the internal classification of the ARK data classification model.

As the ARK-Virus requirements define the visibility of data through a more complex structure than CKAN, the default data classification of CKAN will be altered to align with the ARK data classifications. This will be achieved through CKAN extensions that allow for dataset access to be more restricted than the current private/internal visibility level.

4.4 Data Sharing

TIKD provides the functionality to share data between users from the same organization, considering the ARK-Virus security requirements. Data sharing is performed by means of the data interlinking service and data classifications.

The sharing mechanism allows users from the same organization to share evidence through CKAN. The data classification of the shared evidence remains under the control of the owner or the admin user, i.e., the data classification of shared evidence is not transferable between projects.

The data interlinking service and the sharing mechanism allow organizations to reuse data between projects. Evidence data is shared under a secured scenario where the access control and the data classification determine the visibility of the evidence.

4.5 Subgraph Sharing

The ARK-Virus Project defines a collaborative environment where users can share data from ACEs using a privacy-aware sharing mechanism whereby confidential or sensitive data cannot be shared outside an organization. This sharing functionality helps to reuse information to enrich related ACEs. In this sense, the subgraph sharing service (Fig. 2, subgraph sharing) helps to extend or complement information from one ACE to another.

The subgraph sharing process (Fig. 9) considers the access control policies, from the security control service, to determine which data is accessible to an organization’s users and which data is not, e.g., ACE data defined as public (P-labeled nodes) could be reused by any member of the same organization, whereas restricted data (R-labeled node) cannot be shared with any other member of the organization, i.e., the data defined as restricted is enabled only for the owner of the data, the organization admin, and other explicitly specified users. The accessibility is defined by the data classification (Table 4) of the ACE and its data entities. If the user’s request is allowed, the corresponding subgraph is returned.

Fig. 9
figure 9

Subgraph sharing process. P-labeled nodes represent public data, while R-labeled nodes represent restricted nodes

The sharing methods defined by TIKD enable collaboration between members from the same organization. The subgraph sharing enables the reuse of data between ACEs. These sharing functionalities are handled by the access control policies which determine whether the requester (user) is able to access evidence or subgraph information.

5 Security and Privacy Evaluations of the ARK Platform

This section presents a security evaluation of the ARK Platform considering the requirements of the ISO 27001 (ISO/IEC 27001) standard and the privacy control requirements of the ISO 27701 (ISO/IEC 27701). The ISO 27001Footnote 19 is a specification for information security management systems (ISMS) to increase the reliability and security of systems and information by means of a set of requirements.

The second standard considered for the evaluation of TIKD is the ISO 27701.Footnote 20 The ISO 27701 is the international standard for personally identifiable information (PII). This standard defines a privacy information management system (PIMS) based on the structure of the ISO 27001. The standard integrates the general requirements of GDPR, the Information Security Management System (ISMS) of ISO 27001, and the ISO 27002 which defines the best security practices.

The requirements of the ISO 27701 include 114 security controls of Annex A of ISO/IEC 27001 and the guide of ISO/IEC 27002 about how to implement these security controls. The ISO 27701 defines specific security controls that are directly related to PII, which are grouped into two categories: PII processors (Annex A) and PII controllers (Annex B).

5.1 Security Evaluation

The security evaluation of the ARK PlatformFootnote 21 was conducted using the ISO 27001 GAT. The ISO 27001 GAT can be used to identify gaps in ISO 27001 compliance.

The ISO 27001 GAT consists of 41 questions divided into 7 clauses. Each clause is divided into sub-clauses, containing one or more requirements (questions). For example, the “Leadership” clause is divided into three sub-clauses: the first sub-clause is leadership and commitment which contains three requirements. The first requirement is: “are the general ISMS objectives compatible with the strategic direction?”; a positive answer means that the ISMS supports the achievement of the business objectives. (Figure 10 illustrates this example.)

Fig. 10
figure 10

Excerpt of the ISO 27001 GAT

The ISO 27001 GAT was conducted on the ARK Platform both before and after implementing TIKD. Before implementing TIKD, the ARK Platform only used access control, based on authentication process, to provide access to the platform. The results of both evaluations can be seen in Table 5 where #Req. defines the number of requirements for each sub-clause, Impl defines the number of implemented requirements, and %Impl. defines the percentage of implemented requirements.

Table 5 ARK Platform security evaluation, before and after implementing the TIKD, based on the ISO 27001 GAT

It can be seen that compliance with the ISO 27001 standard increased, from 54% to 85%, after implementing the TIKD on the ARK Platform. There was a notable increase in the “Operation” and “Performance evaluation” clauses after the TIKD was employed. However, there are still some requirements that are yet to be addressed in order to achieve an increased level of compliance with the ISO 27001 standard. Table 6 outlines these unaddressed requirements as well as the action needed to implement them.

Table 6 Unaddressed clauses and the action needed to comply with the ISO 27001 requirement

5.2 Privacy Information Evaluation

The privacy information evaluation of the ARK PlatformFootnote 22 was conducted considering the clauses defined in the ISO/IEC 27701:2019Footnote 23 Annex A and B, concerned with the personal data handling. Annex A, PIMS-specific reference control objectives and controls, defines the control requirements for PII controllers. Annex B, PIMS-specific reference control objectives and controls, defines the control requirements for PII processors.

The ISO 27701 evaluation followed the same configuration as the ISO 27001 GAT evaluation (conducted before and after TIKD). For this evaluation, before the implementation of TIKD, the ARK Platform had documented personal data handling; however, some elements were not fully implemented. After implementing TIKD on the ARK Platform, all personal data handling elements were included. Table 7 shows the evaluation results, where the first and second columns represent the Annex and the target clause. The third column defines the number of control requirement for the corresponding clause. The before TIKD group of columns defines the number and percentage of the implemented control requirements for the corresponding Annex clause. The same applies for the after TIKD group of columns.

Table 7 ARK Platform privacy information evaluation, before and after implementing the TIKD, based on the ISO 27701 Annex A and B

According to the evaluation results, Annex A results (A 7.2–7.5) show a compliance improvement after implementing TIKD, mainly in A 7.3 and A 7.5. In the case of A 7.3, obligations to PII principals, the ARK Platform before TIKD was less accurate than the ARK Platform after TIKD implementation as some control requirements related to implementation aspects were only covered by the latter. In A 7.5, PII sharing, transfer, and disclosure, the ARK Platform before TIKD complied with the documented control requirements; meanwhile, the ARK Platform after TIKD complied with both the documented and implementation control requirements. In this clause, both versions did not comply with the control requirement of “Countries and international organizations to which PII can be transferred are identified and documented” as sharing information with international organizations is beyond the scope of the ARK Platform.

Similar to Annex A, the Annex B results (B 8.2–8.5) show a compliance improvement after implementing TIKD. In B 8.5, PII sharing, transfer, and disclosure control requirements, the low percentage in the ARK Platform after TIKD is due to the fact that the ARK-Virus Project does not define subcontractors for processing personal data. Additionally, the control requirements of B 8.5 are related to countries and international organizations—this is beyond scope of the ARK-Virus Project. In B 8.4, privacy by design and privacy by default, the ARK Platform after TIKD satisfies the control requirements; however, the before TIKD version did not comply with any of the control requirements as they are all related to implementation aspects which were not covered by this version.

6 Conclusions

In this chapter the Trusted Integrated Knowledge Dataspace (TIKD) was presented as an approach to securely share data in collaborative environments by considering personal data handling, data privileges, access control context specification, and a privacy-aware data interlinking.

TIKD was implemented in the ARK Platform, considering the security requirements of the ARK-Virus Project, to explore the extent to which an integrated sharing dataspace, based on Linked Data technologies, personal data handling, data privileges, and interlinking data, contributes to building a trusted sharing dataspace in a collaborative environment. In comparison with state-of-the-art works TIKD integrates solutions for security aspects in compliance with the ISO 27001 security information standard and GDPR-compliant personal data handling in compliance with the ISO 27701 privacy information standard as part of the data security infrastructure.

The TIKD evaluation considers the requirements of the security standard ISO 27001 and the control requirements of the privacy information standard ISO 27701. The security evaluation of the ARK Platform was conducted using the ISO 27001 Gap Analysis Tool (GAT). The evaluation compared two versions of the ARK Platform, a version before TIKD implementation and a version after TIKD implementation. According to the results, the implementation of the TIKD achieved an 85% ISO 27001 compliance score, improving the security aspects of the ARK Platform as compared to the version before TIKD implementation (54% ISO 27001 compliance score).

The privacy information evaluation was conducted considering the control requirements defined by the ISO/IEC 27701:2019 standard and following the same configuration as the security evaluation. According to the results, the ARK Platform after implementing TIKD achieved a 91% ISO 27701 compliance score, improving the privacy information aspects defined by the standard when compared to the version before TIKD implementation (64% ISO 27701 compliance score).

Future work will focus on addressing the remaining ISO 27001 standard requirements. Additionally, the TIKD will be evaluated by the project stakeholders and their feedback will be used to distill further requirements.