FAIR Ontologies for Transparent and Accountable AI: A Hospital Adverse Incidents Vocabulary Case Study

—In this paper, the relation between the FAIR (Find-able, Accessible, Interoperable, Reusable) ontologies and accountability and transparency of ontology-based AI systems is analysed. Also, governance-related gaps in ontology quality evaluation metrics were identiﬁed by examining their relation with FAIR principles and FAcct (Fairness, Accountability, Transparency) governance aspects. A simple SKOS vocabulary, titled “Hospital Adverse Incidents Classiﬁcation Scheme” (HAICS) has been used as a use case for this study. Theoreti-cally, we found that there is a straight relation between FAIR principles and FAccT AI, which means that FAIR ontologies enhance transparency and accountability in ontology-based AI systems. We suggest that ”FAIRness” should be assessed as one of the ontology quality evaluation aspects.


Introduction
During the past decade, Artificial Intelligence (AI) has been extensively deployed in many real-world applications. However, there are risks associated with these applications, such as the lack of fairness, accountability, and transparency. AI governance mechanisms are used to minimize these risks while maintaining full benefits of the AI technology [1]. Data governance can be seen as a prerequisite for AI governance. It is used to control the data quality and compliance with relevant legal and ethical requirements to guarantee trustworthy decisions by AI [2]. Accordingly, it is crucial to facilitate the Fair, Accountable, and Transparent (FAccT) deployment of AI in real-world applications, using governance frameworks [3]. However, data governance has been often overlooked in efforts to create FAccT AI systems.
In this paper, we analyse the role of a set of widely accepted data governance principles, i.e., FAIR (Findable, Accessible, Interoperable, Reusable) [4] in creating FAccT ontology-based AI systems. We investigate answering the following question: "To what extent do FAIR data governance principles improve the transparency and accountability of ontology-based AI?". To answer this question, we analyse the relation between FAIR principles [4] and accountability and transparency of ontology-based AI. We also map a set of ontology quality evaluation metrics to FAIR principles and FAccT to check if these metrics are able to point out FAIRness issues in such semantic resources. Through this, we find governance-related gaps in this set of metrics.
We use a SKOS vocabulary titled "Hospital Adverse Incidents Classification Scheme" (HAICS) as a use case. For governance purposes, we find HAICS FAIRness limitations by comparing it against best practices and recommendations for FAIR semantic artefacts [5], [6]. We also analyse if these recommendations and best practices support accountability and transparency. To analyse the quality of HAICS and the relation between ontology quality evaluation metrics, FAIR principles, and FAccT, we use one subjective method [7] and a set of quality evaluation metrics for SKOS vocabularies [8].
The contributions of this paper are as follows. Theoretically, we found a straight relation between FAIR ontologies and transparency and accountability in ontology-based AI systems; i.e., FAIR ontologies enhance the transparency and accountability of the ontology-based AI systems. We also found that ontology FAIRness cannot be fully shown using the set of ontology quality evaluation metrics under study. As a result, we suggest considering FAIR as one of the aspects of the ontology/vocabulary quality evaluation, which allows assessing governance-related issues before using semantic artefacts in creating ontology-based AI systems.
The reminder of this paper is structured as follows. In section 2 a concise "Related Work" is mentioned. Section 3 overviews the "Design" of the study. The "Evaluation and Analysis" is performed in section 4. Finally, "Conclusion" is mentioned in section five.

FAIR Principles, Best Practices, and Recommendations
We evaluate HAICS based on FAIR principles [4]. These principles have initially been proposed by a multidisci-plinary group from academia, industry, and funding agencies to enhance usability of scholarly digital resources for humans and machines. FAIR principles have gained a wide acceptance [9], [10]. Since their emergence in 2016, several tools in the form of metrics [11], questionnaires [11], [12], [13], [14], checklists [15], [16], and semi-automated evaluators [17] have been suggested to evaluate the FAIRness of digital resources. Azevedo and Dumontier [9] concisely highlight the weaknesses and strengths of the existing methods and clarify the way they should be interpreted.
In addition to the general FAIRness evaluation methods and metrics, there has been some work particularly around FAIRness of semantic artefacts. "D2.2 FAIR Semantics: First recommendations" [5] is an effort towards a practical solution for making semantic resources FAIR. It includes 17 preliminary recommendations (P-Rec.) related to one or more of the FAIR principles and 10 best practice recommendations (BP-Rec.) to improve the global FAIRness of semantic artefacts. Cota [6] presents guidelines and best practices for FAIR ontologies on the Web, which have been suggested with the help of standard practices and pointing to existing tools and frameworks. In [18], the relation between the FAIR principles and semantic web best practices and guidelines, such as [5], [6], [19], [20] has been analysed and alignments and open discussions have been highlighted. To increase data interoperability and integration, Cox, et al. [21] have proposed ten rules to convert a legacy vocabulary (a list of terms available in a print-based glossary or in a table not accessible using web standards) into a standalone FAIR vocabulary.

SKOS/Ontology Quality Evaluation Metrics and Approaches
Ontology evaluation approaches can be divided into eight groups, i.e., rule-based, evolution-based, criteria-based, application-based, data-driven, evaluation by humans, goldstandard-based, and task-based [22]. In this research, we focus on the criteria-based approaches which are applicationindependent and are not as expensive as gold-standard-based and human-based approaches. There are different criteriabased models for ontology evaluation. Ivanova and Popov [23] classify these ontology evaluation approaches, methods, and metrics into three main groups, i.e., domain presentation quality, domain model quality and correctness criteria, and usability and usefulness criteria. Ontology evaluation frameworks by Duque-Ramos et al. [24] and Gangemi et al. [18] are well-known frameworks [25], which divide ontology evaluation criteria into three dimensions: structural, functional, and usability. There are also different tools for automatic evaluation of ontologies, such as OntoMetric [26], TOMM [27], Protégé [28], and OntoKeeper [29].
Since HAICS is a SKOS vocabulary and this kind of vocabularies usually does not contain object and data properties, a lot of ontology quality evaluation metrics are not suitable for evaluating them. However, there are some subjective methods that can be used for evaluating both SKOS vocabularies and OWL ontologies. Silva-López et al. [7] suggest a quantitative model of minimalist verification techniques (QMM) based on the ontology design principles, mentioned in Gruber [30], Köhler [31], and Wiesner and Marquardt [32]. Based on QMM, if the ontology is compliant with a principle, one point is assigned to it, which is adjustable based on how it fulfills the criteria.
There are some quality evaluation metrics, particularly for evaluating SKOS vocabularies. Mader et al. [33] identify 15 potential quantifiable quality issues in SKOS vocabularies and classify them in three categories, i.e., labelling and documentation issues, structural issues, and linked data specific issues. They also formalized and implemented the issues in an open source quality assessment tool, called qSKOS. As a continuation of this work, Suominen and Mader [8] define 26 quality issues and update qSKOS accordingly. In this study, QMM [7] and the 26 SKOS vocabulary quality evaluation metrics by Suominen and Mader [8] are used for analysis.

Use Case and Requirements
HAICS consists of 213 SKOS concepts and 188 semantic relations, which represents a classification scheme for hospital adverse incidents. It has been created using data from our partner hospital in Ireland, Simple Knowledge Organisation System (SKOS), and the R2RML-F tool [34]. This vocabulary has been created in the context of an ontology-based Knowledge Extraction (KE) pipeline, which itself is a part of an ongoing research on AI governance for clinical risk management. It provides necessary vocabulary for extracting risk-related knowledge from hospital adverse incident reports and converting them into knowledge graphs. We use HAICS as our use case to analyse its FAIRness and further analyse the relation between FAIR principles and FAccT. Also, we analyse if ontology quality evaluation metrics can cover FAIR and FAccT evaluation as important aspects of governance.
To use HAICS in the experiments, there are some requirements which should be considered. First, we need to consider that HAICS is a SKOS vocabulary which does not have object and data properties and the quality evaluation metrics need to be chosen accordingly. Second, a comprehensive set of metrics should be chosen to evaluate the quality of HIACS and to see if they can cover governance aspects. Accordingly, we use QMM [7], the SKOS vocabulary quality evaluation metrics by Suominen and Mader, and the qSKOS tool [8] to evaluate HAICS.

Design
Since, there is a strong connection between AI and data governance [2], we analyse the relation between FAIR, as a set of widely accepted data governance principles and FAccT as a set of most important AI governance aspects. Using HAICS as our use case, we evaluate its FAIRness and mitigate its FAIRness limitations using FAIR principles best practices and recommendations for semantic artefacts [5], [6], [19], [20]. To stress the importance of assessing and mitigating governance issues in semantic artefacts, we also evaluate the quality of HAICS using 26 quality metrics for SKOS vocabularies [8] and the QMM subjective method [7] and map the QMM metrics to FAIR principles and FAccT. This allows checking if the metrics are able to point out FAIRness issues in semantic resources. This helps identify gaps in quality evaluation metrics from governance perspective.

FAIR Ontologies and FAccT AI
The goal of Data governance is to have FAccT AI [2] and FAIR, as well known data governance principles, contribute to this goal by enhancing the transparency of locally produced digital resources [35]. Accordingly, FAIR principles contribute to transparency and accountability of ontology-based AI systems by emphasizing on findability and accessibility of digital objects, such as ontologies and linked data [4].
Best practices and recommendations for FAIR ontologies and semantic resources emphasize on "P-Rec. 3: Use a common minimum metadata schema to describe semantic artefacts and their content" [5] and adding accountability metadata, i.e., "license" (CC-BY recommended), "creator", "contributor", "creation date", "previous version", "namespace URI", "version IRI", "prefix", "title", "description", and a human-readable "label" [6]. Accordingly, having FAIR ontologies and vocabularies in place will enhance the transparency and accountability of the ontology-based AI systems by reinforcing using appropriate metadata and suggesting suitable ways to enhance findability, accessibility, and reusability.

HAICS FAIRness
The FAIRness of HAICS has been evaluated by assessing the alignment of its features with FAIR best practices and recommendations [5], [6], [19], [20]. Generally, recommendations and best practices suggest moving towards template and content-pattern unification to achieve uniformity in a semantic representation. They also suggest having a set of agreed-on meta data/annotations to reach transparency and accountability in semantic definitions and usage. Through our evaluation, we found FAIRness limitations in HAICS, as reported in Table 1. Finding and mitigating these limitations help enhancing transparency and accountability of the ontology through targeted expansion of its metadata and increasing its findability, accessibility, and reusability. It also contributes to the transparency and accountability of the ontology-based the AI systems which is going to use HAICS.
According to Table 1, we aim to improve the findability of our ontology by publishing it in a semantic repository and adding annotations to its HTML file. To make it more accessible and reusable, we will make it available in other Two-star ontology rule: Provide human-readable documentation, such as "last modification" metadata (reusability) 5-star by Janowicz et al. [19] Five-star ontology rule: The vocabulary is linked to by other vocabularies formats, such as RDF-XML and add previous version, version IRI, and last modification sections to its metadata. In this way, by being FAIR, the vocabulary will be linked to by other vocabularies as well. However, these actions are simple tasks, they contribute to the transparency and accountability of the AI systems that are going to be built based on HAICS.

HAICS Quality
Machines cannot be held accountable, so in order for humans to feel accountable, there needs to be transparency [36]. Accordingly, one of the important factors for accountability is transparency. Also since HAICS is a classification scheme, inconsistencies and structural issues will cause error and bias in classification results. "Linked data specific issues" metrics affect FAccT since they affect findability and accessibility of the vocabulary. Two "Labelling and Documentation Issues" and "Structural Issues" categories affect reusability and interoperability. "Linked Data Specific Issues" metrics affect FAIRness, since it directly affects findability and reusability. Considering the mentioned relations, in addition to FAIRness, we have assessed the quality of HAICS based on 26 quality evaluation metrics for SKOS vocabularies [8] and the QMM methodology [7] (Table 2 and Table 3). Compliance with most of the SKOS metrics, except "Extra Whitespace in Labels", "Disjoint Classes Violation", and "Invalid URIs", has been checked using the qSKOS tool [8].
According to Table 2, HAICS has failed in six of the 26 SKOS evaluation metrics. There are two similar concepts in different levels of the hospital adverse incident categories and subcategories, which is the reason why "Overlapping Labels" and "Cyclic Hierarchical Relations" have been detected in the vocabulary. HAICS hierarchy has been created using unidirectional "Narrower" relations, in which the reverse direction, i.e., "broader", "hasTopConcept", and "top-ConceptOf" relations, are assumed to be inferable. For this reason, "Unidirectionally Related Concepts", "Unmarked Top Concepts", and "Omitted Top Concepts" have been detected in the vocabulary. Since two of the 213 definitions in HAICS are similar, there are slight redundancies in definitions. Also, in some of the concepts, underline operator has been used to separate different words and some other have been written in the form of camel case. This makes inconsistencies in operators' use. Finally, the vocabulary is particularly related to the hospital adverse incidents which makes it less generalisable and causes the "minimum ontological commitment" metric to be less than +1.
As we evaluated HAICS using QMM, we found some issues related to its metrics, which are worth considering. There is ambiguity and overlap in QMM metrics' concepts and definitions. In addition, since they are calculated subjectively, it is necessary to have a clear knowledge of their important factors. For example, efficiency needs to be better defined to see what factors make an efficient ontology. It can be understood that having a small and simple set of axioms helps efficiency but there is no knowledge of other important factors. Another factor of efficiency can be how errorless and fast the ontology will be analysed by a reasoner [37]. But still being fast depends on the size and type of the

Do QMM Ontology Quality Evaluation Metrics Point Out FAIR and FAccT Issues?
Reusability is a common element between FAIR principles and the QMM. Besides, "reusability" metric itself partially covers findability and accessibility aspects of FAIR principles, since a resource needs to be findable and accessible in order to be reusable. "adaptability" and "reuse of available resources" metrics imply interoperability. Minimalist, coherence, flexibility, and standardization categories of the metrics are closely related to interoperability and reusability principles, since they are assessing ontologies' clarity and transparency.
Some metrics, i.e., "non-subjective definitions", "intelligible definitions", and "documentation" from the minimalist and the coherence categories are also related to FAccT. The reason is that Objective and clear definitions and good documentation not only allow transparency, but also prevent biased output of the ontology-based AI systems by encouraging correct usage of terms in the ontologies. They also allow accountability through enhanced transparency. Flexibility and standardization categories of metrics are mostly relevant to transparency and accountability, since an extensible, customizable, and adaptable ontology which is open access and has minimal encoding bias, enhances transparency and as a result, accountability. Low level of redundancy in definitions and terms allows transparency and accountability and facilitates reuse. Finally, Efficiency, i.e., simple and minimal axioms and simple and easily processable ontology structure, facilitates transparency, accountability, interoperability, and reusability.
Although, QMM ontology quality evaluation metrics have partial overlaps with FAIR principles, they do not fully cover them. Accordingly, we suggest considering FAIRness as one of the important quality aspects of semantic artefacts to help with both AI and data governance.

Conclusion
This paper investigated answering the question: "To what extent do FAIR data governance principles improve the transparency and accountability of ontology-based AI?". The analysis, mentioned in section 4.1, has shown theoretically that FAIR ontologies/vocabularies contribute to transparency and accountability of the ontology-based AI systems by reinforcing using appropriate metadata. We also found that ontology FAIRness of the ontologies and vocabularies cannot be fully assessed by using QMM ontology quality evaluation metrics. Accordingly, we suggest considering FAIR as one of the ontology/vocabulary quality evaluation aspects which allows assessing governance-related issues before using semantic artefacts in creating ontologybased AI systems.
This research is limited in the set of ontology quality evaluation metrics that are analysed due to the limitation of SKOS vocabularies in having data and object properties. As next steps, we plan to measure "FAccTness" of an ontology-based AI system with and without FAIR ontologies to practically prove the relation between FAIR ontologies and FAccT AI. We also plan to expand our case study to include OWL ontologies and analyse more ontology quality evaluation metrics.