Metadata 2020 Metadata Evaluation Projects

Metadata 2020: a cross-community collaboration that advocates richer, connected, reusable, and open metadata for all research outputs to advance scholarly pursuits for the benefit of society. A group of volunteers working together trying to encourage and facilitate progress towards this challenging goal. Management guru Peter Druker famously said “If you can’t measure it, you can’t improve it”. With that in mind, several Metadata 2020 projects examined approaches to metadata evaluation and connections between evaluation and guidance. Accomplishing this progress across the broad expanse of the Metadata 2020 landscape requires connecting metadata dialects and community recommendations and analysis of multiple metadata corpora. This paper describes one framework for approaching that task and some potential examples.

I focus on this connection using the terminology proposed on the Metadata 2020 Blog NO.MEN.CLA.TURE (Habermann 2017) and described by (Gordon and Habermann 2018).
Scientific communities that recognize the need for metadata typically address that need using one of several approaches: they either use a metadata standard proposed by a related community or organization, or they develop a community standard. In most cases, they also include a standard representation for the metadata. We refer to these representations as metadata dialects. These metadata dialects include concept names, definitions and associated structures. A concept is a general, dialect-independent term for describing a documentation entity, frequently translating into an element or attribute defined in XML or some other representation. Typically, the communities or organizations that develop these standards also develop a set of recommendations for metadata content. We refer to these as dialect recommendations, i.e. recommendations explicitly associated with a dialect and its creators.
The relationship between dialects and recommendations is illustrated in Fig. 1 Metadata (Federal Geographic Data Committee 1998), as examples. LTER uses the EML dialect (D ) and their initial recommendation had five levels (R , R , R , R and R ). All the concepts in the recommendation are included in the dialect. In some cases, the recommended concepts are required by the XML schema used to implement the dialect, illustrated as special case R . There are many similar examples of metadata dialects and recommendations.
When another community, like FGDC, creates a second dialect (D ) with recommendations at two levels (R and R ), there is typically overlap between the dialects (most often for discovery content) and the recommendations, e.g. R and R in Fig. 1. Recommendations can also be created in a conceptual space independent of specific dialects, e.g the Catalog Service for the Web (Wikipedia contributors 2019) developed by the Open Geospatial Consortium (2020). Mappings from recommended concepts to metadata elements in specific dialects must be created in order to apply these dialectindependent recommendations to specific dialects. In these cases, some dialects may not include all of the recommended concepts.

Approaches to metadata evaluation
Given the context of recommendations, dialects and communities, there are several different approaches that can be taken to metadata evaluation: Record Curation: Do records conform to rules?
Individuals or teams (curators) select metadata records from a collection and evaluate those records against a set of rules. Some rules might be amenable to automatic checks (are links current and alive?), and some might require individual inspection (are abstracts understandable or correct?). This approach typically results in reports with suggested improvements for specific records.

Record/Collection Completeness: Are recommended concepts in the collection?
With a given recommendation or set of recommendations, a collection of records is checked for the existence of the elements that correspond to recommended concepts. In order to use this approach, the dialect and the mapping of the recommended concepts to the dialect must be known. This approach typically results in reports of completeness (%) of the collection with respect to each recommended concept.

Record Consistency: Is content consistent across the collection?
Records across a collection are examined to ensure that the content of elements that are expected to be the same are the same, e.g. names and identifiers of organizations are consistent, keywords are included from the correct shared vocabularies, units/other properties are reported consistently across parameters, etc. This approach typically results in reports of the most common content and variants for specific elements.

Dialect Utilization: What content is in the collection?
Dialects typically include content that is optional or not in any recommendations (see Fig.  1). This is where communities might experiment with new ideas and content in order to address new or emergent community needs and use cases. This can be the case when dialects are designed to support multiple use cases, e.g. discovery, access, use and understanding, while recommendations are focused on particular use cases, e.g. discovery. This approach typically results in reports of new use cases and proposed approaches for addressing them in specific dialects.
Many of these approaches measure aspects of the "quality" of the metadata content and are often referred to as measures of metadata quality. In the big picture, understanding and evaluating metadata quality is highly dependent on the user community and is most effectively done using the question: Are the metadata sufficient to allow users to independently use, understand and trust these data. If the answer is yes, the metadata quality is high. Thus, testing overall metadata (or documentation) quality without user involvement is difficult, but community recommendations are designed, developed and evolved to serve as useful proxies for this user involvement.

Facilitating change
One goal of Metadata 2020 is to facilitate understanding and evolutionary change in the entire metadata community, i.e. researchers in many disciplines, data providers, funders, repositories, libraries, service providers, publishers and users. Fortunately we already have enthusiastic partners and champions in many of these communities. Even with this larger team, formulating and implementing a framework for change is difficult. The book "Switch: How to Change Things When Change is Hard " by Heath and Heath (2010) is a great source of ideas related to organizational change. They break the task into several pieces: direct the rider (appeal to the intellect), motivate the elephant (appeal to emotion), and shape the path (work the environment). I wrote up some ideas about applying their framework to metadata evaluation (Habermann 2014) that might be interesting or helpful.
A related approach to facilitating change that might contribute to a useful framework is called "positive deviance" (Positive Deviance Collaborative 2020). This involves, in a small nutshell, finding good examples, i.e. bright spots, and using them to understand how change can happen and guide communities forward.

The projects
Hopefully these ideas will provide a common vocabulary for discussions of metadata evaluation and a framework for identifying tasks and getting them done. Of course, others may bring new ideas to the group that will augment or replace these.
Specifically, Metadata 2020 volunteers might aim initially at 1) identifying and understanding recommendations inside and outside of their communities, 2) connecting and harmonizing concepts across those recommendations and 3) connecting those concepts to important metadata dialects they work with. Once these connections are made, the completeness and consistency of actual metadata content can be evaluated.
In any case, the first task is identifying partners, understanding their potential contributions, and uniting those contributions into real-world work.

Project schematics
Pictures can help some people (i.e. visual learners) understand ideas and frame questions or improvements. I am one of those visual learners and I put together some pictures to describe two of the Metadata 2020 projects that I think are closely related.
Metadata Recommendations and Element Mappings (Project 2, Fig. 2) builds on the concepts described above. This picture shows four dialects (D ), each of which has a related recommendation (R ). As described above, these things often come in pairs. Each of these recommendations includes four concepts represented by single letters. Three of these concepts are shown in the middle of the picture as circles, along with the identifiers of the dialects (and recommendations) that include them. As we can see, concepts A and B occur in three recommendations and concept Z occurs in two. It is important to note that not all concepts are ubiquitous across all dialects (see Can We Agree, Habermann 2018).
Identifying concepts that are shared across recommendations is the first part of Project 2.
As each of these concepts has an implementation in each of the dialects, the mapping of concepts included in the recommendations leads fairly directly to a mapping between elements (the second phase of Project 2).  Once common concepts are identified and their mappings to implementations are known, we can evaluate their completeness in metadata collections for various dialects and in collections. This process is illustrated in Fig. 3. In this case D1-D4 represent collections of metadata records in one of the four dialects. We know how the common concepts are represented in these dialects, so we can simply count the number of occurrences to determine the occurrence rate (#occurrences / #records) for each concept. We can apply other tools described above to quantitatively characterize other aspects of the metadata quality. During this process, good metadata examples can be identified and integrated into guidance to help metadata creators understand the concepts and how to implement them well in various dialects.
Note that some concepts may not occur in all dialects. Identifying these missing concepts provides important information for dialect developers and input on possible future directions.
This evaluation can involve some manual work -i.e. downloading a bunch of metadata and counting things up. This approach may work during the exploratory phase -trying to identify patterns and infer collection behaviors, but automated approaches that can be applied at scale are also helpful. See Metadata Evolution -CrossRef Participation Reports (Habermann 2019) for an example.
Evaluating collections is interesting for several reasons. First, it may help us identify common organizational behaviors and practices, to determine who does what and who else does similar (or different) things. My tendency is to focus on commonalities and things that work. Second, it helps us identify gaps in some collections that provide opportunities to improve while providing metrics that we might use to measure and demonstrate that improvement. Organizations that are trying to fill these gaps need consistent guidance (part two of this project) that includes use cases and good examples hopefully from within a single organization or from another that shares mission and goals. Finally, and most importantly, evaluations help us find those good examples that we can integrate into the guidance along with stories from the authors that describe why they did something well and how it helped them achieve a goal(s) for their users.

Conclusion
Metadata 2020 includes participants active in all phases of the scientific data life-cycle: researchers, metadata experts, librarians, publishers and users from many domains. They use many metadata dialects and support use cases across the life-cycle and domains. Developing common understanding across this broad group required awareness of differences in community vocabularies and connections between those vocabularies. Building on this understanding to evaluate metadata collections in multiple dialects and create guidance for practitioners are exciting future directions. Hopefully the ideas described here will be helpful in explaining what Metadata 2020 was thinking about and to help others find related stories to help us move forward.