Why PoMS as RDF?

According to Wikipedia (page: Linked Data), Tim Berners Lee coined the term "linked data" in 2006 in a document about the Semantic Web project. Nonetheless Wikipedia goes on to cite Bizer, Heath and Berners-Lee's 2009 paper entitled "Linked Data: The Story So Far" as a source for its opening definition of Linked data:

"a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried."

About PoMS

Why is Linked Data relevant to the PoMS project? First of all, I believe Linked and Open Data principles are particularly relevant to PoMS because PoMS is a prosopography, and, generally speaking, I believe that published prosopography offers an almost ideal kind of research that could be expressed as linked data. There are two senses in which prosopography connects with linked data's central principles. First, because a prosopography aims to develop the identity of their historical persons in a way that crosses multiple historical sources, these identified historical people act, by their very nature, as a kind of interlinking between these different sources. Second, a prosopography is, at least potentially, a global object — something used by other researchers throughout the world as a source for identities for historical people. The people-as-entities in a prosopography ideally have a global reach and can thus play a part in the Global Graph that web folk, and those in the Semantic Web and Linked Data in particular, talk about. For these reasons, it seems to me that a prosopography forms the basis for a particularly rich and interesting Linked Data kind of publication.

Furthermore, the People of Medieval Scotland (PoMS), like DDH/CCH's many other prosopographical projects, is constructed based on a representation of its materials in the form of highly structured data. Indeed, like DDH/CCH's other structured prosopographies, PoMS is built on top of that quintessential highly structured paradigm: the relational database, and as a result, PoMS's historical research work has been already expressed in terms of entities, attributes and relationships as they are thought of in the relational data model. Since the Linked Data model is also based on the idea of representing materials in the form of highly structured data that is accessible globally, PoMS's highly structured database would appear to fit well with it.

Finally, unlike the other structured data prosopography which has been expressed by DDH/KDL in Semantic Web's RDF technology, DPRR (http://romanrepublic.ac.uk/rdf) PoMS is in fact created using the factoid prosopography paradign (Bradley 2017) as one of its fundamental semantic principles. Indeed, it is explicitly connected to the Factoid Prosopography Ontology described in Bradley 2017 through its own ontology.

Why a PoMS RDF Server?

We have already said that PoMS is built upon the relational database — and called this the "quintessential structured data paradigmâ€. Here, however, we are talking about a linked data or semantic web representation of PoMS's materials, and although both linked data/semantic web technologies and relational database technologies are built upon a shared basic conception of highly structured data, they are not the same. What, then, is necessary to turn PoMS's already existing database-like structured materials into a publication that fits the similar-but-different Linked Data model?Â In order to think about this most usefully, we need to understand the fundamental principles of Linked Data.

Tim Berners-Lee gave a presentation on linked data at the TED 2010 conference. In it, he restated the linked data principles as three "extremely simple" rules:

"All kinds of conceptual things, they have names now that start with HTTP.
I get important information back. I will get back some data in a standard format which is kind of useful data that somebody might like to know about that thing, about that event.
I get back that information it's not just got somebody's height and weight and when they were born, it's got relationships. And when it has relationships, whenever it expresses a relationship then the other thing that it's related to is given one of those names that starts with HTTP."

More formally, Bizer, Heath and Berners-Lee's 2009 paper, mentioned earlier, specify four criteria that Berners-Lee had described as a "set of 'rules' for publishing data on the Web" in a way that all published data becomes part of a single global data space. These four principles are presented succinctly in Wikipedia's "Linked Data" entry:

Use URIs as names for things
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
Include links to other URIs, so that they can discover more things.

To at least some extent, rules one and two of these four seem to be met already by PoMS's existing web browser oriented web application at URL https://www.poms.ac.uk/. Rule one, for example: there is one publishable URL provided by the browser-oriented web application for each person in PoMS, and it is a RESTful one (definition of RESTful URL's see Wikipedia's definition here). An example from PoMS would be the URL for Abraham, Bishop of Dunblane (fl.1210×14-1220×25): https://www.poms.ac.uk/record/person/749/.This URL, then, could be interpreted as rule 1's "URI" acting as a kind of name for its person. Furthermore, if the existing web app is directly presented with this person's URL, the application will return back an HTML page containing the information PoMS has about that person, you can see this happening when one clicks on the Bishop's URL shown above. Thus, as rule 2 requires, anyone with WWW access can use this HTTP URI to look up that PoMS person. Furthermore, the generated PoMS page provides, as rule 3 says, "useful information" about the entity it refers to — although, of course, the material is presented as an HTML page in a form suitable for presentation by a web browser and is not delivered using the semantic web standards of RDF. Finally, (rule 4) these generated web pages do in fact contain links to other URIs within PoMS.

So, what is missing from the existing PoMS web application that is needed to make it more fully into a Linked Data application? The key issue can be found in the second half of Wikipedia's definition of linked data — items 3 and 4. As Bizer, Heath and Berners-Lee say, to operate as Linked Data, the material has to presented "in a way that can be read automatically by computers." They then go on to say that this enables data from different sources to be "appropriately connected and queried." With the current "browser oriented" web application at www.poms.ac.uk the material is presented in terms of a HTML web page suitable for reading by a human user, rather than in the form that explicitly expresses the formal structured data. Of course, one can apply techniques called "screen scraping" to extract the data from the presented web pages, but screen scraping is broadly understood by its practitioners as awkward to do, and prone to error. Thus, when presented as a set of HTML web pages, PoMS's data cannot readily be processed, as data, by computers, and cannot readily be used as a source to be connected, as data, with other sources. This is why Bizer, Heath and Berners-Lee attach an explicit reference to RDF in their rule 3. RDF is described in its own documentation as a representation of a world-wide "graph-based data model" (section 1.1, https://www.w3.org/TR/rdf11-concepts/). By presenting the PoMS data in RDF, a language specifically designed for interlinking between data that can operate potentially world-wide, and is then available for further computer processing, one can present PoMS's research materials as a more satisfactory Linked Data source.

Building the RDF

The work described and referenced in these pages does exactly this: it turns PoMS's relational database which holds most of the intellectual work embodied in PoMS into RDF, and then uses pieces of RDF-related technology to deliver RDF over the internet to anyone that wants to use it. However, the work done here goes further than just this. In addition:

First, it provides access to RDF data using the query mechanism specifically designed for selecting and processing RDF data: SPARQL (http://www.w3.org/TR/sparql11-query/). By providing a SPARQL endpoint, we are allowing users to select and order data in PoMS in any way that a SPARQL query allows (subject, of course, to length-of-time limitations for processing), and this, in turn, allows for a much broader range of querying than PoMS's current browser-oriented web application data selection methods support.
Second, the work has resulted in the creation of a Semantic Web ontology for PoMS — a formal definition of the structure of PoMS's data. Digital Ontologies are important elements in the whole framework of Semantic Web technologies that go beyond the more modest aims of Linked Data. They provide mechanisms that, because of their formalism, allow the computer on its own to better exploit the link between the structure of PoMS's data and its meaning in the world. And, indeed, PoMS's ontology does allow some of the Semantic Web's ideas related to automatic reasoning to be exploited against this data. However, an ontology for PoMS has a further significant different purpose: it makes it easier for human users to understand what kind of information is in the set of RDF data for PoMS, and how it fits together.

So, the work described here resulted in several products:

The PoMS MySQL relational database was translated into a sequence of RDF triples,
The triples were loaded into an RDF repository, and made available to the WWW via a server,
PoMS's server was extended to support a SPARQL query frontend (a so-called SPARQL endpoint), and
The structure behind RDF's triples has been defined in a basic Semantic Web OWL ontology.

The rest of this PoMS RDF server documentation site talks more about this work, and has three parts:

Using PoMS's RDF Server: First, there is guidance on how to use the PoMS RDF Linked Data server to get at, and query, PoMS materials (Tab "Using the Server" (above)),
PoMS Ontology: then Tab "PoMS Ontology" presents an overview of PoMS's ontology. In addition to this, there is also a web-site presentation of the ontology that was automatically generated by OWLDoc here, and
SPARQL Examples: finally, the tab SPARQL Examples provides a few examples of SPARQL queries against PoMS data to help you get started creating your own queries that serve your interests.

References

Berners-Lee, Tim (2010). "The year open data went worldwideâ€. TED Talk. At https://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide
Bizer, Christian, Tom Heath and Tim Berners-Lee (2009). "Linked Data: the Story So Far". In International Journal on Semantic Web and Information Systems. 5 (3): 1â€“22. doi:10.4018/jswis.2009081901. ISSN 1552-6283. At http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf.
Bradley, John (2017). Factoids: A site that introduces Factoid Prosopography. At https://factoid-dighum.kcl.ac.uk/

PoMS: RDF Services Documentation

Why PoMS as RDF?

About PoMS

Why a PoMS RDF Server?

Building the RDF

References