This post describes the latest release of SMARCQL, which adds handling for the MARC Leader. Note that SMARCQL’s initial release only accounted for MARC “Variable Data Fields” (01X-9XX), which characteristically contain “coded” subfields. For reference, the basic query pattern looks something like this:
PREFIX tag: <https://w3id.org/smarcql/tag/>
PREFIX code: <https://w3id.org/smarcql/code/>
SELECT ?rec ?title
WHERE {
?rec tag:bd245 ?field . # 245 - Title Statement
?field code:sa ?title . # $a - Title
}
LIMIT 5
This rec/tag/field/code/subfield pattern doesn’t work well for the MARC Leader, though, because values are parsed from character positions rather than code delimiters. In the new release, leader positions can now be addressed in queries using a pattern like this:
PREFIX tag: <https://w3id.org/smarcql/tag/>
PREFIX pstn: <https://w3id.org/smarcql/position/>
SELECT *
WHERE {
?rec tag:bdleader ?bdleader .
?bdleader pstn:bdleader_06 ?bdleader_06 .
}
LIMIT 5
A few things to point out:
- The position: namespace is new, but will come into play again later when the 00X-Control Fields are addressed.
- Strictly speaking the MARC Leader isn’t a “tag” even though I assigned “bdleader” to that namespace. The SMARCQL Ontology defines a variety of namespaces, but none seem appropriate for this situation.
- The justification for using tag: is symmetry with the positional 00X-Control Fields. The Leader/06 and 07 influence how 00X fields are parsed, though, so those essentials are being addressed first.
SMARCQL Ontology Treatment
In order to adequately demonstrate queries involving the leader, we need to start fleshing out the ontology. The SMARCQL Ontology is still minimalistic, but it now contains details to help exercise leader-based queries. To start, I updated my Blazegraph instance to index the ontology triples alongside the sample SMARCQL/RDF test data.
wget https://realworldobject.github.io/smarcql/ontology/smarcql.ttl
java -cp blazegraph.jar com.bigdata.rdf.store.DataLoader src/main/resources/fastload.properties smarcql.ttl
Here, then, is a query that lists the leader fields along with potential values and the number of records associated with each.
PREFIX tag: <https://w3id.org/smarcql/tag/>
PREFIX class: <https://w3id.org/smarcql/class/>
SELECT ?positionLabel ?positionValue ?positionValueLabel (COUNT(?rec) AS ?numRecs)
WHERE {
?position rdfs:domain class:BdLeader;
rdfs:range ?class ;
rdfs:label ?positionLabel .
?individual a ?class ;
rdfs:label ?positionValueLabel ;
rdf:value ?positionValue .
OPTIONAL {
?rec tag:bdleader [
?position ?individual
]
}
}
GROUP BY ?positionLabel ?positionValue ?positionValueLabel
ORDER BY ?positionLabel ?positionValue
The result looks like this:
positionLabel | positionValue | positionValueLabel | numRecs |
---|
Bibliographic level | a | Monographic component part | 3304 |
Bibliographic level | b | Serial component part | 14 |
Bibliographic level | c | Collection | 600 |
Bibliographic level | d | Subunit | 354 |
Bibliographic level | i | Integrating resource | 1322 |
Bibliographic level | m | Monograph/Item | 707498 |
Bibliographic level | s | Serial | 66287 |
Type of record | a | Language material | 747970 |
Type of record | c | Notated music | 12 |
Type of record | d | Manuscript notated music | 0 |
Type of record | e | Cartographic material | 86 |
Type of record | f | Manuscript cartographic material | 1 |
Type of record | g | Projected medium | 4757 |
Type of record | i | Nonmusical sound recording | 2320 |
Type of record | j | Musical sound recording | 291 |
Type of record | k | Two-dimensional nonprojected graphic | 249 |
Type of record | m | Computer file | 5409 |
Type of record | o | Kit | 92 |
Type of record | p | Mixed materials | 70 |
Type of record | r | Three-dimensional artifact or naturally occurring object | 15 |
Type of record | t | Manuscript language material | 18107 |
Here are some observations:
- In MARC, leader values are encoded as character codes. In SMARCQL, these codes (strings) have been upgraded to URIs (things) that are defined in the ontology. For example, notice that the output includes labels for the leader positions and values. Those came from the ontology, based on the Library of Congress MARC Bibliographic Format specification.
- The Leader/06 (“Type of record”) and 07 (“Bibliographic level”) values are crude, but start to give a sense of the type of the library resource that the MARC is describing.
Digging Deeper into the Result
The WorldCat-derived dataset used in this example has a “library science” theme, so the record counts in the result reflect that. Eyeballing the list, though, makes me wonder how the twelve hits on Leader/06 “c” (Type of record: Notated music) passed the selection criteria. To look closer, here is a query to list them and see:
PREFIX tag: <https://w3id.org/smarcql/tag/>
PREFIX code: <https://w3id.org/smarcql/code/>
PREFIX pstn: <https://w3id.org/smarcql/position/>
PREFIX ind: <https://w3id.org/smarcql/individual/>
SELECT ?wc ?creatorLabel ?title
WHERE {
?rec tag:bd001 ?bd001 ;
tag:bdleader [
pstn:bdleader_06 ind:bdleader_06-c
] .
BIND(URI(CONCAT("http://www.worldcat.org/oclc/", ?bd001)) AS ?wc)
OPTIONAL {?rec tag:bd100 [code:sa ?creatorLabel] }
OPTIONAL {?rec tag:bd245 [code:sa ?title ] }
}
ORDER BY ?title
The result looks like this:
wc | creatorLabel | title |
---|
<http://www.worldcat.org/oclc/681584959> | | ABC Coast FM Gutenberg experience |
<http://www.worldcat.org/oclc/458544284> | | Books and libraries in the Americas. |
<http://www.worldcat.org/oclc/643257385> | | Buchkultur : |
<http://www.worldcat.org/oclc/820384936> | | Bulletin de la Société des Amis de la Bibliothèque et de l’Histoire de l’Ecole Polytechnique |
<http://www.worldcat.org/oclc/906787592> | Marín Fernández, Josefa. | Estadística aplicada a las ciencias de la documentación/ |
<http://www.worldcat.org/oclc/906785372> | Kerr, George D. | Fidelizar clientes en la biblioteca pública : |
<http://www.worldcat.org/oclc/473663877> | | Fun songs / |
<http://www.worldcat.org/oclc/643845859> | | Health information and libraries journal : |
<http://www.worldcat.org/oclc/946097862> | | Informetrics 89/90 : |
<http://www.worldcat.org/oclc/45253549> | | Just for kids NOT! |
<http://www.worldcat.org/oclc/34835157> | | Just for kids NOT! |
<http://www.worldcat.org/oclc/830140233> | | Toward a theory of librarianship : |
Why “Books and libraries in the Americas.” is cataloged as a musical score or what “Fun songs” has to do with “library science” may be interesting questions, but SMARCQL can only help find potential issues like this not correct them. Also, to be fair, this dataset is from 2016 so some of the problems it reveals may have since been corrected.
SMARCQL Ontology Namespaces
Now that the SMARCQL Ontology is getting fleshed out, it’s worth discussing the namespaces that it declares:
tag: <https://w3id.org/smarcql/tag/>
The tag: namespace represents the notion of a MARC tag, which SMARCQL treats as owl:ObjectProperty. For example, tag:bd245 refers to the “245 – Title Statement” in MARC Bibliographic Format. The “bd” prefix is an abbreviation of “bibliographic data” and serves two purposes. 1) It differentiates the tag’s meaning from other MARC formats. 2) It allows SMARCQL/RDF to be serialized as RDF/XML, which doesn’t allow element names to start with a digit.
code: <https://w3id.org/smarcql/code/>
The code: namespace represents the notion of a MARC subfield code, which SMARCQL treats as owl:DatatypeProperty. For example, code:sa refers to “$a” whose meaning varies based on the tagged field where it occurs. This treatment of code: elements as RDF properties is admittedly awkward because the property’s meaning varies from tag to tag. The problem isn’t insurmountable, but remains a topic for the future.
position: <https://w3id.org/smarcql/position/>
The position: namespace represents positional elements that occur in the MARC Leader and 006/007/008 fields. SMARCQL treats these as owl:ObjectProperty, even though the values in MARC/XML are generally codes. As described above, this upgrade from string to thing allows for enhanced functionality such as label assignment. For now, only two position elements are accounted for in SMARCQL (position:bdleader_06 and position:bdleader_07). The others will come in due course.
individual: <https://w3id.org/smarcql/individual/>
The individual: namespace represents string values in positional MARC/XML elements that have been upgraded to URI-identified things. For now, the only examples are values for Leader/06 and 07. Note that SMARCQL only intends to upgrade positional values that are defined by the MARC Format itself. In particular, SMARCQL makes no effort to upgrade values for the code: properties. Those elements will inevitably be owl:DatatypeProperty and remain so. URIs that correspond to subfield literals might be available in the field’s $0/$1, but if so the queries must look there explicitly.
class: <https://w3id.org/smarcql/class/>
The class: namespace is used to coordinate the domain and range aspects of the SMARCQL properties as well as group identified individuals in a class taxonomy. Details can be discussed later.