In my initial SMARCQL post, I included a SPARQL query for the most frequently referenced author/title and pointed out how inconsistencies in MARC records “stick out like a sore thumb in SPARQL”. I also hinted that text searching, as opposed to querying, can help deal with some of those difficulties. To facilitate this in practice, Blazegraph supports a hybrid FullTextSearch extension that is easily enabled.
To illustrate, here is a hybrid query that uses the text search extension to find references to the terms “lois chan”, regardless of order, adjacency, capitalization, and punctuation. The remainder of the query groups the resulting subfields by tag and code so the context and variances are a little more obvious.
PREFIX tag: <https://w3id.org/smarcql/tag/>
PREFIX code: <https://w3id.org/smarcql/code/>
prefix bds: <http://www.bigdata.com/rdf/search#>
SELECT ?tag ?code (COUNT(DISTINCT ?rec) AS ?numRecs) ?subfield
WHERE {
?rec ?tag ?field .
?field ?code ?subfield .
?subfield bds:search "lois chan" .
?subfield bds:matchAllTerms "true" .
FILTER(?code != rdfs:label)
}
GROUP BY ?tag ?code ?subfield
ORDER BY DESC(?numRecs)
LIMIT 10
tag | code | numRecs | subfield |
---|---|---|---|
tag:bd100 | code:sa | 176 | Chan, Lois Mai. |
tag:bd245 | code:sc | 75 | Lois Mai Chan. |
tag:bd700 | code:sa | 48 | Chan, Lois Mai. |
tag:bd776 | code:sa | 25 | Chan, Lois Mai. |
tag:bd245 | code:sc | 17 | by Lois Mai Chan. |
tag:bd245 | code:sc | 13 | prepared by Lois Mai Chan for the Library of Congress. |
tag:bd700 | code:sa | 12 | Chan, Lois Mai |
tag:bd245 | code:sc | 10 | Lois Mai Chan and Richard Pollard. |
tag:bd250 | code:sb | 9 | by Lois Mai Chan. |
tag:bd245 | code:sc | 9 | Sharon Chien Lin ; forewords by Lois Mai Chan and Ching-Chih Chen. |
The FullTextSearch extension also supports relevance (cosine similarity) and ranking (ordinal position) , which can be useful for reconciliation or autosuggestion purposes. Once the SMARCQL mapping and ontology are a little more fleshed out, I have some applications in mind to demonstrate more of these possibilities.