Fulltext Searching Against MARC+SPARQL (SMARCQL)

In my initial SMARCQL post, I included a SPARQL query for the most frequently referenced author/title and pointed out how inconsistencies in MARC records “stick out like a sore thumb in SPARQL”. I also hinted that text searching, as opposed to querying, can help deal with some of those difficulties. To facilitate this in practice, Blazegraph supports a hybrid FullTextSearch extension that is easily enabled.

To illustrate, here is a hybrid query that uses the text search extension to find references to the terms “lois chan”, regardless of order, adjacency, capitalization, and punctuation. The remainder of the query groups the resulting subfields by tag and code so the context and variances are a little more obvious.

PREFIX tag: <https://w3id.org/smarcql/tag/>
PREFIX code: <https://w3id.org/smarcql/code/>
prefix bds: <http://www.bigdata.com/rdf/search#>

SELECT ?tag ?code (COUNT(DISTINCT ?rec) AS ?numRecs) ?subfield
WHERE {
  ?rec ?tag ?field .
  ?field ?code ?subfield .
  ?subfield bds:search "lois chan" .
  ?subfield bds:matchAllTerms "true" .
  
  FILTER(?code != rdfs:label)
}
GROUP BY ?tag ?code ?subfield
ORDER BY DESC(?numRecs)
LIMIT 10
tagcodenumRecssubfield
tag:bd100code:sa176Chan, Lois Mai.
tag:bd245code:sc75Lois Mai Chan.
tag:bd700code:sa48Chan, Lois Mai.
tag:bd776code:sa25Chan, Lois Mai.
tag:bd245code:sc17by Lois Mai Chan.
tag:bd245code:sc13prepared by Lois Mai Chan for the Library of Congress.
tag:bd700code:sa12Chan, Lois Mai
tag:bd245code:sc10Lois Mai Chan and Richard Pollard.
tag:bd250code:sb9by Lois Mai Chan.
tag:bd245code:sc9Sharon Chien Lin ; forewords by Lois Mai Chan and Ching-Chih Chen.

The FullTextSearch extension also supports relevance (cosine similarity) and ranking (ordinal position) , which can be useful for reconciliation or autosuggestion purposes. Once the SMARCQL mapping and ontology are a little more fleshed out, I have some applications in mind to demonstrate more of these possibilities.