{"id":71,"date":"2022-03-12T19:09:32","date_gmt":"2022-03-12T19:09:32","guid":{"rendered":"https:\/\/quxfarm.com\/?p=71"},"modified":"2022-03-12T19:09:39","modified_gmt":"2022-03-12T19:09:39","slug":"fulltext-searching-against-marcsparql-smarcql","status":"publish","type":"post","link":"https:\/\/quxfarm.com\/?p=71","title":{"rendered":"Fulltext Searching Against MARC+SPARQL (SMARCQL)"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">In my <a href=\"https:\/\/quxfarm.com\/?p=39\">initial SMARCQL post<\/a>, I included a SPARQL query for the <a href=\"https:\/\/quxfarm.com\/?p=39#most-frequently-referenced-author-title\">most frequently referenced author\/title<\/a> and pointed out how inconsistencies in MARC records &#8220;stick out like a sore thumb in SPARQL&#8221;. I also hinted that text searching, as opposed to querying, can help deal with some of those difficulties. To facilitate this in practice, Blazegraph supports a hybrid <a href=\"https:\/\/github.com\/blazegraph\/database\/wiki\/FullTextSearch\">FullTextSearch extension<\/a> that is <a href=\"https:\/\/github.com\/realworldobject\/smarcql\/blob\/main\/src\/main\/resources\/fastload.properties#L26-L27\">easily enabled<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To illustrate, here is a hybrid query that uses the text search extension to find references to the terms &#8220;lois chan&#8221;, regardless of order, adjacency, capitalization, and punctuation. The remainder of the query groups the resulting subfields by tag and code so the context and variances are a little more obvious.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>PREFIX tag: &lt;https:\/\/w3id.org\/smarcql\/tag\/>\nPREFIX code: &lt;https:\/\/w3id.org\/smarcql\/code\/>\nprefix bds: &lt;http:\/\/www.bigdata.com\/rdf\/search#>\n\nSELECT ?tag ?code (COUNT(DISTINCT ?rec) AS ?numRecs) ?subfield\nWHERE {\n  ?rec ?tag ?field .\n  ?field ?code ?subfield .\n  ?subfield bds:search \"lois chan\" .\n  ?subfield bds:matchAllTerms \"true\" .\n  \n  FILTER(?code != rdfs:label)\n}\nGROUP BY ?tag ?code ?subfield\nORDER BY DESC(?numRecs)\nLIMIT 10<\/code><\/pre>\n\n\n\n<table border=\"1\"><thead><tr><th>tag<\/th><th>code<\/th><th>numRecs<\/th><th>subfield<\/th><\/tr><\/thead><tbody><tr><td class=\"literal\">tag:bd100<\/td><td class=\"literal\">code:sa<\/td><td class=\"literal\">176<\/td><td class=\"literal\">Chan, Lois Mai.<\/td><\/tr><tr><td class=\"literal\">tag:bd245<\/td><td class=\"literal\">code:sc<\/td><td class=\"literal\">75<\/td><td class=\"literal\">Lois Mai Chan.<\/td><\/tr><tr><td class=\"literal\">tag:bd700<\/td><td class=\"literal\">code:sa<\/td><td class=\"literal\">48<\/td><td class=\"literal\">Chan, Lois Mai.<\/td><\/tr><tr><td class=\"literal\">tag:bd776<\/td><td class=\"literal\">code:sa<\/td><td class=\"literal\">25<\/td><td class=\"literal\">Chan, Lois Mai.<\/td><\/tr><tr><td class=\"literal\">tag:bd245<\/td><td class=\"literal\">code:sc<\/td><td class=\"literal\">17<\/td><td class=\"literal\">by Lois Mai Chan.<\/td><\/tr><tr><td class=\"literal\">tag:bd245<\/td><td class=\"literal\">code:sc<\/td><td class=\"literal\">13<\/td><td class=\"literal\">prepared by Lois Mai Chan for the Library of Congress.<\/td><\/tr><tr><td class=\"literal\">tag:bd700<\/td><td class=\"literal\">code:sa<\/td><td class=\"literal\">12<\/td><td class=\"literal\">Chan, Lois Mai<\/td><\/tr><tr><td class=\"literal\">tag:bd245<\/td><td class=\"literal\">code:sc<\/td><td class=\"literal\">10<\/td><td class=\"literal\">Lois Mai Chan and Richard Pollard.<\/td><\/tr><tr><td class=\"literal\">tag:bd250<\/td><td class=\"literal\">code:sb<\/td><td class=\"literal\">9<\/td><td class=\"literal\">by Lois Mai Chan.<\/td><\/tr><tr><td class=\"literal\">tag:bd245<\/td><td class=\"literal\">code:sc<\/td><td class=\"literal\">9<\/td><td class=\"literal\">Sharon Chien Lin ; forewords by Lois Mai Chan and Ching-Chih Chen.<\/td><\/tr><\/tbody><\/table>\n\n\n\n<p class=\"wp-block-paragraph\">The FullTextSearch extension also supports relevance (cosine similarity) and ranking (ordinal position) , which can be useful for reconciliation or autosuggestion purposes. Once the SMARCQL <a href=\"https:\/\/github.com\/realworldobject\/smarcql\/blob\/main\/src\/main\/resources\/marcslim2rdf.xsl\">mapping<\/a> and <a href=\"https:\/\/realworldobject.github.io\/smarcql\/\">ontology<\/a> are a little more fleshed out, I have some applications in mind to demonstrate more of these possibilities. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>In my initial SMARCQL post, I included a SPARQL query for the most frequently referenced author\/title and pointed out how inconsistencies in MARC records &#8220;stick out like a sore thumb in SPARQL&#8221;. I also hinted that text searching, as opposed to querying, can help deal with some of those difficulties. To facilitate this in practice, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","_crdt_document":"","om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[9,10,8],"tags":[],"class_list":["post-71","post","type-post","status-publish","format-standard","hentry","category-marc","category-smarcql","category-sparql"],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/quxfarm.com\/index.php?rest_route=\/wp\/v2\/posts\/71","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/quxfarm.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/quxfarm.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/quxfarm.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/quxfarm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=71"}],"version-history":[{"count":3,"href":"https:\/\/quxfarm.com\/index.php?rest_route=\/wp\/v2\/posts\/71\/revisions"}],"predecessor-version":[{"id":76,"href":"https:\/\/quxfarm.com\/index.php?rest_route=\/wp\/v2\/posts\/71\/revisions\/76"}],"wp:attachment":[{"href":"https:\/\/quxfarm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=71"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/quxfarm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=71"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/quxfarm.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=71"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}