===== Querying with SPARQL ===== * Last verification: **20220909** * Tools required for this lab: * A tool for querying RDF files using SPARQL. Choose one of the two: * **[preferred]** **[[http://sparql.org/sparql.html|SPARQLer]]** (a general purpose SPARQL query processor) -- it is more powerful (because it implements CONSTRUCT queries) but it is less friendly because it only accepts links to RDF/XML files, so you have to prepare your document first: * Convert your document from Turtle to RDF/XML (using tools from [[.:lab_rdf|previous lab]]) * Then host the file somewhere -- we need the **direct URL** of this file for querying it using SPARQLer (not the URL of download page), i.e., if you are hosting the file using the Dropbox, change the www.dropbox.com to dl.dropboxusercontent.com in the sharing link, e.g.: https://www.dropbox.com/s/kc3g05y0k7t1mbw/foaf.rdf # sharing link generated by Dropbox https://dl.dropboxusercontent.com/s/kc3g05y0k7t1mbw/foaf.rdf # direct URL for SPARQLer * **[[https://rdfshape.weso.es/dataQuery|RDFShape Data query]]** * A tool for querying SPARQL Endpoints: * **[preferred]** **[[https://yasgui.triply.cc/|Yasgui]]** (Yet Another Sparql GUI) -- it has more powerful editor than RDFShape (but it can't be used against simple RDF files) * **[[https://rdfshape.weso.es/endpointQuery|RDFShape Endpoint query]]** * //Not required, but can be useful:// [[http://sparql.org/query-validator.html|SPARQLer Query Validator]] ==== Prepare yourself for the lab ==== * If you missed the lecture about SPARQL, at least watch the video: [[https://www.youtube.com/watch?v=FvGndkpa4K0|SPARQL in 11 minutes]] * {{sparql-cheat-sheet.pdf|SPARQL by Example: the Cheat Sheet}} (from http://www.slideshare.net/LeeFeigenbaum/sparql-cheat-sheet) ==== Lab instructions ==== At the end of the lab, each group should email their first project to the teacher. It consist of: - a list of the names of all project authors, - the final ''*.ttl'' file with the graph developed during the [[lab_rdf|previous lab]], - the set of SPARQL queries against the knowledge graph developed during today's lab (up to //Section 4. ASK and DESCRIBE queries//). === 1. SPARQL = Pattern matching [20 minutes] === * General Idea: **SPARQL is an RDF graph pattern matching system.** * E.g.: there is a triple saved in RDF: :Hydrogen :standardState :gas . * Now we can simply replace part of the triple with a question word (with a question mark at the start) and we get simple queries, e.g.: * //Query:// '':Hydrogen :standardState **what?** .'' \\ //Answer:// '':gas'' * //Query:// ''**?what** :standardState :gas .'' \\ //Answer:// '':Hydrogen'' * //Query:// '':Hydrogen **?what** :gas .'' \\ //Answer:// '':standardState'' - Do you have __your knowledge graph__, developed during the [[lab_rdf|previous lab]]? If not, now is the time to find it! - Open the preferred tool for querying RDF files using SPARQL (see **Tools required for this lab** at the top of this page) and execute your first simple SELECT query against your knowledge graph: SELECT ?a ?b ?c WHERE { ?a ?b ?c } LIMIT 10 - Now, it's time to explore your graph more! Prepare **two queries** for your graph that extract some interesting information. Use only triple patterns -- we will move to more complicated things in the subsequent sections. * If you want to ask about all members of a container, you can use the ''rdfs:member'' which is equivalent to all ''rdf:_1'', ''rdf:_2'', ... relations, e.g.: BASE PREFIX rdf: PREFIX rdfs: SELECT ?who ?place WHERE { ?who [ a rdf:Bag ; rdfs:member ?place ] } LIMIT 10 selects all pairs of people and places visited by them (can be executed against ''https://krzysztof.kutt.pl/didactics/semweb/bob_and_mona_lisa.rdf'' file) * [[http://www.w3.org/TR/sparql11-query/|SPARQL 1.1 Query Language]] may be useful. * Save the queries for the report! === 2. Constraints: FILTER [15 minutes] === After matching RDF graph pattern, there is also possibility to put some constraints on the rows that will be excluded or included in the results. This is achieved using FILTER construct. Let's try it now on your knowledge graphs. - Your graph should contain at least a few different datatypes (this was a requirement in a previous lab!). Select **two** of them (e.g., boolean, string, numeric, date) and check what functions can be used in filters for them. * [[https://docs.data.world/tutorials/sparql/list-of-sparql-filter-functions.html|List of SPARQL Filter Functions]] -- quite good as a short reference * [[https://www.w3.org/TR/sparql11-query/#func-strings|Functions on Strings]] * [[https://www.w3.org/TR/sparql11-query/#func-numerics|Functions on Numerics]] * [[https://www.w3.org/TR/sparql11-query/#func-date-time|Functions on Dates and Times]] - Prepare and execute **two queries** (one for each selected datatype) that filter something interesting in your knowledge graph. * Save the queries for the report! === 3. SPARQL as rule language [15 minutes] === So far, we have seen that the answers to questions in SPARQL can take the form of a table. In this section, we will take a look at CONSTRUCT queries which answers take the form of an RDF graph. They provide a way to introduce "rules" into RDF datasets: * Let's back to [[lab_rdf|the model you prepared previously]]. Probably you had a problem which relations should be placed in RDF file: ''is_father_of'' or ''is_child_of'' or maybe both of them? * CONSTRUCT queries make this simpler. In the initial data set you can put one of them, let's assume it was ''is_father_of''. Now, you can execute CONSTRUCT query that creates inverse relation: PREFIX bb: . CONSTRUCT { ?child bb:is_child_of ?father . } WHERE { ?father bb:is_father_of ?child } * Or maybe ''is_uncle_of'' relation will be useful? No problem! PREFIX bb: . PREFIX rdf: CONSTRUCT { ?uncle bb:is_uncle_of ?child . } WHERE { ?uncle bb:is_sibling_of ?parent; a bb:Man. ?child bb:is_child_of ?parent } * OK, we created some new RDF triples using CONSTRUCT query. What now? Depending on your plans, you can: * Add these triples back to the original dataset, * Create new dataset (e.g., save results in RDF file). * And then simply execute queries against this new knowledge. * //Note: there are more powerful ways to define rules in knowledge graphs -- [[lab_rules|we will explore them later]].// - Now, it's time for you to develop **2 CONSTRUCT queries** that provide useful rules for your knowledge graph! * Save the queries for the report! === 4. ASK and DESCRIBE queries [15 minutes] === SPARQL also provides two more query types: * **[[https://www.w3.org/TR/sparql11-query/#ask|ASK queries]]** simply provide Yes/No answer and no information about founded triples (in case of "Yes" answer). * **[[https://www.w3.org/TR/sparql11-query/#describe|DESCRIBE queries]]** return all knowledge associated with given Subject URI(s). - Prepare at least **one ASK query** that checks something interesting in your knowledge graph. - Prepare at least **one DESCRIBE query** that describe the most interesting "thing" in your knowledge graph. - Save the queries for the report! === 5. DBpedia SPARQL Endpoint [30 minutes] === SPARQL queries may be asked against RDF file as we did in previous sections. But there is also possibility to use special purpose web services called SPARQL Endpoints. As we already know [[wp>Wikidata|Wikidata]], we will explore the [[http://dbpedia.org/|DBpedia]] in this section. - Do you remember your task from the [[lab_intro|first lab]]? You were asked to prepare a list of **the 15 most populous countries in Europe** based on Wikipedia. Now we know enough to not do it manually but use the SPARQL language and DBpedia instead! - As DBpedia is a dump of Wikipedia, it should contain some information about Poland. We don't know what URI Poland has in DBpedia, but we know the name Poland, and we remember that ''rdfs:label'' property is useful. Maybe this will help us? Let's try! - Open the preferred tool for querying SPARQL Endpoints (see **Tools required for this lab** at the top of this page). - Enter **''http://dbpedia.org/sparql''** as SPARQL Endpoint. - What we know so far? There should be some URI (''?country'') that probably has a relation ''rdfs:label'' with object ''"Poland"@en''. This can be easily translated into SPARQL query:PREFIX dbo: PREFIX dbr: PREFIX dbp: PREFIX rdfs: SELECT ?country WHERE { ?country rdfs:label "Poland"@en . } * Hint: some useful prefixes are already in place to assist you in this task. - Success! Now, we can expand this query to find information about the population of Poland. * Hint: the following line may be useful to get only objects that are numbers (like population) FILTER(isNumeric(?val)) - Now, prepare the actual query that returns a list of 15 countries in Europe with the biggest population! === 6. Aggregation [30 minutes] === * SPARQL provides grouping and aggregation mechanisms known from SQL: * grouping: GROUP BY * aggregation: COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT, and SAMPLE * filter on groups: HAVING * See [[http://www.w3.org/TR/sparql11-query/#aggregates|SPARQL 1.1 documentation]] for wider description. - Poland is divided into 16 voivodeships (PL: województwo), and then into 314 counties (PL: powiat). In this task, we will examine it closer. - Prepare a query (using preferred tool for querying SPARQL Endpoints, against DBpedia) which returns list of voivodeships and number of counties inside them. List should consist only of voivodeships with 20 or more counties and should be ordered by number of counties. - Results should look like that: \\ {{dbpedia-voivodeships-counties.jpg?direct&600|}} - **Hint** -- useful URIs (you can use ''dbo'', ''dbr'' and ''dbp'' prefixes defined in previous section): * county: ''dbr:Powiat'' * voivodeship: ''dbr:Voivodeships_of_Poland'' ==== Learn more! ==== SPARQL: * [[http://www.w3.org/TR/sparql11-query/|SPARQL 1.1 Query Language]] * [[http://www.w3.org/TR/sparql11-overview/|SPARQL 1.1 Overview]] * [[http://www.cambridgesemantics.com/semantic-university/learn-sparql|Learn SPARQL @Cambridge Semantics]] * You can combine results from many SPARQL Endpoints in one query -- see [[https://www.w3.org/TR/sparql11-federated-query/|SPARQL Federated Query]] for more information. * [[https://docs.data.world/tutorials/sparql/|SPARQL Tutorial by data.world]] * {{sparql-cheat-sheet.pdf|SPARQL by Example: the Cheat Sheet}} (from http://www.slideshare.net/LeeFeigenbaum/sparql-cheat-sheet) Sample queries in SPARQL: * [[http://chem-bla-ics.blogspot.com/2018/09/wikidata-query-service-recipe.html|Wikidata Query Service recipe: qualifiers and the Greek alphabet]] * **[[https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples|Big set of SPARQL queries against Wikidata]]** * [[https://semantic-web.com/sparql-analytics-proves-boxers-live-dangerously/|SPARQL analytics proves boxers live dangerously]] * [[http://www.snee.com/bobdc.blog/2017/11/sparql-queries-of-beatles-reco.html|SPARQL queries of Beatles recording sessions]] Tools: * [[http://sparql.org/sparql.html|SPARQLer]] -- general purpose tool for executing SPARQL queries * [[http://sparql.org/query-validator.html|SPARQLer Query Validator]] * **[[https://yasgui.triply.cc/|YASGUI]]** -- online visual tool for querying SPARQL Endpoints * [[https://rdfshape.weso.es/|RDFShape]]: * [[https://rdfshape.weso.es/dataQuery|Data query]] -- to execute SPARQL queries against RDF files * [[https://rdfshape.weso.es/endpointQuery|Endpoint query]] -- to execute queries against SPARQL Endpoints * [[http://jena.apache.org/tutorials/sparql.html|Apache Jena -- SPARQL]] * [[https://en.wikipedia.org/wiki/GeoSPARQL|GeoSPARQL]] -- standard for representing and querying the geospatial data using RDF DB2RDF (RDF and Relational Databases): * [[http://esw.w3.org/RdfAndSql|RDFandSQL]] * [[http://www.w3.org/wiki/ConverterToRdf#SQL|ConvertToRDF -- SQL]]