===== Introduction ===== * Last verification: **20220909** * Tools required for this lab: **Pens and paper** ==== Prepare yourself for the lab ==== * Some introduction/motivation: * [[http://www.slate.com/articles/technology/future_tense/2015/11/why_does_google_say_jerusalem_is_the_capital_of_israel.html|Why Does Google Say Jerusalem Is the Capital of Israel?]] * [[https://opensource.com/life/15/11/segrada-open-source-semantic-graph-database|Historians and detectives keep track of data with open source tool]] ==== Lab instructions ==== === 1. Data in the Wikipedia [15 minutes] === Wikipedia contains a huge amount of information, so it can be used as a source for various summaries. Is it a **convenient** source of knowledge? Let's check it out! - Your task is to prepare **a list of the 15 most populous countries in Europe** based on [[https://www.wikipedia.org/|Wikipedia]]. Do NOT use any other websites for this purpose. The ready-made lists available on Wikipedia are NOT reliable, as they often have outdated numbers! === 2. Wikidata and DBpedia [10 minutes] === Processing Wikipedia data was tedious, huh? Luckily, there are [[https://www.dbpedia.org/|DBpedia]] and [[https://www.wikidata.org/|Wikidata]]! Both enable machine learning processing of Wikipedia data. - //We don't need no introduction...// \\ Simply go to the page: [[https://query.wikidata.org/]]. Click on ''Examples'', select ''Countries sorted by population'' and click ''Execute'' (the big arrow button). - Whoa, we have an up-to-date list of countries sorted by population! How does it work? - Look at [[https://www.wikidata.org/wiki/Q36|wikidata/Poland]] page and figure out how all the knowledge is stated here. The picture below may be helpful in understanding: \\ [[https://en.wikipedia.org/wiki/Wikidata|{{:courses:semint:wikidata_datamodel.png?direct|}}]] - Actually, the whole thing is very simple: we have an entity (this is the page we are on), some property (on the gray background, in the left column) and some value (on the white background). - Have a look at other example queries for Wikidata at [[https://query.wikidata.org/]]. You don't need to understand them now, just see what the possibilities are - we'll come back to this in a few weeks. === 3. Linked Open Data [10 minutes] === Wikidata isn't the only one that stores data that's easy for machine processing... - Read about the [[wp>Linked_Data|Linked Data]] idea (and the [[http://www.w3.org/DesignIssues/LinkedData.html|original note by T. Berners-Lee, plus the 5 star system]]) - Analyze the [[http://lod-cloud.net/|clickable LOD diagram]], choose 3 interesting datasets and in a few words describe them to your colleague. === 4. FOAF [10 minutes] === You can easily create such data yourself! - Read about [[wp>FOAF_(ontology)|FOAF]] (the pre-Facebook social network!). - Create your FOAF file with: [[http://www.ldodds.com/foaf/foaf-a-matic|foaf-o-matic]] - Save your FOAF file. - [If you have the possibility] Publish your file so that it can be referenced with URL. Then, visualize your FOAF file with [[http://foaf-visualizer.gnu.org.ua/|FOAF.Vix]]. Simply put the URL as an ''uri'' argument to the FOAF.Vix, e.g.: http://foaf-visualizer.gnu.org.ua/?nocache=1&uri=http://krzysztof.kutt.pl/foaf.rdf * We need the **direct URL** of this file. If you are hosting the file using the Dropbox, change the www.dropbox.com to dl.dropboxusercontent.com in the sharing link, e.g.: https://www.dropbox.com/s/kc3g05y0k7t1mbw/foaf.rdf # sharing link generated by Dropbox https://dl.dropboxusercontent.com/s/kc3g05y0k7t1mbw/foaf.rdf # direct URL for SPARQLer === 5. Images annotation [5 minutes] === - Open [[http://www.kanzaki.com/works/2016/pub/image-annotator|Image Annotator]] - Enter URL for some image you like - Select some regions on the picture and add descriptions for them - Generate file using "Show JSON-LD" button - Analyse the file. How regions' information is represented? === 6. RDF model (and Mona Lisa) [5 minutes] === * RDF model is a directed graph built from //Statements// a.k.a. //triples// * Each Statement consists of: //subject//, //predicate// and //object// * Subject can be an //URI// or an //empty node// * Predicate can be an //URI// * Object can be an //URI//, an //empty node// or a //literal// - Let's consider a simple knowledge graph (//taken from [[http://www.w3.org/TR/rdf11-primer/|RDF 1.1 Primer]]//): \\ {{rdf-primer-graph1.jpg?direct&550|}} - It is very informal and vague... So we can make it more concrete using URIs for every element in the graph. Note that we are using existing vocabularies: [[http://www.foaf-project.org/|FOAF]] (''foaf:'') and [[http://dublincore.org/metadata-basics/|Dublin Core]] (''dcterms:''). \\ {{rdf-primer-graph4.jpg?direct&550|}} - Every arrow represents now a simple RDF Statement (RDF triple). - Compare this to the knowledge stored in Wikidata that you looked at earlier - do you see similarities? === 7. Modeling knowledge with RDF graphs [30 minutes] === RDF is a data model based on principle of representing relational information as labeled directed graphs. - In this task you will represent a piece of knowledge with use of the RDF graphs. Firstly, select one of the topics (we will use this topic on subsequent labs): - **The Bold and the Beautiful** -- you can use a [[wp>The_Bold_and_the_Beautiful#Premise]] section on Wikipedia (or [[http://pl.wikipedia.org/wiki/Moda_na_sukces#Historia_rodziny_Forrester.C3.B3w|the polish one]]) - **The Game of Thrones** -- you can use a [[wp>A_Song_of_Ice_and_Fire#Plot_synopsis]] section on Wikipedia - //Another complex story from a book/series/movie you like// :-) - Read the selected fragment and extract as much information as you can. - **Draw a graph** (yes, with a pen and paper) representing the relations you identified in the fragment. Of course, //"there's more than one way to do it"//. - Draw regular resources (i.e. representing persons, places etc.) as oval nodes. Draw datatype values (e.g. dates, numbers representing age etc.) as rectangular nodes. - You don't need to write URIs, simply identify the resources with names and surnames etc. - Keep your sketch in a safe place -- we will use it on the next lab! :-) ==== Learn more! ==== Reading: * [[https://github.com/JoshData/rdfabout/blob/gh-pages/intro-to-rdf.md|What is RDF and what is it good for?]] * [[http://www.w3.org/TR/turtle/|Turtle syntax for RDF]] * [[http://www.w3.org/TR/rdf11-concepts/|RDF Abstract Syntax]] * [[http://www.w3.org/2000/10/swap/Primer.html|Primer: Getting into RDF & Semantic Web using N3]] * RDFS enables simple reasoning: [[https://www.w3.org/TR/rdf11-mt/#patterns-of-rdfs-entailment-informative|Patterns of RDFS entailment]] Common vocabularies: * [[http://www.w3.org/TR/skos-primer/|SKOS]] * [[http://www.dublincore.org/metadata-basics/|Dublin Core]] * [[http://xmlns.com/foaf/spec/|FOAF]] Tools: * [[https://rdfshape.weso.es/|RDFShape]] -- RDF conversion, RDF/SPARQL/ShEx/SHACL playground * [[https://any23.apache.org/|Apache Any23 (Anything to Triples)]] * [[http://jena.apache.org/tutorials/rdf_api.html|Apache Jena]] * [[http://loki.re/wiki/docs:rdfeditor|RDF Editor]] developed at AGH UST (by Artur SmaroĊ„, EIS 2015-2016) Others: * [[http://prefix.cc/|prefix.cc - namespace lookup for RDF developers]]