===== Introduction =====

  * Last verification: **20220909**
  * Tools required for this lab: **Pens and paper**

==== Prepare yourself for the lab ====

  * Some introduction/motivation:
    * [[http://www.slate.com/articles/technology/future_tense/2015/11/why_does_google_say_jerusalem_is_the_capital_of_israel.html|Why Does Google Say Jerusalem Is the Capital of Israel?]]
    * [[https://opensource.com/life/15/11/segrada-open-source-semantic-graph-database|Historians and detectives keep track of data with open source tool]]

==== Lab instructions ====

=== 1. Data in the Wikipedia [15 minutes] ===

Wikipedia contains a huge amount of information, so it can be used as a source for various summaries.
Is it a **convenient** source of knowledge? Let's check it out!

  - Your task is to prepare **a list of the 15 most populous countries in Europe** based on [[https://www.wikipedia.org/|Wikipedia]]. Do NOT use any other websites for this purpose. The ready-made lists available on Wikipedia are NOT reliable, as they often have outdated numbers!

=== 2. Wikidata and DBpedia [10 minutes] ===

Processing Wikipedia data was tedious, huh? Luckily, there are [[https://www.dbpedia.org/|DBpedia]] and [[https://www.wikidata.org/|Wikidata]]! Both enable machine learning processing of Wikipedia data.

  - //We don't need no introduction...// \\ Simply go to the page: [[https://query.wikidata.org/]]. Click on ''Examples'', select ''Countries sorted by population'' and click ''Execute'' (the big arrow button).
  - Whoa, we have an up-to-date list of countries sorted by population! How does it work?
  - Look at [[https://www.wikidata.org/wiki/Q36|wikidata/Poland]] page and figure out how all the knowledge is stated here. The picture below may be helpful in understanding: \\ [[https://en.wikipedia.org/wiki/Wikidata|{{:courses:semint:wikidata_datamodel.png?direct|}}]]
  - Actually, the whole thing is very simple: we have an entity (this is the page we are on), some property (on the gray background, in the left column) and some value (on the white background).
  - Have a look at other example queries for Wikidata at [[https://query.wikidata.org/]]. You don't need to understand them now, just see what the possibilities are - we'll come back to this in a few weeks.

=== 3. Linked Open Data [10 minutes] ===

Wikidata isn't the only one that stores data that's easy for machine processing...

  - Read about the [[wp>Linked_Data|Linked Data]] idea (and the [[http://www.w3.org/DesignIssues/LinkedData.html|original note by T. Berners-Lee, plus the 5 star system]])
  - Analyze the [[http://lod-cloud.net/|clickable LOD diagram]], choose 3 interesting datasets and in a few words describe them to your colleague.

=== 4. FOAF [10 minutes] ===

You can easily create such data yourself!

  - Read about [[wp>FOAF_(ontology)|FOAF]] (the pre-Facebook social network!).
  - Create your FOAF file with: <wrap caution>[[http://www.ldodds.com/foaf/foaf-a-matic|foaf-o-matic]]</wrap>
  - Save your FOAF file.
  - [If you have the possibility] Publish your file so that it can be referenced with URL. Then, visualize your FOAF file with <wrap caution>[[http://foaf-visualizer.gnu.org.ua/|FOAF.Vix]]</wrap>. Simply put the URL as an ''uri'' argument to the FOAF.Vix, e.g.: http://foaf-visualizer.gnu.org.ua/?nocache=1&uri=http://krzysztof.kutt.pl/foaf.rdf
    * We need the **direct URL** of this file. If you are hosting the file using the Dropbox, change the <nowiki>www.dropbox.com</nowiki> to <nowiki>dl.dropboxusercontent.com</nowiki> in the sharing link, e.g.: <code bash>https://www.dropbox.com/s/kc3g05y0k7t1mbw/foaf.rdf  # sharing link generated by Dropbox
https://dl.dropboxusercontent.com/s/kc3g05y0k7t1mbw/foaf.rdf  # direct URL for SPARQLer</code>

=== 5. Images annotation [5 minutes] ===

  - Open <wrap caution>[[http://www.kanzaki.com/works/2016/pub/image-annotator|Image Annotator]]</wrap>
  - Enter URL for some image you like
  - Select some regions on the picture and add descriptions for them
  - Generate file using "Show JSON-LD" button
  - Analyse the file. How regions' information is represented?

=== 6. RDF model (and Mona Lisa) [5 minutes] ===

  * RDF model is a directed graph built from //Statements// a.k.a. //triples//
  * Each Statement consists of: //subject//, //predicate// and //object//
    * Subject can be an //URI// or an //empty node//
    * Predicate can be an //URI//
    * Object can be an //URI//, an //empty node// or a //literal//

  - Let's consider a simple knowledge graph (//taken from [[http://www.w3.org/TR/rdf11-primer/|RDF 1.1 Primer]]//): \\ {{rdf-primer-graph1.jpg?direct&550|}}
  - It is very informal and vague... So we can make it more concrete using URIs for every element in the graph. Note that we are using existing vocabularies: [[http://www.foaf-project.org/|FOAF]] (''foaf:'') and [[http://dublincore.org/metadata-basics/|Dublin Core]] (''dcterms:''). \\ {{rdf-primer-graph4.jpg?direct&550|}}
  - Every arrow represents now a simple RDF Statement (RDF triple).
  - Compare this to the knowledge stored in Wikidata that you looked at earlier - do you see similarities?

=== 7. Modeling knowledge with RDF graphs [30 minutes] ===

RDF is a data model based on principle of representing relational information as labeled directed graphs.

  - In this task you will represent a piece of knowledge with use of the RDF graphs. Firstly, select one of the topics (we will use this topic on subsequent labs):
    - **The Bold and the Beautiful** -- you can use a [[wp>The_Bold_and_the_Beautiful#Premise]] section on Wikipedia (or [[http://pl.wikipedia.org/wiki/Moda_na_sukces#Historia_rodziny_Forrester.C3.B3w|the polish one]])
    - **The Game of Thrones** -- you can use a [[wp>A_Song_of_Ice_and_Fire#Plot_synopsis]] section on Wikipedia
    - //Another complex story from a book/series/movie you like// :-)
  - Read the selected fragment and extract as much information as you can.
  - **Draw a graph** (yes, with a pen and paper) representing the relations you identified in the fragment. Of course, //"there's more than one way to do it"//.
    - Draw regular resources (i.e. representing persons, places etc.) as oval nodes. Draw datatype values (e.g. dates, numbers representing age etc.) as rectangular nodes. 
    - You don't need to write URIs, simply identify the resources with names and surnames etc.
  - Keep your sketch in a safe place -- we will use it on the next lab! :-)

==== Learn more! ====

Reading:
  * [[https://github.com/JoshData/rdfabout/blob/gh-pages/intro-to-rdf.md|What is RDF and what is it good for?]]
  * [[http://www.w3.org/TR/turtle/|Turtle syntax for RDF]]
  * [[http://www.w3.org/TR/rdf11-concepts/|RDF Abstract Syntax]]
  * [[http://www.w3.org/2000/10/swap/Primer.html|Primer: Getting into RDF & Semantic Web using N3]]
  * RDFS enables simple reasoning: [[https://www.w3.org/TR/rdf11-mt/#patterns-of-rdfs-entailment-informative|Patterns of RDFS entailment]]

Common vocabularies:
  * [[http://www.w3.org/TR/skos-primer/|SKOS]]
  * [[http://www.dublincore.org/metadata-basics/|Dublin Core]]
  * [[http://xmlns.com/foaf/spec/|FOAF]]

Tools:
  * [[https://rdfshape.weso.es/|RDFShape]] -- RDF conversion, RDF/SPARQL/ShEx/SHACL playground
  * [[https://any23.apache.org/|Apache Any23 (Anything to Triples)]]
  * [[http://jena.apache.org/tutorials/rdf_api.html|Apache Jena]]
  * [[http://loki.re/wiki/docs:rdfeditor|RDF Editor]] developed at AGH UST (by Artur Smaroń, EIS 2015-2016)

Others:
  * [[http://prefix.cc/|prefix.cc - namespace lookup for RDF developers]]