courses:semint:lab_intro [IIS Wiki]

Last verification: 20220909
Tools required for this lab: Pens and paper

Some introduction/motivation:
- Why Does Google Say Jerusalem Is the Capital of Israel?
- Historians and detectives keep track of data with open source tool

1. Data in the Wikipedia [15 minutes]

Wikipedia contains a huge amount of information, so it can be used as a source for various summaries. Is it a convenient source of knowledge? Let's check it out!

Your task is to prepare a list of the 15 most populous countries in Europe based on Wikipedia. Do NOT use any other websites for this purpose. The ready-made lists available on Wikipedia are NOT reliable, as they often have outdated numbers!

2. Wikidata and DBpedia [10 minutes]

Processing Wikipedia data was tedious, huh? Luckily, there are DBpedia and Wikidata! Both enable machine learning processing of Wikipedia data.

We don't need no introduction…
Simply go to the page: https://query.wikidata.org/. Click on Examples, select Countries sorted by population and click Execute (the big arrow button).
Whoa, we have an up-to-date list of countries sorted by population! How does it work?
Look at wikidata/Poland page and figure out how all the knowledge is stated here. The picture below may be helpful in understanding:
Actually, the whole thing is very simple: we have an entity (this is the page we are on), some property (on the gray background, in the left column) and some value (on the white background).
Have a look at other example queries for Wikidata at https://query.wikidata.org/. You don't need to understand them now, just see what the possibilities are - we'll come back to this in a few weeks.

3. Linked Open Data [10 minutes]

Wikidata isn't the only one that stores data that's easy for machine processing…

Read about the Linked Data idea (and the original note by T. Berners-Lee, plus the 5 star system)
Analyze the clickable LOD diagram, choose 3 interesting datasets and in a few words describe them to your colleague.

4. FOAF [10 minutes]

You can easily create such data yourself!

Read about FOAF (the pre-Facebook social network!).
Create your FOAF file with: foaf-o-matic
Save your FOAF file.
[If you have the possibility] Publish your file so that it can be referenced with URL. Then, visualize your FOAF file with FOAF.Vix. Simply put the URL as an uri argument to the FOAF.Vix, e.g.: http://foaf-visualizer.gnu.org.ua/?nocache=1&uri=http://krzysztof.kutt.pl/foaf.rdf
- We need the direct URL of this file. If you are hosting the file using the Dropbox, change the www.dropbox.com to dl.dropboxusercontent.com in the sharing link, e.g.:
```
https://www.dropbox.com/s/kc3g05y0k7t1mbw/foaf.rdf  # sharing link generated by Dropbox
https://dl.dropboxusercontent.com/s/kc3g05y0k7t1mbw/foaf.rdf  # direct URL for SPARQLer
```

5. Images annotation [5 minutes]

Open Image Annotator
Enter URL for some image you like
Select some regions on the picture and add descriptions for them
Generate file using “Show JSON-LD” button
Analyse the file. How regions' information is represented?

6. RDF model (and Mona Lisa) [5 minutes]

RDF model is a directed graph built from Statements a.k.a. triples
Each Statement consists of: subject, predicate and object
- Subject can be an URI or an empty node
- Predicate can be an URI
- Object can be an URI, an empty node or a literal

Let's consider a simple knowledge graph (taken from RDF 1.1 Primer):
It is very informal and vague… So we can make it more concrete using URIs for every element in the graph. Note that we are using existing vocabularies: FOAF (foaf:) and Dublin Core (dcterms:).
Every arrow represents now a simple RDF Statement (RDF triple).
Compare this to the knowledge stored in Wikidata that you looked at earlier - do you see similarities?

7. Modeling knowledge with RDF graphs [30 minutes]

RDF is a data model based on principle of representing relational information as labeled directed graphs.

In this task you will represent a piece of knowledge with use of the RDF graphs. Firstly, select one of the topics (we will use this topic on subsequent labs):
1. The Bold and the Beautiful – you can use a The_Bold_and_the_Beautiful#Premise section on Wikipedia (or the polish one)
2. The Game of Thrones – you can use a A_Song_of_Ice_and_Fire#Plot_synopsis section on Wikipedia
3. Another complex story from a book/series/movie you like
Read the selected fragment and extract as much information as you can.
Draw a graph (yes, with a pen and paper) representing the relations you identified in the fragment. Of course, “there's more than one way to do it”.
1. Draw regular resources (i.e. representing persons, places etc.) as oval nodes. Draw datatype values (e.g. dates, numbers representing age etc.) as rectangular nodes.
2. You don't need to write URIs, simply identify the resources with names and surnames etc.
Keep your sketch in a safe place – we will use it on the next lab!