courses:semint:lab_mashups [IIS Wiki]

Last verification: 20220909
Tools required for this lab:
- Google Colab or your own Python environment

Datasets and mashups lecture (slides are on the main page)

At the end of the lab, each group should email their third project to the teacher. It is a semantic mashup created during this lab - it will probably consists of one Jupyter Notebook (to download it, click in Google Colab: File > Download > Download .ipynb).

During this lab, we want to create a semantic mashup, which is simply a combination of two or more knowledge graphs that together give us some new quality.
Do not be scared – this will be your second mashup! You did the first one during Libraries: owlready2 (Python) lab, when you combined data from Wikidata and DBpedia.

Select two or more interesting datasets which will be the basis of the mashup
- You can choose any of the datasets mentioned in the lecture, in the Learn more! section or select the datasets yourself using LOD Cloud, DataHub or ProgrammableWeb.
- At least one dataset must be based on RDF.
- It must be possible to link datasets together, e.g., using the same URIs (as in our first mashup)
Explore the selected datasets – which data parts will work well together?
Mashups are more interesting when you can interact with them by giving some input, e.g., date, band name, country, to be searched. Identify such a thing for your mashup.
- You can assume that the data will always be correct: the name will belong to an existing country, the date will be correctly formatted, etc.
Implement the whole interaction.
- You can start with the code from the Libraries: owlready2 (Python) lab.
- It may simply be a sequence of cells that perform the appropriate actions.
- Input can simply be typed as the value of one of the variables (there is no time to create the UI).
Can the results be presented in any better form than plain text? Maybe a table / chart / plotting points on a map / image / links / simple html?

Mashups Programming

RDFLib:
- RDFLib; docs
- SPARQLWrapper; docs
owlready2:
- owlready2; docs
Cooking with Python and KBpedia series – a great series of jupyter notebooks that introduces the RDFLib and owlready2 (and KBpedia)

Interesting datasets

Starting points (for more details, see the lecture):
The base for all mashups:
- Wikidata: https://query.wikidata.org/sparql
- DBpedia: http://dbpedia.org/sparql
- YAGO: https://yago-knowledge.org/sparql/query
The base for geographical data:
- GeoNames: http://www.geonames.org/
- LinkedGeoData: http://linkedgeodata.org/
Some interesting datasets in Linked Open Data cloud:
- Europeana – European cultural heritage library (images, texts, sounds, videos, …); do you remember Europeana URI from Mona Lisa RDF? Interlinked with GeoNames, Wikidata, DBpedia
- DBTune.org Musicbrainz – information about artists and CD tracks; linked with DBpedia
- EventKG – information about historical events (extracted from Wikidata, DBpedia and YAGO); useful for timeline generation; linked with DBpedia and Wikidata
- BabelNet – a multilingual dictionary; provides information about, e.g., term definition or narrower/broader terms; linked with DBpedia
Other noteworthy datasets:
- KBpedia – general encyclopediae = a description of “how the world works” (e.g., it uses OpenCyc terms and relations); interlinked with Wikipedia, Wikidata, DBpedia and GeoNames; there is no SPARQL Endpoint, but can be downloaded and loaded into the triplestore
- OpenWeatherMap – weather information for specific coordinates (e.g., from GeoNames); not in RDF, but probably useful (used, e.g., in J.A.N.E.); there is a free API available
- ArCo – knowledge graph of the italian cultural heritage (unfortunately, there are no such graphs for Poland)
- Computer Science Ontology – research areas, sub-topics and related terms; there is no SPARQL Endpoint, but can be downloaded and loaded into triplestore; used e.g. for books recommendations in computer science area (by Springer)
- OpenCitations – academic publications and their citations
- LinkedBrainData – brain-related data (incl. the brain atlases)
- TweetsKB – a large corpus of tweets annotated with emotions (onyx:hasEmotion) and linked with DBpedia (if the tweet mentions something available in DBpedia). There are no URIs (and actual messages) in the corpus, but you can easily generate the real links and visit the Twitter to see the tweets:
```
BIND( URI(CONCAT("https://twitter.com/any/status/", ?tweet_id)) AS ?tweet_url) 
```
- Carnegie Hall Data – data about all events from Carnegie Hall
Other repositories:
- AgroPortal – a vocabulary and ontology repository for agronomy and related domains
- BioPortal – repository of biomedical ontologies

Semantic mashups

Prepare yourself for the lab

Lab instructions

Learn more!

Mashups Programming

Interesting datasets

IIS Wiki