====== Tematy projektów WSHOP -- zima 2025/2026 ======
  * {{:courses:xai:winner.png?30|}}: Possibility of extending it to master thesis
  * {{:courses:wshop:topics:fast.png?30|}}: Quick project
  * {{:courses:wshop:topics:peer.png?40|}}: Linked to international scientific project
  * {{:courses:wshop:topics:chexrish.png?60|}}: Linked to JU-internal scientific project

==== [FIXME] Template ====

  * **Student:** FIXME
  * **Namespace in the wiki:** [[..:projects:2026:FIXME:]]
  * **The goal of the project:** FIXME
  * **Technology:** FIXME
  * **Description:** FIXME
  * **Links:**
    * FIXME

==== [MZK] Affective games ====

  * **Student:** Bartłomiej Błoniarz, Michał Gniadek, Inka Sokołowska
  * **Namespace in the wiki:** [[..:projects:2026:FIXME:]]
  * **The goal of the project:** Analysis of player behavior based on affective events in the game.
  * **Technology:** Python, psychology basis, experimental data
  * **Description:** Identification of characteristic points in the game, linking emotional reactions based on in-game actions, information about eye movements, player movements, and psychological profile. Real data, opportunity to participate in the experiment from the researcher's perspective, further analysis in the next semester, and potential use in a master's thesis.
  * **Links:**

  
==== [LVM] Automatic label generation with LLM + CIDOC-CRM ====  

  * **Student:** Mateusz Dyszewski {{:courses:wshop:topics:chexrish.png?60|}}  
  * **Namespace in the wiki:** 
  * **The goal of the project:** Implementing automatic label generation with LLM + CIDOC-CRM  
  * **Technology:** RDF, Python, LLM  
  * **Description:** The main goal is to use LLMs (or SLMs) to generate labels for events in CIDOC-CRM. Instances of the event class in CIDOC-CRM connect people, places, and properties, but when imported from other datasets they often lack labels. The aim is to generate descriptive labels based on linked entities. For example: given an event typed as matriculation, with institution Cracow Academy and participant Mikolaj Kopernik, the generated label could be “Copernicus matriculation at the Cracow Academy.”  
  * **Links:**
    * [[https://cidoc-crm.org/|CIDOC-CRM]]


==== [LVM] Applying KG discovery algorithms to digital humanities KG ====  

  * **Student:**  {{:courses:wshop:topics:chexrish.png?60|}}  
  * **Namespace in the wiki:** 
  * **The goal of the project:** Applying KG discovery algorithms to digital humanities KG  
  * **Technology:** RDF, Python  
  * **Description:** This project applies KG discovery algorithms to find interesting and non-obvious relations within digital humanities KGs, using the CIDOC-CRM ontology as a case study. It explores serendipitous discovery, path evaluation, and pattern identification. Graph-based models enable this because they facilitate algorithmic exploration of linked cultural heritage data.  
  * **Links:**
    * [[https://www.sciencedirect.com/science/article/pii/S1570826824000386|Serendipitous knowledge discovery]]
    * [[https://aidanhogan.com/docs/woolnet_paths_knowledge_graphs.pdf|Woolnet: finding and evaluating paths in knowledge graphs]]


==== [LVM] Applying network analysis to digital humanities KG ====  

  * **Student:**  {{:courses:wshop:topics:chexrish.png?60|}}  
  * **Namespace in the wiki:** 
  * **The goal of the project:** Applying network analysis to digital humanities KG  
  * **Technology:** RDF, Python  
  * **Description:** This project uses network analysis methods (degree, eigenvector, PageRank centrality, community detection) to reveal patterns, central entities, and relationships in cultural heritage KGs based on CIDOC-CRM. The aim is to identify structural properties and key nodes to aid interpretation and support research within cultural heritage contexts.  
  * **Links:**  
    * [[https://cidoc-crm.org/|CIDOC-CRM]]
    * [[https://www.journals.uchicago.edu/doi/full/10.1086/705532|Network analysis in the humanities explained]]  
    * [[https://journal.dhbenelux.org/journal/issues/002/article-6-birkholz/appendix-2-wechanged-german.pdf|Example network analysis ]]  

==== [LVM] Implementing an automatic KG shortcut generator ====  

  * **Student:**  {{:courses:wshop:topics:chexrish.png?60|}}  
  * **Namespace in the wiki:**
  * **The goal of the project:** Implementing a tool for automatic KG shortcut generation  
  * **Technology:** RDF, Python  
  * **Description:** This project automates the creation of KG “shortcuts,” which are derived relations that simplify navigation, querying, and graph algorithm usage in complex ontologies like CIDOC-CRM. Shortcuts may omit certain reifications or events, making knowledge graphs easier to traverse and analyse. The approach should be generalizable beyond CIDOC-CRM.  
  * **Links:**  
    * [[https://cidoc-crm.org/|CIDOC-CRM]]  
    * [[https://link.springer.com/chapter/10.1007/978-3-642-35233-1_22|Paths and shortcuts in knowledge graphs]]  

==== [EAS] Visual analysis of digital poster collections with contextual description generation ====  

  * **Student:** Wojciech Pałka, Viktoryia Sialila{{:courses:wshop:topics:chexrish.png?60|}}
  * **Namespace in the wiki:** [[..:projects:2026:posters-visual-contextual-analysis:]]  
  * **The goal of the project:** Implementing automatic visual feature extraction and contextual description generation for posters from digital collections (JBC, FBC)  
  * **Technology:** Python, OpenCV, pandas, scikit-learn, LLM/GPT API, Streamlit  
  * **Description:** The project builds a tool to analyse posters from Jagiellonian Digital Library (JBC) and Federation of Digital Libraries (FBC). It extracts visual features like color scheme, presence of characters, and color mood. It then generates contextual descriptions based on visual data and metadata for historical and educational research. An interactive demo will allow users to explore posters, compare eras, styles, and themes.  
  * **Links:**  
    * https://jbc.bj.uj.edu.pl/dlibra/results?&action=AdvancedSearchAction&type=-3&p=0&qf1=Type%3Aplakat&val1=Subject%3A%22plakat%22&ipp=25  
    * https://fbc.pionier.net.pl/results?action=AdvancedSearchAction&type=-3&p=0&val1=Subject:Plakaty  

==== [EAS] Interactive storytelling with selected JBC collections (Documents of Social Life) ====  

  * **Student:**Justyna Gargula
  {{:courses:wshop:topics:chexrish.png?60|}}
  * **Namespace in the wiki:** [[..:projects:2026:interactive-storytelling-jbc-dsl:]]  
  * **The goal of the project:** Creating a tool that generates thematic narratives based on JBC objects (Documents of Social Life)  
  * **Technology:** Python, RDFLib, Neo4j (optional), LLM/GPT API, Streamlit  
  * **Description:** This project integrates metadata about JBC’s social life documents into a knowledge graph, builds semantic links, and uses GPT to craft thematic, chronological, or stylistic narratives. The interface allows exploration of collections as timelines, maps, or continuous stories, aiding contextual discovery in historical and educational research.  
  * **Links:**  
    * https://jbc.bj.uj.edu.pl/dlibra/results?q=&action=SimpleSearchAction&type=-6&p=0&qf1=collections%3A188&qf2=collections%3A201&qf3=Subject%3Aspo%C5%82ecze%C5%84stwo&qf4=Subject%3Adruki%20ulotne%2020%20w.&qf5=Subject%3Adruki%20ulotne%2019%20w.&ipp=50  

==== [KKT] Polyvocal knowledge graphs ====

  * **Student:** Bartosz Mierzwa, Piotr Wójtowicz {{:courses:xai:winner.png?30|}} {{:courses:wshop:topics:chexrish.png?60|}}
  * **Namespace in the wiki:** [[..:projects:2025:polyvocal:]]
  * **The goal of the project:** How to model and process various (sometimes even contradictory) opinions on the same topic in knowledge graphs
  * **Technology:** literature studies, prototypes evaluation, RDF/Semantic Web
  * **Description:** Classically, knowledge bases strive to show one universal truth ( same for machine learning models). Therefore, usually methods in knowledge bases revolve around concepts such as "consistency", "conflict resolution", "lack of redundancy". But reality is different, and there are various parallel opinions on the same facts or artifacts, e.g. a painting exhibited in the Louvre may be understood quite differently by a person educated in Western European culture and a Japanese person - on the one hand they may refer to other cultural symbols in their interpretations, and on the other hand they may not understand something if the creator came from outside of their culture and may need additional contextual information. In a sense, an analogous situation occurs in filter bubbles, where people "locked" in their bubble have a shared body of knowledge that may be specific only to them, and which one needs to know in order to understand the information they are conveying. The goal of the project is to study the available literature on such approaches, prepare a catalog of situations/problems in which such polyvocality can take place, and evaluate prototypes (if they exist)/prepare our own prototypes of such knowledge bases in the form of knowledge graphs.
  * **Links:**
    * [[https://link.springer.com/chapter/10.1007/978-3-030-77385-4_30|Erp & de Boer (2021). A Polyvocal and Contextualised Semantic Web]]
    * [[https://ebooks.iospress.nl/doi/10.3233/FAIA240713|Giunchiglia & Li (2024). Big-Thick Data Generation via Reference and Personal Context Unification]]
    * [[https://asistdl.onlinelibrary.wiley.com/doi/10.1002/pra2.665|Zhitomirsky-Geffet (2022). Turning Filter Bubbles into Bubblesphere with Multi-Viewpoint KOS and Diverse Similarity]]


==== [KKT] Metadata enrichment methods for cultural heritage ====

  * **Student:** Tomasz Pakuła, Szymon Pietrzak {{:courses:xai:winner.png?30|}} {{:courses:wshop:topics:chexrish.png?60|}}
  * **Namespace in the wiki:** [[..:projects:2025:enrichment:]]
  * **The goal of the project:** Determine ways to automatically enrich metadata in the field of cultural heritage and which of these methods actually work
  * **Technology:** literature studies, prototypes evaluation, RDF/Semantic Web
  * **Description:** The description of cultural heritage objects is very minimal and still relies on the same simple metadata as in card catalogs. Detailed metadata appeared only for selected collections and was created manually. But over the last few decades, the semantic web, machine learning, and other areas of AI have been developing. Has this made it possible to create metadata automatically? Yes. We have LLM, which creates descriptions or keywords, we have object detection methods (e.g., YOLO), which indicate what objects are in an image, we have handwriting recognition methods that can read manuscripts. Are there other methods? Are there tools ready for this? Do they work? Is this actually used in the field of cultural heritage? To what extent? Does it utilize knowledge graphs? This is to be determined within the project. 

==== [KKT] Incunabula catalogue in RDF (cont.) ====

  * **Student:** Paweł Jasiński {{:courses:wshop:topics:chexrish.png?60|}} {{:courses:wshop:topics:fast.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2025:incunabula:]]
  * **The goal of the project:** Create an incunabula owners catalogue and integrate it with CHExRISH ontology
  * **Technology:** Python, RDF/Semantic Web
  * **Description:** (1) export the metadata from the Jagiellonian Library catalogue, (2) create a knowledge graph schema according to good practices in the domain (to check as a part of the project), (3) transfer all exported metadata into such a knowledge graph, (4) validate with the domain experts, (5) repeat, if needed, (6) integrate the final graph with the CHExRISH ontology

==== [KKT] Wikidata as a central point of cultural heritage data cloud ====

  * **Student:** Hubert Musiał, Jakub Bednarz {{:courses:xai:winner.png?30|}} {{:courses:wshop:topics:chexrish.png?60|}}
  * **Namespace in the wiki:** [[..:projects:2025:chcloud:]]
  * **The goal of the project:** Validate the usability of Wikidata as a central point of cultural heritage data cloud
  * **Technology:** RDF/Semantic Web, some programming (Python preferred)
  * **Description:** Wikidata has many links to external sources of information (URIs/foreign keys), but are these links correct? Can additional information be extracted from these sources, or do these pages not contain any data that can be processed automatically? The aim of this project is: (a) to check whether these links are actually valid, (b) check whether/how and what data can be extracted from these external sources and whether this process can be automated, (c) prepare a knowledge graph that will combine this data into a single cultural heritage data cloud [for the purposes of the project, we will probably limit it to individuals associated with the University], (d) evaluate such a graph (including comparison with a graph containing only data from Wikidata)

==== [SBK] Docker image with flask/fastapi service for visual anomaly detection ====
  * **Student:** Jakub Chmura, Aleksandra Stępień
  * **Namespace in the wiki:** [[..:projects:2026:dockerpatchcore:]]
  * **The goal of the project:** The goal is to provide docker image with REST API (based on Flask) as a service which later could be connected to website for visual anomaly detection.
  * **Technology:** Docker, Flask, Python
  * **Description:** The code that implements anomaly detection will be provided in a form of a parametrized python script. The goal of the project would be to design endpoints for REST API and architecture of the service (e.g. how  datasets will be loaded, how concurrency will be implemented (celery, redis server?)) and implementation of proposed service along with docker configuration.
  * **Links:**
    * Anomaly deteciton scritp: [[https://github.com/PEER-EU/MVP1-PDT/tree/unsupervised-anomalydetection/python| Access only for the team]]

==== [SBK] Docker image for custom hugginface models and OpenAI API ====

  * **Student:** Maciej Wójcki, Jakub Kręcisz
  * **Namespace in the wiki:** [[..:projects:2026:dockeropenai:]]
  * **The goal of the project:** The goal of the project it to prepare and test docker image for custom hugginface model that provides OpenAI REST API
  * **Technology:** Docker, Python, Flask, FastAPI
  * **Description:** The goal is to use one of the ready-to-use docker images and test it for the sake of hosting custom hugginface model (including LLMs, multimodal models such as Phi.3.5, etc.)
  * **Links:**
    * Example: [[https://github.com/wisecubeai/openai-api?utm_source=chatgpt.com | Docker API exmaple]]

==== [SBK] Counterfactual-based sampler for Optuna hyperparameter optimization ====

  * **Student:** Paweł Wacławik
  * **Namespace in the wiki:** [[https://wiki.iis.uj.edu.pl/mgr:mgr2026:xaihyperopt|xaihyperopt]]
  * **The goal of the project:** The goal of the project it to use counterfactual generation and surrogate model learining to implement counterfactual-based sampler for hyperparameter optimization
  * **Technology:** Optuna, Python
  * **Description:**
  * **Links:**

==== [SBK] Visual counterfactuals -- improvement of inpaiting approach ====

  * **Student:** Kamil Kochańczyk, Jacek Gołębiowski
  * **Namespace in the wiki:**  [[..:projects:2026:anomalycf:]]
  * **The goal of the project:** How to generate visually correct counterfactual that is at the same time valid
  * **Technology:** Python, Pytorch, Tensorflow
  * **Description:** The work will be based on the MsC of [[https://wiki.iis.uj.edu.pl/mgr:mgr2025:cfgenimg|Jakub Siwy]], who tested various inpaiting methods for genreation of valid normal samples in anomaly-detection task. In this project, the goal is to extend the Future works by performing more experiments on all sampels from the provided datasets and to focus mainly on how newly generated samples can improve anomaly detection process. I.e. implement Counterfactual-based augmentation of dataset for more reliable anomaly detection.
  * **Links:**
    * [[https://wiki.iis.uj.edu.pl/_media/mgr:mgr2025:cfgenimg:mgrjsiwy.pdf | Master thesis]]

==== [SBK] RAG-based conversational agent for OpenML pipelines ====

  * **Student:** Błażej Torbus
  * **Namespace in the wiki:** [[..:projects:2026:openmlrag:]]
  * **The goal of the project:** The goal is to use RAG systems to embed knowledge about ML pipelines obtained from OpenML
  * **Technology:** Python, R2R
  * **Description:** The main idea is to build a system that will encode knowledge about pipelines that is present in the OpenML reposiotry and allow for querying it to obtain most promising set of pipelines/hyperparameters for a specified dataset/task. This could be later used to improve the fyperparameetr optimization and explainability by constructign setup for optimizer based on data charactersitics, task, and other constrains that are available online and can server as an expert/background knoweldge.
  * **Links:**
    * [[https://r2r-docs.sciphi.ai/introduction | R2R server]]
    * [[https://www.openml.org/ | OpenML]]
    * Can be partically absed on already written scripts to scrap OpenML repos: [[https://wiki.iis.uj.edu.pl/courses:wshop:projects:2023:openmlds | OpenMLDS]]

==== [SBK] Survey on methods for ML models comparison (decision boundary) ====

  * **Student:** Szymon Fortuna, Tair Yerniyazov
  * **Namespace in the wiki:** [[..:projects:2026:decboundcomp:]]
  * **The goal of the project:** Research on different methods that can be used to describe/measure the differences in the decision boundary of a classifier.
  * **Technology:** Python
  * **Description:** Having two models, trained on the same task how to explain the differences between them. This refers to the problem of capturing, measuring and describing the Rashomon Effect.
  * **Links:**
    * [[https://www.sciencedirect.com/science/article/pii/S1566253525003161 | RashomonEfect analysis]]

==== [SBK] Graph Neural Networks and explainability ====

  * **Student:** Stanisław Mnich, Angelo Norelli
  * **Namespace in the wiki:** [[..:projects:2026:gnnexplanations:]]
  * **The goal of the project:** Combine Logical neural netowrks with graph neural networks for better reasonin and explainability (healthcare domain)
  * **Technology:** GNN, Python, Pytorch
  * **Description:** Combining GNN Insights with Domain Knowledge. Exploring is using the GNN model to identify key features or temporal patterns associated with abnormal breathing. These could then be integrated with domain expertise, similar to our previous work where we combined Bayesian networks with a counterfactual generation mechanism to produce more plausible explanations. This could be ideally approached win LNN
  * **Links:**
    * LNN: [[https://ibm.github.io/LNN/introduction.html|LNN]]
    * (Access will be granted only to project team): [[https://github.com/sbobek/gnn-ts-ano|GNN-TS-ANO]]

==== [SBK] Rule-based explainability for tabular data is (mostly) bullshit ====

  * **Student:** Jan Zioło, Wojciech Szymańskim
  * **Namespace in the wiki:** [[..:projects:2026:xaitabullshit:]]
  * **The goal of the project:** Investigate how much value existing rule-based approaches give in comparison to glassbox models
  * **Technology:** Python
  * **Description:** We have already prepared a script, where we test 4 explainers on around 55 datasets. We would like to extend this to more explainers and more datasets and confrim our observation, that complex rule-based approaches are not useful for tabular data, as good old fashion decision tree can easily compete with them.
  * **Links:**
    * Benchmark Notebook to extend: https://github.com/sbobek/lux/tree/lux-benchmark


==== [JKO] Emergence of environment-based communication between multiple reinforcement-learned agents ====

  * **Student:**  {{:courses:xai:winner.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2025:FIXME:]]
  * **The goal of the project:** Can multiple reinforcement-learned (RL) agents spontaneously invent a language?
  * **Technology:** Python
  * **Description:** This project continues the MSc. thesis by Łukasz Dobrzycki (code available). A simple, cooperative game with imperfect information has been designed for two agents. The game is designed in such a way that the goal can be quickly achieved by one of the agents communicating solutions to the other agent. The communication scheme, however, is something that the agents have to invent themselves by means of deep RL.
  * **Links:**
    * [[https://doi.org/10.48550/arXiv.2306.11336|Abdelaziz, M.K., Elbamby, M.S., Samarakoon, S., Bennis, M., 2024. Cooperative Multi-Agent Learning for Navigation via Structured State Abstraction.]]
    * [[https://www.marl-book.com/download/marl-book.pdf|Multi-Agent Reinforcement Learning: Foundations and Modern Approaches]]

==== [JKO] Comparison of explainability tools for NLP ====

  * **Student:** Julia Zezula {{:courses:xai:winner.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2025:expnlp:]]
  * **The goal of the project:** Which explanations work best for text classification?
  * **Technology:** Python, spaCy, SHAP etc.
  * **Description:** Exploration of the available explainability methods, designing tests and assessing their reliability. [[..:projects:2025:FIXME:]]
  * **Links:**
    * [[https://doi.org/10.2200/S01118ED1V01Y202107HLT051|Søgaard, A., 2021. Explainable natural language processing, Synthesis lectures on human language technologies. Morgan & Claypool Publishers, San Rafael.]]