Tematy projektów WSHOP -- wiosna 2025/2026

: Possibility of extending it to master thesis
: Quick project
: Linked to international scientific project
: Linked to JU-internal scientific project

Student:
Namespace in the wiki: FIXME
The goal of the project:
Technology:
Description:
Links:

Student:
Namespace in the wiki: FIXME
The goal of the project: How to model and process various (sometimes even contradictory) opinions on the same topic in knowledge graphs
Technology: literature studies, prototypes evaluation, RDF/Semantic Web
Description: Classically, knowledge bases strive to show one universal truth ( same for machine learning models). Therefore, usually methods in knowledge bases revolve around concepts such as “consistency”, “conflict resolution”, “lack of redundancy”. But reality is different, and there are various parallel opinions on the same facts or artifacts, e.g. a painting exhibited in the Louvre may be understood quite differently by a person educated in Western European culture and a Japanese person - on the one hand they may refer to other cultural symbols in their interpretations, and on the other hand they may not understand something if the creator came from outside of their culture and may need additional contextual information. In a sense, an analogous situation occurs in filter bubbles, where people “locked” in their bubble have a shared body of knowledge that may be specific only to them, and which one needs to know in order to understand the information they are conveying. The goal of the project is to study the available literature on such approaches, prepare a catalog of situations/problems in which such polyvocality can take place, and evaluate prototypes (if they exist)/prepare our own prototypes of such knowledge bases in the form of knowledge graphs.
Links:

Student: Cezary Zięba, Igor Tyszer
Namespace in the wiki: FIXME
The goal of the project: Create date unification module
Technology: RDF, Python is preferred
Description: The date in source documents or databases in the area of cultural heritage can take different forms: July 1, 1818, 05.03.1783, “Wednesday after St. Martin” of the year 1654. In the case of older documents, calendar reforms are additionally involved. The objectives of the project are twofold: (1) to determine whether there are ready-made tools / standards developed for this purpose, (2) to prepare (either from scratch or based on the solutions found) a tool for date unification and evaluate it on the dataset provided by the project supervisor.

Student: Jan Zoń,
Namespace in the wiki: chenames
The goal of the project: Create an algorithm that for a given string – a name of a person – and a given collection of strings – names of persons – finds a given number of members of the latter that exhibit the greatest resemblance to the former
Technology: Python, Keras/PyTorch, LSTM, phonemizer (?)
Description: A task like this can be approached using Levenshtein distance as a measure of resemblance. However, the goal is to develop a more sophisticated solution that accounts for the natural tendencies of languages to make specific substitutions, omissions, and extensions or to use some sophisticated methods like LSTM networks. Important part of the task will be to check the state-of-the-art methods.
We will evaluate the solution on real-world cases of entity matching across different databases as part of the CHExRISH Flagship Project.
Links:
- Levenshtein distance

Student: Maciej Szymański, Dominika Głowacka
Namespace in the wiki: FIXME
The goal of the project: Create an incunabula owners catalogue and integrate it with CHExRISH ontology
Technology: API programming, RDF/Semantic Web
Description: (1) export the metadata from the Jagiellonian Library catalogue, (2) create a knowledge graph schema according to good practices in the domain (to check as a part of the project), (3) transfer all exported metadata into such a knowledge graph, (4) validate with the domain experts, (5) repeat, if needed, (6) integrate the final graph with the CHExRISH ontology

Student: Hubert Musiał, Przemysław Zagraniczny
Namespace in the wiki: ladies
The goal of the project: Create the knowledge base for the exhibition from JU Museum
Technology: RDF/Semantic Web
Description: (1) export the data and metadata from the Jagiellonian University Museum, (2) create a knowledge graph schema according to good practices in the domain (to check as a part of the project), keep in mind the interoperability with the CHExRISH ontology, (3) transfer all exported metadata into such a knowledge graph, (4) think about the simple UI (prob. made with existing tools, like Omeka S or Sante) that shows the data (artifacts), metadata (in knowledge graph), and probably some free text from the exhibition catalogue, (5) validate with the domain experts, (6) repeat, if needed
Links:
- Info about exhibition

Student: Jan Zioło,
Namespace in the wiki: FIXME
The goal of the project: To develop a prototype of AI-based methods/tools for reconstructing book collections from source documents
Technology: Any useful, but Python is preferred
Description: The method is described in detail in the PhD dissertation, which will be shared with the project team (along with access to the author of the dissertation, who will clarify all doubts and share all data). As part of the project, it is necessary to consider which steps and with which AI methods/techniques can currently be automated, and then conduct a pilot implementation repeating the steps performed manually so far (based on the examples in the aforementioned dissertation)

Student:
Namespace in the wiki: FIXME
The goal of the project: Plan, perform and evaluate selected scenarios of social network analysis in cultural heritage domain
Technology: Any, but Python is preferred
Description: During the project, we will consider what interesting and non-trivial insights can be drawn using social network analysis methods from graphs describing cultural heritage. The project will consist of both conceptual work (literature review, brainstorming of ideas) and implementation work (prototype analyses for selected scenarios). The project will be carried out in cooperation with CHExRISH project team members and a foreign expert.
Links:
- Social Network Analysis 101 (we want to go far beyond these basic methods)

Student: Kamil Mróz, Paweł Gębala,
Namespace in the wiki: FIXME
The goal of the project: Be part of the team running the BIRAFFE3 experiment
Technology: Python, data science
Description: The project has three phases. During the first phase, you will take part in final fixes during the pilot study (March-April 2025). Then, in the second part, you will help with conducting the actual experiment (April-June 2025; scope of work to be determined). Finally, you will perform some preliminary analyses on the actual data collected in the experiment (May-June 2025; scope of work to be determined). If the work is done with appropriate involvement, you could also become a co-author of a publication on BIRAFFE3 in Nature Scientific Data.
Links:
- BIRAFFE general description
- BIRAFFE2 in Nature Scientific Data – at the end of a day, we want something similar for the BIRAFFE3 experiment

Student: Piotr Bednarski, Jakub Kozak
Namespace in the wiki: FIXME
The goal of the project: Can multiple reinforcement-learned (RL) agents spontaneously invent a language?
Technology: Python
Description: This project continues the MSc. thesis by Łukasz Dobrzycki (code available). A simple, cooperative game with imperfect information has been designed for two agents. The game is designed in such a way that the goal can be quickly achieved by one of the agents communicating solutions to the other agent. The communication scheme, however, is something that the agents have to invent themselves by means of deep RL.
Links:
- Abdelaziz, M.K., Elbamby, M.S., Samarakoon, S., Bennis, M., 2024. Cooperative Multi-Agent Learning for Navigation via Structured State Abstraction.
- Multi-Agent Reinforcement Learning: Foundations and Modern Approaches

[JKO] Time-series analysis applied to text classification

Student: Szymon Fortuna, Hubert Pamuła, Tair Yerniyazov,
Namespace in the wiki: FIXME
The goal of the project: Quantify long-range correlations in texts for classification
Technology: Python, R, Matlab, PyTorch
Description: This project extends the MSc. thesis by Małgorzata Gwinner (data and some tsai code available). We use a couple of methods to turn text (a symbolic series) into a numeric/vectorial time series (TS) and then apply either TS analysis to generate (possibly thousands of) features used as input to stand-alone classifiers, or use native time-series classifiers to do authorial/stylistic classification (including detection of AI-generated texts). The optional next step is to look at explanations of these classifiers.
Links:

Student: Julia Zezula
Namespace in the wiki: expnlp
The goal of the project: Which explanations work best for text classification?
Technology: Python, spaCy, SHAP etc.
Description: Exploration of the available explainability methods, designing tests and assessing their reliability. FIXME
Links:
- Søgaard, A., 2021. Explainable natural language processing, Synthesis lectures on human language technologies. Morgan & Claypool Publishers, San Rafael.

[JKO] Challenge submission: AI-generated text detection

Student: Mateusz Matias, Tymoteusz Boba
Namespace in the wiki: VKchallenge
The goal of the project: Tune and prepare existing code for submission in an open challenge
Technology: Python, spaCy, LightGBM, scikit-learn
Description: “Subtask 1: Given a (potentially obfuscated) text, decide whether it was written by a human or an AI. Subtask 2: Given a document collaboratively authored by human and AI, classify the extent to which the model assisted. … Participants will submit their systems as Docker images through the Tira platform. It is not expected that submitted systems are actually trained on Tira, but they must be standalone and runnable on the platform without requiring contact to the outside world (evaluation runs will be sandboxed). The submitted software must be executable inside the container via a command line call. … Important dates: May 23, 2025.”
Links:
- PAN shared task: Voight-Kampff Generative AI Detection 2025

Student:
Namespace in the wiki: openmlds
The goal of the project: Prepare a script that will build meta-learnign dataset out of OpenML logs
Technology: Python, OpenML API
Description: The main goal of the project is to create a script that will fetch all of the runs/pipelines and dataset from OpenML platform and create a dataset out of it. The challenge is to transform pipeline definitions which are code snippets into logical components of machine-learning pipeline (including deep neural networks). Such a dataset will serve as a learn-to-learn dataset for meta-learning solutions.

Links:
- OpenML

Student:
Namespace in the wiki:Jasiński Paweł,Kręcisz Jakub,Pakuła Tomasz

The goal of the project: Prepare Google Colab Notebook demonstrating and comparing explanations for generative models
Technology: Python
Description: The main goal is to start from Diffusers-Interpret and test its capabilities on various datasets, and diffusers. Prepare a Notebook in a tutorial-like style, with code that will generate the explanations for selected models and datasets, and comment on the results.
Links:
- Diffusers-Interpret to generate explanations for generative AI

Student:
Namespace in the wiki: Baranowicz Bartosz,Dyszewski Mateusz,Gołębiowski Jacek, Kochańczyk Kamil,Szymański Wojciech,
The goal of the project: Investigate possibilities of counterfactual explanations for images in highly imbalanced datasets
Technology: Python
Description: The main goal is to use MVTec datset and PatchCore anomaly detection algorithm, and provide counterfactual explanations for detected anomalies.
Links:
- MVTec
- anomalib

Student:
Namespace in the wiki: Wacławik Paweł,
The goal of the project: Creating explainable hyperparameter optimization for the process-level optimization task
Technology: Python, Keras/PyTorch, SHAP
Description: AutoML hyperparameter optimization is the process of automatically tuning the hyperparameters of machine learning models to improve performance without manual intervention. It leverages techniques like grid search, random search, and Bayesian optimization to efficiently explore the hyperparameter space. By automating this process, AutoML reduces the time and expertise needed to find optimal model configurations, making machine learning more accessible and effective. In this project we aim in testing existing solutions. Our goal is to add as hyperparameters not only the model parameters, but also additional components form the pipeline, like size of the dataset, number of labelled instance, time constraints, etc.
Links:
- https://www.automl.org/ixautoml/
- We will work with MVTec dataset: MVTec and anomalib library

Student:
Namespace in the wiki: Kachnic Bartłomiej,Wójcik Maciej
The goal of the project: Prepare a Notebook that will test semi-supervised anomaly detection method on MVTec dataset
Technology: Python, Keras/PyTorch, SHAP
Description: We have developped a method for semi-supervised anomaly detection for images and we need more tests performed on benchmark datasets. The main goal would be to adapt existing code to ne dataset and perform analysis of the results.
Links:
- Source code of the method to tests will be made accessible on demand.
- We will work with MVTec dataset: MVTec and anomalib library

Student:
Namespace in the wiki: Chmura Jakub i Aleksandra Stępień,
The goal of the project: Prepare a Notebook that will demonstrate usage of a selected multimodal models for time-series classification or regression.
Technology: Python, Keras/PyTorch, SHAP
Description: The goal is to evaluate if there are available pretrained models similar to LLAvA, Phi3 and others for the time-series, that can link textual and time-series data into one mebedding space.
Links:
- UCI library UCI

Tematy projektów WSHOP -- wiosna 2025/2026

[FIXME] Template

[KKT] Polyvocal knowledge graphs

[KKT] Date unification and representation in knowledge bases

[KKT] Measure of name resemblance between languages

[KKT] Incunabula catalogue in RDF

[KKT] "My ladies, do you really need it? Women of the Jagiellonian University"

[KKT] (Semi-)Automatic method of reconstructing the book collection based on source documents

[KKT] Social network analysis for cultural heritage (linked) data

[KKT] BIRAFFE3

[JKO] Emergence of environment-based communication between multiple reinforcement-learned agents

[JKO] Time-series analysis applied to text classification

[JKO] Comparison of explainability tools for NLP

[JKO] Challenge submission: AI-generated text detection

[SBK] OpenML dataset creation script for Meta-Learning

[SBK] Explainable generative AI

[SBK] Counterfactual explanations in anomaly detection for images

[SBK] Explainable Hyperparameter optimization in anomaly detection

[SBK] Semi-supervised anomaly detection evaluation

[SBK] Survey in the area of multimodal explanations for time-seris

IIS Wiki