====== Tematy projektów WSHOP -- zima 2024/2025 ======
  * {{:courses:xai:winner.png?30|}}: Possibility of extending it to master thesis
  * {{:courses:wshop:topics:fast.png?30|}}: Quick project
  * {{:courses:wshop:topics:peer.png?40|}}: Linked to international scientific project

==== [FIXME] Template ====

  * **Student:** FIXME
  * **Namespace in the wiki:** [[..:projects:2024:FIXME:]]
  * **The goal of the project:** FIXME
  * **Technology:** FIXME
  * **Description:** FIXME
  * **Links:**
    * FIXME


==== [SBK] Counterfactual evaluation framework [other team: factual] ==== 

  * **Student:** Paulina Wojnarska  {{:courses:xai:winner.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:cfeval:]] 
  * **The goal of the project:** The goal of this project is to implement a Python module that will cover all of the evalaution metrics from: [[https://link.springer.com/article/10.1007/s10618-022-00831-6|Counterfactual explanations and how to find them: literature review and benchmarking]]. One of the most important aspect of the framework should be easy way to add your own method a server script making it possible to run evaluations and publish the results online automatically.
  * **Technology:** Python 
  * **Description:** Counterfactual explanations are hypothetical instances that explain how changing certain features of an input would lead to a different model prediction. For example, in the context of image classification, a counterfactual explanation might demonstrate how modifying the color of an object in an image would cause a model to classify it as a different category, or in a credit scoring system, a counterfactual explanation could demonstrate how increasing a borrower's income and decreasing their debt-to-income ratio would result in an approval instead of a rejection for a loan application. There are multiple different methods for constructing counterfactual, and many methods to evaluate them, but no unified framework for doing that.  

  * **Links:** 
    * [[https://link.springer.com/article/10.1007/s10618-022-00831-6|Counterfactual explanations and how to find them: literature review and benchmarking]] 
    * https://christophm.github.io/interpretable-ml-book/

 ==== [SBK] Eyetracking for (Explainable) AI ====

  * **Student:** Sebastian Sęczyk,Jakub Pleśniak  {{:courses:xai:winner.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:tobiixai:start|Sebastian Sęczyk]], [[..:projects:2024:tobiixai:start|Jakub Pleśniak]]
  * **The goal of the project:** Tool for data labeling supported with eyetracking
  * **Technology:** python,tobii
  * **Description:** Eye tracking is the process of measuring and recording the movement of a person's gaze or eye movements to understand visual attention and cognitive processes. The aim is to develop a data labeling system integrating eye tracking, augmenting labels with inferred "reasons" by tracking user gaze to text, images, or data fragments associated with the label.
  * **Links:**
    * https://github.com/sbobek/tobii-pytracker
 
==== [SBK] OpenML dataset creation script for Meta-Learning ====
  * **Student:** FIXME  {{:courses:xai:winner.png?30|}} {{:courses:wshop:topics:peer.png?40|}}
  * **Namespace in the wiki:** [[..:projects:2023:openmlds:]]
  * **The goal of the project:** Prepare a script that will build meta-learnign dataset out of OpenML logs
  * **Technology:** Python, [[https://docs.openml.org/APIs/|OpenML API]]
  * **Description:** The main goal of the project is to create a script that will fetch all of the runs/pipelines and dataset from [[https://www.openml.org/|OpenML]] platform and create a dataset out of it. The challenge is to transform pipeline definitions which are code snippets into logical components of machine-learning pipeline (including deep neural networks). Such a dataset will serve as a learn-to-learn dataset for meta-learning solutions. 
{{ :courses:wshop:openml-fetch.png |}}
  * **Links:**
    * [[https://www.openml.org/|OpenML]]

==== [SBK] Explainable Hyperparameter optimization ====
  * **Student:** Agnieszka Felis, Mikołaj Golowski {{:courses:wshop:topics:peer.png?40|}} {{:courses:wshop:topics:fast.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:ixautoml:start]]
  * **The goal of the project:** Evaluation of several selected methods for explaianbel hyperparameter optimization
  * **Technology:** Python, Keras/PyTorch, SHAP
  * **Description:**  AutoML hyperparameter optimization is the process of automatically tuning the hyperparameters of machine learning models to improve performance without manual intervention. It leverages techniques like grid search, random search, and Bayesian optimization to efficiently explore the hyperparameter space. By automating this process, AutoML reduces the time and expertise needed to find optimal model configurations, making machine learning more accessible and effective. In this project we aim in testing existing solutions.
  * **Links**:
    * https://www.automl.org/ixautoml/


==== [SBK] Active learing with SHAP-guided sampling ====
  * **Student:** FIXME {{:courses:wshop:topics:peer.png?40|}} {{:courses:xai:winner.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:FIXME:]]
  * **The goal of the project:** Implementation of novel method for active learning (comparison with existing ones)
  * **Technology:** Python, Keras/PyTorch, SHAP
  * **Description:**  Active learning in AI is a machine learning approach where the model selectively queries the most informative data points for labeling, reducing the amount of labeled data needed to achieve high performance. There are many different approaches to decide which regions contain the most informative samples. IN this project we want to implement several of state of the art approaches and compare it against explanation-based active learning, where the algorithm uses XAI output to decide which regions (or which datapoints) should be explored or labelled in order to improve performance.


==== [SBK] Dimensionality reduciton to speedup LUX ====
  * **Student:** Jan Zoń {{:courses:wshop:topics:fast.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2043:luxspeedup:start]]
  * **The goal of the project:** The goal is to improve LUX software to perform calculation in reduced dimensionality space
  * **Technology:** Python, Keras/PyTorch
  * **Description:**  LUX (Local Universal Rule-Based Explainer) is an XAI algorithm that produces explanations for any type of machine-learning model. It provides local explanations in a form of human-readable (and executable) rules, but also provide counterfactual explanations as well as visualization of the explanations. It creates explanations by selection of neighborhood data-points which is computationally intensive as it is based on clustering algorithms. In high dims spaces this is inefficient and has limited practical usage due to dimensionality curse. The goal would be to add dimensionality reduction step to the process and test efficiency improvements.
  * **Links:**
    * https://github.com/sbobek/lux
    * https://arxiv.org/abs/2310.14894

==== [SBK] Neurosymbolic Neural Networks====
  * **Student:** Jakub Samel, Aliaxandr Zybaila  {{:courses:xai:winner.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:nsnn:start|Jakub Samel]], [[..:projects:2024:nsnn2:start|Aliaxandr Zybaila]]
  * **The goal of the project:** MVP is to implement the same exmaple (e.g. Sudoku Solver) in all of the linked methods.
  * **Technology:** Python, Keras/PyTorch
  * **Description:** Neurosymbolic AI is an approach that combines neural networks' learning capabilities with symbolic reasoning systems to create models that can both learn from data and reason with structured knowledge. In this project we focus on neural networks as AI models and logic (or probabilistic logic) as symbolic knowledge representation.
  * **Links:**
    * https://neurosymbolic.asu.edu/advances-in-neuro-symbolic-reasoning-and-learning/
    * https://github.com/IBM/LNN
    * https://github.com/ML-KULeuven/deepproblog
    * https://docs.pyro.ai/en/dev/nn.html

==== [SBK] Explainable AI for images====
  * **Student:** Jakub Siwy,  Jarek Such {{:courses:xai:winner.png?30|}} {{:courses:wshop:topics:peer.png?40|}}
  * **Namespace in the wiki:** [[..:projects:2024:imgxai:start]]
  * **The goal of the project:** Implement and compare selected approaches for concept-based explanation of image classifiers
  * **Technology:** Python, Keras/PyTorch
  * **Description:** Concept-based explanation for images involves interpreting a model's predictions by associating them with high-level, human-understandable concepts rather than just pixel-level features. It allows users to understand how a model identifies specific visual elements, such as textures or shapes, that are crucial for its decision-making process. The project aim in evaluating several selected methods from this field.
  * **Links:**
    * https://arxiv.org/abs/2007.04612
    * https://github.com/rachtibat/zennit-crp


==== [SBK] Causal Autoencoder for anomaly detection====
  * **Student:** Natalia Kramarz  {{:courses:xai:winner.png?30|}} {{:courses:wshop:topics:peer.png?40|}}
  * **Namespace in the wiki:** [[..:projects:2024:causalae:start]]
  * **The goal of the project:** Implement and compare anomaly detection algorithm with/without causality component
  * **Technology:** Python, Keras/PyTorch
  * **Description:** Anomaly detection with autoencoders involves training a neural network to compress and then reconstruct input data, learning a compact representation of normal patterns. When the autoencoder encounters an anomaly, the reconstruction error is significantly higher, signaling the presence of an outlier or abnormal data. Usually there is no causal relationship encoded in the altent space. This project aims in evaluating methods that allows that and later utilize it for the sake of explainability.
  * **Links:**
    * https://docs.pyro.ai/en/dev/contrib.cevae.html
    * https://zenodo.org/records/11469702

==== [KKT] Explainable AI with graphs ====

  * **Student:** FIXME {{:courses:xai:winner.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:FIXME:]]
  * **The goal of the project:** Development of a prototype demonstrating the use of knowledge graphs to provide high-level explanations
  * **Technology:** Python, Machine learning, XAI, Semantic web
  * **Description:** Currently existing explainable AI methods that aim to explain the performance of black-box machine learning models focus on feature importance. That is, the output from XAI methods is information about which features contribute most to a decision. But in situations where the model has a large number of parameters, such an explanation may be incomprehensible even to data analysts (as it certainly will be to system users). A much better situation would be to have an ontology/knowledge graph describing the relationships between features, which would allow translating the output from XAI methods into something more high-level (at the level of concepts/classes instead of individual features). Preparing a prototype that implements this task is the goal of this project. The project will involve (a) creation of basic ML model with XAI layer (with SHAP?), (b) development of an ontology, <nowiki>(c)</nowiki> development of methods for translating low-level explanations into high-level ones (the last point will be the core issue of the project).
  * **Links:**
    * Starting point: [[https://doi.org/10.1016/j.artint.2021.103627|Knowledge graphs as tools for explainable machine learning: A survey]]

==== [KKT] JU cultural heritage in Linked Open Data ====

  * **Student:** Lena Kaczanowska {{:courses:xai:winner.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:FIXME:]]
  * **The goal of the project:** Preparation of a knowledge graph that is a subset of the Linked Open Data cloud describing the cultural heritage of our university
  * **Technology:** semantic web, knowledge graphs, Python (or other popular programming language)
  * **Description:** Many people who wrote themselves in history (they were rulers, scientists, authors, musicians, ...) studied and worked at the Jagiellonian University, and thus information about them can be found in publicly available databases, including graph databases federated in the Linked Open Data cloud. The aim of the project is to develop methods that will generate a coherent knowledge graph describing people associated with the University and their surroundings/accomplishments/other related facts. To achieve this, as part of the project it is necessary to: (a) select knowledge bases that may contain useful information, (b) develop entity resolution methods suitable for the task (making sure that two entities are exactly the same real person), (c) determine the extent of useful information about each person, in particular, the links between these individuals and the links to well-known individuals and general knowledge facts, (d) prepare sample queries demonstrating the usefulness of the knowledge graph thus created. The starting point will be the CAC database maintained by the JU Archives.
  * **Links:**
    * [[https://cac.historia.uj.edu.pl/?lang=en|Corpus academicum Cracoviense]]

==== [KKT] Social life documents summarization ====

  * **Student:** Aleksandra Jaroszek, Maciej Struski {{:courses:xai:winner.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:sociallife:]]
  * **The goal of the project:** Preparation of workflow for automatic social life documents summarization (in form of tags and titles)
  * **Technology:** Python, machine learning, data analysis, LLM
  * **Description:** Social life documents (PL: Dokumenty życia społecznego) are a broad category of cultural heritage objects including postcards, flyers, posters, propaganda materials, among others. For years this category has been neglected by librarians and researchers. It is only in recent years that attention is being paid to their value and the fact that all these documents allow us to better understand the culture and daily life of our ancestors. BUT: these documents have very poor metadata, making it virtually impossible to find documents that interest the researcher. The goal of the project is to try to develop a prototype workflow for automatic description of such documents, which will consist of at least: (a) an object detection model for recognizing things in images (many documents of social life have a visual component), the output of which will be a list of tags (object names) identified on the object, (b) a generative model summarizing in one sentence what the object represents. During the project you have to: (a) identify which models can be useful for such a task, (b) find/prepare the dataset with images, <nowiki>(c)</nowiki> evaluate the models.
  * **Links:**
    * [[https://pl.wikipedia.org/wiki/Dokumenty_%C5%BCycia_spo%C5%82ecznego|Dokumenty życia społecznego]] (PL Wikipedia)
    * Introduction to YOLO: https://www.datacamp.com/blog/yolo-object-detection-explained


==== [KKT] Semantic Search 101 ====

  * **Student:** Dominika Głowacka, Mikołaj Szymański {{:courses:wshop:topics:fast.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:semsearch:]]
  * **The goal of the project:** Preparation of a working semantic search demo for cultural heritage knowledge
  * **Technology:** Python, Semantic Web, natural language processing
  * **Description:** Semantic search is the idea of searching for specific objects in a knowledge base instead of relying on classical string matching. It is used, among others, in Google's search engine, which is based on the Google Knowledge Graph (Google also coined the name of the “things, not strings” paradigm). The goal of the project is to develop a functional demo that will demonstrate the use of the semantic search paradigm for cultural heritage data. The project will involve (a) review of existing tools for this task, (b) selection of a suitable knowledge graph as a basis (Europeana + Wikidata?), <nowiki>(c)</nowiki> development of the demo, (d) evaluation on prepared diverse use cases
  * **Links:**
    * {{https://wiki.iis.uj.edu.pl/_media/courses:semint:advanced_topics.pdf|General idea of semantic search (slides 80-86)}}

==== [KKT] Contextual LOD-based Exploration 101 ====

  * **Student:** FIXME {{:courses:wshop:topics:fast.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:FIXME:]]
  * **The goal of the project:** Preparation of a working contextual Linked Open Data-based exploration demo for cultural heritage knowledge
  * **Technology:** Python, Semantic Web
  * **Description:** Catalogs of digital heritage resources (e.g. Jagiellonian Digital Library or Europeana) contain only basic metadata about objects (title, author). But cultural heritage is not a collection of individual objects, but a whole network of interconnections. Fortunately, many of these connections have already been summarized in Linked Open Data (e.g., information on relatives), we just need to use them to enable various methods of exploration (influenced by, member of family, etc.). The goal of the project is to provide a functional demo that will demonstrate a few scenarios for cultural heritage collections exploration with LOD cloud. The project will involve (a) wyszukanie odpowiednich kolekcji dziedzictwa kulturowego (idealnie, aby zawierały już bezpośrednie linki do LOD), (b) selection of a suitable relations that can be used for exploration, <nowiki>(c)</nowiki> development of the demo
  * **Links:**
    * {{https://wiki.iis.uj.edu.pl/_media/courses:semint:advanced_topics.pdf|General idea of exploration with knowledge graphs (slides 87-99)}}

==== [KKT] GraphRAG 101 ====

  * **Student:** Ewa Kobrzyńska, Emilia Górnisiewicz {{:courses:wshop:topics:fast.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:graphrag1:|Ewa Kobrzyńska]], [[..:projects:2024:graphrag2:|Emilia Górnisiewicz]]
  * **The goal of the project:** Preparation of a working GraphRAG demo for cultural heritage domain
  * **Technology:** Python/Java, Semantic Web, LLM
  * **Description:** Due to their stochastic nature, Large Language Models are not suitable for answering questions that require reference to specific sources or facts. The answer to this problem was the emergence of the Retrieval-Augmented Generation (RAG) paradigm, in which a search algorithm calls upon an external database to retrieve information about sources. As experiments have shown, this approach also has a number of drawbacks, the most important of which is the lack of a broader context to explain specific facts/sources. Therefore, the natural next step was to use knowledge graphs as a source of knowledge, which led to the development of the GraphRAG (Graph-based RAG) paradigm.  The goal of the project is to develop a functional demo that will demonstrate the use of the GraphRAG paradigm for cultural heritage data. The project will involve (a) review of existing tools for this task, (b) selection of a suitable knowledge base, <nowiki>(c)</nowiki> development of the demo, (d) evaluation on prepared diverse use cases
  * **Links:**
    * [[https://link.springer.com/article/10.1007/s10676-024-09775-5|ChatGPT is bullshit]]
    * [[https://neo4j.com/blog/what-is-retrieval-augmented-generation-rag/|What is RAG]]
    * [[https://neo4j.com/developer-blog/graphrag-ecosystem-tools/|Getting started with GraphRAG]]

==== [KKT] FACE APIs comparison (cont.) ====

  * **Student:** Magdalena Gancarek, Klaudia Korczak {{:courses:xai:winner.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:faceapis:]]
  * **The goal of the project:** Comparison of the effectiveness of off-the-shelf APIs and pre-trained models for emotion recognition in non-trivial images
  * **Technology:** Python, data analysis
  * **Description:** The facial expression recognition tools are trained and evaluated on benchmark datasets that contain many expressions generated 'at the request' of the expressor and photographed en face. This does not match the reality, where expressions are not so strong and where the face is not always facing the camera. The project is the continuation of the previous research published by our team and extended during previous WSHOPs (and one MSc thesis). During the project, one should: (a) explore previous results, (b) integrate them together, <nowiki>(c)</nowiki> perform consistent experiments in a unified environment on a unified dataset (it may be necessary to find/add elements to the dataset!), (d) summarise the results by indicating the strengths and weaknesses (supported situations) for each API.
  * **Links:**
    * AffectNet: [[https://paperswithcode.com/dataset/affectnet]] (a benchmark dataset)
    * Our initial work is briefly summarized in the paper: [[https://link.springer.com/chapter/10.1007/978-3-031-06527-9_7|Evaluation of Selected APIs for Emotion Recognition from Facial Expressions]]


==== [KKT] APIs/models and benchmark datasets for speech-based emotion recognition (cont.) ====

  * **Student:** Klaudia Ropel {{:courses:xai:winner.png?30|}}
  * **Namespace in the wiki:** [[..:projects:2024:emospeech:]]
  * **The goal of the project:** Review of the state-of-the-art + evaluation of the solutions found on the selected subset of benchmark datasets
  * **Technology:** Google Scholar/ResearchGate/reading ;), Python, machine learning, data analysis
  * **Description:** The goal of the project is to create synthetic review of the state-of-the-art APIs/models and benchmark datasets in the area of emotion recognition from speech (both signal characteristic-based and spoken text-based). In the second part of the project, you will create the pipeline for evaluation of the speech-based emotion recognition APIs/models, i.e. prepare the dataset, predict the emotion based on sample, evaluate the prediction correctness, compare the accuracy of APIs/models under evaluation.
  * **Links:**
    * Starting point: //Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers// [[https://doi.org/10.1016/j.specom.2019.12.001|DOI:10.1016/j.specom.2019.12.001]]