===== Tematy projektów WSHOP -- wiosna 2023/2024 =====
==== [KKT] Context-rich descriptions via IIIF ====
* **Student:** Przemysław Pawlik
* **Namespace in the wiki:** [[..:projects:2024:iiif:]]
* **The goal of the project:** The goal of this project is to combine IIIF technology with a knowledge graph to provide metadata-rich descriptions via IIIF stack of technology
* **Technology:** Semantic Web, Python/Java
* **Description:** IIIF is a set of open APIs for getting images and their metadata from dedicated server(s) and displaying them to the user (it is of course more sophisticated, e.g., it provides rich image manipulation capabilities, but we will not dive into such details in this project). The goal of this project is to combine IIIF technology with a knowledge graph. As a result, alongside a scan of a document, the user will see a rich description about that document stored in a knowledge graph.
* Idea for the project: (a) store images on your local filesystem, (b) use [[https://jena.apache.org/documentation/fuseki2/|Apache Jena Fuseki]] for storing knowledge graph (ready-to-use standalone server with knowledge graph), (c) create some code to generate IIIF manifests with rich metadata (IIIF manifests contain every information needed to display the image and metadata), (d) use [[https://universalviewer.io/|Universal Viewer]] to display the content (it simply gets the IIIF manifest and do all the magic), (e) evaluate how these rich metadata can be searched by the users
* //Note: The goal of this project is to show a proof-of-concept of how such a thing could work. If there will be time, we will use real data/metadata and (or this can be done in following project / Master's Thesis)//
* **Links:**
* [[https://iiif.io/|IIIF (main page)]]
* [[https://iiif.io/get-started/how-iiif-works/|IIIF How It Works]] -- very short introduction
* [[https://iiif.io/get-started/training/|IIIF Training]] -- at the bottom you can find useful "Self-directed learning" section
* [[https://universalviewer.io/|Universal Viewer]]
* How does IIIF look like in practice?
* [[https://figgy.princeton.edu/concern/scanned_resources/484e82f7-1b84-4df7-a15d-c9b34ac2407a/manifest|IIIF Manifest for Gutenberg Bible]] (quite long, because it has 655 pages)
* [[https://uv-v4.netlify.app/#?manifest=https://figgy.princeton.edu/concern/scanned_resources/484e82f7-1b84-4df7-a15d-c9b34ac2407a/manifest|Gutenberg Bible showed in Universal Viewer]] (using IIIF Manifest linked above)
* [[https://jena.apache.org/documentation/fuseki2/|Apache Jena Fuseki]]
==== [KKT] YOLO for documents? ====
* **Groups and namespaces in the wiki:**
* Maciej Baczyński: [[..:projects:2024:yolo1:]]
* Aleksandra Jaroszek, Maciej Struski: [[..:projects:2024:yolo2:]]
* **The goal of the project:** Verify whether state-of-the-art object detection models are usable for documents (manuscripts, printed documents, music scores, etc)
* **Technology:** Python, machine learning, data analysis
* **Description:** The object detection models (like YOLO -- probably the most popular one) are used to identify things in the images. They are usually used for photos or videos, and they achieve very good performance here. The goal of the project, however, will be to examine whether such models are capable of recognizing elements of different types of documents: manuscripts, printed documents, music scores. The expected output of the model: "This part of the scan is handwritten text. While this part of the document is a plot. And here, you have a signature". During the project you have to: (a) identify which models can be useful for such a task, (b) find/prepare the dataset with images, (c) evaluate the models.
* **Links:**
* Introduction to YOLO: https://www.datacamp.com/blog/yolo-object-detection-explained
==== [KKT] HWR/HTR state-of-the-art evaluation ====
* **Groups and namespaces in the wiki:**
* Kamil Butryn: [[..:projects:2024:hwr1:]]
* Magdalena Gancarek, Klaudia Korczak: [[..:projects:2024:hwr2:]]
* **The goal of the project:** Explore and compare state-of-the-art handwriting recognition models/methods and benchmark datasets
* **Technology:** Google Scholar/ResearchGate/reading :), Python, machine learning, data analysis
* **Description:** The goal of the project is to create synthetic review of the state-of-the-art APIs/models and benchmark datasets in the area of handwriting recognition. In the second part of the project, you will create the pipeline for evaluation of such APIs/models.
* **Links:**
* [[wp>Handwriting_recognition|Handwriting recognition]]
* [[https://readcoop.eu/transkribus/public-models/|Public models @Transkribus]]
==== [KKT] Ontology for the SOLARIS synchrotron ====
* **Student:** FIXME
* **Namespace in the wiki:** [[..:projects:2024:FIXME:]]
* **The goal of the project:** Preparation and evaluation of the ontology
* **Technology:** Semantic Web
* **Description:** Work in the close collaboration with the SOLARIS team to create an ontology from the perspective of training and maintenance: model the core elements and beam lines, model their parameters, generate some instances on actual synchrotron logs.
* **Links:**
* [[https://synchrotron.uj.edu.pl/|SOLARIS]]
==== [KKT] Loki evaluation ====
* **Student:** FIXME
* **Namespace in the wiki:** [[..:projects:2024:FIXME:]]
* **The goal of the project:** Evaluation of the Loki on triplestore
* **Technology:** PHP, Semantic Web
* **Description:** Loki was moved from SWI-Prolog to the triplestore (see [[..:projects:2023:loki:]]). As a result it should be better, faster, etc. Now, it's time to evaluate this new setup: (a) performance tests, (b) usability, (c) docs consistency.
* **Links:**
* [[https://www.w3.org/TR/rdf11-primer/|RDF Primer]]
* [[https://loki.re/|Loki wiki]]
* [[https://en.wikipedia.org/wiki/Comparison_of_triplestores|Triplestores]]
==== [KKT] Support in BIRAFFE3 experiment ====
* **Student:** Agnieszka Felis, Mikołaj Golowski
* **Namespace in the wiki:** [[..:projects:2024:biraffe3:]]
* **The goal of the project:** Support in BIRAFFE3 experiment
* **Technology:** Python, data analysis
* **Description:** The project has three phases. During the first phase, you will take part in final fixes during the pilot study (March 2024). Then, in the second part, you will help with conducting the actual experiment (scope of work to be determined). Finally, you will perform some preliminary analyses on the actual data collected in the experiment (scope of work to be determined). If the work is done with appropriate involvement, you could also become a co-author of a publication on BIRAFFE3 in Nature Scientific Data.
* **Links:**
* [[https://www.nature.com/articles/s41597-022-01402-6|BIRAFFE2 in Nature Scientific Data]] -- at the end of a day, we want something similar for the BIRAFFE3 experiment
==== [KKT] Emotion recognition for everyday life - evaluation of the state-of-the-art ====
* **Group:** Łukasz Chlebek, Piotr Kalita, Paweł Wąsik
* **Namespace in the wiki:** [[..:projects:2024:emognition:]]
* **The goal of the project:** Replicate and evaluate methods and tools proposed for emotion recognition by [[https://emognition.com/|Emognition]] team from PWr
* **Technology:** reading :), Python, data analysis, machine learning
* **Description:** Our colleagues from [[https://emognition.com/|Emognition]] team deal with the same tasks as we in BIRAFFE series of experiments. In this project, we want to explore if their results are really as good as they say by following their papers/tools/methods (starting point is indicated by the Links below) and trying to replicate their results. In the subsequent projects (or Master Thesis) there will be possibility to use these methods and tools to other datasets like BIRAFFE2/BIRAFFE3.
* **Links:**
* Emognition Dataset: [[https://www.nature.com/articles/s41597-022-01262-0]]
* How to calculate features? Almost 900 features are described in: [[https://ieeexplore.ieee.org/document/9431143]]
* Various models evaluated in: [[https://ieeexplore.ieee.org/document/9767233]]
* Personalization of models discussed in: [[https://ieeexplore.ieee.org/document/9767502]]
==== [KKT] Affect predictions as probability blobs with RegFlow ====
* **Student:** FIXME
* **Namespace in the wiki:** [[..:projects:2024:FIXME:]]
* **The goal of the project:** Evaluate usefulness of the RegFlow method for 2-D emotion prediction task
* **Technology:** Python, data analysis
* **Description:** Continuation of the [[..:projects:2023:regflow:]] project (where RegFlow was successfully run and adapted to 2-D emotion space). The goal of the project is: (a) to train emotion prediction model(s) on the subset of the [[https://zenodo.org/record/5786104|BIRAFFE2]] and use RegFlow to show emotion predictions generated by the model(s), (b) to show the blobs as some kind of blobs (not as a cloud of points), (c) to evaluate the usefulness of such a solution.
* **Links:**
* RegFlow: {{https://arxiv.org/pdf/2011.14620.pdf|description}}, [[https://github.com/maciejzieba/regressionFlow|repo]]
* [[https://zenodo.org/record/5786104|BIRAFFE2]]
==== [KKT] Emotion prediction in artificial space (then mapped to actual emotion space) ====
* **Student:** Jan Malczewski
* **Namespace in the wiki:** [[..:projects:2024:emolatent:]]
* **The goal of the project:** Create a prediction model that maps features into artificial X-dimensional space. Then, map this space into actual emotions/labels space
* **Technology:** Python, machine learning, data analysis
* **Description:** Emotions in computational models are often represented as 2-dimensional Valence x Arousal space (sometimes it is a 3-D space with Dominance). When it comes to train prediction models, one try to map features space (features of physiological signals) into this 2-D space. A lot of research shows that this does not work. So, we want to try another approach: create a classifier that maps features into X-dimensional artificial space (with X >> 2), and then create some mapping from this space into the Valence x Arousal space. As X will be higher than 2 it should catch more sophisticated relations in the data. A mapping from X-dimensional space into 2-D space should also provide some information about accuracy of the prediction/mapping. \\ You can start with the DEAP dataset, but ultimately, the project should be done with BIRAFFE2 dataset. \\ There should be some "classical" model trained/downloaded as a baseline (to show that the new model is better). \\ Transformers should be useful to generate the artificial X-D space.
* **Links:**
* [[https://www.eecs.qmul.ac.uk/mmv/datasets/deap/|DEAP]] (the dataset can be obtained from KKT)
* [[https://zenodo.org/record/5786104|BIRAFFE2]]
==== [KKT] APIs/models and benchmark datasets for speech-based emotion recognition ====
* **Student:** Klaudia Ropel
* **Namespace in the wiki:** [[..:projects:2024:emospeech:]]
* **The goal of the project:** Review of the state-of-the-art + evaluation of the solutions found on the selected subset of benchmark datasets
* **Technology:** Google Scholar/ResearchGate/reading ;), Python, machine learning, data analysis
* **Description:** The goal of the project is to create synthetic review of the state-of-the-art APIs/models and benchmark datasets in the area of emotion recognition from speech (both signal characteristic-based and spoken text-based). In the second part of the project, you will create the pipeline for evaluation of the speech-based emotion recognition APIs/models, i.e. prepare the dataset, predict the emotion based on sample, evaluate the prediction correctness, compare the accuracy of APIs/models under evaluation.
* **Links:**
* Starting point: //Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers// [[https://doi.org/10.1016/j.specom.2019.12.001|DOI:10.1016/j.specom.2019.12.001]]
==== [KKT] Stimuli sets in Valence-Arousal and Pleasure-Arousal-Dominance ====
* **Student:** FIXME
* **Namespace in the wiki:** [[..:projects:2024:FIXME:]]
* **The goal of the project:** Review of existing datasets and verification whether Dominance is really (in)dependent factor
* **Technology:** Google Scholar, Python, data analysis
* **Description:** FIXME
* **Links:**
* Starting point: [[https://rstudio-pubs-static.s3.amazonaws.com/292892_6ade8ffdbd8344209a6b14de2a045ab0.html|Affective Image Set Builder]] by Andero Uusberg
==== [SBK] Counterfactual evaluation framework ====
* **Student:** FIXME
* **Namespace in the wiki:** [[..:projects:2024:cfeval:]]
* **The goal of the project:** The goal of this project is to implement a Python module that will cover all of the evalaution metrics from: [[https://link.springer.com/article/10.1007/s10618-022-00831-6|Counterfactual explanations and how to find them: literature review and benchmarking]]
* **Technology:** Python
* **Description:** Counterfactual explanations are hypothetical instances that explain how changing certain features of an input would lead to a different model prediction. For example, in the context of image classification, a counterfactual explanation might demonstrate how modifying the color of an object in an image would cause a model to classify it as a different category, or in a credit scoring system, a counterfactual explanation could demonstrate how increasing a borrower's income and decreasing their debt-to-income ratio would result in an approval instead of a rejection for a loan application. There are multiple different methods for constructing counterfactual, and many methods to evaluate them, but no unified framework for doing that.
* **Links:**
* [[https://link.springer.com/article/10.1007/s10618-022-00831-6|Counterfactual explanations and how to find them: literature review and benchmarking]]
==== [SBK] Time-series prototypes ====
* **Student:** Monika Rasz
* **Namespace in the wiki:** [[..:projects:2024:tsproto1:start]]
* **The goal of the project:** Reproduce results from a selected scientific paper describing prototypical model for time-series.
* **Technology:** Python, PyTorch, Keras
* **Description:** Prototypes in explainable AI are representative instances that encapsulate the typical characteristics of a given class or concept. They serve as exemplars to explain model predictions by illustrating the features most salient for classification or decision-making.In a healthcare scenario, prototypes could represent typical patient profiles within specific disease categories. For instance, a prototype for heart disease might exhibit high blood pressure, elevated cholesterol levels, and a family history of cardiovascular issues, serving as a representative example to explain why a patient's symptoms align with a heart disease diagnosis. The goal is to implement (reproduce) results from current SOTA methods for prototypical neural networks/models in time-series domain.
* **Links:**
* P2ExNet: Patch-based Prototype Explanation Network - https://arxiv.org/abs/2005.02006,
* source code: https://github.com/DominiqueMercier/P2ExNet
* Interpreting Deep Neural Networks through Prototype Factorization - https://gtvalab.github.io/assets/files/20-icdm-dnn-prototype.pdf
==== [SBK] Eyetracking for (Explainable) AI ====
* **Student:** Sebastian Sęczyk,Jakub Pleśniak
* **Namespace in the wiki:** [[..:projects:2024:tobiixai:start|Sebastian Sęczyk]], [[..:projects:2024:tobiixai2:start|Jakub Pleśniak]]
* **The goal of the project:** Tool for data labeling supported with eyetracking
* **Technology:** python,tobii
* **Description:** Eye tracking is the process of measuring and recording the movement of a person's gaze or eye movements to understand visual attention and cognitive processes. The aim is to develop a data labeling system integrating eye tracking, augmenting labels with inferred "reasons" by tracking user gaze to text, images, or data fragments associated with the label.
* **Links:**
* https://github.com/sbobek/tobii-pytracker
==== [SBK] CausalML ====
* **Student:** Natalia Kramarz, Michał Droś
* **Namespace in the wiki:** [[..:projects:2024:causalml1:start|Natalia Kramarz]], [[..:projects:2024:causalml2:start|Michał Droś]],
* **The goal of the project:** Implement selected causlML method to demonstrate its performance
* **Technology:** python
* **Description:** Causal machine learning (ML) aims to understand and infer causal relationships from data, enabling predictions about the effect of interventions or actions. There are frameworks for causal ML such as DoWhy, CausalNex, and EconML, which provide tools and methodologies for causal inference and analysis in various domains. The goal of the project is to First make an overview of these methods (try to launch them, summarize which data-modalities they work with (e.g. text/image/tabular) and implement selected method on a selected dataset to demonstrate its usage.
* **Links:**
* DoWhy: https://github.com/microsoft/dowhy
* CausalNex: https://causalnex.readthedocs.io/en/latest/
* EconML: https://github.com/microsoft/EconML
* CausalML: https://github.com/uber/causalml
* Pyro: https://github.com/pyro-ppl/pyro
* CausalImpact: https://google.github.io/CausalImpact/
* DAGsPy: https://github.com/microsoft/dagstermill
* Causal Discovery Toolbox (CDT): https://github.com/FenTechSolutions/CausalDiscoveryToolbox
* CausalGNN: https://github.com/fishmoon1234/CausalGNN
* CausalForest: https://github.com/grf-labs/grf
==== [SBK] OpenML dataset creation script for Meta-Learning ====
* **Student:** FIXME
* **Namespace in the wiki:** [[..:projects:2023:openmlds:]]
* **The goal of the project:** Prepare a script that will build meta-learnign dataset out of OpenML logs
* **Technology:** Python, [[https://docs.openml.org/APIs/|OpenML API]]
* **Description:** The main goal of the project is to create a script that will fetch all of the runs/pipelines and dataset from [[https://www.openml.org/|OpenML]] platform and create a dataset out of it. The challenge is to transform pipeline definitions which are code snippets into logical components of machine-learning pipeline (including deep neural networks). Such a dataset will serve as a learn-to-learn dataset for meta-learning solutions.
{{ :courses:wshop:openml-fetch.png?400 |}}
* **Links:**
* [[https://www.openml.org/|OpenML]]
==== [SBK] Time-SHAP implementation ====
* **Student:** Kamil Piskorz
* **Namespace in the wiki:** [[..:projects:2023:windowshap:start]]
* **The goal of the project:** The goal is to improve the SHAP XAI algorithm to work more efficiently on time-seris data
* **Technology:** Python, Keras/PyTorch
* **Description:** The objective is to thoroughly test and evaluate, with potential for improvement, the implementation of window-shap for explaining time series data.
* **Links:**
* [[https://arxiv.org/abs/2211.06507|WindowSHAP]]
* [[https://github.com/sbobek/WindowSHAP| WindowSHAP Github]]
==== [SBK] Red-Teaming with XAI ====
* **Student:** Jakub Samel
* **Namespace in the wiki:** [[..:projects:2023:redteam:start:]]
* **The goal of the project:**
* **Technology:** Python, Keras/PyTorch
* **Description:** Red-teaming of ML models involves assessing the robustness and vulnerabilities of a machine learning system by intentionally challenging it with adversarial attacks or realistic/counterfactual scenarios to identify weaknesses and improve its resilience to potential threats. This process helps enhance the security and reliability of the model by uncovering and addressing potential weaknesses in its design or implementation.
* **Links:**
* For instance: Text modality DS: https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification
* Tabular DS: https://www.kaggle.com/datasets/danofer/compass
* Image DS: take ready-to-use model like and analyze red-team it.
==== [SBK] Dimensionality reduciton to speedup LUX ====
* **Student:** FIXME
* **Namespace in the wiki:** [[..:projects:2023:FIXME:]]
* **The goal of the project:** The goal is to improve LUX software to perform calculation in reduced dimensionality space
* **Technology:** Python, Keras/PyTorch
* **Description:** LUX (Local Universal Rule-Based Explainer) is an XAI algorithm that produces explanations for any type of machine-learning model. It provides local explanations in a form of human-readable (and executable) rules, but also provide counterfactual explanations as well as visualization of the explanations. It creates explanations by selection of neighborhood data-points which is computationally intensive as it is based on clustering algorithms. In high dims spaces this is inefficient and has limited practical usage due to dimensionality curse. The goal would be to add dimensionality reduction step to the process and test efficiency improvements.
* **Links:**
* https://github.com/sbobek/lux
* https://arxiv.org/abs/2310.14894
==== [FIXME] Template ====
* **Student:** FIXME
* **Namespace in the wiki:** [[..:projects:2024:FIXME:]]
* **The goal of the project:** FIXME
* **Technology:** FIXME
* **Description:** FIXME
* **Links:**
* FIXME