Reproducing LICO

This project focused on verifying the findings and expanding upon the evaluation and training methods from the paper LICO: Explainable Models with Language-Image Consistency. The main claims are that LICO: enhances interpretability by producing more explainable saliency maps in conjunction with a post-hoc explainability method and improves image classification performance without computational overhead during inference. We reproduced the key experiments conducted by Lei et al. however, the obtained results do not support the original claims....

A visualization of the LICO method from the authors' paper

Investigating lying in language models using RLHF

Using a formal framework of lying and deception, this project done with the AI Safety Hub aimed to investigate the effects of training language models using Reinforcement Learning from Human Feedback (RLHF) in the case where the human is consistently incorrect about a subset of the data. Instead of using a human, we train a language model “judge” to be consistently incorrect about a specific kind of question in a dataset, and the objective is then to see whether the model learns to lie about this kind of question, and whether this lying generalizes to other areas of the dataset....

Plot demonstrating truthfulness vs evaluated truthfulness

License Plate Reader

These days SOTA computer vision systems are all based on deep learning and require a lot of data to achieve great results, but what If we don’t have a lot of data, or just want to experience the pain that people went through before they could just slap a neural network on it? Well, in this project we created a car license plate reader based on “traditional” image processing techniques - just with Python and NumPy....

Image of the back of a car with a green bounding box over its license plate