Sussex Research Online: No conditions. Results ordered -Date Deposited. 2023-11-14T00:31:06Z EPrints https://sro.sussex.ac.uk/images/sitelogo.png http://sro.sussex.ac.uk/ 2022-10-12T08:48:50Z 2022-10-19T09:58:25Z http://sro.sussex.ac.uk/id/eprint/108435 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/108435 2022-10-12T08:48:50Z Okapi: generalising better by making statistical matches match

We propose Okapi, a simple, efficient, and general method for robust semi-supervised learning based on online statistical matching. Our method uses a nearest-neighbours-based matching procedure to generate cross-domain views for a consistency loss, while eliminating statistical outliers. In order to perform the online matching in a runtime- and memory-efficient way, we draw upon the self-supervised literature and combine a memory bank with a slow-moving momentum encoder. The consistency loss is applied within the feature space, rather than on the predictive distribution, making the method agnostic to both the modality and the task in question. We experiment on the WILDS 2.0 datasets (Sagawa et al., 2022), which significantly expands the range of modalities, applications, and shifts available for studying and benchmarking real-world unsupervised adaptation. Contrary to Sagawa et al., 2022, we show that it is in fact possible to leverage additional unlabelled data to improve upon empirical risk minimisation (ERM) results with the right method. Our method outperforms the baseline methods in terms of out-of-distribution (OOD) generalisation on the iWildCam (a multi-class classification task) and PovertyMap (a regression task) image datasets as well as the CivilComments (a binary classification task) text dataset. Furthermore, from a qualitative perspective, we show the matches obtained from the learned encoder are strongly semantically related. Code for our paper is publicly available at https://github.com/wearepal/okapi/.

Myles Bartlett 427307 Sara Romiti 460797 Viktoriia Sharmanska 345370 Novi Quadrianto 335583
2022-08-09T12:15:12Z 2022-11-30T11:50:28Z http://sro.sussex.ac.uk/id/eprint/107287 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/107287 2022-08-09T12:15:12Z Elements DAD-3DHeads: a large-scale dense, accurate and diverse dataset for 3D head alignment from a single image

We present DAD-3DHeads, a dense and diverse large-scale dataset, and a robust model for 3D Dense Head Alignment in-the-wild. It contains annotations of over 3.5K landmarks that accurately represent 3D head shape compared to the ground-truth scans. The data-driven model, DAD-3DNet, trained on our dataset, learns shape, expression, and pose parameters, and performs 3D reconstruction of a FLAME mesh. The model also incorporates a landmark prediction branch to take advantage of rich supervision and co-training of multiple related tasks. Experimentally, DAD- 3DNet outperforms or is comparable to the state-of-the-art models in (i) 3D Head Pose Estimation on AFLW2000-3D and BIWI, (ii) 3D Face Shape Reconstruction on NoW and Feng, and (iii) 3D Dense Head Alignment and 3D Landmarks Estimation on DAD-3DHeads dataset. Finally, diversity of DAD-3DHeads in camera angles, facial expressions, and occlusions enables a benchmark to study in-the-wild generalization and robustness to distribution shifts. The dataset webpage is https://p.farm/research/dad-3dheads.

Tetiana Martyniuk Orest Kupyn Yana Kurlyak Igor Krashenyi Jiři Matas Viktoriia Sharmanska 345370
2022-08-04T17:00:49Z 2023-10-21T01:00:04Z http://sro.sussex.ac.uk/id/eprint/107246 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/107246 2022-08-04T17:00:49Z Elements RealPatch: a statistical matching framework for model patching with real samples

Machine learning classifiers are typically trained to minimise the average error across a dataset. Unfortunately, in practice, this process often exploits spurious correlations caused by subgroup imbalance within the training data, resulting in high average performance but highly variable performance across subgroups. Recent work to address this problem proposes model patching with CAMEL. This previous approach uses generative adversarial networks to perform intra-class inter-subgroup data augmentations, requiring (a) the training of a number of computationally expensive models and (b) sufficient quality of model’s synthetic outputs for the given domain. In this work, we propose RealPatch, a framework for simpler, faster, and more data-efficient data augmentation based on statistical matching. Our framework performs model patching by augmenting a dataset with real samples, mitigating the need to train generative models for the target task. We demonstrate the effectiveness of RealPatch on three benchmark datasets, CelebA, Waterbirds and a subset of iWildCam, showing improvements in worst-case subgroup performance and in subgroup performance gap in binary classification. Furthermore, we conduct experiments with the imSitu dataset with 211 classes, a setting where generative model-based patching such as CAMEL is impractical. We show that RealPatch can successfully eliminate dataset leakage while reducing model leakage and maintaining high utility. The code for RealPatch can be found at https://github.com/wearepal/RealPatch.

Sara Romiti 460797 Christopher Inskip 310576 Viktoriia Sharmanska 345370 Novi Quadrianto 335583
2021-11-30T08:57:28Z 2021-11-30T08:58:43Z http://sro.sussex.ac.uk/id/eprint/103147 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/103147 2021-11-30T08:57:28Z Elements Curriculum learning of multiple tasks

Sharing information between multiple tasks enables algorithms to achieve good generalization performance even from small amounts of training data. However, in a realistic scenario of multi-task learning not all tasks are equally related to each other, hence it could be advantageous to transfer information only between the most related tasks. In this work we propose an approach that processes multiple tasks in a sequence with sharing between subsequent tasks instead of solving all tasks jointly. Subsequently, we address the question of curriculum learning of tasks, i.e. finding the best order of tasks to be learned. Our approach is based on a generalization bound criterion for choosing the task order that optimizes the average expected classification performance over all tasks. Our experimental results show that learning multiple related tasks sequentially can be more effective than learning them jointly, the order in which tasks are being solved affects the overall performance, and that our model is able to automatically discover a favourable order of tasks.

Anastasia Pentina Viktoriia Sharmanska 345370 Christoph H Lampert
2021-11-30T08:45:31Z 2021-11-30T11:11:03Z http://sro.sussex.ac.uk/id/eprint/103146 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/103146 2021-11-30T08:45:31Z Elements Optimization of deep learning methods for visualization of tumor heterogeneity and brain tumor grading through digital pathology

Background: Variations in prognosis and treatment options for gliomas are dependent on tumor grading. When tissue is available for analysis, grade is established based on histological criteria. However, histopathological diagnosis is not always reliable or straight-forward due to tumor heterogeneity, sampling error, and subjectivity, and hence there is great interobserver variability in readings.

Methods: We trained convolutional neural network models to classify digital whole-slide histopathology images from The Cancer Genome Atlas. We tested a number of optimization parameters.

Results: Data augmentation did not improve model training, while a smaller batch size helped to prevent overfitting and led to improved model performance. There was no significant difference in performance between a modular 2-class model and a single 3-class model system. The best models trained achieved a mean accuracy of 73% in classifying glioblastoma from other grades and 53% between WHO grade II and III gliomas. A visualization method was developed to convey the model output in a clinically relevant manner by overlaying color-coded predictions over the original whole-slide image.

Conclusions: Our developed visualization method reflects the clinical decision-making process by highlighting the intratumor heterogeneity and may be used in a clinical setting to aid diagnosis. Explainable artificial intelligence techniques may allow further evaluation of the model and underline areas for improvements such as biases. Due to intratumor heterogeneity, data annotation for training was imprecise, and hence performance was lower than expected. The models may be further improved by employing advanced data augmentation strategies and using more precise semiautomatic or manually labeled training data.

An Hoai Truong Viktoriia Sharmanska 345370 Clara Limbӓck-Stanic Matthew Grech-Sollars
2021-11-30T08:34:07Z 2023-04-27T09:18:21Z http://sro.sussex.ac.uk/id/eprint/103145 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/103145 2021-11-30T08:34:07Z Elements Head2Head++: deep facial attributes re-targeting

Facial video re-targeting is a challenging problem aiming to modify the facial attributes of a target subject in a seamless manner by a driving monocular sequence. We leverage the 3D geometry of faces and Generative Adversarial Networks (GANs) to design a novel deep learning architecture for the task of facial and head reenactment. Our method is different to purely 3D model-based approaches, or recent image-based methods that use Deep Convolutional Neural Networks (DCNNs) to generate individual frames. We manage to capture the complex non-rigid facial motion from the driving monocular performances and synthesise temporally consistent videos, with the aid of a sequential Generator and an ad-hoc Dynamics Discriminator network. We conduct a comprehensive set of quantitative and qualitative tests and demonstrate experimentally that our proposed method can successfully transfer facial expressions, head pose and eye gaze from a source video to a target subject, in a photo-realistic and faithful fashion, better than other state-of-the-art methods. Most importantly, our system performs end-to-end reenactment in nearly real-time speed (18 fps).

Michail Christos Doukas Mohammad Rami Koujan Viktoriia Sharmanska 345370 Anastasios Roussos Stefanos Zafeiriou
2021-10-22T10:39:56Z 2021-10-22T10:39:56Z http://sro.sussex.ac.uk/id/eprint/102428 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/102428 2021-10-22T10:39:56Z Elements HeadGAN: one-shot neural head synthesis and editing

Recent attempts to solve the problem of head reenactment using a single reference image have shown promising results. However, most of them either perform poorly in terms of photo-realism, or fail to meet the identity preservation problem, or do not fully transfer the driving pose and expression. We propose HeadGAN, a novel system that conditions synthesis on 3D face representations, which can be extracted from any driving video and adapted to the facial geometry of any reference image, disentangling identity from expression. We further improve mouth movements, by utilising audio features as a complementary input. The 3D face representation enables HeadGAN to be further used as an efficient method for compression and reconstruction and a tool for expression and pose editing.

Michail Christos Doukas Stefanos Zafeiriou Viktoriia Sharmanska 345370
2019-04-08T10:13:48Z 2020-11-09T13:11:05Z http://sro.sussex.ac.uk/id/eprint/83076 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/83076 2019-04-08T10:13:48Z Discovering fair representations in the data domain

Interpretability and fairness are critical in computer vision and machine learning applications, in particular when dealing with human outcomes, e.g. inviting or not inviting for a job interview based on application materials that may include photographs. One promising direction to achieve fairness is by learning data representations that remove the semantics of protected characteristics, and are therefore able to mitigate unfair outcomes. All available models however learn latent embeddings which comes at the cost of being uninterpretable. We propose to cast this problem as data-to-data translation, i.e. learning a mapping from an input domain to a fair target domain, where a fairness definition is being enforced. Here the data domain can be images, or any tabular data representation. This task would be straightforward if we had fair target data available, but this is not the case. To overcome this, we learn a highly unconstrained mapping by exploiting statistics of residuals -- the difference between input data and its translated version -- and the protected characteristics. When applied to the CelebA dataset of face images with gender attribute as the protected characteristic, our model enforces equality of opportunity by adjusting the eyes and lips regions. Intriguingly, on the same dataset we arrive at similar conclusions when using semantic attribute representations of images for translation. On face images of the recent DiF dataset, with the same gender attribute, our method adjusts nose regions. In the Adult income dataset, also with protected gender attribute, our model achieves equality of opportunity by, among others, obfuscating the wife and husband relationship. Analyzing those systematic changes will allow us to scrutinize the interplay of fairness criterion, chosen protected characteristics, and prediction performance.

Novi Quadrianto 335583 Viktoriia Sharmanska 345370 Oliver Thomas
2017-11-08T09:40:36Z 2020-11-09T13:07:22Z http://sro.sussex.ac.uk/id/eprint/71054 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/71054 2017-11-08T09:40:36Z Recycling privileged learning and distribution matching for fairness

Equipping machine learning models with ethical and legal constraints is a serious issue; without this, the future of machine learning is at risk. This paper takes a step forward in this direction and focuses on ensuring machine learning models deliver fair decisions. In legal scholarships, the notion of fairness itself is evolving and multi-faceted. We set an overarching goal to develop a unified machine learning framework that is able to handle any definitions of fairness, their combinations, and also new definitions that might be stipulated in the future. To achieve our goal, we recycle two well-established machine learning techniques, privileged learning and distribution matching, and harmonize them for satisfying multi-faceted fairness definitions. We consider protected characteristics such as race and gender as privileged information that is available at training but not at test time; this accelerates model training and delivers fairness through unawareness. Further, we cast demographic parity, equalized odds, and equality of opportunity as a classical two-sample problem of conditional distributions, which can be solved in a general form by using distance measures in Hilbert Space. We show several existing models are special cases of ours. Finally, we advocate returning the Pareto frontier of multi-objective minimization of error and unfairness in predictions. This will facilitate decision makers to select an operating point and to be accountable for it.

Novi Quadrianto 335583 Viktoriia Sharmanska 345370
2017-07-19T11:05:47Z 2019-05-15T13:23:25Z http://sro.sussex.ac.uk/id/eprint/69331 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/69331 2017-07-19T11:05:47Z In the era of deep convolutional features: are attributes still useful privileged data?

Our answer is, if used for challenging computer vision tasks, attributes are useful privileged data. We introduce a learning framework called learning using privileged information (LUPI) to the computer vision field to solve the object recognition task in images. We want computers to be able to learn more efficiently at the expense of providing extra information during training time. In this chapter, we focus on semantic attributes as a source of additional information about image data. This information is privileged to image data as it is not available at test time. Recently, image features from deep convolutional neural networks (CNNs) have become primary candidates for many visual recognition tasks. We will therefore analyze the usefulness of attributes as privileged information in the context of deep CNN features as image representation. We explore two maximum-margin LUPI techniques and provide a kernelized version of them to handle nonlinear binary classification problems. We interpret LUPI methods as learning to identify easy and hard objects in the privileged space and transferring this knowledge to train a better classifier in the original data space. We provide a thorough analysis and comparison of information transfer from privileged to the original data spaces for two maximum-margin LUPI methods and a recently proposed probabilistic LUPI method based on Gaussian processes. Our experiments show that in a typical recognition task such as deciding whether an object is “present” or “not present” in an image, attributes do not lead to improvement in the prediction performance when used as privileged information. In an ambiguous vision task such as determining how “easy” or “difficult” it is to spot an object in an image, we show that attribute representation is useful privileged information for deep CNN image features.

Viktoriia Sharmanska 345370 Novi Quadrianto 335583
2017-07-19T10:58:19Z 2018-11-22T15:05:46Z http://sro.sussex.ac.uk/id/eprint/69330 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/69330 2017-07-19T10:58:19Z Learning using privileged information

When applying machine learning techniques to real-world problems, prior knowledge plays a crucial role in enriching the learning system. This prior knowledge is typically defined by domain experts and can be integrated into machine learning algorithms in a variety of ways: as a preference of certain prediction functions over others, as a Bayesian prior over parameters, or as additional information about the samples in the training set used for learning a prediction function. The latter setup is called learning using privileged information (LUPI) and was adopted by Vapnik and Vashist in (Neural Netw, 2009). Formally, LUPI refers to the setting when, in addition to the main data modality, the learning system has access to an extra source of information about the training examples. The additional source of information is only available during training and therefore is called privileged. The main goal of LUPI is to utilize privileged information and to learn a better model in the main data modality than one would learn without the privileged source. As an illustration, for protein classification based on amino-acid sequences, the protein tertiary structure can be considered additional information. Another example is recognizing objects in images; the textual information in the form of image tags contains additional object descriptions and can be used as privileged.

Viktoriia Sharmanska 345370 Novi Quadrianto 335583
2016-06-03T12:28:25Z 2017-06-19T09:25:37Z http://sro.sussex.ac.uk/id/eprint/61286 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/61286 2016-06-03T12:28:25Z Learning using Unselected Features (LUFe)

Feature selection has been studied in machine learning and data mining for many years, and is a valuable way to improve classification accuracy while reducing model complexity. Two main classes of feature selection methods - filter and wrapper - discard those features which are not selected, and do not consider them in the predictive model. We propose that these unselected features may instead be used as an additional source of information at train time. We describe a strategy called Learning using Unselected Features (LUFe) that allows selected and unselected features to serve different functions in classification. In this framework, selected features are used directly to set the decision boundary, and unselected features are utilised in a secondary role, with no additional cost at test time. Our empirical results on 49 textual datasets show that LUFe can improve classification performance in comparison with standard wrapper and filter feature selection.

Joseph G Taylor 327381 Viktoriia Sharmanska 345370 Kristian Kersting David Weir 2860 Novi Quadrianto 335583
2016-04-18T09:28:44Z 2017-06-16T10:20:54Z http://sro.sussex.ac.uk/id/eprint/60510 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/60510 2016-04-18T09:28:44Z Ambiguity helps: classification with disagreements in crowdsourced annotations

Imagine we show an image to a person and ask her/him to decide whether the scene in the image is warm or not warm, and whether it is easy or not to spot a squirrel in the image. For exactly the same image, the answers to those questions are likely to differ from person to person. This is because the task is inherently ambiguous. Such an ambiguous, therefore challenging, task is pushing the boundary of computer vision in showing what can and can not be learned from visual data. Crowdsourcing has been invaluable for collecting annotations. This is particularly so for a task that goes beyond a clear-cut dichotomy as multiple human judgments per image are needed to reach a consensus. This paper makes conceptual and technical contributions. On the conceptual side, we define disagreements among annotators as privileged information about the data instance. On the technical side, we propose a framework to incorporate annotation disagreements into the classifiers. The proposed framework is simple, relatively fast, and outperforms classifiers that do not take into account the disagreements, especially if tested on high confidence annotations.

Viktoriia Sharmanska 345370 Daniel Hernández-Lobato Jose Miguel Hernández-Lobato Novi Quadrianto 335583
2016-04-18T09:26:37Z 2017-06-16T10:21:01Z http://sro.sussex.ac.uk/id/eprint/60509 This item is in the repository with the URL: http://sro.sussex.ac.uk/id/eprint/60509 2016-04-18T09:26:37Z Learning from the mistakes of others: matching errors in cross dataset learning

Can we learn about object classes in images by looking at a collection of relevant 3D models? Or if we want to learn about human (inter-)actions in images, can we benefit from videos or abstract illustrations that show these actions? A common aspect of these settings is the availability of additional or privileged data that can be exploited at training time and that will not be available and not of interest at test time. We seek to generalize the learning with privileged information (LUPI) framework, which requires additional information to be defined per image, to the setting where additional information is a data collection about the task of interest. Our framework minimizes the distribution mismatch between errors made in images and in privileged data. The proposed method is tested on four publicly available datasets: Image+ClipArt, Image+3Dobject, and Image+Video. Experimental results reveal that our new LUPI paradigm naturally addresses the cross-dataset learning.

Viktoriia Sharmanska 345370 Novi Quadrianto 335583