About ABAW

The ABAW Workshop and Competition has a unique aspect of fostering cross-pollination of different disciplines, bringing together experts (from academia, industry, and government) and researchers of mobile and ubiquitous computing, computer vision and pattern recognition, artificial intelligence and machine learning, multimedia, robotics, HCI, ambient intelligence and psychology. The diversity of human behavior, the richness of multi-modal data that arises from its analysis, and the multitude of applications that demand rapid progress in this area ensure that our events provide a timely and relevant discussion and dissemination platform.

The ABAW Workshop and Competition is a continuation of the respective Workshops and Competitions held at IEEE CVPR 2023, ECCV 2022, IEEE CVPR 2022, ICCV 2021, IEEE FG 2020 (a), IEEE FG 2020 (B) and IEEE CVPR 2017 Conferences.

Organisers

Dimitrios Kollias

Queen Mary University of London, UK d.kollias@qmul.ac.uk

Stefanos Zafeiriou

Imperial College London, UK s.zafeiriou@imperial.ac.uk

Irene Kotsia

Cogitat Ltd, UK irene@cogitat.io

Panagiotis Tzirakis

Hume AI, USA panagiotis@hume.ai

Alan Cowen

Hume AI, USA alan@hume.ai

Data Chairs

Alice Baird, Hume AI, USA

Chris Gagne, Hume AI, USA

Chunchang Shao, Queen Mary University of London, UK

Guanyu Hu, Queen Mary University of London, UK & Xi'an Jiaotong University, China

The Workshop

Call for Papers

This Workshop will solicit contributions on the recent progress of recognition, analysis, generation-synthesis and modelling of face, body, gesture, speech, audio, text and language while embracing the most advanced systems available for such in-the-wild (i.e., in unconstrained environments) analysis, and across modalities like face to voice. In parallel, this Workshop will solicit contributions towards building fair, explainable, trustworthy and privacy-aware models that perform well on all subgroups and improve in-the-wild generalisation.

Original high-quality contributions, in terms of databases, surveys, studies, foundation models, techniques and methodologies (either uni-modal or multi-modal; uni-task or multi-task ones) are solicited on -but are not limited to- the following topics:

facial expression (basic, compound or other) or micro-expression analysis

facial action unit detection

valence-arousal estimation

physiological-based (e.g.,EEG, EDA) affect analysis

face recognition, detection or tracking

body recognition, detection or tracking

gesture recognition or detection

pose estimation or tracking

activity recognition or tracking

lip reading and voice understanding

face and body characterization (e.g., behavioral understanding)

characteristic analysis (e.g., gait, age, gender, ethnicity recognition)

group understanding via social cues (e.g., kinship, non-blood relationships, personality)

video, action and event understanding

digital human modeling

characteristic analysis (e.g., gait, age, gender, ethnicity recognition)

violence detection

autonomous driving

domain adaptation, domain generalisation, few- or zero-shot learning for the above cases

fairness, explainability, interpretability, trustworthiness, privacy-awareness, bias mitigation and/or subgroup distribution shift analysis for the above cases

editing, manipulation, image-to-image translation, style mixing, interpolation, inversion and semantic diffusion for all afore mentioned cases

Workshop Important Dates

Paper Submission Deadline: 23:59:59 AoE (Anywhere on Earth) March 30, 2024

Review decisions sent to authors; Notification of acceptance: April 10, 2024

Camera ready version: April 14, 2024

Submission Information

The paper format should adhere to the paper submission guidelines for main CVPR 2024 proceedings style. Please have a look at the Submission Guidelines Section here.

We welcome full long paper submissions (between 5 and 8 pages, excluding references or supplementary materials; a paper submission should be at least 4 pages long to be considered for publication). All submissions must be anonymous and conform to the CVPR 2024 standards for double-blind review.

All papers should be submitted using this CMT website.

All accepted manuscripts will be part of CVPR 2024 conference proceedings.

At the day of the workshop, oral presentations will be conducted by authors who are attending in-person.

The Workshop's Agenda

Keynote Speaker: Angelica Lim

Biography

Dr. Angelica Lim is the Director of the Rosie Lab, and an Assistant Professor in the School of Computing Science at Simon Fraser University (SFU). Previously, she led the Emotion and Expressivity teams for the Pepper humanoid robot at SoftBank Robotics. She received her B.Sc. in Computing Science with Artificial Intelligence Specialization from SFU and a Ph.D. and M.Sc. in Computer Science (Intelligence Science) from Kyoto University, Japan. She and her team have received Best Paper in Entertainment Robotics and Cognitive Robotics Awards at IROS 2011 and 2022, and Best Demo and LBR at HRI 2021 and 2023. She has been featured on the BBC, TEDx, hosted a TV documentary on robotics, and was recently featured in Forbes 20 Leading Women in AI. Her research interests include multimodal machine learning, affective computing, and human-robot interaction.

Title: Social Signals in the Wild: Multimodal Machine Learning for Human-Robot Interaction

Science fiction has long promised us interfaces and robots that interact with us as smoothly as humans do - Rosie the Robot from The Jetsons, C-3PO from Star Wars, and Samantha from Her. Today, interactive robots and voice user interfaces are moving us closer to effortless, human-like interactions in the real world. In this talk, I will discuss the opportunities and challenges in creating technologies that can finely analyze, detect and generate non-verbal communication in context, including gestures, gaze, auditory signals, and facial expressions. Specifically, I will discuss how we might allow robots to understand human social signals (including emotions, mental states, and attitudes) across cultures as well as recognize and generate expressions with controllability and diversity in mind.

The Competition

The Competition is a continuation of the ABAW Competition held last year in CVPR, the year before in ECCV and CVPR, the year before in ICCV and the year before in IEEE FG. It is split into the five below mentioned Challenges. Participants are invited to participate in at least one of these Challenges.

How to participate

In order to participate, teams will have to register. There is a maximum number of 8 participants in each team.

If you want to participate in any of the first 3 Challenges (VA Estimation, Expr Recognition, or AU Detection) you should follow the below procedure for registration.

The lead researcher should send an email from their official address (no personal emails will be accepted) to d.kollias@qmul.ac.uk with:

i) subject "6th ABAW Competition: Team Registration";

ii) this EULA (if the team is composed of only academics) or this EULA (if the team has at least one member coming from the industry) filled in, signed and attached;

iii) the lead researcher's official academic/industrial website; the lead researcher cannot be a student (UG/PG/Ph.D.);

iv) the emails of each team member, each one in a separate line in the body of the email;

v) the team's name;

vi) the point of contact name and email address (which member of the team will be the main point of contact for future communications, data access etc)

As a reply, you will receive access to the dataset's cropped/cropped-aligned images and annotations and other important information.

If you want to participate in the 4th Challenge (CE Recognition) you should follow the below procedure for registration.

The lead researcher should send an email from their official address (no personal emails will be accepted) to d.kollias@qmul.ac.uk with:

i) subject "6th ABAW Competition: Team Registration";

ii) this EULA (if the team is composed of only academics) or this EULA (if the team has at least one member coming from the industry) filled in, signed and attached;

iii) the lead researcher's official academic/industrial website; the lead researcher cannot be a student (UG/PG/Ph.D.);

iv) the emails of each team member, each one in a separate line in the body of the email;

v) the team's name;

vi) the point of contact name and email address (which member of the team will be the main point of contact for future communications, data access etc)

As a reply, you will receive access to the dataset's videos and other important information.

If you want to participate in the 5th Challenge please email competitions@hume.ai with the following information:

i) subject "6th ABAW Competition: Team Registration"

ii) name and email for the lead researcher's official academic/industrial website; the lead researcher cannot be a student (UG/PG/Ph.D.)

iii) the names and emails of each team member, each one in separate line in the body of the email

iv) team’s name

iv) the point of contact name and email address (which member of the team will be the main point of contact for future communications, data access etc) the team's name.

A reply to sign an EULA will be sent to all team members. When the EULA is signed by all team members a link to the data will be shared.

General Information

At the end of the Challenges, each team will have to send us:

i) a link to a Github repository where their solution/source code will be stored,

ii) a link to an ArXiv paper with 2-8 pages describing their proposed methodology, data used and results.

Each team will also need to upload their test set predictions on an evaluation server (details will be circulated when the test set is released).

After that, the winner of each Challenge, along with a leaderboard, will be announced.

There will be one winner per Challenge. The top-3 performing teams of each Challenge will have to contribute paper(s) describing their approach, methodology and results to our Workshop; the accepted papers will be part of the CVPR 2024 proceedings. All other teams are also able to submit paper(s) describing their solutions and final results; the accepted papers will be part of the CVPR 2024 proceedings.

The Competition's white paper (describing the Competition, the data, the baselines and results) will be ready at a later stage and will be distributed to the participating teams.

General Rules

1) Participants can contribute to any of the 5 Challenges.

2) In order to take part in any Challenge, participants will have to register as described above.

3) Any face detector whether commercial or academic can be used in the challenge. The paper accompanying the challenge result submission should contain clear details of the detectors/libraries used.

4) The top performing teams will have to share their solution (code, model weights, executables) with the organisers upon completion of the challenge; in this way the organisers will check so as to prevent cheating or violation of rules.

Competition Important Dates

Call for participation announced, team registration begins, data available: January 13, 2024

Challenges 1-4 Registration Deadline: February 18, 2024

Test set release: March 13, 2024

Final submission deadline (Predictions, Code and ArXiv paper): March 19, 2024

Winners Announcement: March 25, 2024

Final Paper Submission Deadline: 23:59:59 AoE (Anywhere on Earth) March 30, 2024

Review decisions sent to authors; Notification of acceptance: April 10, 2024

Camera ready version: April 14, 2024

Valence-Arousal (VA) Estimation Challenge

Database

For this Challenge, an augmented version of the Aff-Wild2 database will be used. This database is audiovisual (A/V), in-the-wild and in total consists of 594 videos of around 3M frames of 584 subjects annotated in terms of valence and arousal.

Rules

Only uni-task solutions will be accepted for this Challenge; this means that the teams should only develop uni-task (valence-arousal estimation task) solutions. Teams are allowed to use any -publicly or not- available pre-trained model (as long as it has not been pre-trained on Aff-Wild2). The pre-trained model can be pre-trained on any task (e.g., VA estimation, Expression Recognition, AU detection, Face Recognition). However when the teams are refining the model and developing the methodology they should not use any other annotations (expressions or AUs): the methodology should be purely uni-task, using only the VA annotations. This means that teams are allowed to use other databases' VA annotations, or generated/synthetic data, or any affine transformations, or in general data augmentation techniques (e.g., MixAugment) for increasing the size of the training dataset.

Performance Assessment

The performance measure (P) is the mean Concordance Correlation Coefficient (CCC) of valence and arousal:

CCC_arousal + CCC_valence

Baseline Results

The baseline network is a pre-trained on ImageNet ResNet-50 and its performance on the validation set is:

CCC_valence = 0.24, CCC_arousal = 0.20

P = 0.22

Expression (Expr) Recognition Challenge

Database

For this Challenge, the Aff-Wild2 database will be used. This database is audiovisual (A/V), in-the-wild and in total consists of 548 videos of around 2.7M frames that are annotated in terms of the 6 basic expressions (i.e., anger, disgust, fear, happiness, sadness, surprise), plus the neutral state, plus a category 'other' that denotes expressions/affective states other than the 6 basic ones.

Rules

Only uni-task solutions will be accepted for this Challenge; this means that the teams should only develop uni-task (expression recognition task) solutions. Teams are allowed to use any -publicly or not- available pre-trained model (as long as it has not been pre-trained on Aff-Wild2). The pre-trained model can be pre-trained on any task (e.g., VA estimation, Expression Recognition, AU detection, Face Recognition). However when the teams are refining the model and developing the methodology you should not use any other annotations (VA or AUs): the methodology should be purely uni-task, using only the Expr annotations. This means that teams are allowed to use other databases' Expr annotations, or generated/synthetic data (e.g. the data provided in the ECCV 2022 run of the ABAW Challenge), or any affine transformations, or in general data augmentation techniques (e.g., MixAugment) for increasing the size of the training dataset.

Performance Assessment

The performance measure (P) is the average F1 Score across all 8 categories: ∑ F1/8

Baseline Results

The baseline network is a pre-trained VGGFACE (with fixed convolutional weights and with MixAugment data augmentation technique) and its performance on the validation set is:

P = 0.25

Action Unit (AU) Detection Challenge

Database

For this Challenge, the Aff-Wild2 database will be used. This database is audiovisual (A/V), in-the-wild and in total consists of 547 videos of around 2.7M frames that are annotated in terms of 12 action units, namely AU1,AU2,AU4,AU6,AU7,AU10,AU12,AU15,AU23,AU24,AU25,AU26.

Rules

Only uni-task solutions will be accepted for this Challenge; this means that the teams should only develop uni-task (action unit detection task) solutions. Teams are allowed to use any -publicly or not- available pre-trained model (as long as it has not been pre-trained on Aff-Wild2). The pre-trained model can be pre-trained on any task (e.g., VA estimation, Expression Classification, AU detection, Face Recognition). However when the teams are refining the model and developing the methodology you should not use any other annotations (VA or Expr): the methodology should be purely uni-task, using only the AU annotations. This means that teams are allowed to use other databases' AU annotations, or generated/synthetic data, or any affine transformations, or in general data augmentation techniques (e.g., MixAugment) for increasing the size of the training dataset.

Performance Assessment

The performance measure (P) is the average F1 Score across all 12 categories: ∑ F1/12

Baseline Results

The baseline network is a pre-trained VGGFACE (with fixed convolutional weights) and its performance on the validation set is:

P = 0.39

Compound Expression (CE) Recognition Challenge

Database

For this Challenge, a part of C-EXPR-DB database will be used (56 videos in total). C-EXPR-DB is audiovisual (A/V) in-the-wild database and in total consists of 400 videos of around 200K frames; each frame is annotated in terms of 12 compound expressions. For this Challenge, the following 7 compound expressions will be considered: Fearfully Surprised, Happily Surprised, Sadly Surprised, Disgustedly Surprised, Angrily Surprised, Sadly Fearful and Sadly Angry.

Goal of the Challenge and Rules

Participants will be provided with a part of C-EXPR-DB database (56 videos in total), which will be unannotated, and will be required to develop their methodologies (supervised/self-supervised, domain adaptation, zero-/few-shot learning etc) for recognising the 7 compound expressions in this unannotated part, in a per-frame basis.

Teams are allowed to use any -publicly or not- available pre-trained model and any -publicly or not- available database (that contains any annotations, e.g. VA, basic or compound expressions, AUs)

Performance Assessment

The performance measure (P) is the average F1 Score across all 7 categories: ∑ F1/7

Emotional Mimicry Intensity (EMI) Estimation Challenge

Database

For this Challenge, the multimodal Hume-Vidmimic2 dataset is used which consists of more than 15,000 videos totaling over 25 hours. In this dataset, every participant was tasked with imitating a 'seed' video that showcased an individual displaying a particular emotion. Following the mimicry, they were then asked to assess the emotional intensity of the seed video by selecting from a range of predefined emotional categories. The following emotion dimensions are targeted: 'Admiration', 'Amusement', 'Determination', 'Empathic Pain', 'Excitement', and 'Joy'. A normalized score from 0 to 1 is provided as a ground truth value.

Performance Assessment

The performance measure is the average Pearson's correlation (ρ) across the 6 emotion dimensions: ∑ ρ/6

Baseline Results

We established baseline results using two different feature sets.

First, we employed pre-trained Vision Transformer (ViT) features, which were further processed through a three-layer Gated Recurrent Unit (GRU) network. This approach achieved a performance score of: 0.09.

Secondly, we utilized features extracted from Wav2Vec2, combined with a linear processing layer, which resulted in a performance score of: 0.24.

Additionally, we explored a multimodal approach by averaging the predictions from both unimodal methods, leading to a combined performance score of: 0.25.

Leaderboards

Valence-Arousal Estimation Challenge:

In total, 60 Teams participated in the VA Estimation Challenge. 23 Teams submitted their results. 10 Teams made invalid (incomplete) submissions, whilst surpassing the baseline. 3 Teams scored lower than the baseline. 10 Teams scored higher than the baseline and made valid submissions.

The winner of this Challenge is team Netease Fuxi AI Lab.
The runner-up is team DeepAVER.
In the third place is team CtyunAI.

Expression Recognition Challenge:

In total, 70 Teams participated in the Expression Recognition Challenge. 40 Teams submitted their results. 14 Teams made invalid (incomplete) submissions, whilst surpassing the baseline. 16 Teams scored lower than the baseline. 10 Teams scored higher than the baseline and made valid submissions.

The winner of this Challenge is team Netease Fuxi AI Lab.
The runner-up is team CtyunAI.
In the third place is team USTC-IAT-United.

Action Unit Detection Challenge:

In total, 63 Teams participated in the Action Unit Detection Challenge. 40 Teams submitted their results. 16 Teams made invalid (incomplete) submissions, whilst surpassing the baseline. 17 Teams scored lower than the baseline. 7 Teams scored higher than the baseline and made valid submissions.

The winner of this Challenge is team Netease Fuxi AI Lab.
The runner-up is team CtyunAI.
In the third place is team HSEmotion.

Compound Expression Recognition Challenge:

In total, 40 Teams participated in the Compound Expression Recognition Challenge. 17 Teams submitted their results. 12 Teams made invalid (incomplete) submissions. 5 Teams made valid submissions.

The winner of this Challenge is team Netease Fuxi AI Lab.
The runner-up is team HSEmotion.
In the third place is team USTC-IAT-United.

Emotional Mimicry Intensity Estimation Challenge:

In total, 7 Teams participated in the Emotional Mimicry Intensity Estimation Challenge. 4 Teams scored higher than the baseline and made valid submissions.

The winner of this Challenge is team Netease Fuxi AI Lab.
The runner-up is team HCAI-VIS.
In the third place is team USTC-IAT-United.

The leaderboards for all Challenges can be found below:

CVPR2024_ABAW_Leaderboard (first 4 Challenges)
CVPR2024_ABAW_EMI_Leaderboard (EMI Estimation Challenge)

Congratulations to all teams, winning and non-winning ones! Thank you very much for participating in our Competition.
All teams are invited to submit their methodologies-papers (please see Submission Information section above). All accepted papers will be part of the IEEE CVPR 2024 proceedings.
We are looking forward to receiving your submissions!

6th Workshop and Competition on

Affective Behavior Analysis in-the-wild (ABAW)

in conjunction with the IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2024

13:30 - 18:00 PDT, 18 June 2024, Arch 212, Seattle Convention Center, Seattle WA, USA

About ABAW

Organisers

Dimitrios Kollias

Stefanos Zafeiriou

Irene Kotsia

Panagiotis Tzirakis

Alan Cowen

Data Chairs

Alice Baird, Hume AI, USA

Chris Gagne, Hume AI, USA

Chunchang Shao, Queen Mary University of London, UK

Guanyu Hu, Queen Mary University of London, UK & Xi'an Jiaotong University, China

The Workshop

Call for Papers

Workshop Important Dates

Submission Information

The Workshop's Agenda

Keynote Speaker: Angelica Lim

Biography

Title: Social Signals in the Wild: Multimodal Machine Learning for Human-Robot Interaction

The Competition

How to participate

General Information

General Rules

Competition Important Dates

Valence-Arousal (VA) Estimation Challenge

Database

Rules

Performance Assessment

Baseline Results

Expression (Expr) Recognition Challenge

Database

Rules

Performance Assessment

Baseline Results

Action Unit (AU) Detection Challenge

Database

Rules

Performance Assessment

Baseline Results

Compound Expression (CE) Recognition Challenge

Database

Goal of the Challenge and Rules

Performance Assessment

Emotional Mimicry Intensity (EMI) Estimation Challenge

Database

Performance Assessment

Baseline Results

Leaderboards

Valence-Arousal Estimation Challenge:

Expression Recognition Challenge:

Action Unit Detection Challenge:

Compound Expression Recognition Challenge:

Emotional Mimicry Intensity Estimation Challenge:

Sponsors