In this competition, we challenge you to develop an efficient and effective end-to-end neural network backdoor removal technique to mitigate backdoor attacks given poisoned models. Your task is to submit a solution that takes in a (potentially) poisoned model and returns a sanitized model with the backdoor being mitigated (attack success rate drops). We provide a set of models trained with different poisoned datasets and of different model architectures.
Model Architectures:
For some backdoor removal techniques that may require specific model architectures (e.g., synthesizing specific output from a specific layer), we have included all the model architectures that will be used in our evaluation (including the Held-Out settings) in the ”model.py” file in the Starter Kit.
Metrics:
The evaluation will focus on evaluating and comparing the model’s performance before and after the submitted defense method. In particular, we will focus on three metrics: 1) Clean accuracy (ACC), evaluating the impact of the defense on the model performance. A smaller impact on the ACC is preferred. We especially use ACC as a strict cut-off indicator to stop evaluation sessions that cause ACC to drop more than 20%; 2) Poisoned accuracy (PACC), evaluating the number of samples with the backdoor trigger but still being assigned with correct labels, a higher PACC would indicate a better sanitization of the backdoor effects while maintaining model performance. PACC in our competition is the main evaluation metric; 3) Attack success rate (ASR), which measures the amount of sample being successfully misled to the target class(es) upon observing the trigger. A smaller ASR would indicate better backdoor sanitization. ASR in this competition is used as the tie-breaker if the two methods’ PACC scores are tied.
Baseline Defenses:
We present the evaluation results and implementations of two representative Trojan removal techniques, Neural Cleanse [1] and Adversarial Unlearning [2], using the metrics above on the Public Model Set included in the Starter Kit.
The challenge will be two folds:
The first period (till 2023/02/10) of the competition consists of two sets of evaluation model sets:
During the first period of the competition, along with the Public Model Set, participants will also be provided with limited in-distribution data for each model (which are drawn from the same distribution as the training set). The participants will be asked to submit their designed defense pipeline using the provided Google Colab. The Google Colab will check if the code satisfies our environment, package the submission, and forward it to our evaluation backend. If the evaluation successfully proceeds, you will receive an update notification, and you may check your score on the leaderboard.
Below, we provide step-by-step guidance for participating in our competitions:
Please follow the instructions in the following Google Colab to download and participate in the competition: Link
[1] Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao. “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks.” In 2019 IEEE Symposium on Security and Privacy (SP), pp. 707-723. IEEE, 2019.
[2] Yi Zeng, Si Chen, Won Park, Zhuoqing Mao, Ming Jin, and Ruoxi Jia. “Adversarial Unlearning of Backdoors via Implicit Hypergradient.” In International Conference on Learning Representations. 2022.
IEEE TRC’22 is supported by the granted funding to IEEE Smart Computing STC (Awarded by IEEE Computer Society Planning Committee for Emergying Techniques 2022, Dakota State University #845360).
Please contact Yi Zeng or Ruoxi Jia if you have any questions.