We study the problem of Continual Distillation Learning (CDL) that considers Knowledge Distillation (KD) in the Continual Learning (CL) setup. A teacher model and a student model need to learn a sequence of tasks, and the knowledge of the teacher model will be distillated to the student in order to improve the student model. We introduce a novel method named CDL-Prompt that leverages prompt- based continual learning models to build the teacher-student model. We investigate how to utilize the prompts of the teacher model in the student model for knowledge distillation, and propose an attention-based prompt mapping scheme to use the teacher prompts for the student. We demonstrate that our method can be applied to different prompt-based continual learning models such as L2P, DualPrompt and CODA-Prompt to improve their performance using powerful teacher models. While recent CL methods focus on prompt learning, we show that our method can be utilized to build efficient CL models using prompt-based knowledge distillation.
CDL-Prompt is a framework for continual distillation learning that can be integrated into various prompt-based methods to improve performance.
CDL-Prompt(Using CODA baseline and ViT-base backbone) outperforms other prompt-based methods in both Cifar-100 and ImageNet-R datasets. Accuracy refers to the average accuracy for all 10 tasks. We train multiple times and take the average.
# | Teacher | Student | Baseline | Task-Number | Accuarcy(%) | Forgetting(%) |
---|---|---|---|---|---|---|
1 | # | ViT-Large | CODA [3] | 10 | 88.97 | 3.97 |
9 | ViT-Large | ViT-Base | CODA [3] | 10 | 87.69 | 5.40 |
2 | # | ViT-Large | Dual [2] | 10 | 87.56 | 4.99 |
7 | ViT-Large | ViT-Base | Dual [2] | 10 | 86.57 | 5.72 |
3 | # | ViT-Large | L2P [1] | 10 | 86.36 | 5.98 |
8 | # | ViT-Base | CODA [3] | 10 | 86.16 | 5.63 |
6 | # | ViT-Base | Dual [2] | 10 | 84.66 | 5.46 |
5 | ViT-Large | ViT-Base | L2P [1] | 10 | 83.78 | 7.43 |
15 | ViT-Base | ViT-Small | CODA [3] | 10 | 83.24 | 7.63 |
4 | # | ViT-Base | L2P [1] | 10 | 83.02 | 6.06 |
13 | ViT-Base | ViT-Small | Dual [2] | 10 | 82.29 | 6.6 |
14 | # | ViT-Small | CODA [3] | 10 | 82.18 | 6.48 |
11 | ViT-Base | ViT-Small | L2P [1] | 10 | 80.24 | 7.31 |
12 | # | ViT-Small | Dual [2] | 10 | 79.85 | 6.12 |
10 | # | ViT-Small | L2P [1] | 10 | 77.71 | 7.12 |
21 | ViT-Base | ViT-Tiny | CODA [3] | 10 | 70.05 | 14.33 |
19 | ViT-Base | ViT-Tiny | Dual [2] | 10 | 68.58 | 10.79 |
17 | ViT-Base | ViT-Tiny | L2P [1] | 10 | 67.61 | 10.99 |
20 | # | ViT-Tiny | CODA [3] | 10 | 65.05 | 13.55 |
18 | # | ViT-Tiny | Dual [2] | 10 | 62.63 | 14.74 |
16 | # | ViT-Tiny | L2P [1] | 10 | 60.68 | 13.98 |
# | Teacher | Student | Baseline | Task-Number | Accuarcy(%) | Forgetting(%) |
---|---|---|---|---|---|---|
1 | # | ViT-Large | CODA [3] | 10 | 78.79 | 4.46 |
9 | ViT-Large | ViT-Base | CODA [3] | 10 | 77.95 | 5.64 |
7 | ViT-Large | ViT-Base | Dual [2] | 10 | 76.36 | 4.27 |
8 | # | ViT-Base | CODA [3] | 10 | 75.78 | 5.70 |
2 | # | ViT-Large | Dual [2] | 10 | 74.95 | 4.93 |
3 | # | ViT-Large | L2P [1] | 10 | 74.19 | 5.31 |
5 | ViT-Large | ViT-Base | L2P [1] | 10 | 74.01 | 4.26 |
6 | # | ViT-Base | Dual [2] | 10 | 72.44 | 3.80 |
4 | # | ViT-Base | L2P [1] | 10 | 71.59 | 5.65 |
15 | ViT-Base | ViT-Small | CODA [3] | 10 | 70.06 | 8.70 |
13 | ViT-Base | ViT-Small | Dual [2] | 10 | 67.75 | 6.61 |
14 | # | ViT-Small | CODA [3] | 10 | 67.44 | 8.52 |
11 | ViT-Base | ViT-Small | L2P [1] | 10 | 65.04 | 7.38 |
12 | # | ViT-Small | Dual [2] | 10 | 64.27 | 5.93 |
10 | # | ViT-Small | L2P [1] | 10 | 61.95 | 6.52 |
19 | ViT-Base | ViT-Tiny | Dual [2] | 10 | 53.88 | 9.60 |
21 | ViT-Base | ViT-Tiny | CODA [3] | 10 | 53.13 | 13.92 |
17 | ViT-Base | ViT-Tiny | L2P [1] | 10 | 51.00 | 9.18 |
20 | # | ViT-Tiny | CODA [3] | 10 | 50.23 | 12.75 |
18 | # | ViT-Tiny | Dual [2] | 10 | 46.54 | 10.25 |
16 | # | ViT-Tiny | L2P [1] | 10 | 44.98 | 8.79 |
The code for CDL-Prompt.
@misc{2024CDL,
title={Continual Distillation Learning},
author={Qifan Zhang and Yunhui Guo and Yu Xiang},
year={2024},
eprint={2407.13911},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Send any comments or questions to Qifan Zhang: qifan.zhang@utdallas.edu
This work was supported in part by the DARPA Perceptually-enabled Task Guidance (PTG) Program under contract number HR00112220005 and the Sony Research Award Program.