Abstract: Knowledge Distillation (KD), which focuses on transferring semantic knowledge from a parameter-heavy teacher network to a more compact student network, has been widely and successfully used ...