Abstract:Computer vision plays a crucial role in the field of intelligent perception. Existing methods for psychological state perception are typically limited to single tasks such as facial expression recognition or remote photoplethysmography, making it difficult to achieve collaborative perception of multidimensional features. Additionally, approaches that integrate multimodal physiological signals face high computational costs. To address these challenges, this paper proposes a non-contact psychological state perception method based on multi-task rotation learning. The proposed approach utilizes a multi-task model to process facial video, simultaneously performing three tasks: rPPG heart rate signal extraction, emotional valence and arousal prediction, and psychological state classification. Experimental results show that the model achieves an average absolute error of 3.78 for rPPG heart rate signal extraction, prediction accuracies of 97.47% and 96.75% for emotional valence and arousal, respectively, and a classification accuracy of 97.42% for psychological state. This method provides an efficient multi-task processing solution for non-contact psychological state perception, offering significant theoretical and practical value.