Author
Contributions by role
Author 2
Fafa Wang
Beijing iQIYI Technology Co., Ltd., Beijing 100080, China
Summary
Edited Journals
IECE Contributions

Research Article | 25 October 2024
Spatio-temporal Feature Soft Correlation Concatenation Aggregation Structure for Video Action Recognition Networks
IECE Transactions on Sensing, Communication, and Control | Volume 1, Issue 1: 60-71, 2024 | DOI:10.62762/TSCC.2024.212751
Abstract
The efficient extraction and fusion of video features to accurately identify complex and similar actions has consistently remained a significant research endeavor in the field of video action recognition. While adept at feature extraction, prevailing methodologies for video action recognition frequently exhibit suboptimal performance in the context of complex scenes and similar actions. This shortcoming arises primarily from their reliance on uni-dimensional feature extraction, thereby overlooking the interrelations among features and the significance of multi-dimensional fusion. To address this issue, this paper introduces an innovative framework predicated upon a soft correlation strategy... More >

Graphical Abstract
Spatio-temporal Feature Soft Correlation Concatenation Aggregation Structure for Video Action Recognition Networks

Code (Data) Available | Free Access | Research Article | Feature Paper | 09 August 2024
LI3D-BiLSTM: A Lightweight Inception-3D Networks with BiLSTM for Video Action Recognition
IECE Transactions on Emerging Topics in Artificial Intelligence | Volume 1, Issue 1: 58-70, 2024 | DOI:10.62762/TETAI.2024.628205
Abstract
This paper proposes an improved video action recognition method, primarily consisting of three key components. Firstly, in the data preprocessing stage, we developed multi-temporal scale video frame extraction and multi-spatial scale video cropping techniques to enhance content information and standardize input formats. Secondly, we propose a lightweight Inception-3D networks (LI3D) network structure for spatio-temporal feature extraction and design a soft-association feature aggregation module to improve the recognition accuracy of key actions in videos. Lastly, we employ a bidirectional LSTM network to contextualize the feature sequences extracted by LI3D, enhancing the representation capa... More >

Graphical Abstract
LI3D-BiLSTM: A Lightweight Inception-3D Networks with BiLSTM for Video Action Recognition