Fafa Wang - IECE

Free Access | Research Article | 25 October 2024 | Cited: 1

Spatio-temporal Feature Soft Correlation Concatenation Aggregation Structure for Video Action Recognition Networks

IECE Transactions on Sensing, Communication, and Control | Volume 1, Issue 1: 60-71, 2024 | DOI: 10.62762/TSCC.2024.212751

Abstract

The efficient extraction and fusion of video features to accurately identify complex and similar actions has consistently remained a significant research endeavor in the field of video action recognition. While adept at feature extraction, prevailing methodologies for video action recognition frequently exhibit suboptimal performance in the context of complex scenes and similar actions. This shortcoming arises primarily from their reliance on uni-dimensional feature extraction, thereby overlooking the interrelations among features and the significance of multi-dimensional fusion. To address this issue, this paper introduces an innovative framework predicated upon a soft correlation strategy... More >

Graphical Abstract

Spatio-temporal Feature Soft Correlation Concatenation Aggregation Structure for Video Action Recognition Networks

Code (Data) Available | Free Access | Research Article | Feature Paper | 09 August 2024

LI3D-BiLSTM: A Lightweight Inception-3D Networks with BiLSTM for Video Action Recognition

Fafa Wang

Xuebo Jin

Shenglun Yi

IECE Transactions on Emerging Topics in Artificial Intelligence | Volume 1, Issue 1: 58-70, 2024 | DOI: 10.62762/TETAI.2024.628205

Abstract

This paper proposes an improved video action recognition method, primarily consisting of three key components. Firstly, in the data preprocessing stage, we developed multi-temporal scale video frame extraction and multi-spatial scale video cropping techniques to enhance content information and standardize input formats. Secondly, we propose a lightweight Inception-3D networks (LI3D) network structure for spatio-temporal feature extraction and design a soft-association feature aggregation module to improve the recognition accuracy of key actions in videos. Lastly, we employ a bidirectional LSTM network to contextualize the feature sequences extracted by LI3D, enhancing the representation capa... More >

Graphical Abstract

LI3D-BiLSTM: A Lightweight Inception-3D Networks with BiLSTM for Video Action Recognition

We use cookies