2025: Special Issue: Advances in Multimodal Recognition Systems: Integrating Text, Voice, and Images | Journal of Applied Science and Technology Trends

Published July 28, 2025

Special Issue Editors

Dr. Basanta Joshi
Department of Electronics and Computer Engineering,
Institute of Engineering, Tribhuvan University,
Pulchowk Campus, Kathmandu, Nepal.
Email: BassantaJoshi@hotmail.com , basanta@ioe.edu.np
Google Scholar

Dr. Sri Redjeki
Department of Informatics Engineering,
Universitas Teknologi Digital,
Jawa Barat, Indonesia.
Email: dzeky@utdi.ac.id
Google Scholar

Prof. Cheruiyot, Wilson Kipruto
School of Science and Informatics,
Taita Taveta University,
Voi, Kenya.
Email: wilchery68@gmail.com
Google Scholar

Special Issue Information

The most difficult uses of advanced learning, such as multimodal processing and language acquisition, are then carefully examined and analysed. Although at a slower rate than speech and image recognition, these application areas are being revolutionised by new concepts from deep learning, particularly continuous-space incorporating. At various levels of abstraction, multimodal systems interpret and work with data from various interpersonal communication channels. Meaning can potentially be automatically extracted by multimodal systems. On the other hand, they generate perceivable information from symmetric abstract models derived from multimodal raw input data. A multimodal system might be a multimodal voice system or a multimodal interface.

The various stages of fusion and probable situations in a multimodal sensor system are addressed in this specific issue. It covers the many modes of operation, the amalgamation techniques used to compile the evidence, and problems pertaining to the development and implementation of these systems. Biometrics, characterised as the science of identifying a person based on physiological or behavioural characteristics, is starting to be acknowledged as a valid technique for establishing a person's identification. There are three modes of operation for a multimodal system: serial, parallel, or hierarchical. Realistic and adaptable human-machine interactions are only one of the many significant applications that depend on voice recognition and machine-based speaker identification. The majority of advancements in speech-based automated recognition have only taken into account acoustic speech as an input signal, ignoring visual speech. However, especially in challenging environments, acoustical voice recognition alone could suffer from flaws that make it unsuitable for usage in a wide range of real-world scenarios. Higher identification precision and resilience than can be achieved with only one method are promised by the amalgamation of auditory and visual senses. Thus, multimodal recognition is seen as an essential part of the linguistic systems of the future.

In order to improve the user experience during the construction of the multimodal corpuses for audio-visual voice recognition in both driver surveillance systems, a new technique is presented in this special issue. an examination of voice-driven interfaces and speech recognition systems for motorist monitoring systems using a study of both audio and visual data. Multimodal voice recognition enables the use of video data in acoustically loud environments and the use of audio data when video data is worthless. a new structure for creating audio-visual corpuses, and tackle outlines the essential procedures and prerequisites for multimodal reservoir design.

Topics of interest for the special issue include, but are not limited to, the following:

Submerged development: from multimodal processing and recognising languages to voice recognition
A multimodal information fusion implementation for the identification of emotions
Multiple modes an extensive assessment employing physiological, aural, visual, and textual cues
An overview of multimodal learning in computer vision: developments, patterns,
A multimodal speech-based face emotion recognition system using infrared images
Incorporating visual, auditory, and textual emotions in multimodal recognition
Multimodal Interfaces: An Overview of Ideas, Architectures, and Approaches
Visual and voice signals as fusion methods for a multimodal biometric device
Development of multimodal database for audio-visual speech detection in automobile interiors
Hierarchical neural networks combined with multimodal fusion for audio-visual emotion identification.
Engaging with the multimodal content: Thoughts on image and text.

Deadline for manuscript submissions: 31 December 2025.

To submit your manuscript, click here