Imagine a situation wherein a user accidentally falls down in a room next to where the device is located. Second, if users are located in a different room or space that is physically separated by a wall, the device fails to provide any service. First, users do not feel comfortable owing to the feeling of being observed, which is a potential privacy issue. However, this method has a fundamental limitation in that it cannot provide a complete solution to the user location detection problem. In particular, owing to the recent developments in convolutional neural networks (CNNs), object detection accuracy has improved significantly, and user location detection is no longer a challenge. The camera-based method appears to be a more intuitive and convincing user location detection method. This can be achieved by detecting the user’s location using a camera or multi-channel microphone arrays, which mimic human recognition systems. The abovementioned mobility limitation can be resolved by tracking the user’s location in relation to the device. The proposed model can be applied to a camera-based humanoid robot that mimics the manner in which humans react to trigger voices in crowded environments. It achieved a processing time of 7.811 ms per 40 ms samples on the Raspberry Pi 4B. The model adapted in this study achieved an accuracy of 91.41% on fine location estimation and a direction of arrival error of 7.43° on noisy data. The proposed SSL model delivers multi-channel acoustic data to parallel convolutional neural network layers in the form of multiple streams to capture the unique delay patterns for the low-, mid-, and high-frequency ranges, and estimates the fine and coarse location of voices. In this paper, we propose a deep neural network-based real-time sound source localization (SSL) model for low-power internet of things (IoT) devices based on microphone arrays and present a prototype implemented on actual IoT devices. However, most devices equipped with cameras and displays lack mobility therefore, users cannot avoid touching them for face-to-face interactions, which contradicts the voice-activated AI philosophy. Voice-activated artificial intelligence (AI) technology has advanced rapidly and is being adopted in various devices such as smart speakers and display products, which enable users to multitask without touching the devices.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |