File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/3580818
- Scopus: eid_2-s2.0-85150703793
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: AMIR: Active Multimodal Interaction Recognition from Video and Network Traffic in Connected Environments
| Title | AMIR: Active Multimodal Interaction Recognition from Video and Network Traffic in Connected Environments |
|---|---|
| Authors | |
| Keywords | activity recognition datasets multimodal learning |
| Issue Date | 2023 |
| Citation | Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies, 2023, v. 7, n. 1, article no. 21 How to Cite? |
| Abstract | Activity recognition using video data is widely adopted for elder care, monitoring for safety and security, and home automation. Unfortunately, using video data as the basis for activity recognition can be brittle, since models trained on video are often not robust to certain environmental changes, such as camera angle and lighting changes. There has been a proliferation of network-connected devices in home environments. Interactions with these smart devices are associated with network activity, making network data a potential source for recognizing these device interactions. This paper advocates for the synthesis of video and network data for robust interaction recognition in connected environments. We consider machine learning-based approaches for activity recognition, where each labeled activity is associated with both a video capture and an accompanying network traffic trace. We develop a simple but effective framework AMIR (Active Multimodal Interaction Recognition)1 that trains independent models for video and network activity recognition respectively, and subsequently combines the predictions from these models using a meta-learning framework. Whether in lab or at home, this approach reduces the amount of "paired"demonstrations needed to perform accurate activity recognition, where both network and video data are collected simultaneously. Specifically, the method we have developed requires up to 70.83% fewer samples to achieve 85% F1 score than random data collection, and improves accuracy by 17.76% given the same number of samples. |
| Persistent Identifier | http://hdl.handle.net/10722/363521 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Liu, Shinan | - |
| dc.contributor.author | Mangla, Tarun | - |
| dc.contributor.author | Shaowang, Ted | - |
| dc.contributor.author | Zhao, Jinjin | - |
| dc.contributor.author | Paparrizos, John | - |
| dc.contributor.author | Krishnan, Sanjay | - |
| dc.contributor.author | Feamster, Nick | - |
| dc.date.accessioned | 2025-10-10T07:47:32Z | - |
| dc.date.available | 2025-10-10T07:47:32Z | - |
| dc.date.issued | 2023 | - |
| dc.identifier.citation | Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies, 2023, v. 7, n. 1, article no. 21 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/363521 | - |
| dc.description.abstract | Activity recognition using video data is widely adopted for elder care, monitoring for safety and security, and home automation. Unfortunately, using video data as the basis for activity recognition can be brittle, since models trained on video are often not robust to certain environmental changes, such as camera angle and lighting changes. There has been a proliferation of network-connected devices in home environments. Interactions with these smart devices are associated with network activity, making network data a potential source for recognizing these device interactions. This paper advocates for the synthesis of video and network data for robust interaction recognition in connected environments. We consider machine learning-based approaches for activity recognition, where each labeled activity is associated with both a video capture and an accompanying network traffic trace. We develop a simple but effective framework AMIR (Active Multimodal Interaction Recognition)1 that trains independent models for video and network activity recognition respectively, and subsequently combines the predictions from these models using a meta-learning framework. Whether in lab or at home, this approach reduces the amount of "paired"demonstrations needed to perform accurate activity recognition, where both network and video data are collected simultaneously. Specifically, the method we have developed requires up to 70.83% fewer samples to achieve 85% F1 score than random data collection, and improves accuracy by 17.76% given the same number of samples. | - |
| dc.language | eng | - |
| dc.relation.ispartof | Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies | - |
| dc.subject | activity recognition | - |
| dc.subject | datasets | - |
| dc.subject | multimodal learning | - |
| dc.title | AMIR: Active Multimodal Interaction Recognition from Video and Network Traffic in Connected Environments | - |
| dc.type | Article | - |
| dc.description.nature | link_to_subscribed_fulltext | - |
| dc.identifier.doi | 10.1145/3580818 | - |
| dc.identifier.scopus | eid_2-s2.0-85150703793 | - |
| dc.identifier.volume | 7 | - |
| dc.identifier.issue | 1 | - |
| dc.identifier.spage | article no. 21 | - |
| dc.identifier.epage | article no. 21 | - |
| dc.identifier.eissn | 2474-9567 | - |
