Visual event recognition in videos by learning from web data

Duan, Lixin; Xu, Dong; Tsang, Ivor Wai Hung; Luo, Jiebo

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TPAMI.2011.265
Scopus: eid_2-s2.0-84865579385
PMID: 22201057
WOS: WOS:000306409100002
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Visual event recognition in videos by learning from web data

Title	Visual event recognition in videos by learning from web data
Authors	Duan, Lixin Xu, Dong Tsang, Ivor Wai Hung Luo, Jiebo
Keywords	adaptive MKL aligned space-time pyramid matching cross-domain learning domain adaptation Event recognition transfer learning
Issue Date	2012
Citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, v. 34, n. 9, p. 1667-1680 How to Cite? DOI: http://dx.doi.org/10.1109/TPAMI.2011.265
Abstract	We propose a visual event recognition framework for consumer videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). Observing that consumer videos generally contain large intraclass variations within the same type of events, we first propose a new method, called Aligned Space-Time Pyramid Matching (ASTPM), to measure the distance between any two video clips. Second, we propose a new transfer learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time features and static SIFT features) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web video domain and consumer video domain). For each pyramid level and each type of local features, we first train a set of SVM classifiers based on the combined training set from two domains by using multiple base kernels from different kernel types and parameters, which are then fused with equal weights to obtain a prelearned average classifier. In A-MKL, for each event class we learn an adapted target classifier based on multiple base kernels and the prelearned average classifiers from this event class or all the event classes by minimizing both the structural risk functional and the mismatch between data distributions of two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data. We also conduct an in-depth investigation on various aspects of the proposed method A-MKL, such as the analysis on the combination coefficients on the prelearned classifiers, the convergence of the learning algorithm, and the performance variation by using different proportions of labeled consumer videos. Moreover, we show that A-MKL using the prelearned classifiers from all the event classes leads to better performance when compared with A-MKL using the prelearned classifiers only from each individual event class. © 2012 IEEE.
Persistent Identifier	http://hdl.handle.net/10722/321482
ISSN	0162-8828 2021 Impact Factor: 24.314 2020 SCImago Journal Rankings: 3.811
ISI Accession Number ID	WOS:000306409100002

DC Field	Value	Language
dc.contributor.author	Duan, Lixin	-
dc.contributor.author	Xu, Dong	-
dc.contributor.author	Tsang, Ivor Wai Hung	-
dc.contributor.author	Luo, Jiebo	-
dc.date.accessioned	2022-11-03T02:19:12Z	-
dc.date.available	2022-11-03T02:19:12Z	-
dc.date.issued	2012	-
dc.identifier.citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, v. 34, n. 9, p. 1667-1680	-
dc.identifier.issn	0162-8828	-
dc.identifier.uri	http://hdl.handle.net/10722/321482	-
dc.description.abstract	We propose a visual event recognition framework for consumer videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). Observing that consumer videos generally contain large intraclass variations within the same type of events, we first propose a new method, called Aligned Space-Time Pyramid Matching (ASTPM), to measure the distance between any two video clips. Second, we propose a new transfer learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time features and static SIFT features) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web video domain and consumer video domain). For each pyramid level and each type of local features, we first train a set of SVM classifiers based on the combined training set from two domains by using multiple base kernels from different kernel types and parameters, which are then fused with equal weights to obtain a prelearned average classifier. In A-MKL, for each event class we learn an adapted target classifier based on multiple base kernels and the prelearned average classifiers from this event class or all the event classes by minimizing both the structural risk functional and the mismatch between data distributions of two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data. We also conduct an in-depth investigation on various aspects of the proposed method A-MKL, such as the analysis on the combination coefficients on the prelearned classifiers, the convergence of the learning algorithm, and the performance variation by using different proportions of labeled consumer videos. Moreover, we show that A-MKL using the prelearned classifiers from all the event classes leads to better performance when compared with A-MKL using the prelearned classifiers only from each individual event class. © 2012 IEEE.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Pattern Analysis and Machine Intelligence	-
dc.subject	adaptive MKL	-
dc.subject	aligned space-time pyramid matching	-
dc.subject	cross-domain learning	-
dc.subject	domain adaptation	-
dc.subject	Event recognition	-
dc.subject	transfer learning	-
dc.title	Visual event recognition in videos by learning from web data	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TPAMI.2011.265	-
dc.identifier.pmid	22201057	-
dc.identifier.scopus	eid_2-s2.0-84865579385	-
dc.identifier.volume	34	-
dc.identifier.issue	9	-
dc.identifier.spage	1667	-
dc.identifier.epage	1680	-
dc.identifier.isi	WOS:000306409100002	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Visual event recognition in videos by learning from web data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats