Improved distributed principal component analysis

Balcan, Maria Florina; Kanchanapally, Vandana; Liang, Yingyu; Woodruff, David

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-84937837293
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: Improved distributed principal component analysis

Title	Improved distributed principal component analysis
Authors	Balcan, Maria Florina Kanchanapally, Vandana Liang, Yingyu Woodruff, David
Issue Date	2014
Citation	Advances in Neural Information Processing Systems, 2014, v. 4, n. January, p. 3113-3121 How to Cite?
Abstract	We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure for approximate PCA, one can use it to approximately solve problems such as k-means clustering and low rank approximation. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiency for a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for k-means clustering and related problems. Our empirical study on real world data shows a speedup of orders of magnitude, preserving communication with only a negligible degradation in solution quality. Some of these techniques we develop, such as a general transformation from a constant success probability subspace embedding to a high success probability subspace embedding with a dimension and sparsity independent of the success probability, may be of independent interest.
Persistent Identifier	http://hdl.handle.net/10722/341167
ISSN	1049-5258 2020 SCImago Journal Rankings: 1.399

DC Field	Value	Language
dc.contributor.author	Balcan, Maria Florina	-
dc.contributor.author	Kanchanapally, Vandana	-
dc.contributor.author	Liang, Yingyu	-
dc.contributor.author	Woodruff, David	-
dc.date.accessioned	2024-03-13T08:40:41Z	-
dc.date.available	2024-03-13T08:40:41Z	-
dc.date.issued	2014	-
dc.identifier.citation	Advances in Neural Information Processing Systems, 2014, v. 4, n. January, p. 3113-3121	-
dc.identifier.issn	1049-5258	-
dc.identifier.uri	http://hdl.handle.net/10722/341167	-
dc.description.abstract	We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure for approximate PCA, one can use it to approximately solve problems such as k-means clustering and low rank approximation. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiency for a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for k-means clustering and related problems. Our empirical study on real world data shows a speedup of orders of magnitude, preserving communication with only a negligible degradation in solution quality. Some of these techniques we develop, such as a general transformation from a constant success probability subspace embedding to a high success probability subspace embedding with a dimension and sparsity independent of the success probability, may be of independent interest.	-
dc.language	eng	-
dc.relation.ispartof	Advances in Neural Information Processing Systems	-
dc.title	Improved distributed principal component analysis	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.scopus	eid_2-s2.0-84937837293	-
dc.identifier.volume	4	-
dc.identifier.issue	January	-
dc.identifier.spage	3113	-
dc.identifier.epage	3121	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Improved distributed principal component analysis

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats