File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Improved distributed principal component analysis

TitleImproved distributed principal component analysis
Authors
Issue Date2014
Citation
Advances in Neural Information Processing Systems, 2014, v. 4, n. January, p. 3113-3121 How to Cite?
AbstractWe study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure for approximate PCA, one can use it to approximately solve problems such as k-means clustering and low rank approximation. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiency for a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for k-means clustering and related problems. Our empirical study on real world data shows a speedup of orders of magnitude, preserving communication with only a negligible degradation in solution quality. Some of these techniques we develop, such as a general transformation from a constant success probability subspace embedding to a high success probability subspace embedding with a dimension and sparsity independent of the success probability, may be of independent interest.
Persistent Identifierhttp://hdl.handle.net/10722/341167
ISSN
2020 SCImago Journal Rankings: 1.399

 

DC FieldValueLanguage
dc.contributor.authorBalcan, Maria Florina-
dc.contributor.authorKanchanapally, Vandana-
dc.contributor.authorLiang, Yingyu-
dc.contributor.authorWoodruff, David-
dc.date.accessioned2024-03-13T08:40:41Z-
dc.date.available2024-03-13T08:40:41Z-
dc.date.issued2014-
dc.identifier.citationAdvances in Neural Information Processing Systems, 2014, v. 4, n. January, p. 3113-3121-
dc.identifier.issn1049-5258-
dc.identifier.urihttp://hdl.handle.net/10722/341167-
dc.description.abstractWe study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure for approximate PCA, one can use it to approximately solve problems such as k-means clustering and low rank approximation. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiency for a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for k-means clustering and related problems. Our empirical study on real world data shows a speedup of orders of magnitude, preserving communication with only a negligible degradation in solution quality. Some of these techniques we develop, such as a general transformation from a constant success probability subspace embedding to a high success probability subspace embedding with a dimension and sparsity independent of the success probability, may be of independent interest.-
dc.languageeng-
dc.relation.ispartofAdvances in Neural Information Processing Systems-
dc.titleImproved distributed principal component analysis-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.scopuseid_2-s2.0-84937837293-
dc.identifier.volume4-
dc.identifier.issueJanuary-
dc.identifier.spage3113-
dc.identifier.epage3121-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats