Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications

Wang, Yating; Deng, Wei; Lin, Guang

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.jcp.2021.110134
Scopus: eid_2-s2.0-85100238276
WOS: WOS:000626663100002
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Mathematics: Journal/Magazine Articles

Article: Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications

Title	Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications
Authors	Wang, Yating Deng, Wei Lin, Guang
Keywords	Bayesian sparse learning Adaptive hierarchical posterior Stochastic approximation Deep neural network Preconditioned stochastic gradient MCMC Deep learning
Issue Date	2021
Citation	Journal of Computational Physics, 2021, v. 432, article no. 110134 How to Cite? DOI: http://dx.doi.org/10.1016/j.jcp.2021.110134
Abstract	Deep neural networks have been successfully employed in an extensive variety of research areas, including solving partial differential equations. Despite its significant success, there are some challenges in effectively training DNN, such as avoiding overfitting in over-parameterized DNNs and accelerating the optimization in DNNs with pathological curvature. In this work, we propose a Bayesian type sparse deep learning algorithm. The algorithm utilizes a set of spike-and-slab priors for the parameters in the deep neural network. The hierarchical Bayesian mixture will be trained using an adaptive empirical method. That is, one will alternatively sample from the posterior using preconditioned stochastic gradient Langevin Dynamics (PSGLD), and optimize the latent variables via stochastic approximation. The sparsity of the network is achieved while optimizing the hyperparameters with adaptive searching and penalizing. A popular SG-MCMC approach is Stochastic gradient Langevin dynamics (SGLD). However, considering the complex geometry in the model parameter space in nonconvex learning, updating parameters using a universal step size in each component as in SGLD may cause slow mixing. To address this issue, we apply a computationally manageable preconditioner in the updating rule, which provides a step-size parameter to adapt to local geometric properties. Moreover, by smoothly optimizing the hyperparameter in the preconditioning matrix, our proposed algorithm ensures a decreasing bias, which is introduced by ignoring the correction term in the preconditioned SGLD. According to the existing theoretical framework, we show that the proposed algorithm can asymptotically converge to the correct distribution with a controllable bias under mild conditions. Numerical tests are performed on both synthetic regression problems and learning solutions of elliptic PDE, which demonstrate the accuracy and efficiency of the present work.
Persistent Identifier	http://hdl.handle.net/10722/303729
ISSN	0021-9991 2023 Impact Factor: 3.8 2023 SCImago Journal Rankings: 1.679
ISI Accession Number ID	WOS:000626663100002

DC Field	Value	Language
dc.contributor.author	Wang, Yating	-
dc.contributor.author	Deng, Wei	-
dc.contributor.author	Lin, Guang	-
dc.date.accessioned	2021-09-15T08:25:54Z	-
dc.date.available	2021-09-15T08:25:54Z	-
dc.date.issued	2021	-
dc.identifier.citation	Journal of Computational Physics, 2021, v. 432, article no. 110134	-
dc.identifier.issn	0021-9991	-
dc.identifier.uri	http://hdl.handle.net/10722/303729	-
dc.description.abstract	Deep neural networks have been successfully employed in an extensive variety of research areas, including solving partial differential equations. Despite its significant success, there are some challenges in effectively training DNN, such as avoiding overfitting in over-parameterized DNNs and accelerating the optimization in DNNs with pathological curvature. In this work, we propose a Bayesian type sparse deep learning algorithm. The algorithm utilizes a set of spike-and-slab priors for the parameters in the deep neural network. The hierarchical Bayesian mixture will be trained using an adaptive empirical method. That is, one will alternatively sample from the posterior using preconditioned stochastic gradient Langevin Dynamics (PSGLD), and optimize the latent variables via stochastic approximation. The sparsity of the network is achieved while optimizing the hyperparameters with adaptive searching and penalizing. A popular SG-MCMC approach is Stochastic gradient Langevin dynamics (SGLD). However, considering the complex geometry in the model parameter space in nonconvex learning, updating parameters using a universal step size in each component as in SGLD may cause slow mixing. To address this issue, we apply a computationally manageable preconditioner in the updating rule, which provides a step-size parameter to adapt to local geometric properties. Moreover, by smoothly optimizing the hyperparameter in the preconditioning matrix, our proposed algorithm ensures a decreasing bias, which is introduced by ignoring the correction term in the preconditioned SGLD. According to the existing theoretical framework, we show that the proposed algorithm can asymptotically converge to the correct distribution with a controllable bias under mild conditions. Numerical tests are performed on both synthetic regression problems and learning solutions of elliptic PDE, which demonstrate the accuracy and efficiency of the present work.	-
dc.language	eng	-
dc.relation.ispartof	Journal of Computational Physics	-
dc.subject	Bayesian sparse learning	-
dc.subject	Adaptive hierarchical posterior	-
dc.subject	Stochastic approximation	-
dc.subject	Deep neural network	-
dc.subject	Preconditioned stochastic gradient MCMC	-
dc.subject	Deep learning	-
dc.title	Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1016/j.jcp.2021.110134	-
dc.identifier.scopus	eid_2-s2.0-85100238276	-
dc.identifier.volume	432	-
dc.identifier.spage	article no. 110134	-
dc.identifier.epage	article no. 110134	-
dc.identifier.eissn	1090-2716	-
dc.identifier.isi	WOS:000626663100002	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Bayesian sparse learning with preconditioned stochastic gradient MCMC and its applications

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats