An adaptive Hessian approximated stochastic gradient MCMC method

Wang, Yating; Deng, Wei; Lin, Guang

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.jcp.2021.110150
Scopus: eid_2-s2.0-85100415471
WOS: WOS:000636580800004
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Mathematics: Journal/Magazine Articles

Article: An adaptive Hessian approximated stochastic gradient MCMC method

Title	An adaptive Hessian approximated stochastic gradient MCMC method
Authors	Wang, Yating Deng, Wei Lin, Guang
Keywords	Stochastic approximation Limited memory BFGS Deep learning Highly correlated density Hessian approximated stochastic gradient MCMC Adaptive Bayesian method
Issue Date	2021
Citation	Journal of Computational Physics, 2021, v. 432, article no. 110150 How to Cite? DOI: http://dx.doi.org/10.1016/j.jcp.2021.110150
Abstract	Bayesian approaches have been successfully integrated into training deep neural networks. One popular family is stochastic gradient Markov chain Monte Carlo methods (SG-MCMC), which have gained increasing interest due to their ability to handle large datasets and the potential to avoid overfitting. Although standard SG-MCMC methods have shown great performance in a variety of problems, they may be inefficient when the random variables in the target posterior densities have scale differences or are highly correlated. In this work, we present an adaptive Hessian approximated stochastic gradient MCMC method to incorporate local geometric information while sampling from the posterior. The idea is to apply stochastic approximation (SA) to sequentially update a preconditioning matrix at each iteration. The preconditioner possesses second-order information and can guide the random walk of a sampler efficiently. Instead of computing and saving the full Hessian of the log posterior, we use limited memory of the samples and their stochastic gradients to approximate the inverse Hessian-vector multiplication in the updating formula. Moreover, by smoothly optimizing the preconditioning matrix via SA, our proposed algorithm can asymptotically converge to the target distribution with a controllable bias under mild conditions. To reduce the training and testing computational burden, we adopt a magnitude-based weight pruning method to enforce the sparsity of the network. Our method is user-friendly and demonstrates better learning results compared to standard SG-MCMC updating rules. The approximation of inverse Hessian alleviates storage and computational complexities for large dimensional models. Numerical experiments are performed on several problems, including sampling from 2D correlated distribution, synthetic regression problems, and learning the numerical solutions of heterogeneous elliptic PDE. The numerical results demonstrate great improvement in both the convergence rate and accuracy.
Persistent Identifier	http://hdl.handle.net/10722/303731
ISSN	0021-9991 2023 Impact Factor: 3.8 2023 SCImago Journal Rankings: 1.679
ISI Accession Number ID	WOS:000636580800004

DC Field	Value	Language
dc.contributor.author	Wang, Yating	-
dc.contributor.author	Deng, Wei	-
dc.contributor.author	Lin, Guang	-
dc.date.accessioned	2021-09-15T08:25:54Z	-
dc.date.available	2021-09-15T08:25:54Z	-
dc.date.issued	2021	-
dc.identifier.citation	Journal of Computational Physics, 2021, v. 432, article no. 110150	-
dc.identifier.issn	0021-9991	-
dc.identifier.uri	http://hdl.handle.net/10722/303731	-
dc.description.abstract	Bayesian approaches have been successfully integrated into training deep neural networks. One popular family is stochastic gradient Markov chain Monte Carlo methods (SG-MCMC), which have gained increasing interest due to their ability to handle large datasets and the potential to avoid overfitting. Although standard SG-MCMC methods have shown great performance in a variety of problems, they may be inefficient when the random variables in the target posterior densities have scale differences or are highly correlated. In this work, we present an adaptive Hessian approximated stochastic gradient MCMC method to incorporate local geometric information while sampling from the posterior. The idea is to apply stochastic approximation (SA) to sequentially update a preconditioning matrix at each iteration. The preconditioner possesses second-order information and can guide the random walk of a sampler efficiently. Instead of computing and saving the full Hessian of the log posterior, we use limited memory of the samples and their stochastic gradients to approximate the inverse Hessian-vector multiplication in the updating formula. Moreover, by smoothly optimizing the preconditioning matrix via SA, our proposed algorithm can asymptotically converge to the target distribution with a controllable bias under mild conditions. To reduce the training and testing computational burden, we adopt a magnitude-based weight pruning method to enforce the sparsity of the network. Our method is user-friendly and demonstrates better learning results compared to standard SG-MCMC updating rules. The approximation of inverse Hessian alleviates storage and computational complexities for large dimensional models. Numerical experiments are performed on several problems, including sampling from 2D correlated distribution, synthetic regression problems, and learning the numerical solutions of heterogeneous elliptic PDE. The numerical results demonstrate great improvement in both the convergence rate and accuracy.	-
dc.language	eng	-
dc.relation.ispartof	Journal of Computational Physics	-
dc.subject	Stochastic approximation	-
dc.subject	Limited memory BFGS	-
dc.subject	Deep learning	-
dc.subject	Highly correlated density	-
dc.subject	Hessian approximated stochastic gradient MCMC	-
dc.subject	Adaptive Bayesian method	-
dc.title	An adaptive Hessian approximated stochastic gradient MCMC method	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1016/j.jcp.2021.110150	-
dc.identifier.scopus	eid_2-s2.0-85100415471	-
dc.identifier.volume	432	-
dc.identifier.spage	article no. 110150	-
dc.identifier.epage	article no. 110150	-
dc.identifier.eissn	1090-2716	-
dc.identifier.isi	WOS:000636580800004	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: An adaptive Hessian approximated stochastic gradient MCMC method

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats