The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

Cao, Yuan; Zou, Difan; Li, Yuanzhi; Gu, Quanquan

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers
- Statistics & Actuarial Science: Conference papers

Conference Paper: The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

Title	The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks
Authors	Cao, Yuan Zou, Difan Li, Yuanzhi Gu, Quanquan
Issue Date	15-Jul-2023
Abstract	We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an exp(−Ω(log2t)) convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We further extend our result to a class of two-layer, single-filter linear convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.
Persistent Identifier	http://hdl.handle.net/10722/337341

DC Field	Value	Language
dc.contributor.author	Cao, Yuan	-
dc.contributor.author	Zou, Difan	-
dc.contributor.author	Li, Yuanzhi	-
dc.contributor.author	Gu, Quanquan	-
dc.date.accessioned	2024-03-11T10:20:07Z	-
dc.date.available	2024-03-11T10:20:07Z	-
dc.date.issued	2023-07-15	-
dc.identifier.uri	http://hdl.handle.net/10722/337341	-
dc.description.abstract	<p>We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an exp(−Ω(log2t)) convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We further extend our result to a class of two-layer, single-filter linear convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.<br></p>	-
dc.language	eng	-
dc.relation.ispartof	36th Annual Conference on Learning Theory (COLT 2023) (12/07/2023-15/07/2023, Bangalore, India)	-
dc.title	The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks	-
dc.type	Conference_Paper	-

File Download

Supplementary

Conference Paper: The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats