Data Pruning via Moving-one-Sample-out

Tan, Haoru; Wu, Sitong; Du, Fei; Chen, Yukang; Wang, Zhibin; Wang, Fan; Qi, Xiaojuan

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Electrical & Electronic Engineering: Conference papers

Conference Paper: Data Pruning via Moving-one-Sample-out

Title	Data Pruning via Moving-one-Sample-out
Authors	Tan, Haoru Wu, Sitong Du, Fei Chen, Yukang Wang, Zhibin Wang, Fan Qi, Xiaojuan
Issue Date	10-Dec-2023
Abstract	In this paper, we propose a novel data-pruning approach called moving-one-sample-out (MoSo), which aims to identify and remove the least informative samples from the training set. The core insight behind MoSo is to determine the importance of each sample by assessing its impact on the optimal empirical risk. This is achieved by measuring the extent to which the empirical risk changes when a particular sample is excluded from the training set. Instead of using the computationally expensive leaving-one-out-retraining procedure, we propose an efficient first-order approximator that only requires gradient information from different training stages. The key idea behind our approximation is that samples with gradients that are consistently aligned with the average gradient of the training set are more informative and should receive higher scores, which could be intuitively understood as follows: if the gradient from a specific sample is consistent with the average gradient vector, it implies that optimizing the network using the sample will yield a similar effect on all remaining samples. Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios and achieves satisfactory performance across various settings.
Persistent Identifier	http://hdl.handle.net/10722/339458

DC Field	Value	Language
dc.contributor.author	Tan, Haoru	-
dc.contributor.author	Wu, Sitong	-
dc.contributor.author	Du, Fei	-
dc.contributor.author	Chen, Yukang	-
dc.contributor.author	Wang, Zhibin	-
dc.contributor.author	Wang, Fan	-
dc.contributor.author	Qi, Xiaojuan	-
dc.date.accessioned	2024-03-11T10:36:47Z	-
dc.date.available	2024-03-11T10:36:47Z	-
dc.date.issued	2023-12-10	-
dc.identifier.uri	http://hdl.handle.net/10722/339458	-
dc.description.abstract	<p>In this paper, we propose a novel data-pruning approach called moving-one-sample-out (MoSo), which aims to identify and remove the least informative samples from the training set. The core insight behind MoSo is to determine the importance of each sample by assessing its impact on the optimal empirical risk. This is achieved by measuring the extent to which the empirical risk changes when a particular sample is excluded from the training set. Instead of using the computationally expensive leaving-one-out-retraining procedure, we propose an efficient first-order approximator that only requires gradient information from different training stages. The key idea behind our approximation is that samples with gradients that are consistently aligned with the average gradient of the training set are more informative and should receive higher scores, which could be intuitively understood as follows: if the gradient from a specific sample is consistent with the average gradient vector, it implies that optimizing the network using the sample will yield a similar effect on all remaining samples. Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios and achieves satisfactory performance across various settings.</p>	-
dc.language	eng	-
dc.relation.ispartof	Neural Information Processing Systems 2023 (10/12/2023-16/12/2023, , , New Orleans)	-
dc.title	Data Pruning via Moving-one-Sample-out	-
dc.type	Conference_Paper	-

File Download

Supplementary

Conference Paper: Data Pruning via Moving-one-Sample-out

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats