Advancing Fundus-Based Retinal Representations Through Multi-Modal Contrastive Pre-training for Detection of Glaucoma-Related Diseases

Guo, Yawen; Ng, Michelle; Yan, Xu; Hung, Calvin; Lam, Alexander; Leung, Christopher Kai-Shun

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Ophthalmology: Conference papers

Conference Paper: Advancing Fundus-Based Retinal Representations Through Multi-Modal Contrastive Pre-training for Detection of Glaucoma-Related Diseases

Title	Advancing Fundus-Based Retinal Representations Through Multi-Modal Contrastive Pre-training for Detection of Glaucoma-Related Diseases
Authors	Guo, Yawen Ng, Michelle Yan, Xu Hung, Calvin Lam, Alexander Leung, Christopher Kai-Shun
Issue Date	5-Jun-2024
Abstract	Purpose : To develop and evaluate a multi-modal constative-based pretraining strategy for glaucoma detection using fundus photographs and different image-based modalities. Methods : Two ResNet50 networks were used to extract representations from fundus photographs and other modalities, including red-free photographs, RNFL thickness maps, OCT 3D volume dataset, and OCT en face images (Fig.1A). The networks were pretrained on 1703 pairs of images from the same eye using InfoNCE loss, and then fine-tuned on downstream classification tasks using labeled fundus images (Fig.1B). The performance of the strategy was assessed on two datasets: an internal dataset of 528 eyes with RNFL defects or 975 without RNFL defects, and an external dataset of 3000 eyes with glaucoma and 3000 eyes without glaucoma (80% for training and 20% for testing). Results : The performance of different pretraining strategies for fundus image encoder on glaucoma detection tasks was compared. The fundus encoder pretrained with multi-modal contrastive learning of fundus images and RNFL thickness maps achieved the highest AUC of 0.942 on the internal dataset and 0.881 on the external dataset, surpassing supervised pretraining on ImageNet (0.852 and 0.828 AUC). Self-supervised learning with only fundus images yielded 0.766 and 0.699 AUC. Other multi-modal combinations, such as fundus images and OCT 3D volumes, and fundus image and OCT enface images, obtained 0.870-0.834 AUC and 0.818-0.816 AUC respectively (Table 1). The labeling efficiency of the pretraining strategies was determined by using different proportions of training data in our datasets (Fig. 2). The fundus encoder pretrained with fundus images and OCT RNFL thickness maps achieved AUCs of 0.922 with only 10% of the training data, showing significantly better results than other methods. Conclusions : The multi-modal contrastive learning approach, which leverages fundus images and RNFL thickness map correlations to pretrain the fundus encoder, demonstrated the best representation learning and labeling efficiency for glaucoma detection, reflecting the benefits of using related multi-modal data to learn more informative retinal representations.
Persistent Identifier	http://hdl.handle.net/10722/347286

DC Field	Value	Language
dc.contributor.author	Guo, Yawen	-
dc.contributor.author	Ng, Michelle	-
dc.contributor.author	Yan, Xu	-
dc.contributor.author	Hung, Calvin	-
dc.contributor.author	Lam, Alexander	-
dc.contributor.author	Leung, Christopher Kai-Shun	-
dc.date.accessioned	2024-09-20T00:31:13Z	-
dc.date.available	2024-09-20T00:31:13Z	-
dc.date.issued	2024-06-05	-
dc.identifier.uri	http://hdl.handle.net/10722/347286	-
dc.description.abstract	<p><strong>Purpose </strong>: To develop and evaluate a multi-modal constative-based pretraining strategy for glaucoma detection using fundus photographs and different image-based modalities.</p><p><strong>Methods </strong>: Two ResNet50 networks were used to extract representations from fundus photographs and other modalities, including red-free photographs, RNFL thickness maps, OCT 3D volume dataset, and OCT en face images (Fig.1A). The networks were pretrained on 1703 pairs of images from the same eye using InfoNCE loss, and then fine-tuned on downstream classification tasks using labeled fundus images (Fig.1B). The performance of the strategy was assessed on two datasets: an internal dataset of 528 eyes with RNFL defects or 975 without RNFL defects, and an external dataset of 3000 eyes with glaucoma and 3000 eyes without glaucoma (80% for training and 20% for testing).</p><p><strong>Results </strong>: The performance of different pretraining strategies for fundus image encoder on glaucoma detection tasks was compared. The fundus encoder pretrained with multi-modal contrastive learning of fundus images and RNFL thickness maps achieved the highest AUC of 0.942 on the internal dataset and 0.881 on the external dataset, surpassing supervised pretraining on ImageNet (0.852 and 0.828 AUC). Self-supervised learning with only fundus images yielded 0.766 and 0.699 AUC. Other multi-modal combinations, such as fundus images and OCT 3D volumes, and fundus image and OCT enface images, obtained 0.870-0.834 AUC and 0.818-0.816 AUC respectively (Table 1). The labeling efficiency of the pretraining strategies was determined by using different proportions of training data in our datasets (Fig. 2). The fundus encoder pretrained with fundus images and OCT RNFL thickness maps achieved AUCs of 0.922 with only 10% of the training data, showing significantly better results than other methods.</p><p><strong>Conclusions </strong>: The multi-modal contrastive learning approach, which leverages fundus images and RNFL thickness map correlations to pretrain the fundus encoder, demonstrated the best representation learning and labeling efficiency for glaucoma detection, reflecting the benefits of using related multi-modal data to learn more informative retinal representations.</p>	-
dc.language	eng	-
dc.relation.ispartof	ARVO 2024 Annual Meeting (05/05/2024-09/05/2024, Seattle, Washington)	-
dc.title	Advancing Fundus-Based Retinal Representations Through Multi-Modal Contrastive Pre-training for Detection of Glaucoma-Related Diseases	-
dc.type	Conference_Paper	-

File Download

Supplementary

Conference Paper: Advancing Fundus-Based Retinal Representations Through Multi-Modal Contrastive Pre-training for Detection of Glaucoma-Related Diseases

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats