StyleAdapter: A Unified Stylized Image Generation Model

Wang, Zhouxia; Wang, Xintao; Xie, Liangbin; Qi, Zhongang; Shan, Ying; Wang, Wenping; Luo, Ping

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s11263-024-02253-x
Scopus: eid_2-s2.0-105001549092
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles
- HKU Musketeers Foundation Institute of Data Science: Journal/Magazine Articles

Article: StyleAdapter: A Unified Stylized Image Generation Model

Title	StyleAdapter: A Unified Stylized Image Generation Model
Authors	Wang, Zhouxia Wang, Xintao Xie, Liangbin Qi, Zhongang Shan, Ying Wang, Wenping Luo, Ping
Keywords	Artificial intelligence generated content (AIGC) Computer vision Diffusion model Stylized image generation
Issue Date	1-Apr-2025
Publisher	Springer
Citation	International Journal of Computer Vision, 2025, v. 133, n. 4, p. 1894-1911 How to Cite? DOI: http://dx.doi.org/10.1007/s11263-024-02253-x
Abstract	This work focuses on generating high-quality images with specific style of reference images and content of provided textual descriptions. Current leading algorithms, i.e., DreamBooth and LoRA, require fine-tuning for each style, leading to time-consuming and computationally expensive processes. In this work, we propose StyleAdapter, a unified stylized image generation model capable of producing a variety of stylized images that match both the content of a given prompt and the style of reference images, without the need for per-style fine-tuning. It introduces a two-path cross-attention (TPCA) module to separately process style information and textual prompt, which cooperate with a semantic suppressing vision model (SSVM) to suppress the semantic content of style images. In this way, it can ensure that the prompt maintains control over the content of the generated images, while also mitigating the negative impact of semantic information in style references. This results in the content of the generated image adhering to the prompt, and its style aligning with the style references. Besides, our StyleAdapter can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet, to attain a more controllable and stable generation process. Extensive experiments demonstrate the superiority of our method over previous works.
Persistent Identifier	http://hdl.handle.net/10722/362108
ISSN	0920-5691 2023 Impact Factor: 11.6 2023 SCImago Journal Rankings: 6.668

DC Field	Value	Language
dc.contributor.author	Wang, Zhouxia	-
dc.contributor.author	Wang, Xintao	-
dc.contributor.author	Xie, Liangbin	-
dc.contributor.author	Qi, Zhongang	-
dc.contributor.author	Shan, Ying	-
dc.contributor.author	Wang, Wenping	-
dc.contributor.author	Luo, Ping	-
dc.date.accessioned	2025-09-19T00:32:06Z	-
dc.date.available	2025-09-19T00:32:06Z	-
dc.date.issued	2025-04-01	-
dc.identifier.citation	International Journal of Computer Vision, 2025, v. 133, n. 4, p. 1894-1911	-
dc.identifier.issn	0920-5691	-
dc.identifier.uri	http://hdl.handle.net/10722/362108	-
dc.description.abstract	<p>This work focuses on generating high-quality images with specific style of reference images and content of provided textual descriptions. Current leading algorithms, i.e., DreamBooth and LoRA, require fine-tuning for each style, leading to time-consuming and computationally expensive processes. In this work, we propose StyleAdapter, a unified stylized image generation model capable of producing a variety of stylized images that match both the content of a given prompt and the style of reference images, without the need for per-style fine-tuning. It introduces a two-path cross-attention (TPCA) module to separately process style information and textual prompt, which cooperate with a semantic suppressing vision model (SSVM) to suppress the semantic content of style images. In this way, it can ensure that the prompt maintains control over the content of the generated images, while also mitigating the negative impact of semantic information in style references. This results in the content of the generated image adhering to the prompt, and its style aligning with the style references. Besides, our StyleAdapter can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet, to attain a more controllable and stable generation process. Extensive experiments demonstrate the superiority of our method over previous works.</p>	-
dc.language	eng	-
dc.publisher	Springer	-
dc.relation.ispartof	International Journal of Computer Vision	-
dc.subject	Artificial intelligence generated content (AIGC)	-
dc.subject	Computer vision	-
dc.subject	Diffusion model	-
dc.subject	Stylized image generation	-
dc.title	StyleAdapter: A Unified Stylized Image Generation Model	-
dc.type	Article	-
dc.identifier.doi	10.1007/s11263-024-02253-x	-
dc.identifier.scopus	eid_2-s2.0-105001549092	-
dc.identifier.volume	133	-
dc.identifier.issue	4	-
dc.identifier.spage	1894	-
dc.identifier.epage	1911	-
dc.identifier.eissn	1573-1405	-
dc.identifier.issnl	0920-5691	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: StyleAdapter: A Unified Stylized Image Generation Model

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats