Diffusion Personalization Tuning Free
8 papers with code • 1 benchmarks • 1 datasets
This is a sub-class of diffusion personalization methods where the model is not required to be tuned on few user-specific images. Rather, the diffusion models are additionally trained on some dataset to allow forward pass personalization during test time.
Most implemented papers
Arc2Face: A Foundation Model of Human Faces
This paper presents Arc2Face, an identity-conditioned face foundation model, which, given the ArcFace embedding of a person, can generate diverse photo-realistic images with an unparalleled degree of face similarity than existing models.
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Despite the simplicity of our method, an IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fully fine-tuned image prompt model.
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications.
InstantID: Zero-shot Identity-Preserving Generation in Seconds
There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA.
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation.
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
In this paper, we propose Subject-Diffusion, a novel open-domain personalized image generation model that, in addition to not requiring test-time fine-tuning, also only requires a single reference image to support personalized generation of single- or multi-subject in any domain.
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts.
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
In this paper, we propose a brand new training-free text-to-image generation/editing framework, namely Recaption, Plan and Generate (RPG), harnessing the powerful chain-of-thought reasoning ability of multimodal LLMs to enhance the compositionality of text-to-image diffusion models.