Image Generation - 2025-05 | Paper Arxiv Daily

Image Generation - 2025-05

Publish Date	Title	Authors	PDF	Translate	Read	Code
2025-05-30	ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL	Yu Zhang et.al.	2505.24875	translate	read	link
2025-05-30	GenSpace: Benchmarking Spatially-Aware Image Generation	Zehan Wang et.al.	2505.24870	translate	read	null
2025-05-30	Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation	Yucheng Zhou et.al.	2505.24787	translate	read	link
2025-05-30	QGAN-based data augmentation for hybrid quantum-classical neural networks	Run-Ze He et.al.	2505.24780	translate	read	null
2025-05-30	DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds	Jiaxu Zhang et.al.	2505.24733	translate	read	null
2025-05-30	un $^2$ CLIP: Improving CLIP’s Visual Detail Capturing Ability via Inverting unCLIP	Yinqi Li et.al.	2505.24517	translate	read	link
2025-05-30	Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields	Md Shahriar Rahim Siddiqui et.al.	2505.24434	translate	read	null
2025-05-30	Category-aware EEG image generation based on wavelet transform and contrast semantic loss	Enshang Zhang et.al.	2505.24301	translate	read	null
2025-05-30	Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin	Fangyikang Wang et.al.	2505.24222	translate	read	link
2025-05-29	LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers	Yusuf Dalva et.al.	2505.23758	translate	read	null
2025-05-29	How Animals Dance (When You’re Not Looking)	Xiaojuan Wang et.al.	2505.23738	translate	read	null
2025-05-29	Inference-time Scaling of Diffusion Models through Classical Search	Xiangcheng Zhang et.al.	2505.23614	translate	read	null
2025-05-29	Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model	Qingyu Shi et.al.	2505.23606	translate	read	link
2025-05-29	PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation	Christian Schmidt et.al.	2505.23587	translate	read	null
2025-05-29	R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation	Kaijie Chen et.al.	2505.23493	translate	read	null
2025-05-29	VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration	Ben Li et.al.	2505.23439	translate	read	link
2025-05-29	Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering	Sixian Wang et.al.	2505.23343	translate	read	link
2025-05-29	Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis	Hengyuan Cao et.al.	2505.23325	translate	read	link
2025-05-29	Score-based Generative Modeling for Conditional Independence Testing	Yixin Ren et.al.	2505.23309	translate	read	link
2025-05-28	SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation	Dekai Zhu et.al.	2505.22643	translate	read	null
2025-05-28	Principled Out-of-Distribution Generalization via Simplicity	Jiawei Ge et.al.	2505.22622	translate	read	null
2025-05-28	ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models	Dmitrii Sorokin et.al.	2505.22569	translate	read	link
2025-05-28	TabularQGAN: A Quantum Generative Model for Tabular Data	Pallavi Bhardwaj et.al.	2505.22533	translate	read	null
2025-05-28	PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models	Junwen Chen et.al.	2505.22523	translate	read	null
2025-05-28	ProCrop: Learning Aesthetic Image Cropping from Professional Compositions	Ke Zhang et.al.	2505.22490	translate	read	null
2025-05-28	Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation	Jiadong Pan et.al.	2505.22407	translate	read	null
2025-05-28	PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models	Fan Fei et.al.	2505.22394	translate	read	null
2025-05-28	Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion	Kewen Chen et.al.	2505.22360	translate	read	null
2025-05-28	Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers	Weilun Feng et.al.	2505.22167	translate	read	null
2025-05-27	Policy Optimized Text-to-Image Pipeline Design	Uri Gadot et.al.	2505.21478	translate	read	null
2025-05-27	DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction	Yiheng Liu et.al.	2505.21473	translate	read	link
2025-05-27	Creativity in LLM-based Multi-Agent Systems: A Survey	Yi-Cheng Lin et.al.	2505.21116	translate	read	null
2025-05-27	Facial Attribute Based Text Guided Face Anonymization	Mustafa İzzet Muştu et.al.	2505.21002	translate	read	null
2025-05-27	OrienText: Surface Oriented Textual Image Generation	Shubham Singh Paliwal et.al.	2505.20958	translate	read	null
2025-05-27	Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models	Puwei Lian et.al.	2505.20955	translate	read	null
2025-05-27	Create Anything Anywhere: Layout-Controllable Personalized Diffusion Model for Multiple Subjects	Wei Li et.al.	2505.20909	translate	read	null
2025-05-27	Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech	Nam-Gyu Kim et.al.	2505.20868	translate	read	null
2025-05-27	Not All Thats Rare Is Lost: Causal Paths to Rare Concept Synthesis	Bo-Kai Ruan et.al.	2505.20808	translate	read	null
2025-05-27	Unpaired Image-to-Image Translation for Segmentation and Signal Unmixing	Nikola Andrejic et.al.	2505.20746	translate	read	null
2025-05-26	FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities	Jin Wang et.al.	2505.20147	translate	read	null
2025-05-26	Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion	Zheqi Lv et.al.	2505.20053	translate	read	link
2025-05-26	StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation	Yi Wu et.al.	2505.19874	translate	read	null
2025-05-26	Applications and Effect Evaluation of Generative Adversarial Networks in Semi-Supervised Learning	Jiyu Hu et.al.	2505.19522	translate	read	null
2025-05-26	Structure Disruption: Subverting Malicious Diffusion-Based Inpainting via Self-Attention Query Perturbation	Yuhao He et.al.	2505.19425	translate	read	null
2025-05-26	MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models	Hang Hua et.al.	2505.19415	translate	read	null
2025-05-25	TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis	Kazi Mahathir Rahman et.al.	2505.19291	translate	read	null
2025-05-25	DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving	Chen Shi et.al.	2505.19239	translate	read	null
2025-05-25	RAISE: Realness Assessment for Image Synthesis and Evaluation	Aniruddha Mukherjee et.al.	2505.19233	translate	read	null
2025-05-25	MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation	Chenglong Ma et.al.	2505.19225	translate	read	link
2025-05-23	F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles	Varun Ajith et.al.	2505.18106	translate	read	null
2025-05-23	RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration	Sudarshan Rajagopalan et.al.	2505.18047	translate	read	null
2025-05-23	R-Genie: Reasoning-Guided Generative Image Editing	Dong Zhang et.al.	2505.17768	translate	read	null
2025-05-23	FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving	Shuang Zeng et.al.	2505.17685	translate	read	null
2025-05-23	Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer	Soumya Dutta et.al.	2505.17655	translate	read	null
2025-05-23	MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation	Jihan Yao et.al.	2505.17613	translate	read	link
2025-05-23	Deeper Diffusion Models Amplify Bias	Shahin Hakemi et.al.	2505.17560	translate	read	null
2025-05-23	Graph Style Transfer for Counterfactual Explainability	Bardh Prenkaj et.al.	2505.17542	translate	read	null
2025-05-23	RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning	Mingrui Wu et.al.	2505.17540	translate	read	link
2025-05-23	Co-Reinforcement Learning for Unified Multimodal Understanding and Generation	Jingjing Jiang et.al.	2505.17534	translate	read	null
2025-05-22	GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning	Chengqi Duan et.al.	2505.17022	translate	read	link
2025-05-22	Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO	Chengzhuo Tong et.al.	2505.17017	translate	read	link
2025-05-22	Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On	Siqi Wan et.al.	2505.16977	translate	read	link
2025-05-22	Creatively Upscaling Images with Global-Regional Priors	Yurui Qian et.al.	2505.16976	translate	read	null
2025-05-22	Power-Law Decay Loss for Large Language Model Finetuning: Focusing on Information Sparsity to Enhance Generation Quality	Jintian Shao et.al.	2505.16900	translate	read	null
2025-05-22	Conditional Panoramic Image Generation via Masked Autoregressive Modeling	Chaoyang Wang et.al.	2505.16862	translate	read	null
2025-05-22	Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation	Hongji Yang et.al.	2505.16763	translate	read	null
2025-05-22	Synthesis of Ventilator Dyssynchrony Waveforms using a Hybrid Generative Model and a Lung Model	Sagar Deep Deb et.al.	2505.16462	translate	read	null
2025-05-22	UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension	Kishan Gupta et.al.	2505.16404	translate	read	null
2025-05-22	Style Transfer with Diffusion Models for Synthetic-to-Real Domain Adaptation	Estelle Chigot et.al.	2505.16360	translate	read	link
2025-05-21	MMaDA: Multimodal Large Diffusion Language Models	Ling Yang et.al.	2505.15809	translate	read	link
2025-05-21	IA-T2I: Internet-Augmented Text-to-Image Generation	Chuanhao Li et.al.	2505.15779	translate	read	null
2025-05-21	FaceCrafter: Identity-Conditional Diffusion with Disentangled Control over Facial Pose, Expression, and Emotion	Kazuaki Mishima et.al.	2505.15313	translate	read	null
2025-05-21	BadSR: Stealthy Label Backdoor Attacks on Image Super-Resolution	Ji Guo et.al.	2505.15308	translate	read	null
2025-05-21	Scaling Diffusion Transformers Efficiently via $μ$ P	Chenyu Zheng et.al.	2505.15270	translate	read	link
2025-05-21	Contrastive Learning-Enhanced Trajectory Matching for Small-Scale Dataset Distillation	Wenmin Li et.al.	2505.15267	translate	read	null
2025-05-21	GT^2-GS: Geometry-aware Texture Transfer for Gaussian Splatting	Wenjie Liu et.al.	2505.15208	translate	read	null
2025-05-21	Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation	Xinran Wang et.al.	2505.15172	translate	read	null
2025-05-20	TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis	Yu Zhang et.al.	2505.14910	translate	read	link
2025-05-20	UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation	Rui Tian et.al.	2505.14682	translate	read	null
2025-05-20	Training-Free Watermarking for Autoregressive Image Generation	Yu Tong et.al.	2505.14673	translate	read	link
2025-05-20	SparC: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling	Zhihao Li et.al.	2505.14521	translate	read	null
2025-05-20	Latent Flow Transformer	Yen-Chen Wu et.al.	2505.14513	translate	read	link
2025-05-20	VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank	Tianhe Wu et.al.	2505.14460	translate	read	link
2025-05-20	Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives	Xingxing Weng et.al.	2505.14361	translate	read	null
2025-05-20	Handloom Design Generation Using Generative Networks	Rajat Kanti Bhattacharjee et.al.	2505.14330	translate	read	null
2025-05-20	Towards Generating Realistic Underwater Images	Abdul-Kazeem Shamba et.al.	2505.14296	translate	read	null
2025-05-20	EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection	Yijie Lu et.al.	2505.14289	translate	read	null
2025-05-20	Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization	Yuanyuan Chang et.al.	2505.14254	translate	read	link
2025-05-19	VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation	Huawei Lin et.al.	2505.13439	translate	read	link
2025-05-20	Swin DiT: Diffusion Transformer using Pseudo Shifted Windows	Jiafu Wu et.al.	2505.13219	translate	read	null
2025-05-19	Diffusion Models with Double Guidance: Generate with aggregated datasets	Yanfeng Yang et.al.	2505.13213	translate	read	null
2025-05-19	A Physics-Inspired Optimizer: Velocity Regularized Adam	Pranav Vaidhyanathan et.al.	2505.13196	translate	read	null
2025-05-19	Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model	Jonas Brenig et.al.	2505.13152	translate	read	link
2025-05-19	Accelerate TarFlow Sampling with GS-Jacobi Iteration	Ben Liu et.al.	2505.12849	translate	read	link
2025-05-19	A Comprehensive Benchmarking Platform for Deep Generative Models in Molecular Design	Adarsh Singh et.al.	2505.12848	translate	read	null
2025-05-19	A Study on the Refining Handwritten Font by Mixing Font Styles	Avinash Kumar et.al.	2505.12834	translate	read	link
2025-05-19	SynDec: A Synthesize-then-Decode Approach for Arbitrary Textual Style Transfer via Large Language Models	Han Sun et.al.	2505.12821	translate	read	null
2025-05-19	FRAbench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities	Shibo Hong et.al.	2505.12795	translate	read	link
2025-05-16	PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment	Dingbang Huang et.al.	2505.11468	translate	read	null
2025-05-16	GOUHFI: a novel contrast- and resolution-agnostic segmentation tool for Ultra-High Field MRI	Marc-Antoine Fortin et.al.	2505.11445	translate	read	link
2025-05-16	Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior	Chin-Yun Yu et.al.	2505.11315	translate	read	null
2025-05-16	DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models	Giulia Bertazzini et.al.	2505.11257	translate	read	null
2025-05-16	Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models	Fu-Yun Wang et.al.	2505.11245	translate	read	link
2025-05-16	CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback	Yixin Wan et.al.	2505.11178	translate	read	null
2025-05-16	One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework	Feiran Li et.al.	2505.11131	translate	read	link
2025-05-16	Deepfake Forensic Analysis: Source Dataset Attribution and Legal Implications of Synthetic Media Manipulation	Massimiliano Cassia et.al.	2505.11110	translate	read	null
2025-05-16	HSRMamba: Efficient Wavelet Stripe State Space Model for Hyperspectral Image Super-Resolution	Baisong Li et.al.	2505.11062	translate	read	null
2025-05-16	DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning	Weilai Xiang et.al.	2505.10999	translate	read	null
2025-05-15	End-to-End Vision Tokenizer Tuning	Wenxuan Wang et.al.	2505.10562	translate	read	null
2025-05-15	CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs	Raman Dutt et.al.	2505.10496	translate	read	link
2025-05-15	SOS: A Shuffle Order Strategy for Data Augmentation in Industrial Human Activity Recognition	Anh Tuan Ha et.al.	2505.10312	translate	read	null
2025-05-15	Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis	Bingda Tang et.al.	2505.10046	translate	read	link
2025-05-15	CartoAgent: a multimodal large language model-powered multi-agent cartographic framework for map style transfer and evaluation	Chenglong Wang et.al.	2505.09936	translate	read	null
2025-05-14	EnerVerse-AC: Envisioning Embodied Environments with Action Condition	Yuxin Jiang et.al.	2505.09723	translate	read	link
2025-05-14	Don’t Forget your Inverse DDIM for Image Editing	Guillermo Gomez-Trenado et.al.	2505.09571	translate	read	null
2025-05-14	BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset	Jiuhai Chen et.al.	2505.09568	translate	read	link
2025-05-14	Train a Multi-Task Diffusion Policy on RLBench-18 in One Day with One GPU	Yutong Hu et.al.	2505.09430	translate	read	link
2025-05-14	Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis	Bingxin Ke et.al.	2505.09358	translate	read	link
2025-05-14	Q-space Guided Collaborative Attention Translation Network for Flexible Diffusion-Weighted Images Synthesis	Pengli Zhu et.al.	2505.09323	translate	read	null
2025-05-14	An Initial Exploration of Default Images in Text-to-Image Generation	Hannu Simonen et.al.	2505.09166	translate	read	null
2025-05-14	DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis	Zeeshan Ahmad et.al.	2505.09091	translate	read	null
2025-05-13	SPAST: Arbitrary Style Transfer with Style Priors via Pre-trained Large-scale Model	Zhanjie Zhang et.al.	2505.08695	translate	read	null
2025-05-13	Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models	Donghoon Kim et.al.	2505.08622	translate	read	null
2025-05-13	DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art	Haroon Wahab et.al.	2505.08552	translate	read	null
2025-05-13	Skeleton-Guided Diffusion Model for Accurate Foot X-ray Synthesis in Hallux Valgus Diagnosis	Midi Wan et.al.	2505.08247	translate	read	link
2025-05-13	Identifying Memorization of Diffusion Models through p-Laplace Analysis	Jonathan Brokman et.al.	2505.08246	translate	read	null
2025-05-13	Unsupervised Raindrop Removal from a Single Image using Conditional Diffusion Models	Lhuqita Fazry et.al.	2505.08190	translate	read	null
2025-05-12	Image-Guided Microstructure Optimization using Diffusion Models: Validated with Li-Mn-rich Cathode Precursors	Geunho Choi et.al.	2505.07906	translate	read	null
2025-05-12	Synthesizing Diverse Network Flow Datasets with Scalable Dynamic Multigraph Generation	Arya Grayeli et.al.	2505.07777	translate	read	null
2025-05-12	Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning	Bohan Wang et.al.	2505.07538	translate	read	null
2025-05-12	Addressing degeneracies in latent interpolation for diffusion models	Erik Landolsi et.al.	2505.07481	translate	read	null
2025-05-12	GAN-based synthetic FDG PET images from T1 brain MRI can serve to improve performance of deep unsupervised anomaly detection models	Daria Zotova et.al.	2505.07364	translate	read	null
2025-05-12	Metrics that matter: Evaluating image quality metrics for medical image generation	Yash Deo et.al.	2505.07175	translate	read	link
2025-05-11	Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation	Md. Naimur Asif Borno et.al.	2505.06995	translate	read	null
2025-05-10	Learning Graph Representation of Agent Diffuser	Youcef Djenouri et.al.	2505.06761	translate	read	link
2025-05-10	HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image Generation	Hang Wang et.al.	2505.06512	translate	read	link
2025-05-10	PC-SRGAN: Physically Consistent Super-Resolution Generative Adversarial Network for General Transient Simulations	Md Rakibul Hasan et.al.	2505.06502	translate	read	null
2025-05-10	Climate in a Bottle: Towards a Generative Foundation Model for the Kilometer-Scale Global Atmosphere	Noah D. Brenowitz et.al.	2505.06474	translate	read	null
2025-05-09	Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation	Dongying Li et.al.	2505.06117	translate	read	null
2025-05-09	Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation	Kunpeng Qiu et.al.	2505.06068	translate	read	link
2025-05-09	Discovery of the Polar Ring Galaxies with deep learning	D. V. Dobrycheva et.al.	2505.05890	translate	read	null
2025-05-09	Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition	Zhiyuan Chen et.al.	2505.05829	translate	read	null
2025-05-08	InstanceGen: Image Generation with Instance-level Instructions	Etai Sella et.al.	2505.05678	translate	read	null
2025-05-08	Semantic Style Transfer for Enhancing Animal Facial Landmark Detection	Anadil Hussein et.al.	2505.05640	translate	read	null
2025-05-08	A Preliminary Study for GPT-4o on Image Restoration	Hao Yang et.al.	2505.05621	translate	read	link
2025-05-08	Prompt to Polyp: Clinically-Aware Medical Image Synthesis with Diffusion Models	Mikhail Chaichuk et.al.	2505.05573	translate	read	link
2025-05-08	OXSeg: Multidimensional attention UNet-based lip segmentation using semi-supervised lip contours	Hanie Moghaddasi et.al.	2505.05531	translate	read	null
2025-05-08	Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation	Chao Liao et.al.	2505.05472	translate	read	null
2025-05-08	Does CLIP perceive art the same way we do?	Andrea Asperti et.al.	2505.05229	translate	read	null
2025-05-08	Normalize Everything: A Preconditioned Magnitude-Preserving Architecture for Diffusion-Based Speech Enhancement	Julius Richter et.al.	2505.05216	translate	read	null
2025-05-09	FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech	Linhan Ma et.al.	2505.05159	translate	read	null
2025-05-08	PIDiff: Image Customization for Personalized Identities with Diffusion Models	Jinyu Gu et.al.	2505.05081	translate	read	null
2025-05-08	ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis	Onkar Susladkar et.al.	2505.04963	translate	read	null
2025-05-07	CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation	Viacheslav Vasilev et.al.	2505.04851	translate	read	null
2025-05-07	Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers	Divyansh Srivastava et.al.	2505.04718	translate	read	null
2025-05-08	Defining and Quantifying Creative Behavior in Popular Image Generators	Aditi Ramaswamy et.al.	2505.04497	translate	read	null
2025-05-07	Efficient Flow Matching using Latent Variables	Anirban Samaddar et.al.	2505.04486	translate	read	null
2025-05-07	RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation	Jing Hu et.al.	2505.04424	translate	read	link
2025-05-07	CountDiffusion: Text-to-Image Synthesis with Training-Free Counting-Guidance Diffusion	Yanyu Li et.al.	2505.04347	translate	read	null
2025-05-07	A Large Language Model for Feasible and Diverse Population Synthesis	Sung Yoo Lim et.al.	2505.04196	translate	read	null
2025-05-07	Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety	Variath Madhupal Gautham Nair et.al.	2505.04146	translate	read	null
2025-05-07	RFNNS: Robust Fixed Neural Network Steganography with Popular Deep Generative Models	Yu Cheng et.al.	2505.04116	translate	read	null
2025-05-08	MAISY: Motion-Aware Image SYnthesis for Medical Image Motion Correction	Andrew Zhang et.al.	2505.04105	translate	read	null
2025-05-06	nuGAN: Generative Adversarial Emulator for Cosmic Web with Neutrinos	Neerav Kaushal et.al.	2505.03936	translate	read	null
2025-05-06	CaRaFFusion: Improving 2D Semantic Segmentation with Camera-Radar Point Cloud Fusion and Zero-Shot Image Inpainting	Huawei Sun et.al.	2505.03679	translate	read	null
2025-05-06	Distribution-Conditional Generation: From Class Distribution to Creative Generation	Fu Feng et.al.	2505.03667	translate	read	null
2025-05-06	Revolutionizing Brain Tumor Imaging: Generating Synthetic 3D FA Maps from T1-Weighted MRI using CycleGAN Models	Xin Du et.al.	2505.03662	translate	read	null
2025-05-06	Real-Time Person Image Synthesis Using a Flow Matching Model	Jiwoo Jeong et.al.	2505.03562	translate	read	null
2025-05-06	Safer Prompts: Reducing IP Risk in Visual Generative AI	Lena Reissinger et.al.	2505.03338	translate	read	null
2025-05-06	Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning	Yibin Wang et.al.	2505.03318	translate	read	link
2025-05-06	Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation	Jincheng Zhang et.al.	2505.03314	translate	read	link
2025-05-05	Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models	Kuofeng Gao et.al.	2505.02824	translate	read	null
2025-05-06	MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation	Mingcheng Li et.al.	2505.02648	translate	read	null
2025-05-05	Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities	Xinjie Zhang et.al.	2505.02567	translate	read	link
2025-05-05	Text to Image Generation and Editing: A Survey	Pengfei Yang et.al.	2505.02527	translate	read	null
2025-05-05	Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction	Biao Gong et.al.	2505.02471	translate	read	link
2025-05-04	Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset	Jakub Wąsala et.al.	2505.02255	translate	read	null
2025-05-04	Improving Physical Object State Representation in Text-to-Image Generative Systems	Tianle Chen et.al.	2505.02236	translate	read	link
2025-05-04	Robust AI-Generated Face Detection with Imbalanced Data	Yamini Sri Krubha et.al.	2505.02182	translate	read	link
2025-05-06	Regression is all you need for medical image translation	Sebastian Rassmann et.al.	2505.02048	translate	read	null
2025-05-03	Discrete Spatial Diffusion: Intensity-Preserving Diffusion Modeling	Javier E. Santos et.al.	2505.01917	translate	read	null
2025-05-02	Deep Learning-Enabled System Diagnosis in Microgrids: A Feature-Feedback GAN Approach	Swetha Rani Kasimalla et.al.	2505.01366	translate	read	null
2025-05-02	Improving Editability in Image Generation with Layer-wise Memory	Daneul Kim et.al.	2505.01079	translate	read	link
2025-05-01	Data-Driven Optical To Thermal Inference in Pool Boiling Using Generative Adversarial Networks	Qianxi Fu et.al.	2505.00823	translate	read	null
2025-05-01	T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT	Dongzhi Jiang et.al.	2505.00703	translate	read	link
2025-05-01	Steering Large Language Models with Register Analysis for Arbitrary Style Transfer	Xinchen Yang et.al.	2505.00679	translate	read	null
2025-05-01	JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers	Kwon Byung-Ki et.al.	2505.00482	translate	read	link
2025-05-01	Stealth Signals: Multi-Discriminator GANs for Covert Communications Against Diverse Wardens	Afan Ali et.al.	2505.00399	translate	read	null
2025-05-01	GAN-based Generator of Adversarial Attack on Intelligent End-to-End Autoencoder-based Communication System	Jianyuan Chen et.al.	2505.00395	translate	read	null
2025-05-01	Denoising weak lensing mass maps with diffusion model: systematic comparison with generative adversarial network	Shohei D. Aoyama et.al.	2505.00345	translate	read	null

(<a href=../Image_Generation.md>back to Image Generation</a>)