Image Generation - 2025-05
Image Generation - 2025-05
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-05-30 | ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL | Yu Zhang et.al. | 2505.24875 | translate | read | link |
| 2025-05-30 | GenSpace: Benchmarking Spatially-Aware Image Generation | Zehan Wang et.al. | 2505.24870 | translate | read | null |
| 2025-05-30 | Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation | Yucheng Zhou et.al. | 2505.24787 | translate | read | link |
| 2025-05-30 | QGAN-based data augmentation for hybrid quantum-classical neural networks | Run-Ze He et.al. | 2505.24780 | translate | read | null |
| 2025-05-30 | DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds | Jiaxu Zhang et.al. | 2505.24733 | translate | read | null |
| 2025-05-30 | un $^2$ CLIP: Improving CLIP’s Visual Detail Capturing Ability via Inverting unCLIP | Yinqi Li et.al. | 2505.24517 | translate | read | link |
| 2025-05-30 | Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields | Md Shahriar Rahim Siddiqui et.al. | 2505.24434 | translate | read | null |
| 2025-05-30 | Category-aware EEG image generation based on wavelet transform and contrast semantic loss | Enshang Zhang et.al. | 2505.24301 | translate | read | null |
| 2025-05-30 | Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin | Fangyikang Wang et.al. | 2505.24222 | translate | read | link |
| 2025-05-29 | LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers | Yusuf Dalva et.al. | 2505.23758 | translate | read | null |
| 2025-05-29 | How Animals Dance (When You’re Not Looking) | Xiaojuan Wang et.al. | 2505.23738 | translate | read | null |
| 2025-05-29 | Inference-time Scaling of Diffusion Models through Classical Search | Xiangcheng Zhang et.al. | 2505.23614 | translate | read | null |
| 2025-05-29 | Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model | Qingyu Shi et.al. | 2505.23606 | translate | read | link |
| 2025-05-29 | PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation | Christian Schmidt et.al. | 2505.23587 | translate | read | null |
| 2025-05-29 | R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation | Kaijie Chen et.al. | 2505.23493 | translate | read | null |
| 2025-05-29 | VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration | Ben Li et.al. | 2505.23439 | translate | read | link |
| 2025-05-29 | Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering | Sixian Wang et.al. | 2505.23343 | translate | read | link |
| 2025-05-29 | Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis | Hengyuan Cao et.al. | 2505.23325 | translate | read | link |
| 2025-05-29 | Score-based Generative Modeling for Conditional Independence Testing | Yixin Ren et.al. | 2505.23309 | translate | read | link |
| 2025-05-28 | SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation | Dekai Zhu et.al. | 2505.22643 | translate | read | null |
| 2025-05-28 | Principled Out-of-Distribution Generalization via Simplicity | Jiawei Ge et.al. | 2505.22622 | translate | read | null |
| 2025-05-28 | ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models | Dmitrii Sorokin et.al. | 2505.22569 | translate | read | link |
| 2025-05-28 | TabularQGAN: A Quantum Generative Model for Tabular Data | Pallavi Bhardwaj et.al. | 2505.22533 | translate | read | null |
| 2025-05-28 | PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models | Junwen Chen et.al. | 2505.22523 | translate | read | null |
| 2025-05-28 | ProCrop: Learning Aesthetic Image Cropping from Professional Compositions | Ke Zhang et.al. | 2505.22490 | translate | read | null |
| 2025-05-28 | Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation | Jiadong Pan et.al. | 2505.22407 | translate | read | null |
| 2025-05-28 | PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models | Fan Fei et.al. | 2505.22394 | translate | read | null |
| 2025-05-28 | Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion | Kewen Chen et.al. | 2505.22360 | translate | read | null |
| 2025-05-28 | Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers | Weilun Feng et.al. | 2505.22167 | translate | read | null |
| 2025-05-27 | Policy Optimized Text-to-Image Pipeline Design | Uri Gadot et.al. | 2505.21478 | translate | read | null |
| 2025-05-27 | DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction | Yiheng Liu et.al. | 2505.21473 | translate | read | link |
| 2025-05-27 | Creativity in LLM-based Multi-Agent Systems: A Survey | Yi-Cheng Lin et.al. | 2505.21116 | translate | read | null |
| 2025-05-27 | Facial Attribute Based Text Guided Face Anonymization | Mustafa İzzet Muştu et.al. | 2505.21002 | translate | read | null |
| 2025-05-27 | OrienText: Surface Oriented Textual Image Generation | Shubham Singh Paliwal et.al. | 2505.20958 | translate | read | null |
| 2025-05-27 | Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models | Puwei Lian et.al. | 2505.20955 | translate | read | null |
| 2025-05-27 | Create Anything Anywhere: Layout-Controllable Personalized Diffusion Model for Multiple Subjects | Wei Li et.al. | 2505.20909 | translate | read | null |
| 2025-05-27 | Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech | Nam-Gyu Kim et.al. | 2505.20868 | translate | read | null |
| 2025-05-27 | Not All Thats Rare Is Lost: Causal Paths to Rare Concept Synthesis | Bo-Kai Ruan et.al. | 2505.20808 | translate | read | null |
| 2025-05-27 | Unpaired Image-to-Image Translation for Segmentation and Signal Unmixing | Nikola Andrejic et.al. | 2505.20746 | translate | read | null |
| 2025-05-26 | FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities | Jin Wang et.al. | 2505.20147 | translate | read | null |
| 2025-05-26 | Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion | Zheqi Lv et.al. | 2505.20053 | translate | read | link |
| 2025-05-26 | StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation | Yi Wu et.al. | 2505.19874 | translate | read | null |
| 2025-05-26 | Applications and Effect Evaluation of Generative Adversarial Networks in Semi-Supervised Learning | Jiyu Hu et.al. | 2505.19522 | translate | read | null |
| 2025-05-26 | Structure Disruption: Subverting Malicious Diffusion-Based Inpainting via Self-Attention Query Perturbation | Yuhao He et.al. | 2505.19425 | translate | read | null |
| 2025-05-26 | MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models | Hang Hua et.al. | 2505.19415 | translate | read | null |
| 2025-05-25 | TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis | Kazi Mahathir Rahman et.al. | 2505.19291 | translate | read | null |
| 2025-05-25 | DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving | Chen Shi et.al. | 2505.19239 | translate | read | null |
| 2025-05-25 | RAISE: Realness Assessment for Image Synthesis and Evaluation | Aniruddha Mukherjee et.al. | 2505.19233 | translate | read | null |
| 2025-05-25 | MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation | Chenglong Ma et.al. | 2505.19225 | translate | read | link |
| 2025-05-23 | F-ANcGAN: An Attention-Enhanced Cycle Consistent Generative Adversarial Architecture for Synthetic Image Generation of Nanoparticles | Varun Ajith et.al. | 2505.18106 | translate | read | null |
| 2025-05-23 | RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration | Sudarshan Rajagopalan et.al. | 2505.18047 | translate | read | null |
| 2025-05-23 | R-Genie: Reasoning-Guided Generative Image Editing | Dong Zhang et.al. | 2505.17768 | translate | read | null |
| 2025-05-23 | FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving | Shuang Zeng et.al. | 2505.17685 | translate | read | null |
| 2025-05-23 | Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer | Soumya Dutta et.al. | 2505.17655 | translate | read | null |
| 2025-05-23 | MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation | Jihan Yao et.al. | 2505.17613 | translate | read | link |
| 2025-05-23 | Deeper Diffusion Models Amplify Bias | Shahin Hakemi et.al. | 2505.17560 | translate | read | null |
| 2025-05-23 | Graph Style Transfer for Counterfactual Explainability | Bardh Prenkaj et.al. | 2505.17542 | translate | read | null |
| 2025-05-23 | RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning | Mingrui Wu et.al. | 2505.17540 | translate | read | link |
| 2025-05-23 | Co-Reinforcement Learning for Unified Multimodal Understanding and Generation | Jingjing Jiang et.al. | 2505.17534 | translate | read | null |
| 2025-05-22 | GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning | Chengqi Duan et.al. | 2505.17022 | translate | read | link |
| 2025-05-22 | Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO | Chengzhuo Tong et.al. | 2505.17017 | translate | read | link |
| 2025-05-22 | Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On | Siqi Wan et.al. | 2505.16977 | translate | read | link |
| 2025-05-22 | Creatively Upscaling Images with Global-Regional Priors | Yurui Qian et.al. | 2505.16976 | translate | read | null |
| 2025-05-22 | Power-Law Decay Loss for Large Language Model Finetuning: Focusing on Information Sparsity to Enhance Generation Quality | Jintian Shao et.al. | 2505.16900 | translate | read | null |
| 2025-05-22 | Conditional Panoramic Image Generation via Masked Autoregressive Modeling | Chaoyang Wang et.al. | 2505.16862 | translate | read | null |
| 2025-05-22 | Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation | Hongji Yang et.al. | 2505.16763 | translate | read | null |
| 2025-05-22 | Synthesis of Ventilator Dyssynchrony Waveforms using a Hybrid Generative Model and a Lung Model | Sagar Deep Deb et.al. | 2505.16462 | translate | read | null |
| 2025-05-22 | UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension | Kishan Gupta et.al. | 2505.16404 | translate | read | null |
| 2025-05-22 | Style Transfer with Diffusion Models for Synthetic-to-Real Domain Adaptation | Estelle Chigot et.al. | 2505.16360 | translate | read | link |
| 2025-05-21 | MMaDA: Multimodal Large Diffusion Language Models | Ling Yang et.al. | 2505.15809 | translate | read | link |
| 2025-05-21 | IA-T2I: Internet-Augmented Text-to-Image Generation | Chuanhao Li et.al. | 2505.15779 | translate | read | null |
| 2025-05-21 | FaceCrafter: Identity-Conditional Diffusion with Disentangled Control over Facial Pose, Expression, and Emotion | Kazuaki Mishima et.al. | 2505.15313 | translate | read | null |
| 2025-05-21 | BadSR: Stealthy Label Backdoor Attacks on Image Super-Resolution | Ji Guo et.al. | 2505.15308 | translate | read | null |
| 2025-05-21 | Scaling Diffusion Transformers Efficiently via $μ$ P | Chenyu Zheng et.al. | 2505.15270 | translate | read | link |
| 2025-05-21 | Contrastive Learning-Enhanced Trajectory Matching for Small-Scale Dataset Distillation | Wenmin Li et.al. | 2505.15267 | translate | read | null |
| 2025-05-21 | GT^2-GS: Geometry-aware Texture Transfer for Gaussian Splatting | Wenjie Liu et.al. | 2505.15208 | translate | read | null |
| 2025-05-21 | Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation | Xinran Wang et.al. | 2505.15172 | translate | read | null |
| 2025-05-20 | TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis | Yu Zhang et.al. | 2505.14910 | translate | read | link |
| 2025-05-20 | UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation | Rui Tian et.al. | 2505.14682 | translate | read | null |
| 2025-05-20 | Training-Free Watermarking for Autoregressive Image Generation | Yu Tong et.al. | 2505.14673 | translate | read | link |
| 2025-05-20 | SparC: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling | Zhihao Li et.al. | 2505.14521 | translate | read | null |
| 2025-05-20 | Latent Flow Transformer | Yen-Chen Wu et.al. | 2505.14513 | translate | read | link |
| 2025-05-20 | VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank | Tianhe Wu et.al. | 2505.14460 | translate | read | link |
| 2025-05-20 | Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives | Xingxing Weng et.al. | 2505.14361 | translate | read | null |
| 2025-05-20 | Handloom Design Generation Using Generative Networks | Rajat Kanti Bhattacharjee et.al. | 2505.14330 | translate | read | null |
| 2025-05-20 | Towards Generating Realistic Underwater Images | Abdul-Kazeem Shamba et.al. | 2505.14296 | translate | read | null |
| 2025-05-20 | EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection | Yijie Lu et.al. | 2505.14289 | translate | read | null |
| 2025-05-20 | Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization | Yuanyuan Chang et.al. | 2505.14254 | translate | read | link |
| 2025-05-19 | VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation | Huawei Lin et.al. | 2505.13439 | translate | read | link |
| 2025-05-20 | Swin DiT: Diffusion Transformer using Pseudo Shifted Windows | Jiafu Wu et.al. | 2505.13219 | translate | read | null |
| 2025-05-19 | Diffusion Models with Double Guidance: Generate with aggregated datasets | Yanfeng Yang et.al. | 2505.13213 | translate | read | null |
| 2025-05-19 | A Physics-Inspired Optimizer: Velocity Regularized Adam | Pranav Vaidhyanathan et.al. | 2505.13196 | translate | read | null |
| 2025-05-19 | Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model | Jonas Brenig et.al. | 2505.13152 | translate | read | link |
| 2025-05-19 | Accelerate TarFlow Sampling with GS-Jacobi Iteration | Ben Liu et.al. | 2505.12849 | translate | read | link |
| 2025-05-19 | A Comprehensive Benchmarking Platform for Deep Generative Models in Molecular Design | Adarsh Singh et.al. | 2505.12848 | translate | read | null |
| 2025-05-19 | A Study on the Refining Handwritten Font by Mixing Font Styles | Avinash Kumar et.al. | 2505.12834 | translate | read | link |
| 2025-05-19 | SynDec: A Synthesize-then-Decode Approach for Arbitrary Textual Style Transfer via Large Language Models | Han Sun et.al. | 2505.12821 | translate | read | null |
| 2025-05-19 | FRAbench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities | Shibo Hong et.al. | 2505.12795 | translate | read | link |
| 2025-05-16 | PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment | Dingbang Huang et.al. | 2505.11468 | translate | read | null |
| 2025-05-16 | GOUHFI: a novel contrast- and resolution-agnostic segmentation tool for Ultra-High Field MRI | Marc-Antoine Fortin et.al. | 2505.11445 | translate | read | link |
| 2025-05-16 | Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior | Chin-Yun Yu et.al. | 2505.11315 | translate | read | null |
| 2025-05-16 | DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models | Giulia Bertazzini et.al. | 2505.11257 | translate | read | null |
| 2025-05-16 | Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models | Fu-Yun Wang et.al. | 2505.11245 | translate | read | link |
| 2025-05-16 | CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback | Yixin Wan et.al. | 2505.11178 | translate | read | null |
| 2025-05-16 | One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework | Feiran Li et.al. | 2505.11131 | translate | read | link |
| 2025-05-16 | Deepfake Forensic Analysis: Source Dataset Attribution and Legal Implications of Synthetic Media Manipulation | Massimiliano Cassia et.al. | 2505.11110 | translate | read | null |
| 2025-05-16 | HSRMamba: Efficient Wavelet Stripe State Space Model for Hyperspectral Image Super-Resolution | Baisong Li et.al. | 2505.11062 | translate | read | null |
| 2025-05-16 | DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning | Weilai Xiang et.al. | 2505.10999 | translate | read | null |
| 2025-05-15 | End-to-End Vision Tokenizer Tuning | Wenxuan Wang et.al. | 2505.10562 | translate | read | null |
| 2025-05-15 | CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs | Raman Dutt et.al. | 2505.10496 | translate | read | link |
| 2025-05-15 | SOS: A Shuffle Order Strategy for Data Augmentation in Industrial Human Activity Recognition | Anh Tuan Ha et.al. | 2505.10312 | translate | read | null |
| 2025-05-15 | Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis | Bingda Tang et.al. | 2505.10046 | translate | read | link |
| 2025-05-15 | CartoAgent: a multimodal large language model-powered multi-agent cartographic framework for map style transfer and evaluation | Chenglong Wang et.al. | 2505.09936 | translate | read | null |
| 2025-05-14 | EnerVerse-AC: Envisioning Embodied Environments with Action Condition | Yuxin Jiang et.al. | 2505.09723 | translate | read | link |
| 2025-05-14 | Don’t Forget your Inverse DDIM for Image Editing | Guillermo Gomez-Trenado et.al. | 2505.09571 | translate | read | null |
| 2025-05-14 | BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset | Jiuhai Chen et.al. | 2505.09568 | translate | read | link |
| 2025-05-14 | Train a Multi-Task Diffusion Policy on RLBench-18 in One Day with One GPU | Yutong Hu et.al. | 2505.09430 | translate | read | link |
| 2025-05-14 | Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis | Bingxin Ke et.al. | 2505.09358 | translate | read | link |
| 2025-05-14 | Q-space Guided Collaborative Attention Translation Network for Flexible Diffusion-Weighted Images Synthesis | Pengli Zhu et.al. | 2505.09323 | translate | read | null |
| 2025-05-14 | An Initial Exploration of Default Images in Text-to-Image Generation | Hannu Simonen et.al. | 2505.09166 | translate | read | null |
| 2025-05-14 | DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis | Zeeshan Ahmad et.al. | 2505.09091 | translate | read | null |
| 2025-05-13 | SPAST: Arbitrary Style Transfer with Style Priors via Pre-trained Large-scale Model | Zhanjie Zhang et.al. | 2505.08695 | translate | read | null |
| 2025-05-13 | Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models | Donghoon Kim et.al. | 2505.08622 | translate | read | null |
| 2025-05-13 | DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art | Haroon Wahab et.al. | 2505.08552 | translate | read | null |
| 2025-05-13 | Skeleton-Guided Diffusion Model for Accurate Foot X-ray Synthesis in Hallux Valgus Diagnosis | Midi Wan et.al. | 2505.08247 | translate | read | link |
| 2025-05-13 | Identifying Memorization of Diffusion Models through p-Laplace Analysis | Jonathan Brokman et.al. | 2505.08246 | translate | read | null |
| 2025-05-13 | Unsupervised Raindrop Removal from a Single Image using Conditional Diffusion Models | Lhuqita Fazry et.al. | 2505.08190 | translate | read | null |
| 2025-05-12 | Image-Guided Microstructure Optimization using Diffusion Models: Validated with Li-Mn-rich Cathode Precursors | Geunho Choi et.al. | 2505.07906 | translate | read | null |
| 2025-05-12 | Synthesizing Diverse Network Flow Datasets with Scalable Dynamic Multigraph Generation | Arya Grayeli et.al. | 2505.07777 | translate | read | null |
| 2025-05-12 | Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning | Bohan Wang et.al. | 2505.07538 | translate | read | null |
| 2025-05-12 | Addressing degeneracies in latent interpolation for diffusion models | Erik Landolsi et.al. | 2505.07481 | translate | read | null |
| 2025-05-12 | GAN-based synthetic FDG PET images from T1 brain MRI can serve to improve performance of deep unsupervised anomaly detection models | Daria Zotova et.al. | 2505.07364 | translate | read | null |
| 2025-05-12 | Metrics that matter: Evaluating image quality metrics for medical image generation | Yash Deo et.al. | 2505.07175 | translate | read | link |
| 2025-05-11 | Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation | Md. Naimur Asif Borno et.al. | 2505.06995 | translate | read | null |
| 2025-05-10 | Learning Graph Representation of Agent Diffuser | Youcef Djenouri et.al. | 2505.06761 | translate | read | link |
| 2025-05-10 | HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image Generation | Hang Wang et.al. | 2505.06512 | translate | read | link |
| 2025-05-10 | PC-SRGAN: Physically Consistent Super-Resolution Generative Adversarial Network for General Transient Simulations | Md Rakibul Hasan et.al. | 2505.06502 | translate | read | null |
| 2025-05-10 | Climate in a Bottle: Towards a Generative Foundation Model for the Kilometer-Scale Global Atmosphere | Noah D. Brenowitz et.al. | 2505.06474 | translate | read | null |
| 2025-05-09 | Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation | Dongying Li et.al. | 2505.06117 | translate | read | null |
| 2025-05-09 | Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation | Kunpeng Qiu et.al. | 2505.06068 | translate | read | link |
| 2025-05-09 | Discovery of the Polar Ring Galaxies with deep learning | D. V. Dobrycheva et.al. | 2505.05890 | translate | read | null |
| 2025-05-09 | Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition | Zhiyuan Chen et.al. | 2505.05829 | translate | read | null |
| 2025-05-08 | InstanceGen: Image Generation with Instance-level Instructions | Etai Sella et.al. | 2505.05678 | translate | read | null |
| 2025-05-08 | Semantic Style Transfer for Enhancing Animal Facial Landmark Detection | Anadil Hussein et.al. | 2505.05640 | translate | read | null |
| 2025-05-08 | A Preliminary Study for GPT-4o on Image Restoration | Hao Yang et.al. | 2505.05621 | translate | read | link |
| 2025-05-08 | Prompt to Polyp: Clinically-Aware Medical Image Synthesis with Diffusion Models | Mikhail Chaichuk et.al. | 2505.05573 | translate | read | link |
| 2025-05-08 | OXSeg: Multidimensional attention UNet-based lip segmentation using semi-supervised lip contours | Hanie Moghaddasi et.al. | 2505.05531 | translate | read | null |
| 2025-05-08 | Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation | Chao Liao et.al. | 2505.05472 | translate | read | null |
| 2025-05-08 | Does CLIP perceive art the same way we do? | Andrea Asperti et.al. | 2505.05229 | translate | read | null |
| 2025-05-08 | Normalize Everything: A Preconditioned Magnitude-Preserving Architecture for Diffusion-Based Speech Enhancement | Julius Richter et.al. | 2505.05216 | translate | read | null |
| 2025-05-09 | FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech | Linhan Ma et.al. | 2505.05159 | translate | read | null |
| 2025-05-08 | PIDiff: Image Customization for Personalized Identities with Diffusion Models | Jinyu Gu et.al. | 2505.05081 | translate | read | null |
| 2025-05-08 | ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis | Onkar Susladkar et.al. | 2505.04963 | translate | read | null |
| 2025-05-07 | CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation | Viacheslav Vasilev et.al. | 2505.04851 | translate | read | null |
| 2025-05-07 | Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers | Divyansh Srivastava et.al. | 2505.04718 | translate | read | null |
| 2025-05-08 | Defining and Quantifying Creative Behavior in Popular Image Generators | Aditi Ramaswamy et.al. | 2505.04497 | translate | read | null |
| 2025-05-07 | Efficient Flow Matching using Latent Variables | Anirban Samaddar et.al. | 2505.04486 | translate | read | null |
| 2025-05-07 | RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation | Jing Hu et.al. | 2505.04424 | translate | read | link |
| 2025-05-07 | CountDiffusion: Text-to-Image Synthesis with Training-Free Counting-Guidance Diffusion | Yanyu Li et.al. | 2505.04347 | translate | read | null |
| 2025-05-07 | A Large Language Model for Feasible and Diverse Population Synthesis | Sung Yoo Lim et.al. | 2505.04196 | translate | read | null |
| 2025-05-07 | Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety | Variath Madhupal Gautham Nair et.al. | 2505.04146 | translate | read | null |
| 2025-05-07 | RFNNS: Robust Fixed Neural Network Steganography with Popular Deep Generative Models | Yu Cheng et.al. | 2505.04116 | translate | read | null |
| 2025-05-08 | MAISY: Motion-Aware Image SYnthesis for Medical Image Motion Correction | Andrew Zhang et.al. | 2505.04105 | translate | read | null |
| 2025-05-06 | nuGAN: Generative Adversarial Emulator for Cosmic Web with Neutrinos | Neerav Kaushal et.al. | 2505.03936 | translate | read | null |
| 2025-05-06 | CaRaFFusion: Improving 2D Semantic Segmentation with Camera-Radar Point Cloud Fusion and Zero-Shot Image Inpainting | Huawei Sun et.al. | 2505.03679 | translate | read | null |
| 2025-05-06 | Distribution-Conditional Generation: From Class Distribution to Creative Generation | Fu Feng et.al. | 2505.03667 | translate | read | null |
| 2025-05-06 | Revolutionizing Brain Tumor Imaging: Generating Synthetic 3D FA Maps from T1-Weighted MRI using CycleGAN Models | Xin Du et.al. | 2505.03662 | translate | read | null |
| 2025-05-06 | Real-Time Person Image Synthesis Using a Flow Matching Model | Jiwoo Jeong et.al. | 2505.03562 | translate | read | null |
| 2025-05-06 | Safer Prompts: Reducing IP Risk in Visual Generative AI | Lena Reissinger et.al. | 2505.03338 | translate | read | null |
| 2025-05-06 | Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning | Yibin Wang et.al. | 2505.03318 | translate | read | link |
| 2025-05-06 | Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation | Jincheng Zhang et.al. | 2505.03314 | translate | read | link |
| 2025-05-05 | Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models | Kuofeng Gao et.al. | 2505.02824 | translate | read | null |
| 2025-05-06 | MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation | Mingcheng Li et.al. | 2505.02648 | translate | read | null |
| 2025-05-05 | Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities | Xinjie Zhang et.al. | 2505.02567 | translate | read | link |
| 2025-05-05 | Text to Image Generation and Editing: A Survey | Pengfei Yang et.al. | 2505.02527 | translate | read | null |
| 2025-05-05 | Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction | Biao Gong et.al. | 2505.02471 | translate | read | link |
| 2025-05-04 | Enhancing AI Face Realism: Cost-Efficient Quality Improvement in Distilled Diffusion Models with a Fully Synthetic Dataset | Jakub Wąsala et.al. | 2505.02255 | translate | read | null |
| 2025-05-04 | Improving Physical Object State Representation in Text-to-Image Generative Systems | Tianle Chen et.al. | 2505.02236 | translate | read | link |
| 2025-05-04 | Robust AI-Generated Face Detection with Imbalanced Data | Yamini Sri Krubha et.al. | 2505.02182 | translate | read | link |
| 2025-05-06 | Regression is all you need for medical image translation | Sebastian Rassmann et.al. | 2505.02048 | translate | read | null |
| 2025-05-03 | Discrete Spatial Diffusion: Intensity-Preserving Diffusion Modeling | Javier E. Santos et.al. | 2505.01917 | translate | read | null |
| 2025-05-02 | Deep Learning-Enabled System Diagnosis in Microgrids: A Feature-Feedback GAN Approach | Swetha Rani Kasimalla et.al. | 2505.01366 | translate | read | null |
| 2025-05-02 | Improving Editability in Image Generation with Layer-wise Memory | Daneul Kim et.al. | 2505.01079 | translate | read | link |
| 2025-05-01 | Data-Driven Optical To Thermal Inference in Pool Boiling Using Generative Adversarial Networks | Qianxi Fu et.al. | 2505.00823 | translate | read | null |
| 2025-05-01 | T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT | Dongzhi Jiang et.al. | 2505.00703 | translate | read | link |
| 2025-05-01 | Steering Large Language Models with Register Analysis for Arbitrary Style Transfer | Xinchen Yang et.al. | 2505.00679 | translate | read | null |
| 2025-05-01 | JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers | Kwon Byung-Ki et.al. | 2505.00482 | translate | read | link |
| 2025-05-01 | Stealth Signals: Multi-Discriminator GANs for Covert Communications Against Diverse Wardens | Afan Ali et.al. | 2505.00399 | translate | read | null |
| 2025-05-01 | GAN-based Generator of Adversarial Attack on Intelligent End-to-End Autoencoder-based Communication System | Jianyuan Chen et.al. | 2505.00395 | translate | read | null |
| 2025-05-01 | Denoising weak lensing mass maps with diffusion model: systematic comparison with generative adversarial network | Shohei D. Aoyama et.al. | 2505.00345 | translate | read | null |
(<a href=../Image_Generation.md>back to Image Generation</a>)