Image Generation - 2025-12
Image Generation - 2025-12
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-12-31 | Comparative Evaluation of CNN Architectures for Neural Style Transfer in Indonesian Batik Motif Generation: A Comprehensive Study | Happy Gery Pangestu et.al. | 2601.00888 | translate | read | null |
| 2025-12-25 | Can Generative Models Actually Forge Realistic Identity Documents? | Alexander Vinogradov et.al. | 2601.00829 | translate | read | null |
| 2025-12-24 | Speak the Art: A Direct Speech to Image Generation Framework | Mariam Saeed et.al. | 2601.00827 | translate | read | null |
| 2025-12-31 | Compositional Diffusion with Guided Search for Long-Horizon Planning | Utkarsh A Mishra et.al. | 2601.00126 | translate | read | null |
| 2025-12-31 | Are First-Order Diffusion Samplers Really Slower? A Fast Forward-Value Approach | Yuchen Jiao et.al. | 2512.24927 | translate | read | null |
| 2025-12-31 | Resolving the Origins and Pathways of Ionizing Radiation Escape with UV Integral Field Spectroscopy | Cody Carr et.al. | 2512.24895 | translate | read | null |
| 2025-12-30 | F2IDiff: Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model | Devendra K. Jangid et.al. | 2512.24473 | translate | read | null |
| 2025-12-30 | Medical Image Classification on Imbalanced Data Using ProGAN and SMA-Optimized ResNet: Application to COVID-19 | Sina Jahromi et.al. | 2512.24214 | translate | read | null |
| 2025-12-30 | RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention | Aiyue Chen et.al. | 2512.24086 | translate | read | null |
| 2025-12-29 | Lifelong Domain Adaptive 3D Human Pose Estimation | Qucheng Peng et.al. | 2512.23860 | translate | read | null |
| 2025-12-29 | MiMo-Audio: Audio Language Models are Few-Shot Learners | Xiaomi LLM-Core Team et.al. | 2512.23808 | translate | read | null |
| 2025-12-29 | Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion | Hau-Shiang Shiu et.al. | 2512.23709 | translate | read | null |
| 2025-12-29 | PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation | Zongsheng Cao et.al. | 2512.23546 | translate | read | null |
| 2025-12-29 | Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution | Hexin Zhang et.al. | 2512.23532 | translate | read | null |
| 2025-12-29 | UniHetero: Could Generation Enhance Understanding for Vision-Language-Model at Large Data Scale? | Fengjiao Chen et.al. | 2512.23512 | translate | read | null |
| 2025-12-29 | SPER: Accelerating Progressive Entity Resolution via Stochastic Bipartite Maximization | Dimitrios Karapiperis et.al. | 2512.23491 | translate | read | null |
| 2025-12-29 | Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators | Bohan Xiao et.al. | 2512.23463 | translate | read | null |
| 2025-12-29 | Direct Diffusion Score Preference Optimization via Stepwise Contrastive Policy-Pair Supervision | Dohyun Kim et.al. | 2512.23426 | translate | read | null |
| 2025-12-29 | Multi Agents Semantic Emotion Aligned Music to Image Generation with Music Derived Captions | Junchang Shi et.al. | 2512.23320 | translate | read | null |
| 2025-12-29 | Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-step High-Fidelity Audio Generation | Zengwei Yao et.al. | 2512.23278 | translate | read | null |
| 2025-12-29 | RS-Prune: Training-Free Data Pruning at High Ratios for Efficient Remote Sensing Diffusion Foundation Models | Fan Wei et.al. | 2512.23239 | translate | read | null |
| 2025-12-29 | Anomaly Detection by Effectively Leveraging Synthetic Images | Sungho Kang et.al. | 2512.23227 | translate | read | null |
| 2025-12-29 | Bridging Your Imagination with Audio-Video Generation via a Unified Director | Jiaxu Zhang et.al. | 2512.23222 | translate | read | null |
| 2025-12-29 | PathoSyn: Imaging-Pathology MRI Synthesis via Disentangled Deviation Diffusion | Jian Wang et.al. | 2512.23130 | translate | read | null |
| 2025-12-28 | RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance | Chunyuan Chen et.al. | 2512.22974 | translate | read | null |
| 2025-12-28 | KANO: Kolmogorov-Arnold Neural Operator for Image Super-Resolution | Chenyu Li et.al. | 2512.22822 | translate | read | null |
| 2025-12-28 | SwinCCIR: An end-to-end deep network for Compton camera imaging reconstruction | Minghao Dong et.al. | 2512.22766 | translate | read | null |
| 2025-12-27 | CritiFusion: Semantic Critique and Spectral Alignment for Faithful Text-to-Image Generation | ZhenQi Chen et.al. | 2512.22681 | translate | read | null |
| 2025-12-27 | Quantum Generative Models for Computational Fluid Dynamics: A First Exploration of Latent Space Learning in Lattice Boltzmann Simulations | Achraf Hsain et.al. | 2512.22672 | translate | read | null |
| 2025-12-27 | FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution | Yidi Liu et.al. | 2512.22647 | translate | read | null |
| 2025-12-27 | Quantum-Circuit Framework for Two-Stage Stochastic Programming via QAOA Integrated with a Quantum Generative Neural Network | Taihei Kuroiwa et.al. | 2512.22434 | translate | read | null |
| 2025-12-26 | Self-Evaluation Unlocks Any-Step Text-to-Image Generation | Xin Yu et.al. | 2512.22374 | translate | read | null |
| 2025-12-25 | Human-Aligned Generative Perception: Bridging Psychophysics and Generative Models | Antara Titikhsha et.al. | 2512.22272 | translate | read | null |
| 2025-12-22 | Super-Resolution Enhancement of Medical Images Based on Diffusion Model: An Optimization Scheme for Low-Resolution Gastric Images | Haozhe Jia et.al. | 2512.22209 | translate | read | null |
| 2025-12-21 | Complex Swin Transformer for Accelerating Enhanced SMWI Reconstruction | Muhammad Usman et.al. | 2512.22202 | translate | read | null |
| 2025-12-17 | AudioGAN: A Compact and Efficient Framework for Real-Time High-Fidelity Text-to-Audio Generation | HaeChun Chung et.al. | 2512.22166 | translate | read | null |
| 2025-12-26 | DPAR: Dynamic Patchification for Efficient Autoregressive Visual Generation | Divyansh Srivastava et.al. | 2512.21867 | translate | read | null |
| 2025-12-25 | Deep Generative Models for Synthetic Financial Data: Applications to Portfolio and Risk Modeling | Christophe D. Hounwanou et.al. | 2512.21798 | translate | read | null |
| 2025-12-25 | Diffusion Posterior Sampling for Super-Resolution under Gaussian Measurement Noise | Abu Hanif Muhammad Syarubany et.al. | 2512.21797 | translate | read | null |
| 2025-12-25 | Synthetic Financial Data Generation for Enhanced Financial Modelling | Christophe D. Hounwanou et.al. | 2512.21791 | translate | read | null |
| 2025-12-25 | InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation | Jinqi Xiao et.al. | 2512.21788 | translate | read | null |
| 2025-12-25 | FUSE: Unifying Spectral and Semantic Cues for Robust AI-Generated Image Detection | Md. Zahid Hossain et.al. | 2512.21695 | translate | read | null |
| 2025-12-25 | BeHGAN: Bengali Handwritten Word Generation from Plain Text Using Generative Adversarial Networks | Md. Rakibul Islam et.al. | 2512.21694 | translate | read | null |
| 2025-12-25 | Dictionary-Transform Generative Adversarial Networks | Angshul Majumdar et.al. | 2512.21677 | translate | read | null |
| 2025-12-25 | UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture | Shuo Cao et.al. | 2512.21675 | translate | read | null |
| 2025-12-25 | Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints | Mutiara Shabrina et.al. | 2512.21637 | translate | read | null |
| 2025-12-25 | Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models | Takuro Kutsuna et.al. | 2512.21593 | translate | read | null |
| 2025-12-25 | DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO | Henglin Liu et.al. | 2512.21514 | translate | read | null |
| 2025-12-25 | Fixed-Threshold Evaluation of a Hybrid CNN-ViT for AI-Generated Image Detection Across Photos and Art | Md Ashik Khan et.al. | 2512.21512 | translate | read | null |
| 2025-12-24 | A Reinforcement Learning Approach to Synthetic Data Generation | Natalia Espinosa-Dice et.al. | 2512.21395 | translate | read | null |
| 2025-12-24 | GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation | Snehal Singh Tomar et.al. | 2512.21276 | translate | read | null |
| 2025-12-24 | A Turn Toward Better Alignment: Few-Shot Generative Adaptation with Equivariant Feature Rotation | Chenghao Xu et.al. | 2512.21174 | translate | read | null |
| 2025-12-24 | FreeInpaint: Tuning-free Prompt Alignment and Visual Rationality Enhancement in Image Inpainting | Chao Gong et.al. | 2512.21104 | translate | read | null |
| 2025-12-24 | Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control | Minghao Han et.al. | 2512.21058 | translate | read | null |
| 2025-12-24 | Matrix Completion Via Reweighted Logarithmic Norm Minimization | Zhijie Wang et.al. | 2512.21050 | translate | read | null |
| 2025-12-24 | A Large-Depth-Range Layer-Based Hologram Dataset for Machine Learning-Based 3D Computer-Generated Holography | Jaehong Lee et.al. | 2512.21040 | translate | read | null |
| 2025-12-24 | Next-Scale Prediction: A Self-Supervised Approach for Real-World Image Denoising | Yiwen Shan et.al. | 2512.21038 | translate | read | null |
| 2025-12-24 | Enhancing diffusion models with Gaussianization preprocessing | Li Cunzhi et.al. | 2512.21020 | translate | read | null |
| 2025-12-24 | FluencyVE: Marrying Temporal-Aware Mamba with Bypass Attention for Video Editing | Mingshu Cai et.al. | 2512.21015 | translate | read | null |
| 2025-12-19 | Dominating vs. Dominated: Generative Collapse in Diffusion Models | Hayeon Jeong et.al. | 2512.20666 | translate | read | null |
| 2025-12-12 | Flow Gym | Francesco Banelli et.al. | 2512.20642 | translate | read | null |
| 2025-12-23 | Optical Pin Beams: Research Progresses and Emerging Applications | Ze Zhang et.al. | 2512.20541 | translate | read | null |
| 2025-12-23 | Multi-temporal Adaptive Red-Green-Blue and Long-Wave Infrared Fusion for You Only Look Once-Based Landmine Detection from Unmanned Aerial Systems | James E. Gallagher et.al. | 2512.20487 | translate | read | null |
| 2025-12-23 | UTDesign: A Unified Framework for Stylized Text Editing and Generation in Graphic Design Images | Yiming Zhao et.al. | 2512.20479 | translate | read | null |
| 2025-12-23 | Resolution and Robustness Bounds for Reconstructive Spectrometers | Changyan Zhu et.al. | 2512.20415 | translate | read | null |
| 2025-12-23 | CRAFT: Continuous Reasoning and Agentic Feedback Tuning for Multimodal Text-to-Image Generation | V. Kovalev et.al. | 2512.20362 | translate | read | null |
| 2025-12-23 | Field-Space Attention for Structure-Preserving Earth System Transformers | Maximilian Witte et.al. | 2512.20350 | translate | read | null |
| 2025-12-23 | HGAN-SDEs: Learning Neural Stochastic Differential Equations with Hermite-Guided Adversarial Training | Yuanjian Xu et.al. | 2512.20272 | translate | read | null |
| 2025-12-23 | How I Met Your Bias: Investigating Bias Amplification in Diffusion Models | Nathan Roos et.al. | 2512.20233 | translate | read | null |
| 2025-12-23 | Generative Latent Coding for Ultra-Low Bitrate Image Compression | Zhaoyang Jia et.al. | 2512.20194 | translate | read | null |
| 2025-12-23 | Target Classification for Integrated Sensing and Communication in Industrial Deployments | Luca Barbieri et.al. | 2512.20154 | translate | read | null |
| 2025-12-23 | IoT-based Android Malware Detection Using Graph Neural Network With Adversarial Defense | Rahul Yumlembam et.al. | 2512.20004 | translate | read | null |
| 2025-12-22 | Efficient Vision Mamba for MRI Super-Resolution via Hybrid Selective Scanning | Mojtaba Safari et.al. | 2512.19676 | translate | read | null |
| 2025-12-22 | Generative diffusion models for agricultural AI: plant image generation, indoor-to-outdoor translation, and expert preference alignment | Da Tan et.al. | 2512.19632 | translate | read | null |
| 2025-12-22 | Rethinking Coupled Tensor Analysis for Hyperspectral Super-Resolution: Recoverable Modeling Under Endmember Variability | Meng Ding et.al. | 2512.19489 | translate | read | null |
| 2025-12-22 | Emotion-Director: Bridging Affective Shortcut in Emotion-Oriented Image Generation | Guoli Jia et.al. | 2512.19479 | translate | read | null |
| 2025-12-22 | dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models | Yi Xin et.al. | 2512.19433 | translate | read | null |
| 2025-12-22 | GANeXt: A Fully ConvNeXt-Enhanced Generative Adversarial Network for MRI- and CBCT-to-CT Synthesis | Siyuan Mei et.al. | 2512.19336 | translate | read | null |
| 2025-12-22 | MixFlow Training: Alleviating Exposure Bias with Slowed Interpolation Mixture | Hui Li et.al. | 2512.19311 | translate | read | null |
| 2025-12-22 | 3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory | Xinyang Song et.al. | 2512.19271 | translate | read | null |
| 2025-12-22 | VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis | Meng Chu et.al. | 2512.19243 | translate | read | null |
| 2025-12-22 | Regression generation adversarial network based on dual data evaluation strategy for industrial application | Zesen Wang et.al. | 2512.19232 | translate | read | null |
| 2025-12-22 | ALMA Observations of Cold Methanol Gas in the Large Magellanic Cloud (LMC): N79 South GMC | Suman Kumar Mondal et.al. | 2512.19185 | translate | read | null |
| 2025-12-22 | Efficient Personalization of Generative Models via Optimal Experimental Design | Guy Schacht et.al. | 2512.19057 | translate | read | null |
| 2025-12-22 | AI-Driven Subcarrier-Level CQI Feedback | Chengyong Jiang et.al. | 2512.19054 | translate | read | null |
| 2025-12-22 | An Fluid Antenna Array-Enabled DOA Estimation Method: End-Fire Effect Suppression | Jiaji Ren et.al. | 2512.18981 | translate | read | null |
| 2025-12-22 | LouvreSAE: Sparse Autoencoders for Interpretable and Controllable Style Transfer | Raina Panda et.al. | 2512.18930 | translate | read | null |
| 2025-12-21 | Generative Modeling through Spectral Analysis of Koopman Operator | Yuanchao Xu et.al. | 2512.18837 | translate | read | null |
| 2025-12-21 | MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation | Guohui Zhang et.al. | 2512.18766 | translate | read | null |
| 2025-12-21 | Uni-Neur2Img: Unified Neural Signal-Guided Image Generation, Editing, and Stylization via Diffusion Transformers | Xiyue Bai et.al. | 2512.18635 | translate | read | null |
| 2025-12-21 | Image-to-Image Translation with Generative Adversarial Network for Electrical Resistance Tomography Reconstruction | Wejian Yan et.al. | 2512.18557 | translate | read | null |
| 2025-12-20 | Plasticine: A Traceable Diffusion Model for Medical Image Translation | Tianyang Zhanng et.al. | 2512.18455 | translate | read | null |
| 2025-12-20 | Imaging the LkCa 15 system in polarimetry and total intensity without self-subtraction artefacts | C. Swastik et.al. | 2512.18439 | translate | read | null |
| 2025-12-20 | Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models | Chao Wen et.al. | 2512.18388 | translate | read | null |
| 2025-12-20 | PSI3D: Plug-and-Play 3D Stochastic Inference with Slice-wise Latent Diffusion Prior | Wenhan Guo et.al. | 2512.18367 | translate | read | null |
| 2025-12-20 | Loom: Diffusion-Transformer for Interleaved Generation | Mingcheng Ye et.al. | 2512.18254 | translate | read | null |
| 2025-12-20 | Local Patches Meet Global Context: Scalable 3D Diffusion Priors for Computed Tomography Reconstruction | Taewon Yang et.al. | 2512.18161 | translate | read | null |
| 2025-12-19 | SERA-H: Beyond Native Sentinel Spatial Limits for High-Resolution Canopy Height Mapping | Thomas Boudras et.al. | 2512.18128 | translate | read | null |
| 2025-12-17 | SuperFlow: Training Flow Matching Models with RL on the Fly | Kaijie Chen et.al. | 2512.17951 | translate | read | null |
| 2025-12-19 | Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing | Shilong Zhang et.al. | 2512.17909 | translate | read | null |
| 2025-12-19 | Inverse-Designed Phase Prediction in Digital Lasers Using Deep Learning and Transfer Learning | Yu-Che Wu et.al. | 2512.17879 | translate | read | null |
| 2025-12-19 | InSPECT: Invariant Spectral Features Preservation of Diffusion Models | Baohua Yan et.al. | 2512.17873 | translate | read | null |
| 2025-12-19 | UrbanDIFF: A Denoising Diffusion Model for Spatial Gap Filling of Urban Land Surface Temperature Under Dense Cloud Cover | Arya Chavoshi et.al. | 2512.17782 | translate | read | null |
| 2025-12-19 | AdaptPrompt: Parameter-Efficient Adaptation of VLMs for Generalizable Deepfake Detection | Yichen Jiang et.al. | 2512.17730 | translate | read | null |
| 2025-12-19 | An Empirical Study of Sampling Hyperparameters in Diffusion-Based Super-Resolution | Yudhistira Arief Wibowo et.al. | 2512.17675 | translate | read | null |
| 2025-12-19 | Self-Supervised Weighted Image Guided Quantitative MRI Super-Resolution | Alireza Samadifardheris et.al. | 2512.17612 | translate | read | null |
| 2025-12-19 | LumiCtrl : Learning Illuminant Prompts for Lighting Control in Personalized Text-to-Image Models | Muhammad Atif Butt et.al. | 2512.17489 | translate | read | null |
| 2025-12-19 | Fetpype: An Open-Source Pipeline for Reproducible Fetal Brain MRI Analysis | Thomas Sanchez et.al. | 2512.17472 | translate | read | null |
| 2025-12-19 | Super-resolution wavefront reconstruction in adaptive-optics with pyramid sensors | Carlos M. Correia et.al. | 2512.17469 | translate | read | null |
| 2025-12-19 | Super-resolution-enabled atmospheric tomography for astronomical multi-wavefront-sensor adaptive-optics systems | Carlos M. Correia et.al. | 2512.17430 | translate | read | null |
| 2025-12-19 | Beyond Semantic Features: Pixel-level Mapping for Generalized AI-Generated Image Detection | Chenming Zhou et.al. | 2512.17350 | translate | read | null |
| 2025-12-19 | Multi-level distortion-aware deformable network for omnidirectional image super-resolution | Cuixin Yang et.al. | 2512.17343 | translate | read | null |
| 2025-12-19 | A Benchmark for Ultra-High-Resolution Remote Sensing MLLMs | Yunkai Dang et.al. | 2512.17319 | translate | read | null |
| 2025-12-19 | MatLat: Material Latent Space for PBR Texture Generation | Kyeongmin Yeo et.al. | 2512.17302 | translate | read | null |
| 2025-12-18 | SFTok: Bridging the Performance Gap in Discrete Tokenizers | Qihang Rao et.al. | 2512.16910 | translate | read | null |
| 2025-12-18 | FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering | Ole Beisswenger et.al. | 2512.16670 | translate | read | null |
| 2025-12-18 | REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion | Giorgos Petsangourakis et.al. | 2512.16636 | translate | read | null |
| 2025-12-18 | DeContext as Defense: Safe Image Editing in Diffusion Transformers | Linghui Shen et.al. | 2512.16625 | translate | read | null |
| 2025-12-18 | Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers | Yifan Zhou et.al. | 2512.16615 | translate | read | null |
| 2025-12-18 | Yuan-TecSwin: A text conditioned Diffusion model with Swin-transformer blocks | Shaohua Wu et.al. | 2512.16586 | translate | read | null |
| 2025-12-18 | StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models | Senmao Li et.al. | 2512.16483 | translate | read | null |
| 2025-12-18 | Geometric Disentanglement of Text Embeddings for Subject-Consistent Text-to-Image Generation using A Single Prompt | Shangxun Li et.al. | 2512.16443 | translate | read | null |
| 2025-12-18 | CogSR: Semantic-Aware Speech Super-Resolution via Chain-of-Thought Guided Flow Matching | Jiajun Yuan et.al. | 2512.16304 | translate | read | null |
| 2025-12-18 | PixelArena: A benchmark for Pixel-Precision Visual Intelligence | Feng Liang et.al. | 2512.16303 | translate | read | null |
| 2025-12-18 | Pixel Super-Resolved Fluorescence Lifetime Imaging Using Deep Learning | Paloma Casteleiro Costa et.al. | 2512.16266 | translate | read | null |
| 2025-12-18 | Driving in Corner Case: A Real-World Adversarial Closed-Loop Evaluation Platform for End-to-End Autonomous Driving | Jiaheng Geng et.al. | 2512.16055 | translate | read | null |
| 2025-12-17 | MCR-VQGAN: A Scalable and Cost-Effective Tau PET Synthesis Approach for Alzheimer’s Disease Imaging | Jin Young Kim et.al. | 2512.15947 | translate | read | null |
| 2025-12-17 | Secure AI-Driven Super-Resolution for Real-Time Mixed Reality Applications | Mohammad Waquas Usmani et.al. | 2512.15823 | translate | read | null |
| 2025-12-13 | Two-Step Data Augmentation for Masked Face Detection and Recognition: Turning Fake Masks to Real | Yan Yang et.al. | 2512.15774 | translate | read | null |
| 2025-12-17 | Stylized Synthetic Augmentation further improves Corruption Robustness | Georg Siedel et.al. | 2512.15675 | translate | read | null |
| 2025-12-16 | InpaintDPO: Mitigating Spatial Relationship Hallucinations in Foreground-conditioned Inpainting via Diverse Preference Optimization | Qirui Li et.al. | 2512.15644 | translate | read | null |
| 2025-12-16 | ComMark: Covert and Robust Black-Box Model Watermarking with Compressed Samples | Yunfei Yang et.al. | 2512.15641 | translate | read | null |
| 2025-12-17 | Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition | Shengming Yin et.al. | 2512.15603 | translate | read | null |
| 2025-12-17 | VAAS: Vision-Attention Anomaly Scoring for Image Manipulation Detection in Digital Forensics | Opeyemi Bamigbade et.al. | 2512.15512 | translate | read | null |
| 2025-12-17 | Copyright Infringement Risk Reduction via Chain-of-Thought and Task Instruction Prompting | Neeraj Sarna et.al. | 2512.15442 | translate | read | null |
| 2025-12-17 | Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods | Ji Zhou et.al. | 2512.15422 | translate | read | null |
| 2025-12-17 | Time-Varying Audio Effect Modeling by End-to-End Adversarial Training | Yann Bourdin et.al. | 2512.15313 | translate | read | null |
| 2025-12-17 | SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation | Wangyu Wu et.al. | 2512.15310 | translate | read | null |
| 2025-12-17 | Quantum Machine Learning for Cybersecurity: A Taxonomy and Future Directions | Siva Sai et.al. | 2512.15286 | translate | read | null |
| 2025-12-17 | MMMamba: A Versatile Cross-Modal In Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement | Yingying Wang et.al. | 2512.15261 | translate | read | null |
| 2025-12-17 | Is Nano Banana Pro a Low-Level Vision All-Rounder? A Comprehensive Evaluation on 14 Tasks and 40 Datasets | Jialong Zuo et.al. | 2512.15110 | translate | read | null |
| 2025-12-17 | MVGSR: Multi-View Consistent 3D Gaussian Super-Resolution via Epipolar Guidance | Kaizhe Zhang et.al. | 2512.15048 | translate | read | null |
| 2025-12-16 | Spherical Leech Quantization for Visual Tokenization and Generation | Yue Zhao et.al. | 2512.14697 | translate | read | null |
| 2025-12-16 | MMGR: Multi-Modal Generative Reasoning | Zefan Cai et.al. | 2512.14691 | translate | read | null |
| 2025-12-16 | JMMMU-Pro: Image-based Japanese Multi-discipline Multimodal Understanding Benchmark via Vibe Benchmark Construction | Atsuyuki Miyai et.al. | 2512.14620 | translate | read | null |
| 2025-12-16 | TAT: Task-Adaptive Transformer for All-in-One Medical Image Restoration | Zhiwen Yang et.al. | 2512.14550 | translate | read | null |
| 2025-12-16 | Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space | Xingfu Zhou et.al. | 2512.14448 | translate | read | null |
| 2025-12-16 | Separation-free exponential fitting with structured noise, with applications to inverse problems in parabolic PDEs | Rami Katz et.al. | 2512.14301 | translate | read | null |
| 2025-12-16 | On fractal minimizers and potentials of occupation measures | Michael Hinz et.al. | 2512.14248 | translate | read | null |
| 2025-12-16 | MFE-GAN: Efficient GAN-based Framework for Document Image Enhancement and Binarization with Multi-scale Feature Extraction | Rui-Yang Ju et.al. | 2512.14114 | translate | read | null |
| 2025-12-16 | ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models | Ruishu Zhu et.al. | 2512.14099 | translate | read | null |
| 2025-12-16 | OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration | Ruitong Sun et.al. | 2512.14096 | translate | read | null |
| 2025-12-16 | Bridging Fidelity-Reality with Controllable One-Step Diffusion for Image Super-Resolution | Hao Chen et.al. | 2512.14061 | translate | read | null |
| 2025-12-16 | Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models | Shufan Li et.al. | 2512.14008 | translate | read | null |
| 2025-12-16 | An intercomparison of generative machine learning methods for downscaling precipitation at fine spatial scales | Bryn Ward-Leikis et.al. | 2512.13987 | translate | read | null |
| 2025-12-16 | Super-Resolution Posterior Ocular Microvascular Imaging Using 3-D Ultrasound Localization Microscopy With a 32X32 Matrix Array | Junhang Zhang et.al. | 2512.13966 | translate | read | null |
| 2025-12-15 | From Unlearning to UNBRANDING: A Benchmark for Trademark-Safe Text-to-Image Generation | Dawid Malarz et.al. | 2512.13953 | translate | read | null |
| 2025-12-15 | An evaluation of SVBRDF Prediction from Generative Image Models for Appearance Modeling of 3D Scenes | Alban Gauthier et.al. | 2512.13950 | translate | read | null |
| 2025-12-15 | Coarse-to-Fine Hierarchical Alignment for UAV-based Human Detection using Diffusion Models | Wenda Li et.al. | 2512.13869 | translate | read | null |
| 2025-12-15 | Time-aware UNet and super-resolution deep residual networks for spatial downscaling | Mika Sipilä et.al. | 2512.13753 | translate | read | null |
| 2025-12-13 | Composite Classifier-Free Guidance for Multi-Modal Conditioning in Wind Dynamics Super-Resolution | Jacob Schnell et.al. | 2512.13729 | translate | read | null |
| 2025-12-15 | Directional Textual Inversion for Personalized Text-to-Image Generation | Kunhee Kim et.al. | 2512.13672 | translate | read | null |
| 2025-12-15 | Fast label-free point-scanning super-resolution imaging for endoscopy | Ning Xu et.al. | 2512.13432 | translate | read | null |
| 2025-12-15 | ALMA view on the nature of the compact VLA continuum sources in the massive young stellar object G25.65+1.05 | N. N. Shakhvorostova et.al. | 2512.13382 | translate | read | null |
| 2025-12-15 | Super-resolving Herschel - a deep learning based deconvolution and denoising technique | Dennis Koopmans et.al. | 2512.13353 | translate | read | null |
| 2025-12-15 | ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement | Zhihang Liu et.al. | 2512.13303 | translate | read | null |
| 2025-12-15 | A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis | Xianchao Guan et.al. | 2512.13164 | translate | read | null |
| 2025-12-15 | Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models | Hao Chen et.al. | 2512.13039 | translate | read | null |
| 2025-12-15 | JoDiffusion: Jointly Diffusing Image with Pixel-Level Annotations for Semantic Segmentation Promotion | Haoyu Wang et.al. | 2512.13014 | translate | read | null |
| 2025-12-15 | Few-Step Distillation for Text-to-Image Generation: A Practical Guide | Yifan Pu et.al. | 2512.13006 | translate | read | null |
| 2025-12-15 | SCAdapter: Content-Style Disentanglement for Diffusion Style Transfer | Luan Thanh Trinh et.al. | 2512.12963 | translate | read | null |
| 2025-12-15 | Qonvolution: Towards Learning High-Frequency Signals with Queried Convolution | Abhinav Kumar et.al. | 2512.12898 | translate | read | null |
| 2025-12-14 | Learning Common and Salient Generative Factors Between Two Image Datasets | Yunlong He et.al. | 2512.12800 | translate | read | null |
| 2025-12-14 | Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling | Yuran Wang et.al. | 2512.12675 | translate | read | null |
| 2025-12-14 | Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space | Chengzhi Liu et.al. | 2512.12623 | translate | read | null |
| 2025-12-14 | Geometry-Aware Scene-Consistent Image Generation | Cong Xie et.al. | 2512.12598 | translate | read | null |
| 2025-12-14 | Vision-Enhanced Large Language Models for High-Resolution Image Synthesis and Multimodal Data Interpretation | Karthikeya KV et.al. | 2512.12595 | translate | read | null |
| 2025-12-14 | Differentiable Energy-Based Regularization in GANs: A Simulator-Based Exploration of VQE-Inspired Auxiliary Losses | David Strnadel et.al. | 2512.12581 | translate | read | null |
| 2025-12-14 | SafeGen: Embedding Ethical Safeguards in Text-to-Image Generation | Dang Phuong Nam et.al. | 2512.12501 | translate | read | null |
| 2025-12-13 | From Particles to Fields: Reframing Photon Mapping with Continuous Gaussian Photon Fields | Jiachen Tao et.al. | 2512.12459 | translate | read | null |
| 2025-12-13 | Can GPT replace human raters? Validity and reliability of machine-generated norms for metaphors | Veronica Mangiaterra et.al. | 2512.12444 | translate | read | null |
| 2025-12-13 | Anchoring Values in Temporal and Group Dimensions for Flow Matching Model Alignment | Yawen Shao et.al. | 2512.12387 | translate | read | null |
| 2025-12-13 | Hellinger loss function for Generative Adversarial Networks | Giovanni Saraceno et.al. | 2512.12267 | translate | read | null |
| 2025-12-13 | ProImage-Bench: Rubric-Based Evaluation for Professional Image Generation | Minheng Ni et.al. | 2512.12220 | translate | read | null |
| 2025-12-13 | AutoMV: An Automatic Multi-Agent System for Music Video Generation | Xiaoxuan Tang et.al. | 2512.12196 | translate | read | link |
| 2025-12-13 | A comparative study of generative models for child voice conversion | Protima Nomo Sudro et.al. | 2512.12129 | translate | read | null |
| 2025-12-12 | CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos | Tejas Panambur et.al. | 2512.12060 | translate | read | null |
| 2025-12-12 | From Earths to Super-Earths: Five New Small Planets Transiting M Dwarf Stars | Jonathan Gomez Barrientos et.al. | 2512.11971 | translate | read | null |
| 2025-12-09 | Aesthetic Alignment Risks Assimilation: How Image Generation and Reward Models Reinforce Beauty Bias and Ideological “Censorship” | Wenqi Marshall Guo et.al. | 2512.11883 | translate | read | null |
| 2025-12-12 | Smudged Fingerprints: A Systematic Evaluation of the Robustness of AI Image Fingerprints | Kai Yao et.al. | 2512.11771 | translate | read | null |
| 2025-12-12 | Reducing Domain Gap with Diffusion-Based Domain Adaptation for Cell Counting | Mohammad Dehghanmanshadi et.al. | 2512.11763 | translate | read | link |
| 2025-12-12 | SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder | Minglei Shi et.al. | 2512.11749 | translate | read | link |
| 2025-12-12 | Reframing Music-Driven 2D Dance Pose Generation as Multi-Channel Image Generation | Yan Zhang et.al. | 2512.11720 | translate | read | null |
| 2025-12-12 | EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing | Wei Chow et.al. | 2512.11715 | translate | read | null |
| 2025-12-12 | Fast and Explicit: Slice-to-Volume Reconstruction via 3D Gaussian Primitives with Analytic Point Spread Function Modeling | Maik Dannecker et.al. | 2512.11624 | translate | read | null |
| 2025-12-12 | Super-Resolved Canopy Height Mapping from Sentinel-2 Time Series Using LiDAR HD Reference Data across Metropolitan France | Ekaterina Kalinicheva et.al. | 2512.11524 | translate | read | null |
| 2025-12-12 | Exploring MLLM-Diffusion Information Transfer with MetaCanvas | Han Lin et.al. | 2512.11464 | translate | read | null |
| 2025-12-12 | VFMF: World Modeling by Forecasting Vision Foundation Model Features | Gabrijel Boduljak et.al. | 2512.11225 | translate | read | null |
| 2025-12-11 | Chemical composition and enrichment of the Centaurus cluster core seen by XRISM/Resolve | F. Mernier et.al. | 2512.11028 | translate | read | null |
| 2025-12-11 | Generative Adversarial Variational Quantum Kolmogorov-Arnold Network | Hikaru Wakaura et.al. | 2512.11014 | translate | read | null |
| 2025-12-11 | Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration | Sicheng Mo et.al. | 2512.10954 | translate | read | null |
| 2025-12-11 | Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation | Yiwen Tang et.al. | 2512.10949 | translate | read | null |
| 2025-12-11 | GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting | Madhav Agarwal et.al. | 2512.10939 | translate | read | null |
| 2025-12-11 | Quantifying classical and quantum bounds for resolving closely spaced, non-interacting, simultaneously emitting dipole sources in optical microscopy | Armine I. Dingilian et.al. | 2512.10889 | translate | read | null |
| 2025-12-11 | Interpretable and Steerable Concept Bottleneck Sparse Autoencoders | Akshay Kulkarni et.al. | 2512.10805 | translate | read | null |
| 2025-12-11 | OutLines: Modeling Spectral Lines from Winds, Bubbles, and Outflows | Sophia R. Flury et.al. | 2512.10650 | translate | read | null |
| 2025-12-11 | Lang2Motion: Bridging Language and Motion through Joint Embedding Spaces | Bishoy Galoaa et.al. | 2512.10617 | translate | read | null |
| 2025-12-11 | Topology-Guided Quantum GANs for Constrained Graph Generation | Tobias Rohe et.al. | 2512.10582 | translate | read | null |
| 2025-12-11 | Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding | Yuchen Feng et.al. | 2512.10548 | translate | read | null |
| 2025-12-11 | Topology-Agnostic Animal Motion Generation from Text Prompt | Keyi Chen et.al. | 2512.10352 | translate | read | null |
| 2025-12-11 | Zero-shot Adaptation of Stable Diffusion via Plug-in Hierarchical Degradation Representation for Real-World Super-Resolution | Yi-Cheng Liao et.al. | 2512.10340 | translate | read | null |
| 2025-12-10 | Independent Density Estimation | Jiahao Liu et.al. | 2512.10067 | translate | read | null |
| 2025-12-10 | MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata | Yihao Liu et.al. | 2512.10041 | translate | read | null |
| 2025-12-10 | Hybrid Finite Element and Least Squares Support Vector Regression Method for solving Partial Differential Equations with Legendre Polynomial Kernels | Maryam Babaei et.al. | 2512.09967 | translate | read | null |
| 2025-12-10 | DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation | Zhizhong Wang et.al. | 2512.09814 | translate | read | null |
| 2025-12-10 | Stylized Meta-Album: Group-bias injection with style transfer to study robustness against distribution shifts | Romain Mussard et.al. | 2512.09773 | translate | read | null |
| 2025-12-10 | SynthPix: A lightspeed PIV images generator | Antonio Terpin et.al. | 2512.09664 | translate | read | null |
| 2025-12-10 | A Dual-Domain Convolutional Network for Hyperspectral Single-Image Super-Resolution | Murat Karayaka et.al. | 2512.09546 | translate | read | null |
| 2025-12-10 | FunPhase: A Periodic Functional Autoencoder for Motion Generation via Phase Manifolds | Marco Pegoraro et.al. | 2512.09423 | translate | read | null |
| 2025-12-10 | LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations | Zhichao Yang et.al. | 2512.09271 | translate | read | null |
| 2025-12-10 | OmniPSD: Layered PSD Generation with Diffusion Transformer | Cheng Liu et.al. | 2512.09247 | translate | read | null |
| 2025-12-09 | Learning Patient-Specific Disease Dynamics with Latent Flow Matching for Longitudinal Imaging Generation | Hao Chen et.al. | 2512.09185 | translate | read | null |
| 2025-12-09 | SuperF: Neural Implicit Fields for Multi-Image Super-Resolution | Sander Riisøen Jyhne et.al. | 2512.09115 | translate | read | null |
| 2025-12-09 | Food Image Generation on Multi-Noun Categories | Xinyue Pan et.al. | 2512.09095 | translate | read | null |
| 2025-12-09 | AgentComp: From Agentic Reasoning to Compositional Mastery in Text-to-Image Models | Arman Zarei et.al. | 2512.09081 | translate | read | null |
| 2025-12-06 | An Efficient Test-Time Scaling Approach for Image Generation | Vignesh Sundaresha et.al. | 2512.08985 | translate | read | null |
| 2025-12-09 | OSMO: Open-Source Tactile Glove for Human-to-Robot Skill Transfer | Jessica Yin et.al. | 2512.08920 | translate | read | null |
| 2025-12-09 | Differentially Private Synthetic Data Generation Using Context-Aware GANs | Anantaa Kotal et.al. | 2512.08869 | translate | read | null |
| 2025-12-09 | CARLoS: Retrieval via Concise Assessment Representation of LoRAs at Scale | Shahar Sarfaty et.al. | 2512.08826 | translate | read | null |
| 2025-12-09 | Refining Visual Artifacts in Diffusion Models via Explainable AI-based Flaw Activation Maps | Seoyeon Lee et.al. | 2512.08774 | translate | read | null |
| 2025-12-09 | Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation | Young Kyung Kim et.al. | 2512.08645 | translate | read | null |
| 2025-12-09 | A Novel Wasserstein Quaternion Generative Adversarial Network for Color Image Generation | Zhigang Jia et.al. | 2512.08542 | translate | read | null |
| 2025-12-09 | PaintFlow: A Unified Framework for Interactive Oil Paintings Editing and Generation | Zhangli Hu et.al. | 2512.08534 | translate | read | null |
| 2025-12-09 | Beyond the Noise: Aligning Prompts with Latent Representations in Diffusion Models | Vasco Ramos et.al. | 2512.08505 | translate | read | null |
| 2025-12-09 | Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging | Yi Pan et.al. | 2512.08365 | translate | read | null |
| 2025-12-09 | SCU-CGAN: Enhancing Fire Detection through Synthetic Fire Image Generation and Dataset Augmentation | Ju-Young Kim et.al. | 2512.08362 | translate | read | null |
| 2025-12-09 | Interpreting Structured Perturbations in Image Protection Methods for Diffusion Models | Michael R. Martin et.al. | 2512.08329 | translate | read | null |
| 2025-12-09 | OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation | Yexin Liu et.al. | 2512.08294 | translate | read | null |
| 2025-12-09 | FlowSteer: Conditioning Flow Field for Consistent Image Restoration | Tharindu Wickremasinghe et.al. | 2512.08125 | translate | read | null |
| 2025-12-08 | One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation | Yuan Gao et.al. | 2512.07829 | translate | read | null |
| 2025-12-08 | Distribution Matching Variational AutoEncoder | Sen Ye et.al. | 2512.07778 | translate | read | null |
| 2025-12-08 | Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment | Sangha Park et.al. | 2512.07702 | translate | read | null |
| 2025-12-08 | DIST-CLIP: Arbitrary Metadata and Image Guided MRI Harmonization via Disentangled Anatomy-Contrast Representations | Mehmet Yigit Avci et.al. | 2512.07674 | translate | read | null |
| 2025-12-08 | LongCat-Image Technical Report | Meituan LongCat Team et.al. | 2512.07584 | translate | read | null |
| 2025-12-08 | SJD++: Improved Speculative Jacobi Decoding for Training-free Acceleration of Discrete Auto-regressive Text-to-Image Generation | Yao Teng et.al. | 2512.07503 | translate | read | null |
| 2025-12-08 | MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition | Xinyu Wei et.al. | 2512.07348 | translate | read | null |
| 2025-12-08 | DGGAN: Degradation Guided Generative Adversarial Network for Real-time Endoscopic Video Enhancement | Handing Xu et.al. | 2512.07253 | translate | read | null |
| 2025-12-08 | Generating Storytelling Images with Rich Chains-of-Reasoning | Xiujie Song et.al. | 2512.07198 | translate | read | null |
| 2025-12-07 | Evaluating and Preserving High-level Fidelity in Super-Resolution | Josep M. Rocafort et.al. | 2512.07037 | translate | read | null |
| 2025-12-07 | Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods | Panagiota Kiourti et.al. | 2512.06665 | translate | read | null |
| 2025-12-07 | Masked Autoencoder Pretraining on Strong-Lensing Images for Joint Dark-Matter Model Classification and Super-Resolution | Achmad Ardani Prasha et.al. | 2512.06642 | translate | read | null |
| 2025-12-06 | Generic visuality of war? How image-generative AI models (mis)represent Russia’s war against Ukraine | Mykola Makhortykh et.al. | 2512.06570 | translate | read | null |
| 2025-12-06 | SUGAR: A Sweeter Spot for Generative Unlearning of Many Identities | Dung Thuy Nguyen et.al. | 2512.06562 | translate | read | null |
| 2025-12-06 | AGORA: Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars | Ramazan Fazylov et.al. | 2512.06438 | translate | read | null |
| 2025-12-06 | TreeQ: Pushing the Quantization Boundary of Diffusion Transformer via Tree-Structured Mixed-Precision Search | Kaicheng Yang et.al. | 2512.06353 | translate | read | null |
| 2025-12-04 | PrefGen: Multimodal Preference Learning for Preference-Conditioned Image Generation | Wenyi Mo et.al. | 2512.06020 | translate | read | null |
| 2025-12-05 | EditThinker: Unlocking Iterative Reasoning for Any Image Editor | Hongyu Li et.al. | 2512.05965 | translate | read | null |
| 2025-12-05 | Impugan: Learning Conditional Generative Models for Robust Data Imputation | Zalish Mahmud et.al. | 2512.05950 | translate | read | null |
| 2025-12-05 | Underwater Image Reconstruction Using a Swin Transformer-Based Generator and PatchGAN Discriminator | Md. Mahbub Hasan Akash et.al. | 2512.05866 | translate | read | null |
| 2025-12-05 | HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion Models | Shizhuo Mao et.al. | 2512.05746 | translate | read | null |
| 2025-12-05 | China Regional 3km Downscaling Based on Residual Corrective Diffusion Model | Honglu Sun et.al. | 2512.05377 | translate | read | null |
| 2025-12-04 | CARD: Correlation Aware Restoration with Diffusion | Niki Nezakati et.al. | 2512.05268 | translate | read | null |
| 2025-12-04 | Invariance Co-training for Robot Visual Generalization | Jonathan Yang et.al. | 2512.05230 | translate | read | null |
| 2025-12-03 | EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer Models | Kun Wang et.al. | 2512.05152 | translate | read | null |
| 2025-12-04 | DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation | Dongzhi Jiang et.al. | 2512.05112 | translate | read | null |
| 2025-12-04 | NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation | Yu Zeng et.al. | 2512.05106 | translate | read | null |
| 2025-12-04 | Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding | Abhigyan Bhattacharya et.al. | 2512.05039 | translate | read | null |
| 2025-12-04 | Generative Neural Video Compression via Video Diffusion Prior | Qi Mao et.al. | 2512.05016 | translate | read | null |
| 2025-12-04 | Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models | NaHyeon Park et.al. | 2512.04981 | translate | read | null |
| 2025-12-04 | Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens | Ziran Qin et.al. | 2512.04857 | translate | read | null |
| 2025-12-04 | FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis | Shijie Chen et.al. | 2512.04830 | translate | read | null |
| 2025-12-04 | LaFiTe: A Generative Latent Field for 3D Native Texturing | Chia-Hao Chen et.al. | 2512.04786 | translate | read | null |
| 2025-12-02 | PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling | Bowen Ping et.al. | 2512.04784 | translate | read | null |
| 2025-12-04 | Multi Task Denoiser Training for Solving Linear Inverse Problems | Clément Bled et.al. | 2512.04709 | translate | read | null |
| 2025-12-04 | OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-Resolution | Xinning Chai et.al. | 2512.04699 | translate | read | null |
| 2025-12-04 | Controllable Long-term Motion Generation with Extended Joint Targets | Eunjong Lee et.al. | 2512.04487 | translate | read | null |
| 2025-12-04 | Not All Birds Look The Same: Identity-Preserving Generation For Birds | Aaron Sun et.al. | 2512.04485 | translate | read | null |
| 2025-12-04 | Adversarial Limits of Quantum Certification: When Eve Defeats Detection | Davut Emre Tasar et.al. | 2512.04391 | translate | read | null |
| 2025-12-04 | FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring | Geunhyuk Youk et.al. | 2512.04390 | translate | read | null |
| 2025-12-03 | Learning Single-Image Super-Resolution in the JPEG Compressed Domain | Sruthi Srinivasan et.al. | 2512.04284 | translate | read | null |
| 2025-12-03 | Plug-and-Play Image Restoration with Flow Matching: A Continuous Viewpoint | Fan Jia et.al. | 2512.04283 | translate | read | null |
| 2025-12-03 | UniLight: A Unified Representation for Lighting | Zitian Zhang et.al. | 2512.04267 | translate | read | null |
| 2025-12-03 | Fast & Efficient Normalizing Flows and Applications of Image Generative Models | Sandeep Nagar et.al. | 2512.04039 | translate | read | null |
| 2025-12-03 | Beyond the Ground Truth: Enhanced Supervision for Image Restoration | Donghun Ryou et.al. | 2512.03932 | translate | read | null |
| 2025-12-03 | LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling | Hong-Kai Zheng et.al. | 2512.03796 | translate | read | null |
| 2025-12-03 | Evaluation of Foundational Machine Learned Interatomic Potentials for Migration Barrier Predictions | Achinthya Krishna Bheemaguli et.al. | 2512.03642 | translate | read | null |
| 2025-12-03 | CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation | Ruoxuan Zhang et.al. | 2512.03540 | translate | read | null |
| 2025-12-03 | Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models | Shojiro Yamabe et.al. | 2512.03463 | translate | read | null |
| 2025-12-02 | PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement | Haitian Zheng et.al. | 2512.03247 | translate | read | null |
| 2025-12-02 | Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models | Xiwen Wei et.al. | 2512.03125 | translate | read | null |
| 2025-12-02 | DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation | Ying Yang et.al. | 2512.02931 | translate | read | null |
| 2025-12-02 | Glance: Accelerating Diffusion Models with 1 Sample | Zhuobai Dong et.al. | 2512.02899 | translate | read | null |
| 2025-12-02 | Leveraging generative adversarial networks with spatially adaptive denormalization for multivariate stochastic seismic data inversion | Roberto Miele et.al. | 2512.02863 | translate | read | null |
| 2025-12-01 | PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation | Fan Wu et.al. | 2512.02794 | translate | read | null |
| 2025-12-02 | Channel Knowledge Map Construction via Physics-Inspired Diffusion Model Without Prior Observations | Yunzhe Zhu et.al. | 2512.02757 | translate | read | null |
| 2025-12-02 | Training Data Attribution for Image Generation using Ontology-Aligned Knowledge Graphs | Theodoros Aivalis et.al. | 2512.02713 | translate | read | null |
| 2025-12-02 | PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-Resolution | Zhongbao Yang et.al. | 2512.02681 | translate | read | null |
| 2025-12-02 | OmniPerson: Unified Identity-Preserving Pedestrian Generation | Changxiao Ma et.al. | 2512.02554 | translate | read | null |
| 2025-12-02 | Two-Stage Vision Transformer for Image Restoration: Colorization Pretraining + Residual Upsampling | Aditya Chaudhary et.al. | 2512.02512 | translate | read | null |
| 2025-12-02 | Bayesian Physics-Informed Neural Networks for Inverse Problems (BPINN-IP): Application in Infrared Image Processing | Ali Mohammad-Djafari et.al. | 2512.02495 | translate | read | null |
| 2025-12-02 | ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion Generation | Kerui Chen et.al. | 2512.02453 | translate | read | null |
| 2025-12-02 | SAGE: Style-Adaptive Generalization for Privacy-Constrained Semantic Segmentation Across Domains | Qingmei Li et.al. | 2512.02369 | translate | read | null |
| 2025-12-01 | Progressive Image Restoration via Text-Conditioned Video Generation | Peng Kang et.al. | 2512.02273 | translate | read | null |
| 2025-12-01 | Visible to Longwave-infrared imaging via an inverse-designed monolithic lens | Syed N. Qadri et.al. | 2512.02184 | translate | read | null |
| 2025-12-01 | SplatSuRe: Selective Super-Resolution for Multi-view Consistent 3D Gaussian Splatting | Pranav Asthana et.al. | 2512.02172 | translate | read | null |
| 2025-12-01 | FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges | Kevin David Hayes et.al. | 2512.02161 | translate | read | null |
| 2025-12-01 | Data-Centric Visual Development for Self-Driving Labs | Anbang Liu et.al. | 2512.02018 | translate | read | null |
| 2025-12-01 | Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights | Juanxi Tian et.al. | 2512.01816 | translate | read | null |
| 2025-12-01 | ViT $^3$ : Unlocking Test-Time Training in Vision | Dongchen Han et.al. | 2512.01643 | translate | read | null |
| 2025-12-01 | LLM2Fx-Tools: Tool Calling For Music Post-Production | Seungheon Doh et.al. | 2512.01559 | translate | read | null |
| 2025-12-01 | ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers | Yiyang Ma et.al. | 2512.01426 | translate | read | null |
| 2025-12-01 | FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution | Seungho Choi et.al. | 2512.01390 | translate | read | null |
| 2025-12-01 | FOD-S2R: A FOD Dataset for Sim2Real Transfer Learning based Object Detection | Ashish Vashist et.al. | 2512.01315 | translate | read | null |
| 2025-12-01 | Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation | Zirui Zhao et.al. | 2512.01242 | translate | read | null |
| 2025-12-01 | PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards | Shulei Wang et.al. | 2512.01236 | translate | read | null |
(<a href=../Image_Generation.md>back to Image Generation</a>)