Image Generation - 2025-03
Image Generation - 2025-03
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-03-31 | RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy | Zhonghan Zhao et.al. | 2503.24388 | translate | read | null |
| 2025-03-31 | Consistent Subject Generation via Contrastive Instantiated Concepts | Lee Hsin-Ying et.al. | 2503.24387 | translate | read | null |
| 2025-03-31 | ERUPT: Efficient Rendering with Unposed Patch Transformer | Maxim V. Shugaev et.al. | 2503.24374 | translate | read | null |
| 2025-03-31 | Style Quantization for Data-Efficient GAN Training | Jian Wang et.al. | 2503.24282 | translate | read | null |
| 2025-03-31 | FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics | Yixuan Li et.al. | 2503.24267 | translate | read | null |
| 2025-03-31 | Beyond a Single Mode: GAN Ensembles for Diverse Medical Data Generation | Lorenzo Tronchin et.al. | 2503.24258 | translate | read | link |
| 2025-03-31 | Threats and Opportunities in AI-generated Images for Armed Forces | Raphael Meier et.al. | 2503.24095 | translate | read | null |
| 2025-03-31 | AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents | Jiaxiang Chen et.al. | 2503.23948 | translate | read | null |
| 2025-03-31 | Semantic Packet Aggregation and Repeated Transmission for Text-to-Image Generation | Seunghun Lee et.al. | 2503.23734 | translate | read | null |
| 2025-03-28 | Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure | Frank J. Brooks et.al. | 2503.22658 | translate | read | null |
| 2025-03-28 | RELD: Regularization by Latent Diffusion Models for Image Restoration | Pasquale Cascarano et.al. | 2503.22563 | translate | read | null |
| 2025-03-28 | Deterministic Medical Image Translation via High-fidelity Brownian Bridges | Qisheng He et.al. | 2503.22531 | translate | read | null |
| 2025-03-28 | Meta-LoRA: Meta-Learning LoRA Components for Domain-Aware ID Personalization | Barış Batuhan Topal et.al. | 2503.22352 | translate | read | null |
| 2025-03-28 | Semantix: An Energy Guided Sampler for Semantic Style Transfer | Huiang He et.al. | 2503.22344 | translate | read | null |
| 2025-03-28 | ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting | Wenjie Liu et.al. | 2503.22218 | translate | read | null |
| 2025-03-28 | Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces | Wonhyeok Choi et.al. | 2503.22209 | translate | read | null |
| 2025-03-28 | ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation | Yunhong Min et.al. | 2503.22194 | translate | read | null |
| 2025-03-28 | Sell It Before You Make It: Revolutionizing E-Commerce with Personalized AI-Generated Items | Jianghao Lin et.al. | 2503.22182 | translate | read | null |
| 2025-03-28 | An Empirical Study of Validating Synthetic Data for Text-Based Person Retrieval | Min Cao et.al. | 2503.22171 | translate | read | null |
| 2025-03-27 | Optimal Stepsize for Diffusion Sampling | Jianning Pei et.al. | 2503.21774 | translate | read | link |
| 2025-03-27 | Lumina-Image 2.0: A Unified and Efficient Image Generative Framework | Qi Qin et.al. | 2503.21758 | translate | read | link |
| 2025-03-27 | A Unified Framework for Diffusion Bridge Problems: Flow Matching and Schrödinger Matching into One | Minyoung Kim et.al. | 2503.21756 | translate | read | null |
| 2025-03-27 | LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis | Shitian Zhao et.al. | 2503.21749 | translate | read | link |
| 2025-03-27 | CTRL-O: Language-Controllable Object-Centric Visual Representation Learning | Aniket Didolkar et.al. | 2503.21747 | translate | read | null |
| 2025-03-27 | 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models | Yuhan Zhang et.al. | 2503.21745 | translate | read | null |
| 2025-03-27 | Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance | Jaywon Koo et.al. | 2503.21721 | translate | read | null |
| 2025-03-27 | Zero-Shot Visual Concept Blending Without Text Guidance | Hiroya Makino et.al. | 2503.21277 | translate | read | null |
| 2025-03-27 | UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning | Hongxuan Tang et.al. | 2503.21193 | translate | read | null |
| 2025-03-27 | Model as a Game: On Numerical and Spatial Consistency for Generative Games | Jingye Chen et.al. | 2503.21172 | translate | read | null |
| 2025-03-26 | High Quality Diffusion Distillation on a Single GPU with Relative and Absolute Position Matching | Guoqiang Zhang et.al. | 2503.20744 | translate | read | null |
| 2025-03-26 | RecTable: Fast Modeling Tabular Data with Rectified Flow | Masane Fuchi et.al. | 2503.20731 | translate | read | link |
| 2025-03-26 | BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation | Yuyang Peng et.al. | 2503.20672 | translate | read | link |
| 2025-03-26 | MMGen: Unified Multi-modal Image Generation and Understanding in One Go | Jiepeng Wang et.al. | 2503.20644 | translate | read | null |
| 2025-03-26 | Pluggable Style Representation Learning for Multi-Style Transfer | Hongda Liu et.al. | 2503.20368 | translate | read | link |
| 2025-03-26 | Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models | Alex Jinpeng Wang et.al. | 2503.20198 | translate | read | null |
| 2025-03-26 | AvatarArtist: Open-Domain 4D Avatarization | Hongyu Liu et.al. | 2503.19906 | translate | read | link |
| 2025-03-25 | Scaling Down Text Encoders of Text-to-Image Diffusion Models | Lifu Wang et.al. | 2503.19897 | translate | read | null |
| 2025-03-26 | In the Blink of an Eye: Instant Game Map Editing using a Generative-AI Smart Brush | Vitaly Gnatyuk et.al. | 2503.19793 | translate | read | null |
| 2025-03-25 | SITA: Structurally Imperceptible and Transferable Adversarial Attacks for Stylized Image Generation | Jingdan Kang et.al. | 2503.19791 | translate | read | null |
| 2025-03-25 | Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models | Kartik Thakral et.al. | 2503.19783 | translate | read | null |
| 2025-03-25 | PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models | Junhyuk So et.al. | 2503.19731 | translate | read | null |
| 2025-03-25 | VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models | Suhas G Hegde et.al. | 2503.19530 | translate | read | null |
| 2025-03-25 | Exploring Disentangled and Controllable Human Image Synthesis: From End-to-End to Stage-by-Stage | Zhengwentai Sun et.al. | 2503.19486 | translate | read | null |
| 2025-03-25 | Interpretable Generative Models through Post-hoc Concept Bottlenecks | Akshay Kulkarni et.al. | 2503.19377 | translate | read | link |
| 2025-03-25 | Efficient Adversarial Detection Frameworks for Vehicle-to-Microgrid Services in Edge Computing | Ahmed Omara et.al. | 2503.19318 | translate | read | null |
| 2025-03-24 | Equivariant Image Modeling | Ruixiao Dong et.al. | 2503.18948 | translate | read | link |
| 2025-03-24 | Training-free Diffusion Acceleration with Bottleneck Sampling | Ye Tian et.al. | 2503.18940 | translate | read | link |
| 2025-03-24 | SKDU at De-Factify 4.0: Vision Transformer with Data Augmentation for AI-Generated Image Detection | Shrikant Malviya et.al. | 2503.18812 | translate | read | null |
| 2025-03-24 | Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation | Qin Wang et.al. | 2503.18753 | translate | read | null |
| 2025-03-24 | Boosting Resolution Generalization of Diffusion Transformers with Randomized Positional Encodings | Cong Liu et.al. | 2503.18719 | translate | read | null |
| 2025-03-24 | Adventurer: Exploration with BiGAN for Deep Reinforcement Learning | Yongshuai Liu et.al. | 2503.18612 | translate | read | null |
| 2025-03-24 | Anchor-based oversampling for imbalanced tabular data via contrastive and adversarial learning | Hadi Mohammadi et.al. | 2503.18569 | translate | read | null |
| 2025-03-24 | Advancing Cross-Organ Domain Generalization with Test-Time Style Transfer and Diversity Enhancement | Biwen Meng et.al. | 2503.18567 | translate | read | null |
| 2025-03-24 | Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models | Bin Li et.al. | 2503.18556 | translate | read | link |
| 2025-03-24 | PALATE: Peculiar Application of the Law of Total Expectation to Enhance the Evaluation of Deep Generative Models | Tadeusz Dziarmaga et.al. | 2503.18462 | translate | read | null |
| 2025-03-21 | Vision Transformer Based Semantic Communications for Next Generation Wireless Networks | Muhammad Ahmed Mohsin et.al. | 2503.17275 | translate | read | null |
| 2025-03-21 | Leveraging Text-to-Image Generation for Handling Spurious Correlation | Aryan Yazdan Parast et.al. | 2503.17226 | translate | read | null |
| 2025-03-21 | Generative adversarial framework to calibrate excursion set models for the 3D morphology of all-solid-state battery cathodes | Orkun Furat et.al. | 2503.17171 | translate | read | null |
| 2025-03-21 | D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens | Panpan Wang et.al. | 2503.17155 | translate | read | null |
| 2025-03-21 | HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks | Ekaterina Dmitrieva et.al. | 2503.17141 | translate | read | null |
| 2025-03-21 | Halton Scheduler For Masked Generative Image Transformer | Victor Besnier et.al. | 2503.17076 | translate | read | link |
| 2025-03-21 | Zero-Shot Styled Text Image Generation, but Make It Autoregressive | Vittorio Pippi et.al. | 2503.17074 | translate | read | link |
| 2025-03-21 | DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech | Yongkang Cheng et.al. | 2503.17059 | translate | read | null |
| 2025-03-21 | Multiple Ultrasound Image Generation based on Tuned Alignment of Amplitude Hologram over Spatially non-Uniform Ultrasound Source | Keisuke Hasegawa et.al. | 2503.16949 | translate | read | null |
| 2025-03-21 | When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO | Lingfan Zhang et.al. | 2503.16921 | translate | read | null |
| 2025-03-20 | Tokenize Image as a Set | Zigang Geng et.al. | 2503.16425 | translate | read | link |
| 2025-03-20 | SynCity: Training-Free Generation of 3D Worlds | Paul Engstler et.al. | 2503.16420 | translate | read | link |
| 2025-03-20 | InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity | Liming Jiang et.al. | 2503.16418 | translate | read | link |
| 2025-03-20 | VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness | SeungJu Cha et.al. | 2503.16406 | translate | read | null |
| 2025-03-20 | LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images | Leyang Wang et.al. | 2503.16376 | translate | read | null |
| 2025-03-20 | Ultra-Resolution Adaptation with Ease | Ruonan Yu et.al. | 2503.16322 | translate | read | link |
| 2025-03-20 | Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction | Ziyao Guo et.al. | 2503.16194 | translate | read | link |
| 2025-03-20 | FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing | Tianyi Wei et.al. | 2503.16153 | translate | read | null |
| 2025-03-20 | Multi-Prompt Style Interpolation for Fine-Grained Artistic Control | Lei Chen et.al. | 2503.16133 | translate | read | null |
| 2025-03-20 | Controllable Segmentation-Based Text-Guided Style Editing | Jingwen Li et.al. | 2503.16129 | translate | read | null |
| 2025-03-19 | FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers | Ruichen Chen et.al. | 2503.15465 | translate | read | null |
| 2025-03-19 | Di $\mathtt{[M]}$ O: Distilling Masked Diffusion Models into One-step Generator | Yuanzhi Zhu et.al. | 2503.15457 | translate | read | null |
| 2025-03-19 | Visual Persona: Foundation Model for Full-Body Human Customization | Jisu Nam et.al. | 2503.15406 | translate | read | null |
| 2025-03-19 | TruthLens:A Training-Free Paradigm for DeepFake Detection | Ritabrata Chakraborty et.al. | 2503.15342 | translate | read | null |
| 2025-03-19 | TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models | Teng-Fang Hsiao et.al. | 2503.15283 | translate | read | null |
| 2025-03-19 | LEGION: Learning to Ground and Explain for Synthetic Image Detection | Hengrui Kang et.al. | 2503.15264 | translate | read | link |
| 2025-03-19 | Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization | Feifei Li et.al. | 2503.15197 | translate | read | null |
| 2025-03-19 | Volumetric Reconstruction From Partial Views for Task-Oriented Grasping | Fujian Yan et.al. | 2503.15167 | translate | read | null |
| 2025-03-19 | Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis | Imanol G. Estepa et.al. | 2503.15060 | translate | read | null |
| 2025-03-19 | Texture-Aware StarGAN for CT data harmonisation | Francesco Di Feola et.al. | 2503.15058 | translate | read | null |
| 2025-03-18 | Deeply Supervised Flow-Based Generative Models | Inkyu Shin et.al. | 2503.14494 | translate | read | null |
| 2025-03-18 | DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers | Minglei Shi et.al. | 2503.14487 | translate | read | null |
| 2025-03-18 | ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing | Yulin Pan et.al. | 2503.14482 | translate | read | null |
| 2025-03-18 | RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment | Chao Wang et.al. | 2503.14358 | translate | read | null |
| 2025-03-18 | Free-Lunch Color-Texture Disentanglement for Stylized Image Generation | Jiang Qin et.al. | 2503.14275 | translate | read | null |
| 2025-03-18 | DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection | Jaewoo Song et.al. | 2503.13985 | translate | read | null |
| 2025-03-18 | SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model | Xinqing Li et.al. | 2503.13952 | translate | read | null |
| 2025-03-18 | Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection | Chunlei Li et.al. | 2503.13828 | translate | read | null |
| 2025-03-18 | VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences | Anukriti Singh et.al. | 2503.13817 | translate | read | null |
| 2025-03-17 | Unified Autoregressive Visual Generation and Understanding with Continuous Tokens | Lijie Fan et.al. | 2503.13436 | translate | read | null |
| 2025-03-17 | BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing | Yaowei Li et.al. | 2503.13434 | translate | read | link |
| 2025-03-17 | RainScaleGAN: a Conditional Generative Adversarial Network for Rainfall Downscaling | Marcello Iotti et.al. | 2503.13316 | translate | read | null |
| 2025-03-17 | MAME: Multidimensional Adaptive Metamer Exploration with Human Perceptual Feedback | Mina Kamao et.al. | 2503.13212 | translate | read | null |
| 2025-03-17 | MedLoRD: A Medical Low-Resource Diffusion Model for High-Resolution 3D CT Image Synthesis | Marvin Seyfarth et.al. | 2503.13211 | translate | read | null |
| 2025-03-17 | Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation | Yihong Luo et.al. | 2503.13070 | translate | read | link |
| 2025-03-17 | FNSE-SBGAN: Far-field Speech Enhancement with Schrodinger Bridge and Generative Adversarial Networks | Tong Lei et.al. | 2503.12936 | translate | read | null |
| 2025-03-17 | DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models | Dewei Zhou et.al. | 2503.12885 | translate | read | link |
| 2025-03-17 | Optimizing Ansatz Design in Quantum Generative Adversarial Networks Using Large Language Models | Kento Ueda et.al. | 2503.12884 | translate | read | null |
| 2025-03-17 | High-Resolution Range-Doppler Imaging from One-Bit PMCW Radar via Generative Adversarial Networks | Jingxian Wang et.al. | 2503.12841 | translate | read | null |
| 2025-03-14 | T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation | Seyed Mohammad Hadi Hosseini et.al. | 2503.11481 | translate | read | link |
| 2025-03-14 | MTV-Inpaint: Multi-Task Long Video Inpainting | Shiyuan Yang et.al. | 2503.11412 | translate | read | link |
| 2025-03-14 | Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking | Ziyi Wang et.al. | 2503.11324 | translate | read | null |
| 2025-03-14 | Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards | Zijing Hu et.al. | 2503.11240 | translate | read | link |
| 2025-03-14 | Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption | Du Chen et.al. | 2503.11221 | translate | read | link |
| 2025-03-14 | Simulating Dual-Pixel Images From Ray Tracing For Depth Estimation | Fengchen He et.al. | 2503.11213 | translate | read | null |
| 2025-03-14 | Provenance Detection for AI-Generated Images: Combining Perceptual Hashing, Homomorphic Encryption, and AI Detection Models | Shree Singhi et.al. | 2503.11195 | translate | read | null |
| 2025-03-14 | Direction-Aware Diagonal Autoregressive Image Generation | Yijia Xu et.al. | 2503.11129 | translate | read | null |
| 2025-03-14 | Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models | Zhenguang Liu et.al. | 2503.11071 | translate | read | null |
| 2025-03-14 | Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization | Kyle Sargent et.al. | 2503.11056 | translate | read | null |
| 2025-03-13 | GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing | Rongyao Fang et.al. | 2503.10639 | translate | read | link |
| 2025-03-13 | DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation | Chen Chen et.al. | 2503.10618 | translate | read | null |
| 2025-03-13 | ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer | Bolin Chen et.al. | 2503.10614 | translate | read | link |
| 2025-03-13 | Autoregressive Image Generation with Randomized Parallel Decoding | Haopeng Li et.al. | 2503.10568 | translate | read | link |
| 2025-03-13 | RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation | Yuwen Du et.al. | 2503.10410 | translate | read | link |
| 2025-03-13 | RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models | Yijing Lin et.al. | 2503.10406 | translate | read | link |
| 2025-03-13 | ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation | Zirun Guo et.al. | 2503.10358 | translate | read | null |
| 2025-03-13 | Do I look like a cat.n.01 to you? A Taxonomy Image Generation Benchmark |
Viktor Moskvoretskii et.al. | 2503.10357 | translate | read | null |
| 2025-03-13 | MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment | Hao Zhou et.al. | 2503.10287 | translate | read | null |
| 2025-03-13 | PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models | Runze He et.al. | 2503.10127 | translate | read | null |
| 2025-03-12 | FCaS: Fine-grained Cardiac Image Synthesis based on 3D Template Conditional Diffusion Model | Jiahao Xia et.al. | 2503.09560 | translate | read | null |
| 2025-03-12 | DAMM-Diffusion: Learning Divergence-Aware Multi-Modal Diffusion Model for Nanoparticles Distribution Prediction | Junjie Zhou et.al. | 2503.09491 | translate | read | null |
| 2025-03-12 | PromptMap: An Alternative Interaction Style for AI-Based Image Generation | Krzysztof Adamkiewicz et.al. | 2503.09436 | translate | read | null |
| 2025-03-12 | LHC Triggers using FPGA Image Recognition | James Brooke et.al. | 2503.09428 | translate | read | null |
| 2025-03-12 | Revealing the Implicit Noise-based Imprint of Generative Models | Xinghan Li et.al. | 2503.09314 | translate | read | null |
| 2025-03-12 | Revealing Unintentional Information Leakage in Low-Dimensional Facial Portrait Representations | Kathleen Anderson et.al. | 2503.09306 | translate | read | null |
| 2025-03-12 | UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer | Haoxuan Wang et.al. | 2503.09277 | translate | read | null |
| 2025-03-12 | NAMI: Efficient Image Generation via Progressive Rectified Flow Transformers | Yuhang Ma et.al. | 2503.09242 | translate | read | null |
| 2025-03-12 | Active Learning Inspired ControlNet Guidance for Augmenting Semantic Segmentation Datasets | Hannah Kniesel et.al. | 2503.09221 | translate | read | null |
| 2025-03-12 | WonderVerse: Extendable 3D Scene Generation with Video Generative Models | Hao Feng et.al. | 2503.09160 | translate | read | null |
| 2025-03-11 | GarmentCrafter: Progressive Novel View Synthesis for Single-View 3D Garment Reconstruction and Editing | Yuanhao Wang et.al. | 2503.08678 | translate | read | null |
| 2025-03-11 | Generating Robot Constitutions & Benchmarks for Semantic Safety | Pierre Sermanet et.al. | 2503.08663 | translate | read | link |
| 2025-03-11 | YuE: Scaling Open Foundation Models for Long-Form Music Generation | Ruibin Yuan et.al. | 2503.08638 | translate | read | link |
| 2025-03-11 | LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization | Xianfeng Wu et.al. | 2503.08619 | translate | read | link |
| 2025-03-11 | CellStyle: Improved Zero-Shot Cell Segmentation via Style Transfer | Rüveyda Yilmaz et.al. | 2503.08603 | translate | read | null |
| 2025-03-11 | DISTINGUISH Workflow: A New Paradigm of Dynamic Well Placement Using Generative Machine Learning | Sergey Alyaev et.al. | 2503.08509 | translate | read | null |
| 2025-03-11 | Generalizable AI-Generated Image Detection Based on Fractal Self-Similarity in the Spectrum | Shengpeng Xiao et.al. | 2503.08484 | translate | read | null |
| 2025-03-11 | GAS-NeRF: Geometry-Aware Stylization of Dynamic Radiance Fields | Nhat Phuong Anh Vu et.al. | 2503.08483 | translate | read | null |
| 2025-03-11 | DyArtbank: Diverse Artistic Style Transfer via Pre-trained Stable Diffusion and Dynamic Style Prompt Artbank | Zhanjie Zhang et.al. | 2503.08392 | translate | read | null |
| 2025-03-11 | Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens | Qingsong Xie et.al. | 2503.08377 | translate | read | link |
| 2025-03-10 | V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation | Guiwei Zhang et.al. | 2503.07493 | translate | read | null |
| 2025-03-10 | NeAS: 3D Reconstruction from X-ray Images using Neural Attenuation Surface | Chengrui Zhu et.al. | 2503.07491 | translate | read | null |
| 2025-03-10 | GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models | Ryugo Morita et.al. | 2503.07463 | translate | read | null |
| 2025-03-10 | PersonaBooth: Personalized Text-to-Motion Generation | Boeun Kim et.al. | 2503.07390 | translate | read | null |
| 2025-03-10 | TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models | Ruidong Chen et.al. | 2503.07389 | translate | read | link |
| 2025-03-10 | Inversion-Free Video Style Transfer with Trajectory Reset Attention Control and Content-Style Bridging | Jiang Lin et.al. | 2503.07363 | translate | read | null |
| 2025-03-10 | Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment | Xing Xie et.al. | 2503.07334 | translate | read | link |
| 2025-03-10 | AttenST: A Training-Free Attention-Driven Style Transfer Framework with Pre-Trained Diffusion Models | Bo Huang et.al. | 2503.07307 | translate | read | link |
| 2025-03-10 | WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation | Yuwei Niu et.al. | 2503.07265 | translate | read | link |
| 2025-03-10 | Synthetic Lung X-ray Generation through Cross-Attention and Affinity Transformation | Ruochen Pi et.al. | 2503.07209 | translate | read | null |
| 2025-03-10 | Effective and Efficient Masked Image Generation Models | Zebin You et.al. | 2503.07197 | translate | read | link |
| 2025-03-10 | MIRAM: Masked Image Reconstruction Across Multiple Scales for Breast Lesion Risk Prediction | Hung Q. Vo et.al. | 2503.07157 | translate | read | null |
| 2025-03-07 | VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control | Yuxuan Bian et.al. | 2503.05639 | translate | read | link |
| 2025-03-07 | Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models | Zheng Li et.al. | 2503.05595 | translate | read | null |
| 2025-03-07 | PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations? | Martin Spitznagel et.al. | 2503.05333 | translate | read | link |
| 2025-03-07 | Frequency Autoregressive Image Generation with Continuous Tokens | Hu Yu et.al. | 2503.05305 | translate | read | null |
| 2025-03-07 | Unified Reward Model for Multimodal Understanding and Generation | Yibin Wang et.al. | 2503.05236 | translate | read | link |
| 2025-03-07 | RecipeGen: A Benchmark for Real-World Recipe Image Generation | Ruoxuan Zhang et.al. | 2503.05228 | translate | read | null |
| 2025-03-07 | Development and Enhancement of Text-to-Image Diffusion Models | Rajdeep Roshan Sahu et.al. | 2503.05149 | translate | read | null |
| 2025-03-07 | Accelerated Patient-specific Non-Cartesian MRI Reconstruction using Implicit Neural Representations | Di Xu et.al. | 2503.05051 | translate | read | null |
| 2025-03-06 | Quantum generative adversarial networks for gluon initiated jets generation | Rey Guadarrama et.al. | 2503.05044 | translate | read | null |
| 2025-03-06 | Iris Style Transfer: Enhancing Iris Recognition with Style Features and Privacy Preservation through Neural Style Transfer | Mengdi Wang et.al. | 2503.04707 | translate | read | null |
| 2025-03-06 | Gradient-descent methods for fast quantum state tomography | Akshay Gaikwad et.al. | 2503.04526 | translate | read | null |
| 2025-03-06 | IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement | Zhihao Shi et.al. | 2503.04501 | translate | read | null |
| 2025-03-06 | ObjMST: An Object-Focused Multimodal Style Transfer Framework | Chanda Grover Kamra et.al. | 2503.04353 | translate | read | null |
| 2025-03-06 | S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting | Yecong Wan et.al. | 2503.04314 | translate | read | null |
| 2025-03-06 | ControlFill: Spatially Adjustable Image Inpainting from Prompt Learning | Boseong Jeon et.al. | 2503.04268 | translate | read | null |
| 2025-03-06 | Synthetic Data is an Elegant GIFT for Continual Vision-Language Models | Bin Wu et.al. | 2503.04229 | translate | read | null |
| 2025-03-06 | Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models | Rui Jiang et.al. | 2503.04215 | translate | read | null |
| 2025-03-06 | SCSA: A Plug-and-Play Semantic Continuous-Sparse Attention for Arbitrary Semantic Style Transfer | Chunnan Shang et.al. | 2503.04119 | translate | read | null |
| 2025-03-06 | Underlying Semantic Diffusion for Effective and Efficient In-Context Learning | Zhong Ji et.al. | 2503.04050 | translate | read | null |
| 2025-03-05 | A Generative Approach to High Fidelity 3D Reconstruction from Text Data | Venkat Kumar R et.al. | 2503.03664 | translate | read | null |
| 2025-03-05 | Generative Artificial Intelligence in Robotic Manipulation: A Survey | Kun Zhang et.al. | 2503.03464 | translate | read | null |
| 2025-03-05 | GenColor: Generative Color-Concept Association in Visual Design | Yihan Hou et.al. | 2503.03236 | translate | read | null |
| 2025-03-05 | An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models | Binxu Wang et.al. | 2503.03206 | translate | read | null |
| 2025-03-05 | Find Matching Faces Based On Face Parameters | Setu A. Bhatt et.al. | 2503.03204 | translate | read | null |
| 2025-03-05 | From Architectural Sketch to Conceptual Representation: Using Structure-Aware Diffusion Model to Generate Renderings of School Buildings | Zhengyang Wang et.al. | 2503.03090 | translate | read | null |
| 2025-03-05 | Multi-View Depth Consistent Image Generation Using Generative AI Models: Application on Architectural Design of University Buildings | Xusheng Du et.al. | 2503.03068 | translate | read | null |
| 2025-03-04 | Can Diffusion Models Provide Rigorous Uncertainty Quantification for Bayesian Inverse Problems? | Evan Scope Crafts et.al. | 2503.03007 | translate | read | link |
| 2025-03-04 | Robust time series generation via Schrödinger Bridge: a comprehensive evaluation | Alexandre Alouadi et.al. | 2503.02943 | translate | read | null |
| 2025-03-04 | ARINAR: Bi-Level Autoregressive Feature-by-Feature Generative Models | Qinyu Zhao et.al. | 2503.02883 | translate | read | link |
| 2025-03-04 | Large-Angle Convergent-Beam Electron Diffraction Patterns via Conditional Generative Adversarial Networks | Joseph. J Webb et.al. | 2503.02852 | translate | read | null |
| 2025-03-04 | Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts | Marta Skreta et.al. | 2503.02819 | translate | read | link |
| 2025-03-04 | Undertrained Image Reconstruction for Realistic Degradation in Blind Image Super-Resolution | Ru Ito et.al. | 2503.02767 | translate | read | null |
| 2025-03-04 | Generative Modeling of Microweather Wind Velocities for Urban Air Mobility | Tristan A. Shah et.al. | 2503.02690 | translate | read | null |
| 2025-03-04 | YARE-GAN: Yet Another Resting State EEG-GAN | Yeganeh Farahzadi et.al. | 2503.02636 | translate | read | null |
| 2025-03-04 | SPG: Improving Motion Diffusion by Smooth Perturbation Guidance | Boseong Jeon et.al. | 2503.02577 | translate | read | null |
| 2025-03-04 | PVTree: Realistic and Controllable Palm Vein Generation for Recognition Tasks | Sheng Shang et.al. | 2503.02547 | translate | read | null |
| 2025-03-04 | RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification | Zhen Yang et.al. | 2503.02537 | translate | read | link |
| 2025-03-04 | Q&C: When Quantization Meets Cache in Efficient Image Generation | Xin Ding et.al. | 2503.02508 | translate | read | null |
| 2025-03-03 | MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing | Xueyun Tian et.al. | 2502.21291 | translate | read | link |
(<a href=../Image_Generation.md>back to Image Generation</a>)