Image Generation - 2025-07
Image Generation - 2025-07
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2025-07-23 | Flow Matching Meets Biology and Life Science: A Survey | Zihao Li et.al. | 2507.17731 | translate | read | null |
| 2025-07-23 | Generalized Dual Discriminator GANs | Penukonda Naga Chandana et.al. | 2507.17684 | translate | read | null |
| 2025-07-23 | Attention (as Discrete-Time Markov) Chains | Yotam Erel et.al. | 2507.17657 | translate | read | null |
| 2025-07-23 | Dual-branch Prompting for Multimodal Machine Translation | Jie Wang et.al. | 2507.17588 | translate | read | null |
| 2025-07-23 | An h-space Based Adversarial Attack for Protection Against Few-shot Personalization | Xide Xu et.al. | 2507.17554 | translate | read | link |
| 2025-07-23 | Unsupervised anomaly detection using Bayesian flow networks: application to brain FDG PET in the context of Alzheimer’s disease | Hugues Roy et.al. | 2507.17486 | translate | read | null |
| 2025-07-23 | EndoGen: Conditional Autoregressive Endoscopic Video Generation | Xinyu Liu et.al. | 2507.17388 | translate | read | null |
| 2025-07-23 | PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image | Hyeongjin Nam et.al. | 2507.17332 | translate | read | null |
| 2025-07-23 | PolarAnything: Diffusion-based Polarimetric Image Synthesis | Kailong Zhang et.al. | 2507.17268 | translate | read | null |
| 2025-07-22 | Bringing Balance to Hand Shape Classification: Mitigating Data Imbalance Through Generative Models | Gaston Gustavo Rios et.al. | 2507.17008 | translate | read | null |
| 2025-07-22 | HarmonPaint: Harmonized Training-Free Diffusion Inpainting | Ying Li et.al. | 2507.16732 | translate | read | null |
| 2025-07-22 | Pyramid Hierarchical Masked Diffusion Model for Imaging Synthesis | Xiaojiao Xiao et.al. | 2507.16579 | translate | read | null |
| 2025-07-22 | Learning Text Styles: A Study on Transfer, Attribution, and Verification | Zhiqiang Hu et.al. | 2507.16530 | translate | read | null |
| 2025-07-22 | DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling | Boheng Li et.al. | 2507.16329 | translate | read | null |
| 2025-07-22 | Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning | Boheng Li et.al. | 2507.16302 | translate | read | null |
| 2025-07-22 | Edge-case Synthesis for Fisheye Object Detection: A Data-centric Perspective | Seunghyeon Kim et.al. | 2507.16254 | translate | read | null |
| 2025-07-22 | Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling | Chao Zhou et.al. | 2507.16240 | translate | read | null |
| 2025-07-22 | A Human-Centered Approach to Identifying Promises, Risks, & Challenges of Text-to-Image Generative AI in Radiology | Katelyn Morrison et.al. | 2507.16207 | translate | read | null |
| 2025-07-22 | LSSGen: Leveraging Latent Space Scaling in Flow and Diffusion for Efficient Text to Image Generation | Jyun-Ze Tang et.al. | 2507.16154 | translate | read | null |
| 2025-07-21 | Improving Personalized Image Generation through Social Context Feedback | Parul Gupta et.al. | 2507.16095 | translate | read | null |
| 2025-07-21 | Diffusion models for multivariate subsurface generation and efficient probabilistic inversion | Roberto Miele et.al. | 2507.15809 | translate | read | null |
| 2025-07-21 | Toward an event-level analysis of hadron structure using differential programming | Kevin Braga et.al. | 2507.15768 | translate | read | null |
| 2025-07-21 | A Practical Investigation of Spatially-Controlled Image Generation with Transformers | Guoxuan Xia et.al. | 2507.15724 | translate | read | null |
| 2025-07-21 | SustainDiffusion: Optimising the Social and Environmental Sustainability of Stable Diffusion Models | Giordano d’Aloisio et.al. | 2507.15663 | translate | read | null |
| 2025-07-21 | CylinderPlane: Nested Cylinder Representation for 3D-aware Image Generation | Ru Jia et.al. | 2507.15606 | translate | read | null |
| 2025-07-21 | Evaluating Text Style Transfer: A Nine-Language Benchmark for Text Detoxification | Vitaly Protasov et.al. | 2507.15557 | translate | read | null |
| 2025-07-21 | Improving Joint Embedding Predictive Architecture with Diffusion Noise | Yuping Qiu et.al. | 2507.15216 | translate | read | null |
| 2025-07-20 | Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR | Peirong Zhang et.al. | 2507.15085 | translate | read | null |
| 2025-07-20 | Deep Generative Models in Condition and Structural Health Monitoring: Opportunities, Limitations and Future Outlook | Xin Yang et.al. | 2507.15026 | translate | read | null |
| 2025-07-20 | Paired Image Generation with Diffusion-Guided Diffusion Models | Haoxuan Zhang et.al. | 2507.14833 | translate | read | null |
| 2025-07-18 | MoDyGAN: Combining Molecular Dynamics With GANs to Investigate Protein Conformational Space | Jingbo Liang et.al. | 2507.13950 | translate | read | null |
| 2025-07-18 | Converting T1-weighted MRI from 3T to 7T quality using deep learning | Malo Gicquel et.al. | 2507.13782 | translate | read | null |
| 2025-07-18 | Tackling fake images in cybersecurity – Interpretation of a StyleGAN and lifting its black-box | Julia Laubmann et.al. | 2507.13722 | translate | read | null |
| 2025-07-18 | PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement | Sofia Jamil et.al. | 2507.13708 | translate | read | null |
| 2025-07-18 | TexGS-VolVis: Expressive Scene Editing for Volume Visualization via Textured Gaussian Splatting | Kaiyuan Tang et.al. | 2507.13586 | translate | read | null |
| 2025-07-17 | FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization | Chuancheng Shi et.al. | 2507.13311 | translate | read | null |
| 2025-07-17 | Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection | Hongyang Zhao et.al. | 2507.13221 | translate | read | null |
| 2025-07-17 | SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks | Kutub Uddin et.al. | 2507.13170 | translate | read | null |
| 2025-07-17 | Multi-population GAN Training: Analyzing Co-Evolutionary Algorithms | Walter P. Casas et.al. | 2507.13157 | translate | read | null |
| 2025-07-17 | fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting | Alicia Durrer et.al. | 2507.13146 | translate | read | null |
| 2025-07-17 | Adversarial attacks to image classification systems using evolutionary algorithms | Sergio Nesmachnow et.al. | 2507.13136 | translate | read | null |
| 2025-07-17 | Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation | Yi Xin et.al. | 2507.13032 | translate | read | link |
| 2025-07-17 | A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints | Youssef Tawfilis et.al. | 2507.12979 | translate | read | null |
| 2025-07-17 | DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization | Dongyeun Lee et.al. | 2507.12933 | translate | read | null |
| 2025-07-17 | Local Representative Token Guided Merging for Text-to-Image Generation | Min-Jeong Lee et.al. | 2507.12771 | translate | read | null |
| 2025-07-16 | Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models | Samuel Lavoie et.al. | 2507.12318 | translate | read | null |
| 2025-07-16 | FADE: Adversarial Concept Erasure in Flow Models | Zixuan Fu et.al. | 2507.12283 | translate | read | null |
| 2025-07-16 | DeepShade: Enable Shade Simulation by Text-conditioned Image Generation | Longchao Da et.al. | 2507.12103 | translate | read | null |
| 2025-07-16 | FloGAN: Scenario-Based Urban Mobility Flow Generation via Conditional GANs and Dynamic Region Decoupling | Seanglidet Yean et.al. | 2507.12053 | translate | read | null |
| 2025-07-16 | ID-EA: Identity-driven Text Enhancement and Adaptation with Textual Inversion for Personalized Text-to-Image Generation | Hyun-Jun Jin et.al. | 2507.11990 | translate | read | null |
| 2025-07-16 | RaDL: Relation-aware Disentangled Learning for Multi-Instance Text-to-Image Generation | Geon Park et.al. | 2507.11947 | translate | read | null |
| 2025-07-16 | Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement | Shuichiro Nishigori et.al. | 2507.11925 | translate | read | null |
| 2025-07-16 | A Multimodal Data Fusion Generative Adversarial Network for Real Time Underwater Sound Speed Field Construction | Wei Huang et.al. | 2507.11812 | translate | read | null |
| 2025-07-15 | Deep Generative Methods and Tire Architecture Design | Fouad Oubari et.al. | 2507.11639 | translate | read | null |
| 2025-07-15 | CharaConsist: Fine-Grained Consistent Character Generation | Mengyu Wang et.al. | 2507.11533 | translate | read | link |
| 2025-07-15 | CATVis: Context-Aware Thought Visualization | Tariq Mehmood et.al. | 2507.11522 | translate | read | null |
| 2025-07-15 | Implementing Adaptations for Vision AutoRegressive Model | Kaif Shaikh et.al. | 2507.11441 | translate | read | link |
| 2025-07-15 | MFGDiffusion: Mask-Guided Smoke Synthesis for Enhanced Forest Fire Detection | Guanghao Wu et.al. | 2507.11252 | translate | read | null |
| 2025-07-15 | Latent Space Consistency for Sparse-View CT Reconstruction | Duoyou Chen et.al. | 2507.11152 | translate | read | null |
| 2025-07-15 | Learning from Imperfect Data: Robust Inference of Dynamic Systems using Simulation-based Generative Model | Hyunwoo Cho et.al. | 2507.10884 | translate | read | null |
| 2025-07-14 | Spatial Reasoners for Continuous Variables in Any Domain | Bart Pogodzinski et.al. | 2507.10768 | translate | read | null |
| 2025-07-15 | Text Embedding Knows How to Quantize Text-Guided Diffusion Models | Hongjae Lee et.al. | 2507.10340 | translate | read | null |
| 2025-07-14 | Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks | Ben Hamscher et.al. | 2507.10239 | translate | read | null |
| 2025-07-14 | From Wardrobe to Canvas: Wardrobe Polyptych LoRA for Part-level Controllable Human Image Generation | Jeongho Kim et.al. | 2507.10217 | translate | read | null |
| 2025-07-14 | Latent Diffusion Models with Masked AutoEncoders | Junho Lee et.al. | 2507.09984 | translate | read | link |
| 2025-07-14 | Counterfactual Visual Explanation via Causally-Guided Adversarial Steering | Yiran Qiao et.al. | 2507.09881 | translate | read | null |
| 2025-07-13 | AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs) | Abdul Manaf et.al. | 2507.09759 | translate | read | null |
| 2025-07-13 | Hybrid Quantum-Classical Generative Adversarial Networks with Transfer Learning | Asma Al-Othni et.al. | 2507.09706 | translate | read | null |
| 2025-07-13 | Brain Stroke Detection and Classification Using CT Imaging with Transformer Models and Explainable AI | Shomukh Qari et.al. | 2507.09630 | translate | read | null |
| 2025-07-13 | Demystifying Flux Architecture | Or Greenberg et.al. | 2507.09595 | translate | read | null |
| 2025-07-13 | MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models | Haozhe Zhao et.al. | 2507.09574 | translate | read | link |
| 2025-07-11 | Image Translation with Kernel Prediction Networks for Semantic Segmentation | Cristina Mata et.al. | 2507.08554 | translate | read | null |
| 2025-07-11 | Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation | Liu He et.al. | 2507.08513 | translate | read | null |
| 2025-07-11 | Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation | Anlin Zheng et.al. | 2507.08441 | translate | read | null |
| 2025-07-11 | RePaintGS: Reference-Guided Gaussian Splatting for Realistic and View-Consistent 3D Scene Inpainting | Ji Hyun Seo et.al. | 2507.08434 | translate | read | null |
| 2025-07-11 | Subject-Consistent and Pose-Diverse Text-to-Image Generation | Zhanxin Gao et.al. | 2507.08396 | translate | read | link |
| 2025-07-11 | From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning | Sen Wang et.al. | 2507.08380 | translate | read | null |
| 2025-07-11 | Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training | Aleksei Ilin et.al. | 2507.08284 | translate | read | null |
| 2025-07-11 | Single-Step Latent Diffusion for Underwater Image Restoration | Jiayi Wu et.al. | 2507.07878 | translate | read | null |
| 2025-07-10 | Assessing the Alignment of Audio Representations with Timbre Similarity Ratings | Haokun Tian et.al. | 2507.07764 | translate | read | null |
| 2025-07-10 | Degradation-Agnostic Statistical Facial Feature Transformation for Blind Face Restoration in Adverse Weather Conditions | Chang-Hwan Son et.al. | 2507.07464 | translate | read | null |
| 2025-07-10 | Behave Your Motion: Habit-preserved Cross-category Animal Motion Transfer | Zhimin Zhang et.al. | 2507.07394 | translate | read | null |
| 2025-07-10 | Digital Salon: An AI and Physics-Driven Tool for 3D Hair Grooming and Simulation | Chengan He et.al. | 2507.07387 | translate | read | null |
| 2025-07-09 | Scalable and Realistic Virtual Try-on Application for Foundation Makeup with Kubelka-Munk Theory | Hui Pang et.al. | 2507.07333 | translate | read | null |
| 2025-07-09 | Scale leads to compositional generalization | Florian Redhardt et.al. | 2507.07207 | translate | read | null |
| 2025-07-09 | Interpretable EEG-to-Image Generation with Semantic Prompts | Arshak Rezvani et.al. | 2507.07157 | translate | read | null |
| 2025-07-09 | Evaluating Attribute Confusion in Fashion Text-to-Image Generation | Ziyue Liu et.al. | 2507.07079 | translate | read | null |
| 2025-07-10 | Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting | Fei Teng et.al. | 2507.06971 | translate | read | null |
| 2025-07-09 | Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution | Yonghyun Park et.al. | 2507.06547 | translate | read | null |
| 2025-07-10 | Concept Unlearning by Modeling Key Steps of Diffusion Process | Chaoshuo Zhang et.al. | 2507.06526 | translate | read | null |
| 2025-07-08 | FedPhD: Federated Pruning with Hierarchical Learning of Diffusion Models | Qianyu Long et.al. | 2507.06449 | translate | read | null |
| 2025-07-08 | Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques | Yassin Hussein Rassul et.al. | 2507.06275 | translate | read | null |
| 2025-07-08 | NeoBabel: A Multilingual Open Tower for Visual Generation | Mohammad Mahdi Derakhshani et.al. | 2507.06137 | translate | read | link |
| 2025-07-08 | CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations | Xiaohu Li et.al. | 2507.06043 | translate | read | null |
| 2025-07-08 | TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision | Syeda Anshrah Gillani et.al. | 2507.06033 | translate | read | null |
| 2025-07-08 | Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval | Haiwen Li et.al. | 2507.05970 | translate | read | null |
| 2025-07-08 | DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation | Young Hun Kim et.al. | 2507.05627 | translate | read | null |
| 2025-07-08 | AdaptaGen: Domain-Specific Image Generation through Hierarchical Semantic Optimization Framework | Suoxiang Zhang et.al. | 2507.05621 | translate | read | null |
| 2025-07-08 | Kernel Density Steering: Inference-Time Scaling via Mode Seeking for Image Restoration | Yuyang Hu et.al. | 2507.05604 | translate | read | null |
| 2025-07-08 | Model-free Optical Processors using In Situ Reinforcement Learning with Proximal Policy Optimization | Yuhang Li et.al. | 2507.05583 | translate | read | null |
| 2025-07-08 | SingLoRA: Low Rank Adaptation Using a Single Matrix | David Bensaïd et.al. | 2507.05566 | translate | read | link |
| 2025-07-07 | LoomNet: Enhancing Multi-View Image Generation via Latent Space Weaving | Giulio Federico et.al. | 2507.05499 | translate | read | null |
| 2025-07-07 | SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model | Chun Xie et.al. | 2507.05148 | translate | read | link |
| 2025-07-07 | VERITAS: Verification and Explanation of Realness in Images for Transparency in AI Systems | Aadi Srivastava et.al. | 2507.05146 | translate | read | link |
| 2025-07-07 | ICAS: Detecting Training Data from Autoregressive Image Generative Models | Hongyao Yu et.al. | 2507.05068 | translate | read | null |
| 2025-07-07 | AI-Driven Cytomorphology Image Synthesis for Medical Diagnostics | Jan Carreras Boada et.al. | 2507.05063 | translate | read | link |
| 2025-07-07 | Estimating Object Physical Properties from RGB-D Vision and Depth Robot Sensors Using Deep Learning | Ricardo Cardoso et.al. | 2507.05029 | translate | read | null |
| 2025-07-07 | DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer | Yecheng Wu et.al. | 2507.04947 | translate | read | null |
| 2025-07-07 | Taming the Tri-Space Tension: ARC-Guided Hallucination Modeling and Control for Text-to-Image Generation | Jianjiang Yang et.al. | 2507.04946 | translate | read | null |
| 2025-07-07 | Leveraging Self-Supervised Features for Efficient Flooded Region Identification in UAV Aerial Images | Dibyabha Deb et.al. | 2507.04915 | translate | read | null |
| 2025-07-07 | When do World Models Successfully Learn Dynamical Systems? | Edmund Ross et.al. | 2507.04898 | translate | read | null |
| 2025-07-07 | Efficacy of Image Similarity as a Metric for Augmenting Small Dataset Retinal Image Segmentation | Thomas Wallace et.al. | 2507.04862 | translate | read | null |
| 2025-07-03 | AnyI2V: Animating Any Conditional Image with Motion Control | Ziye Li et.al. | 2507.02857 | translate | read | link |
| 2025-07-03 | RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation | Liheng Zhang et.al. | 2507.02792 | translate | read | null |
| 2025-07-03 | FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models | Yuxuan Wang et.al. | 2507.02714 | translate | read | null |
| 2025-07-03 | UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation | Qin Guo et.al. | 2507.02713 | translate | read | null |
| 2025-07-03 | AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models | Chenhao Xue et.al. | 2507.02598 | translate | read | null |
| 2025-07-03 | Holistic Tokenizer for Autoregressive Image Generation | Anlin Zheng et.al. | 2507.02358 | translate | read | link |
| 2025-07-03 | Transformer-based EEG Decoding: A Survey | Haodong Zhang et.al. | 2507.02320 | translate | read | null |
| 2025-07-02 | Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation | Zhuoyang Zhang et.al. | 2507.01957 | translate | read | null |
| 2025-07-02 | How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks | Rahul Ramachandran et.al. | 2507.01955 | translate | read | null |
| 2025-07-02 | Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning | Qingdong He et.al. | 2507.01908 | translate | read | null |
| 2025-07-02 | Improving GANs by leveraging the quantum noise from real hardware | Hongni Jin et.al. | 2507.01886 | translate | read | null |
| 2025-07-02 | FreeLoRA: Enabling Training-Free LoRA Fusion for Autoregressive Multi-Subject Personalization | Peng Zheng et.al. | 2507.01792 | translate | read | null |
| 2025-07-02 | Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis | Peng Zheng et.al. | 2507.01756 | translate | read | null |
| 2025-07-02 | Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective | Yuxin Mao et.al. | 2507.01652 | translate | read | null |
| 2025-07-02 | Representation Entanglement for Generation:Training Diffusion Transformers Is Much Easier Than You Think | Ge Wu et.al. | 2507.01467 | translate | read | null |
| 2025-07-02 | DiffMark: Diffusion-based Robust Watermark Against Deepfakes | Chen Sun et.al. | 2507.01428 | translate | read | null |
| 2025-07-02 | BronchoGAN: Anatomically consistent and domain-agnostic image-to-image translation for video bronchoscopy | Ahmad Soliman et.al. | 2507.01387 | translate | read | null |
(<a href=../Image_Generation.md>back to Image Generation</a>)