Image Generation - 2024-12
Image Generation - 2024-12
| Publish Date | Title | Authors | Translate | Read | Code | |
|---|---|---|---|---|---|---|
| 2024-12-30 | Quantum Diffusion Model for Quark and Gluon Jet Generation | Mariia Baidachna et.al. | 2412.21082 | translate | read | link |
| 2024-12-30 | Varformer: Adapting VAR’s Generative Prior for Image Restoration | Siyang Wang et.al. | 2412.21063 | translate | read | link |
| 2024-12-30 | Redesign Quantum Circuits on Quantum Hardware Device | Runhong He et.al. | 2412.20893 | translate | read | null |
| 2024-12-30 | Generative Deep Synthesis of MIMO Sensing Waveforms with Desired Transmit Beampattern | Vesa Saarinen et.al. | 2412.20883 | translate | read | null |
| 2024-12-30 | VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control | Shaojin Wu et.al. | 2412.20800 | translate | read | link |
| 2024-12-30 | HFI: A unified framework for training-free detection and implicit watermarking of latent diffusion model generated images | Sungik Choi et.al. | 2412.20704 | translate | read | null |
| 2024-12-30 | Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis | Yousef Yeganeh et.al. | 2412.20651 | translate | read | null |
| 2024-12-29 | Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond) | Tomer Garber et.al. | 2412.20596 | translate | read | null |
| 2024-12-29 | Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment | Shiyun Chen et.al. | 2412.20418 | translate | read | null |
| 2024-12-27 | StyleRWKV: High-Quality and High-Efficiency Style Transfer with RWKV-like Architecture | Miaomiao Dai et.al. | 2412.19535 | translate | read | null |
| 2024-12-27 | P3S-Diffusion:A Selective Subject-driven Generation Framework via Point Supervision | Junjie Hu et.al. | 2412.19533 | translate | read | null |
| 2024-12-27 | Estimation of System Parameters Including Repeated Cross-Sectional Data through Emulator-Informed Deep Generative Model | Hyunwoo Cho et.al. | 2412.19517 | translate | read | null |
| 2024-12-27 | Generative Adversarial Network on Motion-Blur Image Restoration | Zhengdong Li et.al. | 2412.19479 | translate | read | null |
| 2024-12-27 | Focusing Image Generation to Mitigate Spurious Correlations | Xuewei Li et.al. | 2412.19457 | translate | read | null |
| 2024-12-26 | Multi-Attribute Constraint Satisfaction via Language Model Rewriting | Ashutosh Baheti et.al. | 2412.19198 | translate | read | null |
| 2024-12-26 | Generating Editable Head Avatars with 3D Gaussian GANs | Guohao Li et.al. | 2412.19149 | translate | read | link |
| 2024-12-25 | MGAN-CRCM: A Novel Multiple Generative Adversarial Network and Coarse-Refinement Based Cognizant Method for Image Inpainting | Nafiz Al Asad et.al. | 2412.19000 | translate | read | null |
| 2024-12-25 | Single Trajectory Distillation for Accelerating Image and Video Style Transfer | Sijie Xu et.al. | 2412.18945 | translate | read | null |
| 2024-12-25 | UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation | Lunhao Duan et.al. | 2412.18928 | translate | read | null |
| 2024-12-24 | Efficient Aircraft Design Optimization Using Multi-Fidelity Models and Multi-fidelity Physics Informed Neural Networks | Apurba Sarker et.al. | 2412.18564 | translate | read | null |
| 2024-12-24 | Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models | Qice Qin et.al. | 2412.18421 | translate | read | null |
| 2024-12-24 | Extract Free Dense Misalignment from CLIP | JeongYeon Nam et.al. | 2412.18404 | translate | read | link |
| 2024-12-24 | RDPM: Solve Diffusion Probabilistic Models via Recurrent Token Prediction | Wu Xiaoping et.al. | 2412.18390 | translate | read | null |
| 2024-12-24 | Improved Feature Generating Framework for Transductive Zero-shot Learning | Zihan Ye et.al. | 2412.18282 | translate | read | null |
| 2024-12-24 | TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization | Yucong Luo et.al. | 2412.18185 | translate | read | null |
| 2024-12-24 | EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation | Shuhao Han et.al. | 2412.18150 | translate | read | null |
| 2024-12-24 | Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction | Xiao Guo et.al. | 2412.18149 | translate | read | null |
| 2024-12-24 | Ensuring Consistency for In-Image Translation | Chengpeng Fu et.al. | 2412.18139 | translate | read | null |
| 2024-12-24 | Beyond the Known: Enhancing Open Set Domain Adaptation with Unknown Exploration | Lucas Fernando Alvarenga e Silva et.al. | 2412.18105 | translate | read | link |
| 2024-12-23 | Personalized Large Vision-Language Models | Chau Pham et.al. | 2412.17610 | translate | read | null |
| 2024-12-23 | Discriminative Image Generation with Diffusion Models for Zero-Shot Learning | Dingjie Fu et.al. | 2412.17219 | translate | read | null |
| 2024-12-22 | Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching | Enshu Liu et.al. | 2412.17153 | translate | read | link |
| 2024-12-22 | Style Transfer Dataset: What Makes A Good Stylization? | Victor Kitov et.al. | 2412.17139 | translate | read | null |
| 2024-12-22 | Similarity Trajectories: Linking Sampling Process to Artifacts in Diffusion-Generated Images | Dennis Menn et.al. | 2412.17109 | translate | read | null |
| 2024-12-22 | DreamOmni: Unified Image Generation and Editing | Bin Xia et.al. | 2412.17098 | translate | read | null |
| 2024-12-22 | Modular Conversational Agents for Surveys and Interviews | Jiangbo Yu et.al. | 2412.17049 | translate | read | null |
| 2024-12-22 | HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories | Eric Hedlin et.al. | 2412.17040 | translate | read | null |
| 2024-12-22 | DTSGAN: Learning Dynamic Textures via Spatiotemporal Generative Adversarial Network | Xiangtian Li et.al. | 2412.16948 | translate | read | null |
| 2024-12-22 | Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation | Quan Dao et.al. | 2412.16906 | translate | read | link |
| 2024-12-20 | Personalized Representation from Personalized Generation | Shobhita Sundaram et.al. | 2412.16156 | translate | read | link |
| 2024-12-20 | NeRF-To-Real Tester: Neural Radiance Fields as Test Image Generators for Vision of Autonomous Systems | Laura Weihl et.al. | 2412.16141 | translate | read | null |
| 2024-12-20 | CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up | Songhua Liu et.al. | 2412.16112 | translate | read | link |
| 2024-12-20 | SafeCFG: Redirecting Harmful Classifier-Free Guidance for Safe Generation | Jiadong Pan et.al. | 2412.16039 | translate | read | null |
| 2024-12-20 | A Thorough Investigation into the Application of Deep CNN for Enhancing Natural Language Processing Capabilities | Chang Weng et.al. | 2412.15900 | translate | read | null |
| 2024-12-20 | Semi-Supervised Adaptation of Diffusion Models for Handwritten Text Generation | Kai Brandenbusch et.al. | 2412.15853 | translate | read | null |
| 2024-12-20 | Robustness-enhanced Myoelectric Control with GAN-based Open-set Recognition | Cheng Wang et.al. | 2412.15819 | translate | read | null |
| 2024-12-20 | PersonaMagic: Stage-Regulated High-Fidelity Face Customization with Tandem Equilibrium | Xinzhe Li et.al. | 2412.15674 | translate | read | link |
| 2024-12-20 | BS-LDM: Effective Bone Suppression in High-Resolution Chest X-Ray Images with Conditional Latent Diffusion Models | Yifei Sun et.al. | 2412.15670 | translate | read | link |
| 2024-12-20 | SemDP: Semantic-level Differential Privacy Protection for Face Datasets | Xiaoting Zhang et.al. | 2412.15590 | translate | read | null |
| 2024-12-19 | Flowing from Words to Pixels: A Framework for Cross-Modality Evolution | Qihao Liu et.al. | 2412.15213 | translate | read | null |
| 2024-12-19 | FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching | Sucheng Ren et.al. | 2412.15205 | translate | read | link |
| 2024-12-19 | LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation | Weijia Shi et.al. | 2412.15188 | translate | read | null |
| 2024-12-19 | Tiled Diffusion | Or Madar et.al. | 2412.15185 | translate | read | null |
| 2024-12-19 | DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space | Mang Ning et.al. | 2412.15032 | translate | read | link |
| 2024-12-19 | Generative AI for Banks: Benchmarks and Algorithms for Synthetic Financial Transaction Data | Fabian Sven Karst et.al. | 2412.14730 | translate | read | link |
| 2024-12-19 | Qua $^2$ SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models | Keith G. Mills et.al. | 2412.14628 | translate | read | null |
| 2024-12-19 | Dynamic User Interface Generation for Enhanced Human-Computer Interaction Using Variational Autoencoders | Runsheng Zhang et.al. | 2412.14521 | translate | read | null |
| 2024-12-19 | DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On | Wengyi Zhan et.al. | 2412.14465 | translate | read | null |
| 2024-12-19 | LEDiff: Latent Exposure Diffusion for HDR Generation | Chao Wang et.al. | 2412.14456 | translate | read | null |
| 2024-12-18 | E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling | Zhihang Yuan et.al. | 2412.14170 | translate | read | null |
| 2024-12-18 | Autoregressive Video Generation without Vector Quantization | Haoge Deng et.al. | 2412.14169 | translate | read | link |
| 2024-12-18 | FashionComposer: Compositional Fashion Image Generation | Sihui Ji et.al. | 2412.14168 | translate | read | null |
| 2024-12-18 | VideoDPO: Omni-Preference Alignment for Video Diffusion Generation | Runtao Liu et.al. | 2412.14167 | translate | read | null |
| 2024-12-18 | Super-Resolution Generative Adversarial Network for Data Compression of Direct Numerical Simulations | Ludovico Nista et.al. | 2412.14150 | translate | read | null |
| 2024-12-18 | Text2Relight: Creative Portrait Relighting with Text Guidance | Junuk Cha et.al. | 2412.13734 | translate | read | null |
| 2024-12-18 | Diffusion models and stochastic quantisation in lattice field theory | Gert Aarts et.al. | 2412.13704 | translate | read | null |
| 2024-12-18 | MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing | Chuang Yang et.al. | 2412.13684 | translate | read | null |
| 2024-12-18 | Self-control: A Better Conditional Mechanism for Masked Autoregressive Model | Qiaoying Qu et.al. | 2412.13635 | translate | read | null |
| 2024-12-18 | Hybrid Data-Free Knowledge Distillation | Jialiang Tang et.al. | 2412.13525 | translate | read | link |
| 2024-12-17 | F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration | Lu Liu et.al. | 2412.13155 | translate | read | null |
| 2024-12-17 | Prompt Augmentation for Self-supervised Text-guided Image Manipulation | Rumeysa Bodur et.al. | 2412.13081 | translate | read | null |
| 2024-12-17 | 3D MedDiffusion: A 3D Medical Diffusion Model for Controllable and High-quality Medical Image Generation | Haoshen Wang et.al. | 2412.13059 | translate | read | null |
| 2024-12-17 | A New Adversarial Perspective for LiDAR-based 3D Object Detection | Shijun Zheng et.al. | 2412.13017 | translate | read | null |
| 2024-12-17 | Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression | Ruijie Chen et.al. | 2412.12982 | translate | read | null |
| 2024-12-17 | Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance | Wenhao Sun et.al. | 2412.12974 | translate | read | link |
| 2024-12-17 | Unsupervised Region-Based Image Editing of Denoising Diffusion Models | Zixiang Li et.al. | 2412.12912 | translate | read | null |
| 2024-12-17 | ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction | Zhongjie Duan et.al. | 2412.12888 | translate | read | link |
| 2024-12-17 | Rethinking Diffusion-Based Image Generators for Fundus Fluorescein Angiography Synthesis on Limited Data | Chengzhou Yu et.al. | 2412.12778 | translate | read | null |
| 2024-12-17 | Guided and Variance-Corrected Fusion with One-shot Style Alignment for Large-Content Image Generation | Shoukun Sun et.al. | 2412.12771 | translate | read | null |
| 2024-12-16 | Causal Diffusion Transformers for Generative Modeling | Chaorui Deng et.al. | 2412.12095 | translate | read | link |
| 2024-12-16 | A LoRA is Worth a Thousand Pictures | Chenxi Liu et.al. | 2412.12048 | translate | read | null |
| 2024-12-16 | Ensemble Learning and 3D Pix2Pix for Comprehensive Brain Tumor Analysis in Multimodal MRI | Ramy A. Zeineldin et.al. | 2412.11849 | translate | read | null |
| 2024-12-16 | Multilingual and Explainable Text Detoxification with Parallel Corpora | Daryna Dementieva et.al. | 2412.11691 | translate | read | link |
| 2024-12-16 | IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation | Yiren Song et.al. | 2412.11638 | translate | read | null |
| 2024-12-16 | 3D $^2$ -Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling | Zichen Tang et.al. | 2412.11599 | translate | read | link |
| 2024-12-16 | VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis | Zhipeng Chen et.al. | 2412.11594 | translate | read | link |
| 2024-12-16 | LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model | Xi Wang et.al. | 2412.11519 | translate | read | null |
| 2024-12-16 | FedCAR: Cross-client Adaptive Re-weighting for Generative Models in Federated Learning | Minjun Kim et.al. | 2412.11463 | translate | read | link |
| 2024-12-16 | Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models | Namhyuk Ahn et.al. | 2412.11423 | translate | read | null |
| 2024-12-13 | OP-LoRA: The Blessing of Dimensionality | Piotr Teterwak et.al. | 2412.10362 | translate | read | null |
| 2024-12-13 | BrushEdit: All-In-One Image Inpainting and Editing | Yaowei Li et.al. | 2412.10316 | translate | read | link |
| 2024-12-13 | Simple Guidance Mechanisms for Discrete Diffusion Models | Yair Schiff et.al. | 2412.10193 | translate | read | link |
| 2024-12-13 | FaceShield: Defending Facial Image against Deepfake Threats | Jaehwan Jeong et.al. | 2412.09921 | translate | read | null |
| 2024-12-13 | ProxyLLM : LLM-Driven Framework for Customer Support Through Text-Style Transfer | Sehyeong Jo et.al. | 2412.09916 | translate | read | null |
| 2024-12-13 | T-GMSI: A transformer-based generative model for spatial interpolation under sparse measurements | Xiangxi Tian et.al. | 2412.09886 | translate | read | null |
| 2024-12-13 | Financial Fine-tuning a Large Time Series Model | Xinghong Fu et.al. | 2412.09880 | translate | read | link |
| 2024-12-12 | Human vs. AI: A Novel Benchmark and a Comparative Study on the Detection of Generated Images and the Impact of Prompts | Philipp Moeßner et.al. | 2412.09715 | translate | read | link |
| 2024-12-12 | Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation | Chun-Mei Feng et.al. | 2412.09706 | translate | read | link |
| 2024-12-12 | LoRACLR: Contrastive Adaptation for Customization of Diffusion Models | Enis Simsar et.al. | 2412.09622 | translate | read | null |
| 2024-12-12 | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Zhuofan Zong et.al. | 2412.09618 | translate | read | null |
| 2024-12-12 | FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers | Yusuf Dalva et.al. | 2412.09611 | translate | read | null |
| 2024-12-12 | Spectral Image Tokenizer | Carlos Esteves et.al. | 2412.09607 | translate | read | null |
| 2024-12-12 | Are Conditional Latent Diffusion Models Effective for Image Restoration? | Yunchen Yuan et.al. | 2412.09324 | translate | read | null |
| 2024-12-12 | Transfer Learning of RSSI to Improve Indoor Localisation Performance | Thanaphon Suwannaphong et.al. | 2412.09292 | translate | read | link |
| 2024-12-12 | DASK: Distribution Rehearsing via Adaptive Style Kernel Learning for Exemplar-Free Lifelong Person Re-Identification | Kunlun Xu et.al. | 2412.09224 | translate | read | null |
| 2024-12-12 | RAD: Region-Aware Diffusion Models for Image Inpainting | Sora Kim et.al. | 2412.09191 | translate | read | null |
| 2024-12-12 | LVMark: Robust Watermark for latent video diffusion models | MinHyuk Jang et.al. | 2412.09122 | translate | read | null |
| 2024-12-12 | ViUniT: Visual Unit Tests for More Robust Visual Programming | Artemis Panagopoulou et.al. | 2412.08859 | translate | read | null |
| 2024-12-11 | Generative Semantic Communication: Architectures, Technologies, and Applications | Jinke Ren et.al. | 2412.08642 | translate | read | null |
| 2024-12-11 | Fast Prompt Alignment for Text-to-Image Generation | Khalil Mrini et.al. | 2412.08639 | translate | read | link |
| 2024-12-11 | Multimodal Latent Language Modeling with Next-Token Diffusion | Yutao Sun et.al. | 2412.08635 | translate | read | link |
| 2024-12-11 | LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations | Zejian Li et.al. | 2412.08580 | translate | read | link |
| 2024-12-11 | StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements | Mingkun Lei et.al. | 2412.08503 | translate | read | link |
| 2024-12-11 | Learning Flow Fields in Attention for Controllable Person Image Generation | Zijian Zhou et.al. | 2412.08486 | translate | read | link |
| 2024-12-11 | InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models | Min Hou et.al. | 2412.08480 | translate | read | link |
| 2024-12-11 | CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis | Mu Zhang et.al. | 2412.08464 | translate | read | null |
| 2024-12-11 | Analyzing and Improving Model Collapse in Rectified Flow Models | Huminhao Zhu et.al. | 2412.08175 | translate | read | null |
| 2024-12-11 | AsyncDSB: Schedule-Asynchronous Diffusion Schrödinger Bridge for Image Inpainting | Zihao Han et.al. | 2412.08149 | translate | read | null |
| 2024-12-10 | UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics | Xi Chen et.al. | 2412.07774 | translate | read | null |
| 2024-12-10 | StyleMaster: Stylize Your Video with Artistic Generation and Translation | Zixuan Ye et.al. | 2412.07744 | translate | read | link |
| 2024-12-10 | FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models | Tong Wu et.al. | 2412.07674 | translate | read | link |
| 2024-12-10 | DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation | Jianzong Wu et.al. | 2412.07589 | translate | read | link |
| 2024-12-10 | StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization | Jinlu Zhang et.al. | 2412.07375 | translate | read | link |
| 2024-12-10 | Fusion Embedding for Pose-Guided Person Image Synthesis with Diffusion Model | Donghwna Lee et.al. | 2412.07333 | translate | read | link |
| 2024-12-10 | A Generative Victim Model for Segmentation | Aixuan Li et.al. | 2412.07274 | translate | read | null |
| 2024-12-10 | Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation | Xin Zhao et.al. | 2412.07249 | translate | read | null |
| 2024-12-10 | Moderating the Generalization of Score-based Generative Model | Wan Jiang et.al. | 2412.07229 | translate | read | null |
| 2024-12-10 | Fine-grained Text to Image Synthesis | Xu Ouyang et.al. | 2412.07196 | translate | read | null |
| 2024-12-09 | Visual Lexicon: Rich Image Features in Language Space | XuDong Wang et.al. | 2412.06774 | translate | read | null |
| 2024-12-09 | Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty | Meera Hahn et.al. | 2412.06771 | translate | read | link |
| 2024-12-09 | ContRail: A Framework for Realistic Railway Image Synthesis using ControlNet | Andrei-Robert Alexandrescu et.al. | 2412.06742 | translate | read | null |
| 2024-12-09 | EMOv2: Pushing 5M Vision Model Frontier | Jiangning Zhang et.al. | 2412.06674 | translate | read | link |
| 2024-12-09 | ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance | Chunwei Wang et.al. | 2412.06673 | translate | read | null |
| 2024-12-09 | Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion | Shuaiting Li et.al. | 2412.06661 | translate | read | null |
| 2024-12-09 | Echocardiography to Cardiac MRI View Transformation for Real-Time Blind Restoration | Ilke Adalioglu et.al. | 2412.06445 | translate | read | null |
| 2024-12-09 | Exploring the Impact of Synthetic Data on Human Gesture Recognition Tasks Using GANs | George Kontogiannis et.al. | 2412.06389 | translate | read | null |
| 2024-12-09 | Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment | Kim Sung-Bin et.al. | 2412.06209 | translate | read | link |
| 2024-12-09 | ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance | Yuming Li et.al. | 2412.06163 | translate | read | null |
| 2024-12-06 | LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation | Donald Shenaj et.al. | 2412.05148 | translate | read | link |
| 2024-12-06 | The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation | Ruoyu Wang et.al. | 2412.05101 | translate | read | null |
| 2024-12-06 | Noise Matters: Diffusion Model-based Urban Mobility Generation with Collaborative Noise Priors | Yuheng Zhang et.al. | 2412.05000 | translate | read | null |
| 2024-12-06 | Continuous Video Process: Modeling Videos as Continuous Multi-Dimensional Processes for Video Prediction | Gaurav Shrivastava et.al. | 2412.04929 | translate | read | null |
| 2024-12-05 | Hidden in the Noise: Two-Stage Robust Watermarking for Images | Kasra Arabi et.al. | 2412.04653 | translate | read | link |
| 2024-12-05 | One Communication Round is All It Needs for Federated Fine-Tuning Foundation Models | Ziyao Wang et.al. | 2412.04650 | translate | read | null |
| 2024-12-05 | LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors | Yusuf Dalva et.al. | 2412.04460 | translate | read | null |
| 2024-12-05 | Learning Artistic Signatures: Symmetry Discovery and Style Transfer | Emma Finn et.al. | 2412.04441 | translate | read | null |
| 2024-12-05 | Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis | Jian Han et.al. | 2412.04431 | translate | read | link |
| 2024-12-05 | Multi-Subject Image Synthesis as a Generative Prior for Single-Subject PET Image Reconstruction | George Webber et.al. | 2412.04324 | translate | read | null |
| 2024-12-05 | The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation | Fredrik Carlsson et.al. | 2412.04318 | translate | read | null |
| 2024-12-05 | T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts | Ziwei Huang et.al. | 2412.04300 | translate | read | null |
| 2024-12-05 | Structure-Aware Stylized Image Synthesis for Robust Medical Image Segmentation | Jie Bao et.al. | 2412.04296 | translate | read | null |
| 2024-12-05 | AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models | Xinghui Li et.al. | 2412.04146 | translate | read | link |
| 2024-12-05 | D-LORD for Motion Stylization | Meenakshi Gupta et.al. | 2412.04097 | translate | read | null |
| 2024-12-05 | BodyMetric: Evaluating the Realism of HumanBodies in Text-to-Image Generation | Nefeli Andreou et.al. | 2412.04086 | translate | read | null |
| 2024-12-04 | Style3D: Attention-guided Multi-view Style Transfer for 3D Object Generation | Bingjie Song et.al. | 2412.03571 | translate | read | null |
| 2024-12-04 | MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation | Zehuan Huang et.al. | 2412.03558 | translate | read | link |
| 2024-12-04 | Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective | Neta Shaul et.al. | 2412.03487 | translate | read | null |
| 2024-12-04 | Skel3D: Skeleton Guided Novel View Synthesis | Aron Fóthi et.al. | 2412.03407 | translate | read | null |
| 2024-12-04 | Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment | Feng He et.al. | 2412.03400 | translate | read | null |
| 2024-12-04 | SGSST: Scaling Gaussian Splatting StyleTransfer | Bruno Galerne et.al. | 2412.03371 | translate | read | null |
| 2024-12-04 | DIVE: Taming DINO for Subject-Driven Video Editing | Yi Huang et.al. | 2412.03347 | translate | read | null |
| 2024-12-04 | Geometry-guided Cross-view Diffusion for One-to-many Cross-view Image Synthesis | Tao Jun Lin et.al. | 2412.03315 | translate | read | null |
| 2024-12-04 | Is JPEG AI going to change image forensics? | Edoardo Daniele Cannas et.al. | 2412.03261 | translate | read | null |
| 2024-12-04 | DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation | Qingdong He et.al. | 2412.03255 | translate | read | null |
| 2024-12-03 | Taming Scalable Visual Tokenizer for Autoregressive Image Generation | Fengyuan Shi et.al. | 2412.02692 | translate | read | link |
| 2024-12-03 | FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation | Kefan Chen et.al. | 2412.02690 | translate | read | null |
| 2024-12-03 | SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance | Viet Nguyen et.al. | 2412.02687 | translate | read | null |
| 2024-12-03 | WEM-GAN: Wavelet transform based facial expression manipulation | Dongya Sun et.al. | 2412.02530 | translate | read | null |
| 2024-12-03 | ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation? | Leixin Zhang et.al. | 2412.02368 | translate | read | link |
| 2024-12-03 | Switchable deep beamformer for high-quality and real-time passive acoustic mapping | Yi Zeng et.al. | 2412.02327 | translate | read | null |
| 2024-12-03 | Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models | Jungwon Park et.al. | 2412.02237 | translate | read | link |
| 2024-12-03 | GIST: Towards Photorealistic Style Transfer via Multiscale Geometric Representations | Renan A. Rojas-Gomez et.al. | 2412.02214 | translate | read | null |
| 2024-12-03 | An Automated Data Mining Framework Using Autoencoders for Feature Extraction and Dimensionality Reduction | Yaxin Liang et.al. | 2412.02211 | translate | read | null |
| 2024-12-03 | 3D representation in 512-Byte:Variational tokenizer is the key for autoregressive 3D generation | Jinzhi Zhang et.al. | 2412.02202 | translate | read | null |
(<a href=../Image_Generation.md>back to Image Generation</a>)