Dataset Viewer
Go to dataset viewer
Viewer
The dataset viewer is not available for this dataset.
Cannot get the config names for the dataset.
Error code: ConfigNamesError Exception: KeyError Message: 'lastCommit' Traceback: Traceback (most recent call last): File "/src/services/worker/src/worker/job_runners/dataset/config_names.py", line 55, in compute_config_names_response for config in sorted(get_dataset_config_names(path=dataset, token=hf_token)) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 351, in get_dataset_config_names dataset_module = dataset_module_factory( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1512, in dataset_module_factory raise e1 from None File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1489, in dataset_module_factory return HubDatasetModuleFactoryWithoutScript( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1047, in get_module patterns = get_data_patterns(base_path, download_config=self.download_config) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/data_files.py", line 458, in get_data_patterns return _get_data_files_patterns(resolver) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/data_files.py", line 249, in _get_data_files_patterns data_files = pattern_resolver(pattern) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/data_files.py", line 333, in resolve_pattern fs, _, _ = get_fs_token_paths(pattern, storage_options=storage_options) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 622, in get_fs_token_paths paths = [f for f in sorted(fs.glob(paths)) if not fs.isdir(f)] File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 565, in glob allpaths = self.find(root, maxdepth=depth, withdirs=True, detail=True, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 466, in find for _, dirs, files in self.walk(path, maxdepth, detail=True, **kwargs): File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 440, in walk yield from self.walk( File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 403, in walk listing = self.ls(path, detail=True, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/hf_file_system.py", line 282, in ls "last_modified": parse_datetime(tree_item["lastCommit"]["date"]), KeyError: 'lastCommit'
Need help to make the dataset viewer work? Open a discussion for direct support.
ArXiv QA
(TBD) Automated ArXiv question answering via large language models
Github | Homepage | Simple QA - Model Database Space
List of Papers
2023
September 2023
- SlimPajama-DC: Understanding Data Combinations for LLM Training - [ArXiv] [QA].
- OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch - [ArXiv] [QA].
- Language Modeling Is Compression - [ArXiv] [QA].
- FoleyGen: Visually-Guided Audio Generation - [ArXiv] [QA].
- Baichuan 2: Open Large-scale Language Models - [ArXiv] [QA].
- 360$^\circ$ Reconstruction From a Single Image Using Space Carved Outpainting - [ArXiv] [QA].
- Stabilizing RLHF through Advantage Model and Selective Rehearsal - [ArXiv] [QA].
- Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions - [ArXiv] [QA].
- Multimodal Foundation Models: From Specialists to General-Purpose Assistants - [ArXiv] [QA].
- MindAgent: Emergent Gaming Interaction - [ArXiv] [QA].
- An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models - [ArXiv] [QA].
- Adapting Large Language Models via Reading Comprehension - [ArXiv] [QA].
- LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models - [ArXiv] [QA].
- CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages - [ArXiv] [QA].
- Augmenting text for spoken language understanding with Large Language Models - [ArXiv] [QA].
- OWL: A Large Language Model for IT Operations - [ArXiv] [QA].
- Contrastive Decoding Improves Reasoning in Large Language Models - [ArXiv] [QA].
- Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT) - [ArXiv] [QA].
- Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? - [ArXiv] [QA].
- PDFTriage: Question Answering over Long, Structured Documents - [ArXiv] [QA].
- S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking in the Era of LLMs - [ArXiv] [QA].
- Stack-and-Delay: a new codebook pattern for music generation - [ArXiv] [QA].
- Enhance audio generation controllability through representation similarity regularization - [ArXiv] [QA].
- Sparse Autoencoders Find Highly Interpretable Features in Language Models - [ArXiv] [QA].
- Compositional Foundation Models for Hierarchical Planning - [ArXiv] [QA].
- Replacing softmax with ReLU in Vision Transformers - [ArXiv] [QA].
- Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers - [ArXiv] [QA].
- Scaling Laws for Sparsely-Connected Foundation Models - [ArXiv] [QA].
- Cure the headache of Transformers via Collinear Constrained Attention - [ArXiv] [QA].
- Investigating Answerability of LLMs for Long-Form Question Answering - [ArXiv] [QA].
- LASER: LLM Agent with State-Space Exploration for Web Navigation - [ArXiv] [QA].
- Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding - [ArXiv] [QA].
- Retrieval-Augmented Text-to-Audio Generation - [ArXiv] [QA].
- Leveraging Contextual Information for Effective Entity Salience Detection - [ArXiv] [QA].
- Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models - [ArXiv] [QA].
- A Data Source for Reasoning Embodied Agents - [ArXiv] [QA].
- Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping - [ArXiv] [QA].
- ALWOD: Active Learning for Weakly-Supervised Object Detection - [ArXiv] [QA].
- Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning - [ArXiv] [QA].
- TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting - [ArXiv] [QA].
- Generative Image Dynamics - [ArXiv] [QA].
- Ambiguity-Aware In-Context Learning with Large Language Models - [ArXiv] [QA].
- Agents: An Open-source Framework for Autonomous Language Agents - [ArXiv] [QA].
- TextBind: Multi-turn Interleaved Multimodal Instruction-following - [ArXiv] [QA].
- OmnimatteRF: Robust Omnimatte with 3D Background Modeling - [ArXiv] [QA].
- Efficiently Robustify Pre-trained Models - [ArXiv] [QA].
- EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization - [ArXiv] [QA].
- Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation? - [ArXiv] [QA].
- Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts - [ArXiv] [QA].
- Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance - [ArXiv] [QA].
- AudioSR: Versatile Audio Super-resolution at Scale - [ArXiv] [QA].
- Text-Guided Generation and Editing of Compositional 3D Avatars - [ArXiv] [QA].
- Tree-Structured Shading Decomposition - [ArXiv] [QA].
- SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection - [ArXiv] [QA].
- DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models - [ArXiv] [QA].
- MagiCapture: High-Resolution Multi-Concept Portrait Customization - [ArXiv] [QA].
- Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit? - [ArXiv] [QA].
- Leveraging SE(3) Equivariance for Learning 3D Geometric Shape Assembly - [ArXiv] [QA].
- Dynamic NeRFs for Soccer Scenes - [ArXiv] [QA].
- MPI-Flow: Learning Realistic Optical Flow with Multiplane Images - [ArXiv] [QA].
- VLSlice: Interactive Vision-and-Language Slice Discovery - [ArXiv] [QA].
- Generalizable Neural Fields as Partially Observed Neural Processes - [ArXiv] [QA].
- Statistical Rejection Sampling Improves Preference Optimization - [ArXiv] [QA].
- A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale - [ArXiv] [QA].
- Learning Disentangled Avatars with Hybrid 3D Representations - [ArXiv] [QA].
- LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning - [ArXiv] [QA].
- InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation - [ArXiv] [QA].
- Recovering from Privacy-Preserving Masking with Large Language Models - [ArXiv] [QA].
- Modality Unifying Network for Visible-Infrared Person Re-Identification - [ArXiv] [QA].
- Efficient Memory Management for Large Language Model Serving with PagedAttention - [ArXiv] [QA].
- AstroLLaMA: Towards Specialized Foundation Models in Astronomy - [ArXiv] [QA].
- Uncovering mesa-optimization algorithms in Transformers - [ArXiv] [QA].
- Large Language Models for Compiler Optimization - [ArXiv] [QA].
- SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors - [ArXiv] [QA].
- PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models - [ArXiv] [QA].
- Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips - [ArXiv] [QA].
- Large Language Model for Science: A Study on P vs. NP - [ArXiv] [QA].
- UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase - [ArXiv] [QA].
- ITI-GEN: Inclusive Text-to-Image Generation - [ArXiv] [QA].
- NExT-GPT: Any-to-Any Multimodal LLM - [ArXiv] [QA].
- Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs - [ArXiv] [QA].
- Textbooks Are All You Need II: phi-1.5 technical report - [ArXiv] [QA].
- Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning - [ArXiv] [QA].
- Class-Incremental Grouping Network for Continual Audio-Visual Learning - [ArXiv] [QA].
- Multi3DRefer: Grounding Text Description to Multiple 3D Objects - [ArXiv] [QA].
- Towards Viewpoint Robustness in Bird's Eye View Segmentation - [ArXiv] [QA].
- Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color - [ArXiv] [QA].
- 3D Implicit Transporter for Temporally Consistent Keypoint Discovery - [ArXiv] [QA].
- Multi-view Self-supervised Disentanglement for General Image Denoising - [ArXiv] [QA].
- Mitigating Word Bias in Zero-shot Prompt-based Classifiers - [ArXiv] [QA].
- Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation - [ArXiv] [QA].
- Effective Real Image Editing with Accelerated Iterative Diffusion Inversion - [ArXiv] [QA].
- Neurons in Large Language Models: Dead, N-gram, Positional - [ArXiv] [QA].
- Towards Real-World Burst Image Super-Resolution: Benchmark and Method - [ArXiv] [QA].
- Towards Robust Model Watermark via Reducing Parametric Vulnerability - [ArXiv] [QA].
- FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning - [ArXiv] [QA].
- MADLAD-400: A Multilingual And Document-Level Large Audited Dataset - [ArXiv] [QA].
- Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf - [ArXiv] [QA].
- Dynamic Mesh-Aware Radiance Fields - [ArXiv] [QA].
- When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale - [ArXiv] [QA].
- Examining Autoexposure for Challenging Scenes - [ArXiv] [QA].
- Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving - [ArXiv] [QA].
- DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields - [ArXiv] [QA].
- Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts - [ArXiv] [QA].
- The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion - [ArXiv] [QA].
- From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting - [ArXiv] [QA].
- Towards Practical Capture of High-Fidelity Relightable Avatars - [ArXiv] [QA].
- Unsupervised Object Localization with Representer Point Selection - [ArXiv] [QA].
- Evaluation and Mitigation of Agnosia in Multimodal Large Language Models - [ArXiv] [QA].
- CDFSL-V: Cross-Domain Few-Shot Learning for Videos - [ArXiv] [QA].
- ImageBind-LLM: Multi-modality Instruction Tuning - [ArXiv] [QA].
- Tracking Anything with Decoupled Video Segmentation - [ArXiv] [QA].
- Learning Continuous Exposure Value Representations for Single-Image HDR Reconstruction - [ArXiv] [QA].
- The Making and Breaking of Camouflage - [ArXiv] [QA].
- ProPainter: Improving Propagation and Transformer for Video Inpainting - [ArXiv] [QA].
- InstructDiffusion: A Generalist Modeling Interface for Vision Tasks - [ArXiv] [QA].
- DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models - [ArXiv] [QA].
- FLM-101B: An Open LLM and How to Train It with $100K Budget - [ArXiv] [QA].
- Panoramas from Photons - [ArXiv] [QA].
- SimNP: Learning Self-Similarity Priors Between Neural Points - [ArXiv] [QA].
- Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption - [ArXiv] [QA].
- Large-Scale Automatic Audiobook Creation - [ArXiv] [QA].
- Enhancing Sample Utilization through Sample Adaptive Augmentation in Semi-Supervised Learning - [ArXiv] [QA].
- Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model - [ArXiv] [QA].
- Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation - [ArXiv] [QA].
- Temporal Collection and Distribution for Referring Video Object Segmentation - [ArXiv] [QA].
- SyncDreamer: Generating Multiview-consistent Images from a Single-view Image - [ArXiv] [QA].
- Large Language Models as Optimizers - [ArXiv] [QA].
- Distribution-Aware Prompt Tuning for Vision-Language Models - [ArXiv] [QA].
- Robotic Table Tennis: A Case Study into a High Speed Learning System - [ArXiv] [QA].
- Matcha-TTS: A fast TTS architecture with conditional flow matching - [ArXiv] [QA].
- Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields - [ArXiv] [QA].
- SLiMe: Segment Like Me - [ArXiv] [QA].
- ResFields: Residual Neural Fields for Spatiotemporal Signals - [ArXiv] [QA].
- MyoDex: A Generalizable Prior for Dexterous Manipulation - [ArXiv] [QA].
- Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction - [ArXiv] [QA].
- GPT Can Solve Mathematical Problems Without a Calculator - [ArXiv] [QA].
- Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning - [ArXiv] [QA].
- Physically Grounded Vision-Language Models for Robotic Manipulation - [ArXiv] [QA].
- A skeletonization algorithm for gradient-based optimization - [ArXiv] [QA].
- GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction - [ArXiv] [QA].
- Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach - [ArXiv] [QA].
- EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding - [ArXiv] [QA].
- Doppelgangers: Learning to Disambiguate Images of Similar Structures - [ArXiv] [QA].
- Generating Realistic Images from In-the-wild Sounds - [ArXiv] [QA].
- Prototype-based Dataset Comparison - [ArXiv] [QA].
- CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning - [ArXiv] [QA].
- Multi-label affordance mapping from egocentric vision - [ArXiv] [QA].
- Iterative Superquadric Recomposition of 3D Objects from Multiple Views - [ArXiv] [QA].
- Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples - [ArXiv] [QA].
- RawHDR: High Dynamic Range Image Reconstruction from a Single Raw Image - [ArXiv] [QA].
- NICE: CVPR 2023 Challenge on Zero-shot Image Captioning - [ArXiv] [QA].
- Empowering Low-Light Image Enhancer through Customized Learnable Priors - [ArXiv] [QA].
- Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations - [ArXiv] [QA].
- Are Emergent Abilities in Large Language Models just In-Context Learning? - [ArXiv] [QA].
- Mask-Attention-Free Transformer for 3D Instance Segmentation - [ArXiv] [QA].
- AGG-Net: Attention Guided Gated-convolutional Network for Depth Image Completion - [ArXiv] [QA].
- Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification - [ArXiv] [QA].
- EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity - [ArXiv] [QA].
- SOAR: Scene-debiasing Open-set Action Recognition - [ArXiv] [QA].
- Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning - [ArXiv] [QA].
- LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models - [ArXiv] [QA].
- EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment - [ArXiv] [QA].
- Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration - [ArXiv] [QA].
- CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection - [ArXiv] [QA].
- Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning - [ArXiv] [QA].
- ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models - [ArXiv] [QA].
- eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models - [ArXiv] [QA].
- Two-in-One Depth: Bridging the Gap Between Monocular and Binocular Self-supervised Depth Estimation - [ArXiv] [QA].
- Domain Generalization via Balancing Training Difficulty and Model Capability - [ArXiv] [QA].
- Few shot font generation via transferring similarity guided global style and quantization local style - [ArXiv] [QA].
- Instability of the solitary waves for the Generalized Benjamin-Bona-Mahony Equation - [ArXiv] [QA].
- Contrastive Feature Masking Open-Vocabulary Vision Transformer - [ArXiv] [QA].
- Searching for a Leptophilic Z' and a 3-3-1 symmetry at CLIC - [ArXiv] [QA].
- Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following - [ArXiv] [QA].
- CityDreamer: Compositional Generative Model of Unbounded 3D Cities - [ArXiv] [QA].
- Rieger, Schwabe, Suess-de Vries: The Sunny Beats of Resonance - [ArXiv] [QA].
- VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation - [ArXiv] [QA].
- Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior - [ArXiv] [QA].
- RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback - [ArXiv] [QA].
- A Massively Parallel Dynamic Programming for Approximate Rectangle Escape Problem - [ArXiv] [QA].
- Object-Centric Multiple Object Tracking - [ArXiv] [QA].
- Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation - [ArXiv] [QA].
- Pseudo-magnetic fields in square lattices - [ArXiv] [QA].
- Empirical Modeling of Variance in Medium Frequency R-Mode Time-of-Arrival Measurements - [ArXiv] [QA].
August 2023
- Block occurrences in the binary expansion - [ArXiv] [QA].
- YaRN: Efficient Context Window Extension of Large Language Models - [ArXiv] [QA].
- SoDaCam: Software-defined Cameras via Single-Photon Imaging - [ArXiv] [QA].
- FACET: Fairness in Computer Vision Evaluation Benchmark - [ArXiv] [QA].
- PointLLM: Empowering Large Language Models to Understand Point Clouds - [ArXiv] [QA].
- StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation - [ArXiv] [QA].
- InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion - [ArXiv] [QA].
- EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild - [ArXiv] [QA].
- GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields - [ArXiv] [QA].
- TouchStone: Evaluating Vision-Language Models by Language Models - [ArXiv] [QA].
- The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants - [ArXiv] [QA].
- SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation - [ArXiv] [QA].
- Coarse-to-Fine Amodal Segmentation with Shape Prior - [ArXiv] [QA].
- Can Programming Languages Boost Each Other via Instruction Tuning? - [ArXiv] [QA].
- Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models - [ArXiv] [QA].
- Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images - [ArXiv] [QA].
- Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images - [ArXiv] [QA].
- MVDream: Multi-view Diffusion for 3D Generation - [ArXiv] [QA].
- PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction - [ArXiv] [QA].
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models - [ArXiv] [QA].
- Improving Lens Flare Removal with General Purpose Pipeline and Multiple Light Sources Recovery - [ArXiv] [QA].
- BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge - [ArXiv] [QA].
- Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff - [ArXiv] [QA].
- Emergence of Segmentation with Minimalistic White-Box Transformers - [ArXiv] [QA].
- Active Neural Mapping - [ArXiv] [QA].
- Learning Vision-based Pursuit-Evasion Robot Policies - [ArXiv] [QA].
- SAM-Med2D - [ArXiv] [QA].
- MMVP: Motion-Matrix-based Video Prediction - [ArXiv] [QA].
- LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models - [ArXiv] [QA].
- Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-Spectral Image Fusion - [ArXiv] [QA].
- RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation - [ArXiv] [QA].
- WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model - [ArXiv] [QA].
- LLaSM: Large Language and Speech Model - [ArXiv] [QA].
- Reconstructing Groups of People with Hypergraph Relational Reasoning - [ArXiv] [QA].
- Introducing Language Guidance in Prompt-based Continual Learning - [ArXiv] [QA].
- WeatherBench 2: A benchmark for the next generation of data-driven global weather models - [ArXiv] [QA].
- Canonical Factors for Hybrid Neural Fields - [ArXiv] [QA].
- Shatter and Gather: Learning Referring Image Segmentation with Text Supervision - [ArXiv] [QA].
- Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation - [ArXiv] [QA].
- CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation - [ArXiv] [QA].
- Evaluation and Analysis of Hallucination in Large Vision-Language Models - [ArXiv] [QA].
- Learning to Upsample by Learning to Sample - [ArXiv] [QA].
- Class Prior-Free Positive-Unlabeled Learning with Taylor Variational Loss for Hyperspectral Remote Sensing Imagery - [ArXiv] [QA].
- Exploring Model Transferability through the Lens of Potential Energy - [ArXiv] [QA].
- Pose-Free Neural Radiance Fields via Implicit Pose Regularization - [ArXiv] [QA].
- Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models - [ArXiv] [QA].
- Vision Grid Transformer for Document Layout Analysis - [ArXiv] [QA].
- LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks - [ArXiv] [QA].
- Read-only Prompt Optimization for Vision-Language Few-shot Learning - [ArXiv] [QA].
- NSF: Neural Surface Fields for Human Modeling from Monocular Depth - [ArXiv] [QA].
- CLNeRF: Continual Learning Meets NeRF - [ArXiv] [QA].
- Efficient Discovery and Effective Evaluation of Visual Perceptual Similarity: A Benchmark and Beyond - [ArXiv] [QA].
- R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras - [ArXiv] [QA].
- S-TREK: Sequential Translation and Rotation Equivariant Keypoints for local feature extraction - [ArXiv] [QA].
- Referring Image Segmentation Using Text Supervision - [ArXiv] [QA].
- LAC: Latent Action Composition for Skeleton-based Action Segmentation - [ArXiv] [QA].
- Priority-Centric Human Motion Generation in Discrete Latent Space - [ArXiv] [QA].
- Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor - [ArXiv] [QA].
- Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection - [ArXiv] [QA].
- HoloFusion: Towards Photo-realistic 3D Generative Modeling - [ArXiv] [QA].
- Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks - [ArXiv] [QA].
- Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers - [ArXiv] [QA].
- Semi-Supervised Learning in the Few-Shot Zero-Shot Scenario - [ArXiv] [QA].
- MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records - [ArXiv] [QA].
- 4D Myocardium Reconstruction with Decoupled Motion and Shape Model - [ArXiv] [QA].
- Reconstructing Interacting Hands with Interaction Prior from Monocular Images - [ArXiv] [QA].
- Nonrigid Object Contact Estimation With Regional Unwrapping Transformer - [ArXiv] [QA].
- Hierarchical Contrastive Learning for Pattern-Generalizable Image Corruption Detection - [ArXiv] [QA].
- Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation - [ArXiv] [QA].
- Calibrating Panoramic Depth Estimation for Practical Localization and Mapping - [ArXiv] [QA].
- LDL: Line Distance Functions for Panoramic Localization - [ArXiv] [QA].
- Prior-guided Source-free Domain Adaptation for Human Pose Estimation - [ArXiv] [QA].
- Late Stopping: Avoiding Confidently Learning from Mislabeled Examples - [ArXiv] [QA].
- Beyond One-to-One: Rethinking the Referring Image Segmentation - [ArXiv] [QA].
- Point-Query Quadtree for Crowd Counting, Localization, and More - [ArXiv] [QA].
- ORES: Open-vocabulary Responsible Visual Synthesis - [ArXiv] [QA].
- Generalized Lightness Adaptation with Channel Selective Normalization - [ArXiv] [QA].
- MST-compression: Compressing and Accelerating Binary Neural Networks with Minimum Spanning Tree - [ArXiv] [QA].
- ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning - [ArXiv] [QA].
- Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers - [ArXiv] [QA].
- Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models - [ArXiv] [QA].
- Nougat: Neural Optical Understanding for Academic Documents - [ArXiv] [QA].
- SoTaNa: The Open-Source Software Development Assistant - [ArXiv] [QA].
- Harvard Glaucoma Detection and Progression: A Multimodal Multitask Dataset and Generalization-Reinforced Semi-Supervised Learning - [ArXiv] [QA].
- Relighting Neural Radiance Fields with Shadow and Highlight Hints - [ArXiv] [QA].
- Distribution-Aligned Diffusion for Human Mesh Recovery - [ArXiv] [QA].
- ConSlide: Asynchronous Hierarchical Interaction Transformer with Breakup-Reorganize Rehearsal for Continual Whole Slide Image Analysis - [ArXiv] [QA].
- SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation - [ArXiv] [QA].
- Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation - [ArXiv] [QA].
- Black-box Unsupervised Domain Adaptation with Bi-directional Atkinson-Shiffrin Memory - [ArXiv] [QA].
- ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking - [ArXiv] [QA].
- MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning - [ArXiv] [QA].
- IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint Inliers and Outliers Utilization - [ArXiv] [QA].
- Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model - [ArXiv] [QA].
- OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models - [ArXiv] [QA].
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM - [ArXiv] [QA].
- Preserving Modality Structure Improves Multi-Modal Learning - [ArXiv] [QA].
- NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes - [ArXiv] [QA].
- Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation - [ArXiv] [QA].
- Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities - [ArXiv] [QA].
- Dense Text-to-Image Generation with Attention Modulation - [ArXiv] [QA].
- Motion-Guided Masking for Spatiotemporal Representation Learning - [ArXiv] [QA].
- Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment - [ArXiv] [QA].
- Code Llama: Open Foundation Models for Code - [ArXiv] [QA].
- Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining? - [ArXiv] [QA].
- On Offline Evaluation of 3D Object Detection for Autonomous Driving - [ArXiv] [QA].
- LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition - [ArXiv] [QA].
- VIGC: Visual Instruction Generation and Correction - [ArXiv] [QA].
- A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions - [ArXiv] [QA].
- Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation - [ArXiv] [QA].
- Self-supervised Learning of Implicit Shape Representation with Dense Correspondence for Deformable Objects - [ArXiv] [QA].
- Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation - [ArXiv] [QA].
- Hyperbolic Audio-visual Zero-shot Learning - [ArXiv] [QA].
- Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking - [ArXiv] [QA].
- Masked Autoencoders are Efficient Class Incremental Learners - [ArXiv] [QA].
- CGMI: Configurable General Multi-Agent Interaction Framework - [ArXiv] [QA].
- With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning - [ArXiv] [QA].
- Vision Transformer Adapters for Generalizable Multitask Learning - [ArXiv] [QA].
- AdVerb: Visually Guided Audio Dereverberation - [ArXiv] [QA].
- Continual Zero-Shot Learning through Semantically Guided Generative Random Walks - [ArXiv] [QA].
- Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation - [ArXiv] [QA].
- CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images - [ArXiv] [QA].
- Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning - [ArXiv] [QA].
- SG-Former: Self-guided Transformer with Evolving Token Reallocation - [ArXiv] [QA].
- CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No - [ArXiv] [QA].
- Sign Language Translation with Iterative Prototype - [ArXiv] [QA].
- SILT: Shadow-aware Iterative Label Tuning for Learning to Detect Shadows from Noisy Labels - [ArXiv] [QA].
- DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration - [ArXiv] [QA].
- Aligning Language Models with Offline Reinforcement Learning from Human Feedback - [ArXiv] [QA].
- Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages - [ArXiv] [QA].
- RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D - [ArXiv] [QA].
- From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models - [ArXiv] [QA].
- RankMixup: Ranking-Based Mixup Training for Network Calibration - [ArXiv] [QA].
- Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields - [ArXiv] [QA].
- LFS-GAN: Lifelong Few-Shot Image Generation - [ArXiv] [QA].
- ACLS: Adaptive and Conditional Label Smoothing for Network Calibration - [ArXiv] [QA].
- Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification - [ArXiv] [QA].
- Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack - [ArXiv] [QA].
- SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets - [ArXiv] [QA].
- Semi-Supervised Learning via Weight-aware Distillation under Class Distribution Mismatch - [ArXiv] [QA].
- Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts - [ArXiv] [QA].
- Understanding Hessian Alignment for Domain Generalization - [ArXiv] [QA].
- Efficient Controllable Multi-Task Architectures - [ArXiv] [QA].
- Delving into Motion-Aware Matching for Monocular 3D Object Tracking - [ArXiv] [QA].
- StoryBench: A Multifaceted Benchmark for Continuous Story Visualization - [ArXiv] [QA].
- SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation - [ArXiv] [QA].
- Multi-event Video-Text Retrieval - [ArXiv] [QA].
- TrackFlow: Multi-Object Tracking with Normalizing Flows - [ArXiv] [QA].
- Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition - [ArXiv] [QA].
- Learning a More Continuous Zero Level Set in Unsigned Distance Fields through Level Set Projection - [ArXiv] [QA].
- A Survey on Large Language Model based Autonomous Agents - [ArXiv] [QA].
- ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes - [ArXiv] [QA].
- How Much Temporal Long-Term Context is Needed for Action Segmentation? - [ArXiv] [QA].
- Exemplar-Free Continual Transformer with Convolutions - [ArXiv] [QA].
- ProAgent: Building Proactive Cooperative AI with Large Language Models - [ArXiv] [QA].
- GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training - [ArXiv] [QA].
- CiteTracker: Correlating Image and Text for Visual Tracking - [ArXiv] [QA].
- CNN based Cuneiform Sign Detection Learned from Annotated 3D Renderings and Mapped Photographs with Illumination Augmentation - [ArXiv] [QA].
- HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations - [ArXiv] [QA].
- ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts - [ArXiv] [QA].
- LDP-Feat: Image Features with Local Differential Privacy - [ArXiv] [QA].
- DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment - [ArXiv] [QA].
- ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data - [ArXiv] [QA].
- Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models - [ArXiv] [QA].
- MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation - [ArXiv] [QA].
- ReFit: Recurrent Fitting Network for 3D Human Recovery - [ArXiv] [QA].
- Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation - [ArXiv] [QA].
- Domain Generalization via Rationale Invariance - [ArXiv] [QA].
- Efficient View Synthesis with Neural Radiance Distribution Field - [ArXiv] [QA].
- LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video Reconstruction - [ArXiv] [QA].
- CAME: Contrastive Automated Model Evaluation - [ArXiv] [QA].
- Recursive Video Lane Detection - [ArXiv] [QA].
- MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers - [ArXiv] [QA].
- Video OWL-ViT: Temporally-consistent open-world localization in video - [ArXiv] [QA].
- Audio-Visual Class-Incremental Learning - [ArXiv] [QA].
- TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection - [ArXiv] [QA].
- Neural Amortized Inference for Nested Multi-agent Reasoning - [ArXiv] [QA].
- MetaGCD: Learning to Continually Learn in Generalized Category Discovery - [ArXiv] [QA].
- UnLoc: A Unified Framework for Video Localization Tasks - [ArXiv] [QA].
- Coordinate Quantized Neural Implicit Representations for Multi-view Reconstruction - [ArXiv] [QA].
- Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images - [ArXiv] [QA].
- Few-Shot Physically-Aware Articulated Mesh Generation via Hierarchical Deformation - [ArXiv] [QA].
- Can Language Models Learn to Listen? - [ArXiv] [QA].
- EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition - [ArXiv] [QA].
- Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction - [ArXiv] [QA].
- Improving Continuous Sign Language Recognition with Cross-Lingual Signs - [ArXiv] [QA].
- MGMAE: Motion Guided Masking for Video Masked Autoencoding - [ArXiv] [QA].
- Instruction Tuning for Large Language Models: A Survey - [ArXiv] [QA].
- WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models - [ArXiv] [QA].
- On the Adversarial Robustness of Multi-Modal Foundation Models - [ArXiv] [QA].
- Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction - [ArXiv] [QA].
- Learning Clothing and Pose Invariant 3D Shape Representation for Long-Term Person Re-Identification - [ArXiv] [QA].
- A step towards understanding why classification helps regression - [ArXiv] [QA].
- Image-free Classifier Injection for Zero-Shot Classification - [ArXiv] [QA].
- CHORD: Category-level Hand-held Object Reconstruction via Shape Deformation - [ArXiv] [QA].
- Self-Feedback DETR for Temporal Action Detection - [ArXiv] [QA].
- Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations - [ArXiv] [QA].
- QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection - [ArXiv] [QA].
- Texture Generation on 3D Meshes with Point-UV Diffusion - [ArXiv] [QA].
- ADNet: Lane Shape Prediction via Anchor Decomposition - [ArXiv] [QA].
- STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning - [ArXiv] [QA].
- Privacy-Preserving Face Recognition Using Random Frequency Components - [ArXiv] [QA].
- Explore and Tell: Embodied Visual Captioning in 3D Environments - [ArXiv] [QA].
- When Prompt-based Incremental Learning Does Not Meet Strong Pretraining - [ArXiv] [QA].
- X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events - [ArXiv] [QA].
- GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems - [ArXiv] [QA].
- Diffusion Model as Representation Learner - [ArXiv] [QA].
- Simple Baselines for Interactive Video Retrieval with Questions and Answers - [ArXiv] [QA].
- Strata-NeRF : Neural Radiance Fields for Stratified Scenes - [ArXiv] [QA].
- Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos - [ArXiv] [QA].
- Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting - [ArXiv] [QA].
- DVGaze: Dual-View Gaze Estimation - [ArXiv] [QA].
- Representation Disparity-aware Distillation for 3D Object Detection - [ArXiv] [QA].
- Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation - [ArXiv] [QA].
- Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video - [ArXiv] [QA].
- DomainAdaptor: A Novel Approach to Test-time Adaptation - [ArXiv] [QA].
- DomainDrop: Suppressing Domain-Sensitive Channels for Domain Generalization - [ArXiv] [QA].
- CharacterChat: Learning towards Conversational AI with Personalized Social Support - [ArXiv] [QA].
- StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data - [ArXiv] [QA].
- GeT: Generative Target Structure Debiasing for Domain Adaptation - [ArXiv] [QA].
- ViT-Lens: Towards Omni-modal Representations - [ArXiv] [QA].
- Neural Interactive Keypoint Detection - [ArXiv] [QA].
- VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation - [ArXiv] [QA].
- FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory - [ArXiv] [QA].
- Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection - [ArXiv] [QA].
- ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer - [ArXiv] [QA].
- OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision - [ArXiv] [QA].
- ExpeL: LLM Agents Are Experiential Learners - [ArXiv] [QA].
- March in Chat: Interactive Prompting for Remote Embodied Referring Expression - [ArXiv] [QA].
- TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective - [ArXiv] [QA].
- 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation - [ArXiv] [QA].
- HollowNeRF: Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation - [ArXiv] [QA].
- Robust Mixture-of-Expert Training for Convolutional Neural Networks - [ArXiv] [QA].
- Root Pose Decomposition Towards Generic Non-rigid 3D Reconstruction with Monocular Videos - [ArXiv] [QA].
- Single Image Reflection Separation via Component Synergy - [ArXiv] [QA].
- Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation - [ArXiv] [QA].
- Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of Biases-Specific Experts - [ArXiv] [QA].
- ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment - [ArXiv] [QA].
- Disposable Transfer Learning for Selective Source Task Unlearning - [ArXiv] [QA].
- Tackling Vision Language Tasks Through Learning Inner Monologues - [ArXiv] [QA].
- Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos - [ArXiv] [QA].
- Scene-Aware Feature Matching - [ArXiv] [QA].
- Weakly-Supervised Action Localization by Hierarchically-structured Latent Attention Modeling - [ArXiv] [QA].
- On the Robustness of Open-World Test-Time Training: Self-Training with Dynamic Prototype Expansion - [ArXiv] [QA].
- Understanding Self-attention Mechanism via Dynamical System Perspective - [ArXiv] [QA].
- BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions - [ArXiv] [QA].
- MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition - [ArXiv] [QA].
- VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations - [ArXiv] [QA].
- Scalable Video Object Segmentation with Simplified Framework - [ArXiv] [QA].
- SwinLSTM:Improving Spatiotemporal Prediction Accuracy using Swin Transformer and LSTM - [ArXiv] [QA].
- Calibrating Uncertainty for Semi-Supervised Crowd Counting - [ArXiv] [QA].
- Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders - [ArXiv] [QA].
- A Theory of Topological Derivatives for Inverse Rendering of Geometry - [ArXiv] [QA].
- How susceptible are LLMs to Logical Fallacies? - [ArXiv] [QA].
- VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control - [ArXiv] [QA].
- Long-range Multimodal Pretraining for Movie Understanding - [ArXiv] [QA].
- Smoothness Similarity Regularization for Few-Shot GAN Adaptation - [ArXiv] [QA].
- Robust Monocular Depth Estimation under Challenging Conditions - [ArXiv] [QA].
- LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark - [ArXiv] [QA].
- ChatHaruhi: Reviving Anime Character in Reality via Large Language Model - [ArXiv] [QA].
- StableVideo: Text-driven Consistency-aware Diffusion Video Editing - [ArXiv] [QA].
- WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct - [ArXiv] [QA].
- PUMGPT: A Large Vision-Language Model for Product Understanding - [ArXiv] [QA].
- Meta-ZSDETR: Zero-shot DETR with Meta-learning - [ArXiv] [QA].
- Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning - [ArXiv] [QA].
- Leveraging Intrinsic Properties for Non-Rigid Garment Alignment - [ArXiv] [QA].
- ResQ: Residual Quantization for Video Perception - [ArXiv] [QA].
- Vision Relation Transformer for Unbiased Scene Graph Generation - [ArXiv] [QA].
- MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection - [ArXiv] [QA].
- Generalizable Decision Boundaries: Dualistic Meta-Learning for Open Set Domain Generalization - [ArXiv] [QA].
- DReg-NeRF: Deep Registration for Neural Radiance Fields - [ArXiv] [QA].
- Label-Free Event-based Object Recognition via Joint Learning with Image Reconstruction from Events - [ArXiv] [QA].
- Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models - [ArXiv] [QA].
- RLIPv2: Fast Scaling of Relational Language-Image Pre-training - [ArXiv] [QA].
- Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching - [ArXiv] [QA].
- Audio-Visual Glance Network for Efficient Video Recognition - [ArXiv] [QA].
- Retro-FPN: Retrospective Feature Pyramid Network for Point Cloud Semantic Segmentation - [ArXiv] [QA].
- Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge - [ArXiv] [QA].
- DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability - [ArXiv] [QA].
- Human Part-wise 3D Motion Context Learning for Sign Language Recognition - [ArXiv] [QA].
- NAPA-VQ: Neighborhood Aware Prototype Augmentation with Vector Quantization for Continual Learning - [ArXiv] [QA].
- Self-Calibrated Cross Attention Network for Few-Shot Segmentation - [ArXiv] [QA].
- Diverse Cotraining Makes Strong Semi-Supervised Segmentor - [ArXiv] [QA].
- Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos - [ArXiv] [QA].
- Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos - [ArXiv] [QA].
- SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos - [ArXiv] [QA].
- ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation - [ArXiv] [QA].
- Generalized Sum Pooling for Metric Learning - [ArXiv] [QA].
- FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning - [ArXiv] [QA].
- The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation - [ArXiv] [QA].
- ImGeoNet: Image-induced Geometry-aware Voxel Representation for Multi-view 3D Object Detection - [ArXiv] [QA].
- SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning - [ArXiv] [QA].
- Reinforced Self-Training (ReST) for Language Modeling - [ArXiv] [QA].
- Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction - [ArXiv] [QA].
- Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification - [ArXiv] [QA].
- Event-Guided Procedure Planning from Instructional Videos with Text Supervision - [ArXiv] [QA].
- Towards Semi-supervised Learning with Non-random Missing Labels - [ArXiv] [QA].
- Spatially and Spectrally Consistent Deep Functional Maps - [ArXiv] [QA].
- Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling - [ArXiv] [QA].
- Fast Inference and Update of Probabilistic Density Estimation on Trajectory Prediction - [ArXiv] [QA].
- MixBag: Bag-Level Data Augmentation for Learning from Label Proportions - [ArXiv] [QA].
- Label Shift Adapter for Test-Time Adaptation under Covariate and Label Shifts - [ArXiv] [QA].
- Long-Range Grouping Transformer for Multi-View 3D Reconstruction - [ArXiv] [QA].
- V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints - [ArXiv] [QA].
- TeCH: Text-guided Reconstruction of Lifelike Clothed Humans - [ArXiv] [QA].
- MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions - [ArXiv] [QA].
- Learning to Distill Global Representation for Sparse-View CT - [ArXiv] [QA].
- ALIP: Adaptive Language-Image Pre-training with Synthetic Caption - [ArXiv] [QA].
- Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer - [ArXiv] [QA].
- Agglomerative Transformer for Human-Object Interaction Detection - [ArXiv] [QA].
- Membrane Potential Batch Normalization for Spiking Neural Networks - [ArXiv] [QA].
- Stable and Causal Inference for Discriminative Self-supervised Deep Visual Representations - [ArXiv] [QA].
- Dual-Stream Diffusion Net for Text-to-Video Generation - [ArXiv] [QA].
- SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes - [ArXiv] [QA].
- MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation - [ArXiv] [QA].
- Inherent Redundancy in Spiking Neural Networks - [ArXiv] [QA].
- Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network - [ArXiv] [QA].
- Unsupervised Domain Adaptive Detection with Network Stability Analysis - [ArXiv] [QA].
- Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis - [ArXiv] [QA].
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework - [ArXiv] [QA].
- GPA-3D: Geometry-aware Prototype Alignment for Unsupervised Domain Adaptive 3D Object Detection from Point Clouds - [ArXiv] [QA].
- OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution - [ArXiv] [QA].
- View Consistent Purification for Accurate Cross-View Localization - [ArXiv] [QA].
- DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory - [ArXiv] [QA].
- Teach LLMs to Personalize -- An Approach inspired by Writing Education - [ArXiv] [QA].
- CoDeF: Content Deformation Fields for Temporally Consistent Video Processing - [ArXiv] [QA].
- RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models - [ArXiv] [QA].
- Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification - [ArXiv] [QA].
- Helping Hands: An Object-Aware Ego-Centric Video Recognition Model - [ArXiv] [QA].
- Relightable and Animatable Neural Avatar from Sparse-View Video - [ArXiv] [QA].
- Memory-and-Anticipation Transformer for Online Action Understanding - [ArXiv] [QA].
- Link-Context Learning for Multimodal LLMs - [ArXiv] [QA].
- ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces - [ArXiv] [QA].
- StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models - [ArXiv] [QA].
- ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition - [ArXiv] [QA].
- Learning to Identify Critical States for Reinforcement Learning from Videos - [ArXiv] [QA].
- DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding - [ArXiv] [QA].
- Identity-Consistent Aggregation for Video Object Detection - [ArXiv] [QA].
- UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation - [ArXiv] [QA].
- DiffGuard: Semantic Mismatch-Guided Out-of-Distribution Detection using Pre-trained Diffusion Models - [ArXiv] [QA].
- Boosting Multi-modal Model Performance with Adaptive Gradient Modulation - [ArXiv] [QA].
- Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval - [ArXiv] [QA].
- Backpropagation Path Search On Adversarial Transferability - [ArXiv] [QA].
- Story Visualization by Online Text Augmentation with Context Memory - [ArXiv] [QA].
- 3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack - [ArXiv] [QA].
- DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation - [ArXiv] [QA].
- Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering - [ArXiv] [QA].
- Text Injection for Capitalization and Turn-Taking Prediction in Speech Models - [ArXiv] [QA].
- PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects - [ArXiv] [QA].
- Platypus: Quick, Cheap, and Powerful Refinement of LLMs - [ArXiv] [QA].
- Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation - [ArXiv] [QA].
- Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation - [ArXiv] [QA].
- The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation - [ArXiv] [QA].
- RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs - [ArXiv] [QA].
- Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning - [ArXiv] [QA].
- ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate - [ArXiv] [QA].
- OctoPack: Instruction Tuning Code Large Language Models - [ArXiv] [QA].
- CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation - [ArXiv] [QA].
- Masked Motion Predictors are Strong 3D Action Representation Learners - [ArXiv] [QA].
- S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields - [ArXiv] [QA].
- ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion - [ArXiv] [QA].
- Global Features are All You Need for Image Retrieval and Reranking - [ArXiv] [QA].
- Knowing Where to Focus: Event-aware Transformer for Video Grounding - [ArXiv] [QA].
- CBA: Improving Online Continual Learning via Continual Bias Adaptor - [ArXiv] [QA].
- CausalLM is not optimal for in-context learning - [ArXiv] [QA].
- Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking - [ArXiv] [QA].
- Towards Open-Set Test-Time Adaptation Utilizing the Wisdom of Crowds in Entropy Minimization - [ArXiv] [QA].
- SpeechX: Neural Codec Language Model as a Versatile Speech Transformer - [ArXiv] [QA].
- RMP-Loss: Regularizing Membrane Potential Distribution for Spiking Neural Networks - [ArXiv] [QA].
- Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning - [ArXiv] [QA].
- Unsupervised Image Denoising in Real-World Scenarios via Self-Collaboration Parallel Generative Adversarial Branches - [ArXiv] [QA].
- Dual Meta-Learning with Longitudinally Generalized Regularization for One-Shot Brain Tissue Segmentation Across the Human Lifespan - [ArXiv] [QA].
- AerialVLN: Vision-and-Language Navigation for UAVs - [ArXiv] [QA].
- IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models - [ArXiv] [QA].
- Compositional Feature Augmentation for Unbiased Scene Graph Generation - [ArXiv] [QA].
- Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation - [ArXiv] [QA].
- Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training - [ArXiv] [QA].
- 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking - [ArXiv] [QA].
- VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use - [ArXiv] [QA].
- Cyclic Test-Time Adaptation on Monocular Video for 3D Human Mesh Reconstruction - [ArXiv] [QA].
- Revisiting Vision Transformer from the View of Path Ensemble - [ArXiv] [QA].
- SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning - [ArXiv] [QA].
- BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation - [ArXiv] [QA].
- One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training - [ArXiv] [QA].
- Tiny and Efficient Model for the Edge Detection Generalization - [ArXiv] [QA].
- Multi-Label Knowledge Distillation - [ArXiv] [QA].
- Detecting and Preventing Hallucinations in Large Vision Language Models - [ArXiv] [QA].
- U-RED: Unsupervised 3D Shape Retrieval and Deformation for Partial Point Clouds - [ArXiv] [QA].
- Enhancing Network Management Using Code Generated by Large Language Models - [ArXiv] [QA].
- Self-Alignment with Instruction Backtranslation - [ArXiv] [QA].
- FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods - [ArXiv] [QA].
- Improving Joint Speech-Text Representations Without Alignment - [ArXiv] [QA].
- Composable Function-preserving Expansions for Transformer Architectures - [ArXiv] [QA].
- BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents - [ArXiv] [QA].
- PIPPA: A Partially Synthetic Conversational Dataset - [ArXiv] [QA].
- PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs - [ArXiv] [QA].
- Follow Anything: Open-set detection, tracking, and following in real-time - [ArXiv] [QA].
- AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining - [ArXiv] [QA].
- FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models - [ArXiv] [QA].
- PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers - [ArXiv] [QA].
- 2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds - [ArXiv] [QA].
- Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative Convolution Network - [ArXiv] [QA].
- Cross-Domain Product Representation Learning for Rich-Content E-Commerce - [ArXiv] [QA].
- Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation - [ArXiv] [QA].
- LLM As DBA - [ArXiv] [QA].
- Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation - [ArXiv] [QA].
- Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation - [ArXiv] [QA].
- SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated, Noisy, and Decimated Point Cloud Data - [ArXiv] [QA].
- Learning Gabor Texture Features for Fine-Grained Recognition - [ArXiv] [QA].
- Enhancing Trust in LLM-Based AI Automation Agents: New Considerations and Future Challenges - [ArXiv] [QA].
- Interaction-aware Joint Attention Estimation Using People Attributes - [ArXiv] [QA].
- Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment - [ArXiv] [QA].
- Flexible Isosurface Extraction for Gradient-Based Mesh Optimization - [ArXiv] [QA].
- Pseudo-label Alignment for Semi-supervised Instance Segmentation - [ArXiv] [QA].
- OpenProteinSet: Training data for structural biology at scale - [ArXiv] [QA].
- RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End Robust Estimation - [ArXiv] [QA].
- Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI - [ArXiv] [QA].
- LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation - [ArXiv] [QA].
- Feature Modulation Transformer: Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution - [ArXiv] [QA].
- Robust Object Modeling for Visual Tracking - [ArXiv] [QA].
- IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-Conditioned Diffusion Models - [ArXiv] [QA].
- Foreground Object Search by Distilling Composite Image Feature - [ArXiv] [QA].
- Prototypical Kernel Learning and Open-set Foreground Perception for Generalized Few-shot Semantic Segmentation - [ArXiv] [QA].
- SelectNAdapt: Support Set Selection for Few-Shot Domain Adaptation - [ArXiv] [QA].
- WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields - [ArXiv] [QA].
- PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration - [ArXiv] [QA].
- Objects do not disappear: Video object detection by single-frame object location anticipation - [ArXiv] [QA].
- Bird's-Eye-View Scene Graph for Vision-Language Navigation - [ArXiv] [QA].
- JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models - [ArXiv] [QA].
- GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization - [ArXiv] [QA].
- Score Priors Guided Deep Variational Inference for Unsupervised Real-World Single Image Denoising - [ArXiv] [QA].
- Accelerating LLM Inference with Staged Speculative Decoding - [ArXiv] [QA].
- Rendering Humans from Object-Occluded Monocular Videos - [ArXiv] [QA].
- Shepherd: A Critic for Language Model Generation - [ArXiv] [QA].
- LATR: 3D Lane Detection from Monocular Images with Transformer - [ArXiv] [QA].
- FocalFormer3D : Focusing on Hard Instance for 3D Object Detection - [ArXiv] [QA].
- Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation - [ArXiv] [QA].
- DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds - [ArXiv] [QA].
- 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment - [ArXiv] [QA].
- Exploring Transformers for Open-world Instance Segmentation - [ArXiv] [QA].
- D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation - [ArXiv] [QA].
- Under-Display Camera Image Restoration with Scattering Effect - [ArXiv] [QA].
- Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions - [ArXiv] [QA].
- OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation - [ArXiv] [QA].
- 3D Gaussian Splatting for Real-Time Radiance Field Rendering - [ArXiv] [QA].
- Gentopia: A Collaborative Platform for Tool-Augmented LLMs - [ArXiv] [QA].
- AgentSims: An Open-Source Sandbox for Large Language Model Evaluation - [ArXiv] [QA].
- Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning - [ArXiv] [QA].
- Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval - [ArXiv] [QA].
- PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection - [ArXiv] [QA].
- TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models - [ArXiv] [QA].
- From Sky to the Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal - [ArXiv] [QA].
- 3D Motion Magnification: Visualizing Subtle Motions with Time Varying Radiance Fields - [ArXiv] [QA].
- Tiny LVLM-eHub: Early Multimodal Experiments with Bard - [ArXiv] [QA].
- AgentBench: Evaluating LLMs as Agents - [ArXiv] [QA].
- Learning Concise and Descriptive Attributes for Visual Recognition - [ArXiv] [QA].
- FeatEnHancer: Enhancing Hierarchical Features for Object Detection and Beyond Under Low-Light Vision - [ArXiv] [QA].
- Lighting Every Darkness in Two Pairs: A Calibration-Free Pipeline for RAW Denoising - [ArXiv] [QA].
- GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images - [ArXiv] [QA].
- Heterogeneous Forgetting Compensation for Class-Incremental Learning - [ArXiv] [QA].
- Dual Aggregation Transformer for Image Super-Resolution - [ArXiv] [QA].
- Foundation Model based Open Vocabulary Task Planning and Executive System for General Purpose Service Robots - [ArXiv] [QA].
- SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs - [ArXiv] [QA].
- Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation - [ArXiv] [QA].
- A Benchmark for Chinese-English Scene Text Image Super-resolution - [ArXiv] [QA].
- Source-free Domain Adaptive Human Pose Estimation - [ArXiv] [QA].
- Prototypes-oriented Transductive Few-shot Learning with Conditional Transport - [ArXiv] [QA].
- Learning Fine-Grained Features for Pixel-wise Video Correspondences - [ArXiv] [QA].
- Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection - [ArXiv] [QA].
- An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial Transferability - [ArXiv] [QA].
- Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation - [ArXiv] [QA].
- Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis - [ArXiv] [QA].
- EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education - [ArXiv] [QA].
- DeDrift: Robust Similarity Search under Content Drift - [ArXiv] [QA].
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities - [ArXiv] [QA].
- Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization - [ArXiv] [QA].
- The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World - [ArXiv] [QA].
- DETR Doesn't Need Multi-Scale or Locality Design - [ArXiv] [QA].
- Scaling Relationship on Learning Mathematical Reasoning with Large Language Models - [ArXiv] [QA].
- RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension - [ArXiv] [QA].
- Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport - [ArXiv] [QA].
- Ambient Adventures: Teaching ChatGPT on Developing Complex Stories - [ArXiv] [QA].
- LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment - [ArXiv] [QA].
- InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent - [ArXiv] [QA].
- Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class Representation - [ArXiv] [QA].
- MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies - [ArXiv] [QA].
- Multimodal Neurons in Pretrained Text-Only Transformers - [ArXiv] [QA].
- TDMD: A Database for Dynamic Color Mesh Subjective and Objective Quality Explorations - [ArXiv] [QA].
- Target-point Attention Transformer: A novel trajectory predict network for end-to-end autonomous driving - [ArXiv] [QA].
- Efficient neural supersampling on a novel gaming dataset - [ArXiv] [QA].
- HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions - [ArXiv] [QA].
- On $κ$-solutions and canonical neighborhoods in 4d Ricci flow - [ArXiv] [QA].
- OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models - [ArXiv] [QA].
- DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales - [ArXiv] [QA].
- Computational Long Exposure Mobile Photography - [ArXiv] [QA].
- More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes - [ArXiv] [QA].
- Revisiting DETR Pre-training for Object Detection - [ArXiv] [QA].
- A Hyper-pixel-wise Contrastive Learning Augmented Segmentation Network for Old Landslide Detection Using High-Resolution Remote Sensing Images and Digital Elevation Model Data - [ArXiv] [QA].
- LSF-IDM: Automotive Intrusion Detection Model with Lightweight Attribution and Semantic Fusion - [ArXiv] [QA].
- Geometric wakes in collimators and step transitions of arbitrary cross-sections: conformal mapping approach - [ArXiv] [QA].
- One Tree to Rule Them All: Poly-Logarithmic Universal Steiner Tree - [ArXiv] [QA].
- Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation - [ArXiv] [QA].
- Three-level Dicke quantum battery - [ArXiv] [QA].
- Multiobjective Optimization of Non-Smooth PDE-Constrained Problems - [ArXiv] [QA].
- Black hole thermodynamics in Horndeski theories - [ArXiv] [QA].
- MammoDG: Generalisable Deep Learning Breaks the Limits of Cross-Domain Multi-Center Breast Cancer Screening - [ArXiv] [QA].
- Stability Analysis for a Class of Heterogeneous Catalysis Models - [ArXiv] [QA].
- An improved infrastructure for the IceCube realtime system - [ArXiv] [QA].
- Model-agnostic search for the quasinormal modes of gravitational wave echoes - [ArXiv] [QA].
- Enhancing Representation Learning for Periodic Time Series with Floss: A Frequency Domain Regularization Approach - [ArXiv] [QA].
- From Sparse to Soft Mixtures of Experts - [ArXiv] [QA].
- Cosmological Distance Measurement of 12 Nearby Supernovae IIP with ROTSE-IIIB - [ArXiv] [QA].
- ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation - [ArXiv] [QA].
- VLUCI: Variational Learning of Unobserved Confounders for Counterfactual Inference - [ArXiv] [QA].
- Weak localization in radiative transfer of acoustic waves in a randomly-fluctuating slab - [ArXiv] [QA].
- Optimal design of plane elastic membranes using the convexified Föppl's model - [ArXiv] [QA].
- Body Knowledge and Uncertainty Modeling for Monocular 3D Human Body Reconstruction - [ArXiv] [QA].
- LISA: Reasoning Segmentation via Large Language Model - [ArXiv] [QA].
- Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models - [ArXiv] [QA].
- Note: Stokes-Einstein relation without hydrodynamic diameter in the TIP4P/Ice water model - [ArXiv] [QA].
- ELFNet: Evidential Local-global Fusion for Stereo Matching - [ArXiv] [QA].
- Detecting Cloud Presence in Satellite Images Using the RGB-based CLIP Vision-Language Model - [ArXiv] [QA].
- Understanding URDF: A Dataset and Analysis - [ArXiv] [QA].
- Stochastic Geometry Based Modeling and Analysis on Network NOMA in Downlink CoMP Systems - [ArXiv] [QA].
- A many-sorted epistemic logic for chromatic hypergraphs - [ArXiv] [QA].
- SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning - [ArXiv] [QA].
- DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving - [ArXiv] [QA].
- Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning - [ArXiv] [QA].
- Deep Image Harmonization with Learnable Augmentation - [ArXiv] [QA].
- Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation - [ArXiv] [QA].
- MetaGPT: Meta Programming for Multi-Agent Collaborative Framework - [ArXiv] [QA].
- Artifact: Measuring and Mitigating Gaps in Structural Testing - [ArXiv] [QA].
- Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models - [ArXiv] [QA].
- Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models - [ArXiv] [QA].
- Online Prototype Learning for Online Continual Learning - [ArXiv] [QA].
- CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering - [ArXiv] [QA].
- Improving Pixel-based MIM by Reducing Wasted Modeling Capability - [ArXiv] [QA].
- GOALS-JWST: Gas Dynamics and Excitation in NGC7469 revealed by NIRSpec - [ArXiv] [QA].
July 2023
- Predicting masked tokens in stochastic locations improves masked image modeling - [ArXiv] [QA].
- Learning to Model the World with Language - [ArXiv] [QA].
- Discovering Adaptable Symbolic Algorithms from Scratch - [ArXiv] [QA].
- Virtual Prompt Injection for Instruction-Tuned Large Language Models - [ArXiv] [QA].
- Shortcut Partitions in Minor-Free Graphs: Steiner Point Removal, Distance Oracles, Tree Covers, and More - [ArXiv] [QA].
- Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy - [ArXiv] [QA].
- Random Sub-Samples Generation for Self-Supervised Real Image Denoising - [ArXiv] [QA].
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs - [ArXiv] [QA].
- UniVTG: Towards Unified Video-Language Temporal Grounding - [ArXiv] [QA].
- DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation - [ArXiv] [QA].
- Guiding Image Captioning Models Toward More Specific Captions - [ArXiv] [QA].
- CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification - [ArXiv] [QA].
- Transferable Decoding with Visual Entities for Zero-Shot Image Captioning - [ArXiv] [QA].
- Towards General Low-Light Raw Noise Synthesis and Modeling - [ArXiv] [QA].
- MovieChat: From Dense Token to Sparse Memory for Long Video Understanding - [ArXiv] [QA].
- DRAW: Defending Camera-shooted RAW against Image Manipulation - [ArXiv] [QA].
- DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization - [ArXiv] [QA].
- Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks - [ArXiv] [QA].
- JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery - [ArXiv] [QA].
- LP-MusicCaps: LLM-Based Pseudo Music Captioning - [ArXiv] [QA].
- AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? - [ArXiv] [QA].
- Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples - [ArXiv] [QA].
- Evaluating ChatGPT and GPT-4 for Visual Programming - [ArXiv] [QA].
- Unified Model for Image, Video, Audio and Language Tasks - [ArXiv] [QA].
- Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models - [ArXiv] [QA].
- SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension - [ArXiv] [QA].
- XMem++: Production-level Video Segmentation From Few Annotated Frames - [ArXiv] [QA].
- CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation - [ArXiv] [QA].
- What can Discriminator do? Towards Box-free Ownership Verification of Generative Adversarial Network - [ArXiv] [QA].
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control - [ArXiv] [QA].
- The Hydra Effect: Emergent Self-repair in Language Model Computations - [ArXiv] [QA].
- MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking - [ArXiv] [QA].
- Scaling Data Generation in Vision-and-Language Navigation - [ArXiv] [QA].
- Robust Distortion-free Watermarks for Language Models - [ArXiv] [QA].
- Exploring Format Consistency for Instruction Tuning - [ArXiv] [QA].
- Uncertainty-aware Unsupervised Multi-Object Tracking - [ArXiv] [QA].
- Supervised Homography Learning with Realistic Dataset Generation - [ArXiv] [QA].
- Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding - [ArXiv] [QA].
- Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF - [ArXiv] [QA].
- TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts - [ArXiv] [QA].
- Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification - [ArXiv] [QA].
- Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback - [ArXiv] [QA].
- PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization - [ArXiv] [QA].
- Med-Flamingo: a Multimodal Medical Few-shot Learner - [ArXiv] [QA].
- Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields - [ArXiv] [QA].
- To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation - [ArXiv] [QA].
- Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation - [ArXiv] [QA].
- Learning Depth Estimation for Transparent and Mirror Surfaces - [ArXiv] [QA].
- Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models - [ArXiv] [QA].
- TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis - [ArXiv] [QA].
- Diverse Inpainting and Editing with GAN Inversion - [ArXiv] [QA].
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges - [ArXiv] [QA].
- Scaling TransNormer to 175 Billion Parameters - [ArXiv] [QA].
- S$^3$: Social-network Simulation System with Large Language Model-Empowered Agents - [ArXiv] [QA].
- Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models - [ArXiv] [QA].
- PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback - [ArXiv] [QA].
- Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning - [ArXiv] [QA].
- Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining - [ArXiv] [QA].
- Test Time Adaptation for Blind Image Quality Assessment - [ArXiv] [QA].
- P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds - [ArXiv] [QA].
- Pre-training Vision Transformers with Very Limited Synthesized Images - [ArXiv] [QA].
- Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation - [ArXiv] [QA].
- 360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking - [ArXiv] [QA].
- NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection - [ArXiv] [QA].
- TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation - [ArXiv] [QA].
- Clustering based Point Cloud Representation Learning for 3D Analysis - [ArXiv] [QA].
- Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition - [ArXiv] [QA].
- MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation - [ArXiv] [QA].
- Three Bricks to Consolidate Watermarks for Large Language Models - [ArXiv] [QA].
- MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation - [ArXiv] [QA].
- WavJourney: Compositional Audio Creation with Large Language Models - [ArXiv] [QA].
- Towards Generalist Biomedical AI - [ArXiv] [QA].
- G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory - [ArXiv] [QA].
- Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences - [ArXiv] [QA].
- ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation - [ArXiv] [QA].
- Creative Birds: Self-Supervised Single-View 3D Style Transfer - [ArXiv] [QA].
- Leveraging Implicit Feedback from Deployment Data in Dialogue - [ArXiv] [QA].
- Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching - [ArXiv] [QA].
- Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models - [ArXiv] [QA].
- 3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability - [ArXiv] [QA].
- Controllable Guide-Space for Generalizable Face Forgery Detection - [ArXiv] [QA].
- Adaptive Frequency Filters As Efficient Global Token Mixers - [ArXiv] [QA].
- Tracking Anything in High Quality - [ArXiv] [QA].
- AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception - [ArXiv] [QA].
- Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception - [ArXiv] [QA].
- trajdata: A Unified Interface to Multiple Human Trajectory Datasets - [ArXiv] [QA].
- Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation - [ArXiv] [QA].
- WebArena: A Realistic Web Environment for Building Autonomous Agents - [ArXiv] [QA].
- How to Scale Your EMA - [ArXiv] [QA].
- PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single View - [ArXiv] [QA].
- Composite Diffusion | whole >= Σparts - [ArXiv] [QA].
- ARB: Advanced Reasoning Benchmark for Large Language Models - [ArXiv] [QA].
- RecursiveDet: End-to-End Region-based Recursive Object Detection - [ArXiv] [QA].
- Spectrum-guided Multi-granularity Referring Video Object Segmentation - [ArXiv] [QA].
- Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection - [ArXiv] [QA].
- FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios - [ArXiv] [QA].
- Weakly-supervised 3D Pose Transfer with Keypoints - [ArXiv] [QA].
- Predicting Code Coverage without Execution - [ArXiv] [QA].
- Unmasking Anomalies in Road-Scene Segmentation - [ArXiv] [QA].
- LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition - [ArXiv] [QA].
- Conditional Cross Attention Network for Multi-Space Embedding without Entanglement in Only a SINGLE Network - [ArXiv] [QA].
- GaPro: Box-Supervised 3D Point Cloud Instance Segmentation Using Gaussian Processes as Pseudo Labelers - [ArXiv] [QA].
- Strivec: Sparse Tri-Vector Radiance Fields - [ArXiv] [QA].
- GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping - [ArXiv] [QA].
- Contrastive Example-Based Control - [ArXiv] [QA].
- LLM-Rec: Personalized Recommendation via Prompting Large Language Models - [ArXiv] [QA].
- 3D-LLM: Injecting the 3D World into Large Language Models - [ArXiv] [QA].
- Evaluating the Ripple Effects of Knowledge Editing in Language Models - [ArXiv] [QA].
- RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment - [ArXiv] [QA].
- GridMM: Grid Memory Map for Vision-and-Language Navigation - [ArXiv] [QA].
- A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis - [ArXiv] [QA].
- Multiscale Video Pretraining for Long-Term Activity Forecasting - [ArXiv] [QA].
- Fast Full-frame Video Stabilization with Iterative Optimization - [ArXiv] [QA].
- COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts - [ArXiv] [QA].
- Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction - [ArXiv] [QA].
- MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features - [ArXiv] [QA].
- PG-RCNN: Semantic Surface Point Generation for 3D Object Detection - [ArXiv] [QA].
- CTVIS: Consistent Training for Online Video Instance Segmentation - [ArXiv] [QA].
- Less is More: Focus Attention for Efficient DETR - [ArXiv] [QA].
- PRIOR: Prototype Representation Joint Learning from Medical Images and Reports - [ArXiv] [QA].
- A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation - [ArXiv] [QA].
- Interpolating between Images with Diffusion Models - [ArXiv] [QA].
- PUMA: Secure Inference of LLaMA-7B in Five Minutes - [ArXiv] [QA].
- TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition - [ArXiv] [QA].
- Rethinking Data Distillation: Do Not Overlook Calibration - [ArXiv] [QA].
- ProtoFL: Unsupervised Federated Learning via Prototypical Distillation - [ArXiv] [QA].
- Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection - [ArXiv] [QA].
- TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering - [ArXiv] [QA].
- Downstream-agnostic Adversarial Examples - [ArXiv] [QA].
- LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference - [ArXiv] [QA].
- LIST: Learning Implicitly from Spatial Transformers for Single-View 3D Reconstruction - [ArXiv] [QA].
- Optimized Network Architectures for Large Language Model Training with Billions of Parameters - [ArXiv] [QA].
- Hallucination Improves the Performance of Unsupervised Visual Representation Learning - [ArXiv] [QA].
- Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes - [ArXiv] [QA].
- Discovering Spatio-Temporal Rationales for Video Question Answering - [ArXiv] [QA].
- On the Effectiveness of Spectral Discriminators for Perceptual Quality Improvement - [ArXiv] [QA].
- Learning Vision-and-Language Navigation from YouTube Videos - [ArXiv] [QA].
- Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels? - [ArXiv] [QA].
- CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots - [ArXiv] [QA].
- HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness - [ArXiv] [QA].
- Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts - [ArXiv] [QA].
- OxfordTVG-HIC: Can Machine Make Humorous Captions from Images? - [ArXiv] [QA].
- Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation - [ArXiv] [QA].
- CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields - [ArXiv] [QA].
- Prompting Large Language Models with Speech Recognition Abilities - [ArXiv] [QA].
- FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields - [ArXiv] [QA].
- Deep Directly-Trained Spiking Neural Networks for Object Detection - [ArXiv] [QA].
- Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning - [ArXiv] [QA].
- CLR: Channel-wise Lightweight Reprogramming for Continual Learning - [ArXiv] [QA].
- Tuning Pre-trained Model via Moment Probing - [ArXiv] [QA].
- Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields - [ArXiv] [QA].
- DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport - [ArXiv] [QA].
- MAS: Towards Resource-Efficient Federated Multiple-Task Learning - [ArXiv] [QA].
- Brain2Music: Reconstructing Music from Human Brain Activity - [ArXiv] [QA].
- AlignDet: Aligning Pre-training and Fine-tuning in Object Detection - [ArXiv] [QA].
- Cascade-DETR: Delving into High-Quality Universal Object Detection - [ArXiv] [QA].
- General Image-to-Image Translation with One-Shot Image Guidance - [ArXiv] [QA].
- Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image - [ArXiv] [QA].
- Improving Online Lane Graph Extraction by Object-Lane Clustering - [ArXiv] [QA].
- Proxy Anchor-based Unsupervised Learning for Continuous Generalized Category Discovery - [ArXiv] [QA].
- PASTA: Pretrained Action-State Transformer Agents - [ArXiv] [QA].
- FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets - [ArXiv] [QA].
- Diffusion Sampling with Momentum for Mitigating Divergence Artifacts - [ArXiv] [QA].
- The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning - [ArXiv] [QA].
- BlendFace: Re-designing Identity Encoders for Face-Swapping - [ArXiv] [QA].
- BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion - [ArXiv] [QA].
- Meta-Transformer: A Unified Framework for Multimodal Learning - [ArXiv] [QA].
- HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces - [ArXiv] [QA].
- See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data - [ArXiv] [QA].
- Urban Radiance Field Representation with Deformable Neural Mesh Primitives - [ArXiv] [QA].
- Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV - [ArXiv] [QA].
- Lighting up NeRF via Unsupervised Decomposition and Enhancement - [ArXiv] [QA].
- SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models - [ArXiv] [QA].
- Physics-Driven Turbulence Image Restoration with Stochastic Refinement - [ArXiv] [QA].
- Flatness-Aware Minimization for Domain Generalization - [ArXiv] [QA].
- Instruction-following Evaluation through Verbalizer Manipulation - [ArXiv] [QA].
- EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization - [ArXiv] [QA].
- TokenFlow: Consistent Diffusion Features for Consistent Video Editing - [ArXiv] [QA].
- DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering - [ArXiv] [QA].
- DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI - [ArXiv] [QA].
- Challenges and Applications of Large Language Models - [ArXiv] [QA].
- LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs - [ArXiv] [QA].
- Improving Multimodal Datasets with Image Captioning - [ArXiv] [QA].
- FABRIC: Personalizing Diffusion Models with Iterative Feedback - [ArXiv] [QA].
- Android in the Wild: A Large-Scale Dataset for Android Device Control - [ArXiv] [QA].
- Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples - [ArXiv] [QA].
- MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions - [ArXiv] [QA].
- Hierarchical Spatio-Temporal Representation Learning for Gait Recognition - [ArXiv] [QA].
- What do neural networks learn in image classification? A frequency shortcut perspective - [ArXiv] [QA].
- Density-invariant Features for Distant Point Cloud Registration - [ArXiv] [QA].
- Text2Layer: Layered Image Generation using Latent Diffusion Model - [ArXiv] [QA].
- Towards Building More Robust Models with Frequency Bias - [ArXiv] [QA].
- Generative Prompt Model for Weakly Supervised Object Localization - [ArXiv] [QA].
- Space Engage: Collaborative Space Supervision for Contrastive-based Semi-Supervised Semantic Segmentation - [ArXiv] [QA].
- CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation - [ArXiv] [QA].
- AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks - [ArXiv] [QA].
- Towards Saner Deep Image Registration - [ArXiv] [QA].
- GlobalMapper: Arbitrary-Shaped Urban Layout Generation - [ArXiv] [QA].
- Towards A Unified Agent with Foundation Models - [ArXiv] [QA].
- Object-aware Gaze Target Detection - [ArXiv] [QA].
- Promoting Exploration in Memory-Augmented Adam using Critical Momenta - [ArXiv] [QA].
- Conditional 360-degree Image Synthesis for Immersive Indoor Scene Decoration - [ArXiv] [QA].
- ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning - [ArXiv] [QA].
- Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla - [ArXiv] [QA].
- OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation - [ArXiv] [QA].
- Biomaker CA: a Biome Maker project using Cellular Automata - [ArXiv] [QA].
- Llama 2: Open Foundation and Fine-Tuned Chat Models - [ArXiv] [QA].
- Augmenting CLIP with Improved Visio-Linguistic Reasoning - [ArXiv] [QA].
- NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF - [ArXiv] [QA].
- How is ChatGPT's behavior changing over time? - [ArXiv] [QA].
- GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution - [ArXiv] [QA].
- Diffusion Models Beat GANs on Image Classification - [ArXiv] [QA].
- AlpaGasus: Training A Better Alpaca with Fewer Data - [ArXiv] [QA].
- TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT - [ArXiv] [QA].
- Retentive Network: A Successor to Transformer for Large Language Models - [ArXiv] [QA].
- BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs - [ArXiv] [QA].
- Scale-Aware Modulation Meet Transformer - [ArXiv] [QA].
- Does Visual Pretraining Help End-to-End Reasoning? - [ArXiv] [QA].
- Cumulative Spatial Knowledge Distillation for Vision Transformers - [ArXiv] [QA].
- DOT: A Distillation-Oriented Trainer - [ArXiv] [QA].
- Measuring Faithfulness in Chain-of-Thought Reasoning - [ArXiv] [QA].
- Question Decomposition Improves the Faithfulness of Model-Generated Reasoning - [ArXiv] [QA].
- Planting a SEED of Vision in Large Language Model - [ArXiv] [QA].
- Towards Viewpoint-Invariant Visual Recognition via Adversarial Training - [ArXiv] [QA].
- Language Conditioned Traffic Generation - [ArXiv] [QA].
- Communicative Agents for Software Development - [ArXiv] [QA].
- INVE: Interactive Neural Video Editing - [ArXiv] [QA].
- CoTracker: It is Better to Track Together - [ArXiv] [QA].
- NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis - [ArXiv] [QA].
- DreamTeacher: Pretraining Image Backbones with Deep Generative Models - [ArXiv] [QA].
- Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts - [ArXiv] [QA].
- Learning to Retrieve In-Context Examples for Large Language Models - [ArXiv] [QA].
- Bootstrapping Vision-Language Learning with Decoupled Language Pre-training - [ArXiv] [QA].
- DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations - [ArXiv] [QA].
- HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models - [ArXiv] [QA].
- In-context Autoencoder for Context Compression in a Large Language Model - [ArXiv] [QA].
- InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation - [ArXiv] [QA].
- Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation - [ArXiv] [QA].
- mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs - [ArXiv] [QA].
- Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models - [ArXiv] [QA].
- Generating Benchmarks for Factuality Evaluation of Language Models - [ArXiv] [QA].
- Copy Is All You Need - [ArXiv] [QA].
- Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events - [ArXiv] [QA].
- T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation - [ArXiv] [QA].
- Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution - [ArXiv] [QA].
- Instruction Mining: High-Quality Instruction Data Selection for Large Language Models - [ArXiv] [QA].
- MMBench: Is Your Multi-modal Model an All-around Player? - [ArXiv] [QA].
- SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning - [ArXiv] [QA].
- VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View - [ArXiv] [QA].
- PolyLM: An Open Source Polyglot Large Language Model - [ArXiv] [QA].
- VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models - [ArXiv] [QA].
- Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations - [ArXiv] [QA].
- Towards Robust and Efficient Continual Language Learning - [ArXiv] [QA].
- Stack More Layers Differently: High-Rank Training Through Low-Rank Updates - [ArXiv] [QA].
- Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives - [ArXiv] [QA].
- Self-consistency for open-ended generations - [ArXiv] [QA].
- EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone - [ArXiv] [QA].
- Efficient 3D Articulated Human Generation with Layered Surface Volumes - [ArXiv] [QA].
- Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features - [ArXiv] [QA].
- Self-Supervised Learning with Lie Symmetries for Partial Differential Equations - [ArXiv] [QA].
- Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration - [ArXiv] [QA].
- Generative Pretraining in Multimodality - [ArXiv] [QA].
- DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks - [ArXiv] [QA].
- Test-Time Training on Video Streams - [ArXiv] [QA].
- Monotone deep Boltzmann machines - [ArXiv] [QA].
- Secrets of RLHF in Large Language Models Part I: PPO - [ArXiv] [QA].
- Semantic-SAM: Segment and Recognize Anything at Any Granularity - [ArXiv] [QA].
- SITTA: A Semantic Image-Text Alignment for Image Captioning - [ArXiv] [QA].
- Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement - [ArXiv] [QA].
- RoCo: Dialectic Multi-Robot Collaboration with Large Language Models - [ArXiv] [QA].
- AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning - [ArXiv] [QA].
- Large Language Models as General Pattern Machines - [ArXiv] [QA].
- International Institutions for Advanced AI - [ArXiv] [QA].
- VampNet: Music Generation via Masked Acoustic Token Modeling - [ArXiv] [QA].
- AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System - [ArXiv] [QA].
- RLTF: Reinforcement Learning from Unit Test Feedback - [ArXiv] [QA].
- SVIT: Scaling up Visual Instruction Tuning - [ArXiv] [QA].
- Toward Interactive Dictation - [ArXiv] [QA].
- On decoder-only architecture for speech-to-text and large language model integration - [ArXiv] [QA].
- Large Language Models for Supply Chain Optimization - [ArXiv] [QA].
- Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation - [ArXiv] [QA].
- AutoDecoding Latent 3D Diffusion Models - [ArXiv] [QA].
- GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest - [ArXiv] [QA].
- Solvent: A Framework for Protein Folding - [ArXiv] [QA].
- Wireless Multi-Agent Generative AI: From Connected Intelligence to Collective Intelligence - [ArXiv] [QA].
- Building Cooperative Embodied Agents Modularly with Large Language Models - [ArXiv] [QA].
- What Matters in Training a GPT4-Style Language Model with Multimodal Inputs? - [ArXiv] [QA].
- Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners - [ArXiv] [QA].
- Embodied Task Planning with Large Language Models - [ArXiv] [QA].
- Collaborative Score Distillation for Consistent Visual Synthesis - [ArXiv] [QA].
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding - [ArXiv] [QA].
- On Hofstadter's G-sequence - [ArXiv] [QA].
- Hybrid two-level MCMC for Bayesian Inverse Problems - [ArXiv] [QA].
- Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection - [ArXiv] [QA].
- Multi-Task Learning Improves Performance In Deep Argument Mining Models - [ArXiv] [QA].
- EIGER IV: The cool 10$^4$K circumgalactic environment of high-$z$ galaxies reveals remarkably efficient IGM enrichment - [ArXiv] [QA].
- Variational integrals on Hessian spaces: partial regularity for critical points - [ArXiv] [QA].
- Characterisation of three-body loss in ${}^{166}$Er and optimised production of large Bose-Einstein condensates - [ArXiv] [QA].
- SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions - [ArXiv] [QA].
- Scalable quantum neural networks by few quantum resources - [ArXiv] [QA].
- Visual Instruction Tuning with Polite Flamingo - [ArXiv] [QA].
- NOMA-Assisted Grant-Free Transmission: How to Design Pre-Configured SNR Levels? - [ArXiv] [QA].
- Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset - [ArXiv] [QA].
- JourneyDB: A Benchmark for Generative Image Understanding - [ArXiv] [QA].
- Almost sure bounds for a weighted Steinhaus random multiplicative function - [ArXiv] [QA].
- DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment - [ArXiv] [QA].
- Personality Traits in Large Language Models - [ArXiv] [QA].
June 2023
- SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs - [ArXiv] [QA].
- Statler: State-Maintaining Language Models for Embodied Reasoning - [ArXiv] [QA].
- Preference Ranking Optimization for Human Alignment - [ArXiv] [QA].
- LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding - [ArXiv] [QA].
- End-to-end Autonomous Driving: Challenges and Frontiers - [ArXiv] [QA].
- KITE: Keypoint-Conditioned Policies for Semantic Manipulation - [ArXiv] [QA].
- Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language - [ArXiv] [QA].
- Inferring the Goals of Communicating Agents from Actions and Instructions - [ArXiv] [QA].
- Confidence Ranking for CTR Prediction - [ArXiv] [QA].
- Explainable Multimodal Emotion Reasoning - [ArXiv] [QA].
- MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation - [ArXiv] [QA].
- Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic - [ArXiv] [QA].
- Kosmos-2: Grounding Multimodal Large Language Models to the World - [ArXiv] [QA].
- MotionGPT: Human Motion as a Foreign Language - [ArXiv] [QA].
- SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality - [ArXiv] [QA].
- Aligning Large Multi-Modal Model with Robust Instruction Tuning - [ArXiv] [QA].
- DesCo: Learning Object Recognition with Rich Language Descriptions - [ArXiv] [QA].
- A Survey on Multimodal Large Language Models - [ArXiv] [QA].
- MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models - [ArXiv] [QA].
- Evading Forensic Classifiers with Attribute-Conditioned Adversarial Faces - [ArXiv] [QA].
- SoftGPT: Learn Goal-oriented Soft Object Manipulation Skills by Generative Pre-trained Heterogeneous Graph Transformer - [ArXiv] [QA].
- Local 3D Editing via 3D Distillation of CLIP Knowledge - [ArXiv] [QA].
- FFCV: Accelerating Training by Removing Data Bottlenecks - [ArXiv] [QA].
- Mass-Producing Failures of Multimodal Systems with Language Models - [ArXiv] [QA].
- SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling - [ArXiv] [QA].
- Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion - [ArXiv] [QA].
- RM-PRT: Realistic Robotic Manipulation Simulator and Benchmark with Progressive Reasoning Tasks - [ArXiv] [QA].
- MotionGPT: Finetuned LLMs are General-Purpose Motion Generators - [ArXiv] [QA].
- UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning - [ArXiv] [QA].
- CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents - [ArXiv] [QA].
- Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering - [ArXiv] [QA].
- LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning - [ArXiv] [QA].
- Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models - [ArXiv] [QA].
- LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models - [ArXiv] [QA].
- Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration - [ArXiv] [QA].
- Re-Benchmarking Pool-Based Active Learning for Binary Classification - [ArXiv] [QA].
- Toward Grounded Social Reasoning - [ArXiv] [QA].
- Language to Rewards for Robotic Skill Synthesis - [ArXiv] [QA].
- Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models - [ArXiv] [QA].
- AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn - [ArXiv] [QA].
- AVIS: Autonomous Visual Information Seeking with Large Language Models - [ArXiv] [QA].
- Neural Scene Chronology - [ArXiv] [QA].
- Instant Multi-View Head Capture through Learnable Registration - [ArXiv] [QA].
- LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark - [ArXiv] [QA].
- RestGPT: Connecting Large Language Models with Real-World RESTful APIs - [ArXiv] [QA].
- Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models - [ArXiv] [QA].
- MIMIC-IT: Multi-Modal In-Context Instruction Tuning - [ArXiv] [QA].
- M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models - [ArXiv] [QA].
- ScaleDet: A Scalable Multi-Dataset Object Detector - [ArXiv] [QA].
- M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning - [ArXiv] [QA].
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks - [ArXiv] [QA].
- ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory - [ArXiv] [QA].
- Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach - [ArXiv] [QA].
- On Pitfalls of Test-Time Adaptation - [ArXiv] [QA].
- GaitGCI: Generative Counterfactual Intervention for Gait Recognition - [ArXiv] [QA].
- DVIS: Decoupled Video Instance Segmentation Framework - [ArXiv] [QA].
- Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents - [ArXiv] [QA].
- Neuralangelo: High-Fidelity Neural Surface Reconstruction - [ArXiv] [QA].
- BeyondPixels: A Comprehensive Review of the Evolution of Neural Radiance Fields - [ArXiv] [QA].
- Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding - [ArXiv] [QA].
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4 - [ArXiv] [QA].
- RecAgent: A Novel Simulation Paradigm for Recommender Systems - [ArXiv] [QA].
- Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection - [ArXiv] [QA].
- LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day - [ArXiv] [QA].
- Microstructure quality control of steels using deep learning - [ArXiv] [QA].
- GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks? - [ArXiv] [QA].
- Thought Cloning: Learning to Think while Acting by Imitating Human Thinking - [ArXiv] [QA].
May 2023
- Monotonic Location Attention for Length Generalization - [ArXiv] [QA].
- Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models - [ArXiv] [QA].
- Neural Kernel Surface Reconstruction - [ArXiv] [QA].
- Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate - [ArXiv] [QA].
- Independent Component Alignment for Multi-Task Learning - [ArXiv] [QA].
- VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions - [ArXiv] [QA].
- GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction - [ArXiv] [QA].
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model - [ArXiv] [QA].
- Contextual Object Detection with Multimodal Large Language Models - [ArXiv] [QA].
- Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models - [ArXiv] [QA].
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks - [ArXiv] [QA].
- MPCHAT: Towards Multimodal Persona-Grounded Conversation - [ArXiv] [QA].
- Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance - [ArXiv] [QA].
- Generating Images with Multimodal Language Models - [ArXiv] [QA].
- Large Language Models as Tool Makers - [ArXiv] [QA].
- Mindstorms in Natural Language-Based Societies of Mind - [ArXiv] [QA].
- Training Socially Aligned Language Models in Simulated Human Society - [ArXiv] [QA].
- On Evaluating Adversarial Robustness of Large Vision-Language Models - [ArXiv] [QA].
- MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting - [ArXiv] [QA].
- Playing repeated games with Large Language Models - [ArXiv] [QA].
- Randomized Positional Encodings Boost Length Generalization of Transformers - [ArXiv] [QA].
- Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark - [ArXiv] [QA].
- AdaPlanner: Adaptive Planning from Feedback with Language Models - [ArXiv] [QA].
- Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models - [ArXiv] [QA].
- Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory - [ArXiv] [QA].
- Landmark Attention: Random-Access Infinite Context Length for Transformers - [ArXiv] [QA].
- Voyager: An Open-Ended Embodied Agent with Large Language Models - [ArXiv] [QA].
- ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - [ArXiv] [QA].
- Role-Play with Large Language Models - [ArXiv] [QA].
- PandaGPT: One Model To Instruction-Follow Them All - [ArXiv] [QA].
- LayoutGPT: Compositional Visual Planning and Generation with Large Language Models - [ArXiv] [QA].
- Gorilla: Large Language Model Connected with Massive APIs - [ArXiv] [QA].
- Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration - [ArXiv] [QA].
- Dynamic Masking Rate Schedules for MLM Pretraining - [ArXiv] [QA].
- Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models - [ArXiv] [QA].
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought - [ArXiv] [QA].
- Reasoning with Language Model is Planning with World Model - [ArXiv] [QA].
- IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models - [ArXiv] [QA].
- Discriminator-Guided Multi-step Reasoning with Language Models - [ArXiv] [QA].
- PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts - [ArXiv] [QA].
- Adapting Language Models to Compress Contexts - [ArXiv] [QA].
- ExpertPrompting: Instructing Large Language Models to be Distinguished Experts - [ArXiv] [QA].
- Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement - [ArXiv] [QA].
- Automatic Model Selection with Large Language Models for Reasoning - [ArXiv] [QA].
- Improving Factuality and Reasoning in Language Models through Multiagent Debate - [ArXiv] [QA].
- ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models - [ArXiv] [QA].
- RET-LLM: Towards a General Read-Write Memory for Large Language Models - [ArXiv] [QA].
- CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation - [ArXiv] [QA].
- REC-MV: REconstructing 3D Dynamic Cloth from Monocular Videos - [ArXiv] [QA].
- Enhancing Chat Language Models by Scaling High-quality Instructional Conversations - [ArXiv] [QA].
- DetGPT: Detect What You Need via Reasoning - [ArXiv] [QA].
- Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction - [ArXiv] [QA].
- PaD: Program-aided Distillation Specializes Large Models in Reasoning - [ArXiv] [QA].
- Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration - [ArXiv] [QA].
- RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text - [ArXiv] [QA].
- Training Diffusion Models with Reinforcement Learning - [ArXiv] [QA].
- Interactive Natural Language Processing - [ArXiv] [QA].
- LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities - [ArXiv] [QA].
- Making Language Models Better Tool Learners with Execution Feedback - [ArXiv] [QA].
- RWKV: Reinventing RNNs for the Transformer Era - [ArXiv] [QA].
- Pengi: An Audio Language Model for Audio Tasks - [ArXiv] [QA].
- CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing - [ArXiv] [QA].
- Learning Global-aware Kernel for Image Harmonization - [ArXiv] [QA].
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - [ArXiv] [QA].
- RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought - [ArXiv] [QA].
- Enhancing Personalized Dialogue Generation with Contrastive Latent Variables: Combining Sparse and Dense Persona - [ArXiv] [QA].
- Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue - [ArXiv] [QA].
- Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model - [ArXiv] [QA].
- VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks - [ArXiv] [QA].
- SimOAP: Improve Coherence and Consistency in Persona-based Dialogue Generation via Over-sampling and Post-evaluation - [ArXiv] [QA].
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation - [ArXiv] [QA].
- DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs - [ArXiv] [QA].
- An Android Robot Head as Embodied Conversational Agent - [ArXiv] [QA].
- 3D Registration with Maximal Cliques - [ArXiv] [QA].
- Listen, Think, and Understand - [ArXiv] [QA].
- OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding - [ArXiv] [QA].
- Boost Vision Transformer with GPU-Friendly Sparsity and Quantization - [ArXiv] [QA].
- Language Models Meet World Models: Embodied Experiences Enhance Language Models - [ArXiv] [QA].
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models - [ArXiv] [QA].
- IMAD: IMage-Augmented multi-modal Dialogue - [ArXiv] [QA].
- PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering - [ArXiv] [QA].
- Evaluating Object Hallucination in Large Vision-Language Models - [ArXiv] [QA].
- MemoryBank: Enhancing Large Language Models with Long-Term Memory - [ArXiv] [QA].
- Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations - [ArXiv] [QA].
- Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback - [ArXiv] [QA].
- Dual Semantic Knowledge Composed Multimodal Dialog Systems - [ArXiv] [QA].
- Towards Generalist Robots: A Promising Paradigm via Generative Simulation - [ArXiv] [QA].
- Small Models are Valuable Plug-ins for Large Language Models - [ArXiv] [QA].
- Attacking Perceptual Similarity Metrics - [ArXiv] [QA].
- A Cognitive Stimulation Dialogue System with Multi-source Knowledge Fusion for Elders with Cognitive Impairment - [ArXiv] [QA].
- ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems - [ArXiv] [QA].
- In Search of Verifiability: Explanations Rarely Enable Complementary Performance in AI-Advised Decision Making - [ArXiv] [QA].
- ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4 - [ArXiv] [QA].
- EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention - [ArXiv] [QA].
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning - [ArXiv] [QA].
- VideoChat: Chat-Centric Video Understanding - [ArXiv] [QA].
- SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds - [ArXiv] [QA].
- TidyBot: Personalized Robot Assistance with Large Language Models - [ArXiv] [QA].
- Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue - [ArXiv] [QA].
- Distilling Script Knowledge from Large Language Models for Constrained Language Planning - [ArXiv] [QA].
- FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance - [ArXiv] [QA].
- Knowledge-enhanced Agents for Interactive Text Games - [ArXiv] [QA].
- MultiModal-GPT: A Vision and Language Model for Dialogue with Humans - [ArXiv] [QA].
- Multi-Space Neural Radiance Fields - [ArXiv] [QA].
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages - [ArXiv] [QA].
- Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models - [ArXiv] [QA].
- Otter: A Multi-Modal Model with In-Context Instruction Tuning - [ArXiv] [QA].
- LMEye: An Interactive Perception Network for Large Language Models - [ArXiv] [QA].
- T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering - [ArXiv] [QA].
- TransESC: Smoothing Emotional Support Conversation via Turn-Level State Transition - [ArXiv] [QA].
- Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework - [ArXiv] [QA].
- ZipIt! Merging Models from Different Tasks without Training - [ArXiv] [QA].
- Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision - [ArXiv] [QA].
- A Survey on Proactive Dialogue Systems: Problems, Methods, and Prospects - [ArXiv] [QA].
- Caption Anything: Interactive Image Description with Diverse Multimodal Controls - [ArXiv] [QA].
- Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents - [ArXiv] [QA].
- Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings - [ArXiv] [QA].
- Multimodal Procedural Planning via Dual Text-Image Prompting - [ArXiv] [QA].
- Unlimiformer: Long-Range Transformers with Unlimited Length Input - [ArXiv] [QA].
- Transfer Visual Prompt Generator across LLMs - [ArXiv] [QA].
- The Role of Summarization in Generative Agents: A Preliminary Perspective - [ArXiv] [QA].
- ArK: Augmented Reality with Knowledge Interactive Emergent Ability - [ArXiv] [QA].
- Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation - [ArXiv] [QA].
- Hypernuclear event detection in the nuclear emulsion with Monte Carlo simulation and machine learning - [ArXiv] [QA].
- Learning to Reason and Memorize with Self-Notes - [ArXiv] [QA].
April 2023
- LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model - [ArXiv] [QA].
- IMP: Iterative Matching and Pose Estimation with Adaptive Pooling - [ArXiv] [QA].
- ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System - [ArXiv] [QA].
- mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality - [ArXiv] [QA].
- ChatLog: Recording and Analyzing ChatGPT Across Time - [ArXiv] [QA].
- Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models - [ArXiv] [QA].
- Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond - [ArXiv] [QA].
- Multimodal Grounding for Embodied AI via Augmented Reality Headsets for Natural Language Driven Task Planning - [ArXiv] [QA].
- Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System - [ArXiv] [QA].
- Answering Questions by Meta-Reasoning over Multiple Chains of Thought - [ArXiv] [QA].
- Patch-based 3D Natural Scene Generation from a Single Example - [ArXiv] [QA].
- GlyphDiffusion: Text Generation as Image Generation - [ArXiv] [QA].
- WizardLM: Empowering Large Language Models to Follow Complex Instructions - [ArXiv] [QA].
- ChatLLM Network: More brains, More intelligence - [ArXiv] [QA].
- SketchXAI: A First Look at Explainability for Human Sketches - [ArXiv] [QA].
- Emergent and Predictable Memorization in Large Language Models - [ArXiv] [QA].
- ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT - [ArXiv] [QA].
- Can GPT-4 Perform Neural Architecture Search? - [ArXiv] [QA].
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - [ArXiv] [QA].
- Phoenix: Democratizing ChatGPT across Languages - [ArXiv] [QA].
- SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation - [ArXiv] [QA].
- SCoDA: Domain Adaptive Shape Completion for Real Scans - [ArXiv] [QA].
- Learning Bottleneck Concepts in Image Classification - [ArXiv] [QA].
- Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation - [ArXiv] [QA].
- Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models - [ArXiv] [QA].
- Network Pruning Spaces - [ArXiv] [QA].
- Network Pruning Spaces - [ArXiv] [QA].
- SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes - [ArXiv] [QA].
- Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections - [ArXiv] [QA].
- Visual Instruction Tuning - [ArXiv] [QA].
- Tool Learning with Foundation Models - [ArXiv] [QA].
- Chain of Thought Prompt Tuning in Vision Language Models - [ArXiv] [QA].
- Self-collaboration Code Generation via ChatGPT - [ArXiv] [QA].
- Tractable Control for Autoregressive Language Generation - [ArXiv] [QA].
- DCFace: Synthetic Face Generation with Dual Condition Diffusion Model - [ArXiv] [QA].
- Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text - [ArXiv] [QA].
- RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment - [ArXiv] [QA].
- Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning - [ArXiv] [QA].
- NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds - [ArXiv] [QA].
- Language Instructed Reinforcement Learning for Human-AI Coordination - [ArXiv] [QA].
- Hard Patches Mining for Masked Image Modeling - [ArXiv] [QA].
- Instance-Aware Domain Generalization for Face Anti-Spoofing - [ArXiv] [QA].
- ChemCrow: Augmenting large-language models with chemistry tools - [ArXiv] [QA].
- Toxicity in ChatGPT: Analyzing Persona-assigned Language Models - [ArXiv] [QA].
- Teaching Large Language Models to Self-Debug - [ArXiv] [QA].
- Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning - [ArXiv] [QA].
- A Cheaper and Better Diffusion Language Model with Soft-Masked Noise - [ArXiv] [QA].
- Improved Test-Time Adaptation for Domain Generalization - [ArXiv] [QA].
- Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT - [ArXiv] [QA].
- OpenAGI: When LLM Meets Domain Experts - [ArXiv] [QA].
- Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions - [ArXiv] [QA].
- Token Boosting for Robust Self-Supervised Visual Transformer Pre-training - [ArXiv] [QA].
- Hi Sheldon! Creating Deep Personalized Characters from TV Shows - [ArXiv] [QA].
- Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder - [ArXiv] [QA].
- ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application - [ArXiv] [QA].
- Why think step by step? Reasoning emerges from the locality of experience - [ArXiv] [QA].
- Generative Agents: Interactive Simulacra of Human Behavior - [ArXiv] [QA].
- ERRA: An Embodied Representation and Reasoning Architecture for Long-horizon Language-conditioned Manipulation Tasks - [ArXiv] [QA].
- GINA-3D: Learning to Generate Implicit Neural Assets in the Wild - [ArXiv] [QA].
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling - [ArXiv] [QA].
- Asymptotic expansions for the maximum likelihood estimation errors of the rotating parameter of the gravitational wave from core-collapse supernovae - [ArXiv] [QA].
- Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data - [ArXiv] [QA].
- Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement - [ArXiv] [QA].
- ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model - [ArXiv] [QA].
- 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds - [ArXiv] [QA].
- Metrological detection of multipartite entanglement through dynamical symmetries - [ArXiv] [QA].
- When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus - [ArXiv] [QA].
March 2023
- Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation - [ArXiv] [QA].
- On stochastic MPC formulations with closed-loop guarantees: Analysis and a unifying framework - [ArXiv] [QA].
- A Survey of Large Language Models - [ArXiv] [QA].
- VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization - [ArXiv] [QA].
- Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning - [ArXiv] [QA].
- CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society - [ArXiv] [QA].
- Self-Refine: Iterative Refinement with Self-Feedback - [ArXiv] [QA].
- SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer - [ArXiv] [QA].
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face - [ArXiv] [QA].
- WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research - [ArXiv] [QA].
- Mixed Autoencoder for Self-supervised Visual Representation Learning - [ArXiv] [QA].
- ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance - [ArXiv] [QA].
- TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation - [ArXiv] [QA].
- G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment - [ArXiv] [QA].
- Personalised Language Modelling of Screen Characters Using Rich Metadata Annotations - [ArXiv] [QA].
- Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks - [ArXiv] [QA].
- Multi-View Azimuth Stereo via Tangent Space Consistency - [ArXiv] [QA].
- TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs - [ArXiv] [QA].
- ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models - [ArXiv] [QA].
- Are Data-driven Explanations Robust against Out-of-distribution Data? - [ArXiv] [QA].
- LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention - [ArXiv] [QA].
- F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories - [ArXiv] [QA].
- DisWOT: Student Architecture Search for Distillation WithOut Training - [ArXiv] [QA].
- Zero-shot Model Diagnosis - [ArXiv] [QA].
- Learning to Zoom and Unzoom - [ArXiv] [QA].
- SimpleNet: A Simple Network for Image Anomaly Detection and Localization - [ArXiv] [QA].
- UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View - [ArXiv] [QA].
- Natural Language Reasoning, A Survey - [ArXiv] [QA].
- Learning Versatile 3D Shape Generation with Improved AR Models - [ArXiv] [QA].
- Learning video embedding space with Natural Language Supervision - [ArXiv] [QA].
- SUDS: Scalable Urban Dynamic Scenes - [ArXiv] [QA].
- Compacting Binary Neural Networks by Sparse Kernel Selection - [ArXiv] [QA].
- NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects - [ArXiv] [QA].
- Human Preference Score: Better Aligning Text-to-Image Models with Human Preference - [ArXiv] [QA].
- VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud - [ArXiv] [QA].
- IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients - [ArXiv] [QA].
- Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting - [ArXiv] [QA].
- Robust Test-Time Adaptation in Dynamic Scenarios - [ArXiv] [QA].
- Progressively Optimized Local Radiance Fields for Robust View Synthesis - [ArXiv] [QA].
- Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers - [ArXiv] [QA].
- Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment - [ArXiv] [QA].
- Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration - [ArXiv] [QA].
- Spherical Transformer for LiDAR-based 3D Recognition - [ArXiv] [QA].
- Correlational Image Modeling for Self-Supervised Visual Pre-Training - [ArXiv] [QA].
- Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation - [ArXiv] [QA].
- Logical Reasoning over Natural Language as Knowledge Representation: A Survey - [ArXiv] [QA].
- NeAT: Learning Neural Implicit Surfaces with Arbitrary Topologies from Multi-view Images - [ArXiv] [QA].
- Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection - [ArXiv] [QA].
- Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective - [ArXiv] [QA].
- Implicit Neural Representation for Cooperative Low-light Image Enhancement - [ArXiv] [QA].
- eP-ALM: Efficient Perceptual Augmentation of Language Models - [ArXiv] [QA].
- MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action - [ArXiv] [QA].
- Reflexion: Language Agents with Verbal Reinforcement Learning - [ArXiv] [QA].
- Learning Optical Flow from Event Camera with Rendered Dataset - [ArXiv] [QA].
- Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning - [ArXiv] [QA].
- DialogPaint: A Dialog-based Image Editing Model - [ArXiv] [QA].
- Adversarial Counterfactual Visual Explanations - [ArXiv] [QA].
- TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation - [ArXiv] [QA].
- CoLT5: Faster Long-Range Transformers with Conditional Computation - [ArXiv] [QA].
- CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos - [ArXiv] [QA].
- Human-AI Collaboration: The Effect of AI Delegation on Human Task Performance and Task Satisfaction - [ArXiv] [QA].
- ART: Automatic multi-step reasoning and tool-use for large language models - [ArXiv] [QA].
- MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge - [ArXiv] [QA].
- Can Large Language Models design a Robot? - [ArXiv] [QA].
- VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation - [ArXiv] [QA].
- Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting - [ArXiv] [QA].
- MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences - [ArXiv] [QA].
- Chat with the Environment: Interactive Multimodal Perception Using Large Language Models - [ArXiv] [QA].
- Rotation-Invariant Transformer for Point Cloud Matching - [ArXiv] [QA].
- Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis - [ArXiv] [QA].
- ViperGPT: Visual Inference via Python Execution for Reasoning - [ArXiv] [QA].
- NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images - [ArXiv] [QA].
- RE-MOVE: An Adaptive Policy Design Approach for Dynamic Environments via Language-Based Feedback - [ArXiv] [QA].
- The Life Cycle of Knowledge in Big Language Models: A Survey - [ArXiv] [QA].
- Audio Visual Language Maps for Robot Navigation - [ArXiv] [QA].
- Adaptive Data-Free Quantization - [ArXiv] [QA].
- Iterative Geometry Encoding Volume for Stereo Matching - [ArXiv] [QA].
- ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions - [ArXiv] [QA].
- ChatGPT Prompt Patterns for Improving Code Quality, Refactoring, Requirements Elicitation, and Software Design - [ArXiv] [QA].
- FAC: 3D Representation Learning via Foreground Aware Feature Contrast - [ArXiv] [QA].
- Task and Motion Planning with Large Language Models for Object Rearrangement - [ArXiv] [QA].
- MVImgNet: A Large-scale Dataset of Multi-view Images - [ArXiv] [QA].
- Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation - [ArXiv] [QA].
- Hardware Acceleration of Neural Graphics - [ArXiv] [QA].
- 3D Video Loops from Asynchronous Input - [ArXiv] [QA].
- Masked Image Modeling with Local Multi-Scale Reconstruction - [ArXiv] [QA].
- ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction - [ArXiv] [QA].
- X-Pruner: eXplainable Pruning for Vision Transformers - [ArXiv] [QA].
- Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models - [ArXiv] [QA].
- DNBP: Differentiable Nonparametric Belief Propagation - [ArXiv] [QA].
- DNBP: Differentiable Nonparametric Belief Propagation - [ArXiv] [QA].
- LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion - [ArXiv] [QA].
- Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Based Zero-Shot Object Navigation - [ArXiv] [QA].
- PaLM-E: An Embodied Multimodal Language Model - [ArXiv] [QA].
- Prismer: A Vision-Language Model with An Ensemble of Experts - [ArXiv] [QA].
- MathPrompter: Mathematical Reasoning using Large Language Models - [ArXiv] [QA].
- Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners - [ArXiv] [QA].
- EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization - [ArXiv] [QA].
- Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering - [ArXiv] [QA].
- Near Optimal Memory-Regret Tradeoff for Online Learning - [ArXiv] [QA].
- WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions - [ArXiv] [QA].
- First Order Quantum Phase Transition in the Hybrid Metal-Mott Insulator Transition Metal Dichalcogenide 4Hb-TaS2 - [ArXiv] [QA].
- Isotopic effects in molecular attosecond photoelectron interferometry - [ArXiv] [QA].
- Token Contrast for Weakly-Supervised Semantic Segmentation - [ArXiv] [QA].
- Eulerian-Lagrangian particle-based model for diffusional growth for the better parameterization of ISM clouds: A road map for improving climate model through small-scale model using observations - [ArXiv] [QA].
- Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation - [ArXiv] [QA].
- Open-World Object Manipulation using Pre-trained Vision-Language Models - [ArXiv] [QA].
- Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control - [ArXiv] [QA].
- A Practical Upper Bound for the Worst-Case Attribution Deviations - [ArXiv] [QA].
- Can ChatGPT Assess Human Personalities? A General Evaluation Framework - [ArXiv] [QA].
February 2023
- A Comprehensive Perturbative Formalism for Phase Mixing in Perturbed Disks. II. Phase Spirals in an Inhomogeneous Disk Galaxy with a Non-responsive Dark Matter Halo - [ArXiv] [QA].
- Generic-to-Specific Distillation of Masked Autoencoders - [ArXiv] [QA].
- Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue - [ArXiv] [QA].
- GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation - [ArXiv] [QA].
- HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes with Iterative Intertwined Regularization - [ArXiv] [QA].
- Internet Explorer: Targeted Representation Learning on the Open Web - [ArXiv] [QA].
- Language Is Not All You Need: Aligning Perception with Language Models - [ArXiv] [QA].
- LLaMA: Open and Efficient Foundation Language Models - [ArXiv] [QA].
- Control flow in active inference systems - [ArXiv] [QA].
- Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data - [ArXiv] [QA].
- Active Prompting with Chain-of-Thought for Large Language Models - [ArXiv] [QA].
- Aligning Text-to-Image Models using Human Feedback - [ArXiv] [QA].
- Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? - [ArXiv] [QA].
- Distributionally Robust Recourse Action - [ArXiv] [QA].
- Distributionally Robust Recourse Action - [ArXiv] [QA].
- Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities - [ArXiv] [QA].
- ChatGPT for Robotics: Design Principles and Model Abilities - [ArXiv] [QA].
- Weakly Supervised Label Learning Flows - [ArXiv] [QA].
- Weakly Supervised Label Learning Flows - [ArXiv] [QA].
- Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey - [ArXiv] [QA].
- A survey on online active learning - [ArXiv] [QA].
- PersonNeRF: Personalized Reconstruction from Photo Collections - [ArXiv] [QA].
- Tuning computer vision models with task rewards - [ArXiv] [QA].
- Aligning Language Models with Preferences through f-divergence Minimization - [ArXiv] [QA].
- À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting - [ArXiv] [QA].
- Augmented Language Models: a Survey - [ArXiv] [QA].
- The Capacity for Moral Self-Correction in Large Language Models - [ArXiv] [QA].
- Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask - [ArXiv] [QA].
- The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation - [ArXiv] [QA].
- Stitchable Neural Networks - [ArXiv] [QA].
- A Reparameterized Discrete Diffusion Model for Text Generation - [ArXiv] [QA].
- The Wisdom of Hindsight Makes Language Models Better Instruction Followers - [ArXiv] [QA].
- Toolformer: Language Models Can Teach Themselves to Use Tools - [ArXiv] [QA].
- GPTScore: Evaluate as You Desire - [ArXiv] [QA].
- A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity - [ArXiv] [QA].
- Controlling Personality Style in Dialogue with Zero-Shot Prompt-Based Learning - [ArXiv] [QA].
- Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need - [ArXiv] [QA].
- Robust Camera Pose Refinement for Multi-Resolution Hash Encoding - [ArXiv] [QA].
- Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents - [ArXiv] [QA].
- Inference in Non-stationary High-Dimensional VARs - [ArXiv] [QA].
- Accelerating Large Language Model Decoding with Speculative Sampling - [ArXiv] [QA].
- Multimodal Chain-of-Thought Reasoning in Language Models - [ArXiv] [QA].
- Collaborating with language models for embodied reasoning - [ArXiv] [QA].
- Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models - [ArXiv] [QA].
January 2023
- Large Language Models Can Be Easily Distracted by Irrelevant Context - [ArXiv] [QA].
- Grounding Language Models to Images for Multimodal Inputs and Outputs - [ArXiv] [QA].
- Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning - [ArXiv] [QA].
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning - [ArXiv] [QA].
- Faithful Chain-of-Thought Reasoning - [ArXiv] [QA].
- DepGraph: Towards Any Structural Pruning - [ArXiv] [QA].
- Specializing Smaller Language Models towards Multi-Step Reasoning - [ArXiv] [QA].
- Adversarial Style Augmentation for Domain Generalization - [ArXiv] [QA].
- Adversarial Style Augmentation for Domain Generalization - [ArXiv] [QA].
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models - [ArXiv] [QA].
- Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling - [ArXiv] [QA].
- Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation - [ArXiv] [QA].
- Cut and Learn for Unsupervised Object Detection and Instance Segmentation - [ArXiv] [QA].
- Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons - [ArXiv] [QA].
- HexPlane: A Fast Representation for Dynamic Scenes - [ArXiv] [QA].
- FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer - [ArXiv] [QA].
- OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation - [ArXiv] [QA].
- Dissociating language and thought in large language models: a cognitive perspective - [ArXiv] [QA].
- TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World - [ArXiv] [QA].
- Learning to Memorize Entailment and Discourse Relations for Persona-Consistent Dialogues - [ArXiv] [QA].
- Pruning Compact ConvNets for Efficient Inference - [ArXiv] [QA].
- Pruning Compact ConvNets for Efficient Inference - [ArXiv] [QA].
- You Truly Understand What I Need: Intellectual and Friendly Dialogue Agents grounding Knowledge and Persona - [ArXiv] [QA].
- Robust Dynamic Radiance Fields - [ArXiv] [QA].
- SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph - [ArXiv] [QA].
- Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes - [ArXiv] [QA].
- Cross Modal Transformer: Towards Fast and Robust 3D Object Detection - [ArXiv] [QA].
- Rethinking Mobile Block for Efficient Attention-based Models - [ArXiv] [QA].
- One-Time Universal Hashing Quantum Digital Signatures without Perfect Keys - [ArXiv] [QA].
- Efficient On-device Training via Gradient Filtering - [ArXiv] [QA].
2022
December 2022
- Rethinking with Retrieval: Faithful Large Language Model Inference - [ArXiv] [QA].
- A Survey on In-context Learning - [ArXiv] [QA].
- Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples - [ArXiv] [QA].
- NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling - [ArXiv] [QA].
- Effects of Data Geometry in Early Deep Learning - [ArXiv] [QA].
- Effects of Data Geometry in Early Deep Learning - [ArXiv] [QA].
- Discriminator-Cooperated Feature Map Distillation for GAN Compression - [ArXiv] [QA].
- SMMix: Self-Motivated Image Mixing for Vision Transformers - [ArXiv] [QA].
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization - [ArXiv] [QA].
- Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized Photography - [ArXiv] [QA].
- Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise - [ArXiv] [QA].
- 3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions - [ArXiv] [QA].
- Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble - [ArXiv] [QA].
- TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization - [ArXiv] [QA].
- Critic-Guided Decoding for Controlled Text Generation - [ArXiv] [QA].
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning - [ArXiv] [QA].
- MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions - [ArXiv] [QA].
- Ontologically Faithful Generation of Non-Player Character Dialogues - [ArXiv] [QA].
- Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers - [ArXiv] [QA].
- A Survey of Deep Learning for Mathematical Reasoning - [ArXiv] [QA].
- Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions - [ArXiv] [QA].
- LAMBADA: Backward Chaining for Automated Reasoning in Natural Language - [ArXiv] [QA].
- Controllable Text Generation with Language Constraints - [ArXiv] [QA].
- Towards Reasoning in Large Language Models: A Survey - [ArXiv] [QA].
- SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers - [ArXiv] [QA].
- Large Language Models Are Reasoning Teachers - [ArXiv] [QA].
- Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters - [ArXiv] [QA].
- Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments - [ArXiv] [QA].
- A Probabilistic Framework for Lifelong Test-Time Adaptation - [ArXiv] [QA].
- Reasoning with Language Model Prompting: A Survey - [ArXiv] [QA].
- Large Language Models are Better Reasoners with Self-Verification - [ArXiv] [QA].
- Latent Diffusion for Language Generation - [ArXiv] [QA].
- Difformer: Empowering Diffusion Models on the Embedding Space for Text Generation - [ArXiv] [QA].
- Discovering Language Model Behaviors with Model-Written Evaluations - [ArXiv] [QA].
- PAL: Persona-Augmented Emotional Support Conversation Generation - [ArXiv] [QA].
- Emergent Analogical Reasoning in Large Language Models - [ArXiv] [QA].
- Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems - [ArXiv] [QA].
- Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model - [ArXiv] [QA].
- Let's Negotiate! A Survey of Negotiation Dialogue Systems - [ArXiv] [QA].
- The Impact of Symbolic Representations on In-context Learning for Few-shot Reasoning - [ArXiv] [QA].
- Teaching Small Language Models to Reason - [ArXiv] [QA].
- Injecting Domain Knowledge in Language Models for Task-Oriented Dialogue Systems - [ArXiv] [QA].
- On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning - [ArXiv] [QA].
- Real-Time Neural Light Field on Mobile Devices - [ArXiv] [QA].
- Constitutional AI: Harmlessness from AI Feedback - [ArXiv] [QA].
- NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior - [ArXiv] [QA].
- PD-Quant: Post-Training Quantization based on Prediction Difference Metric - [ArXiv] [QA].
- Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders - [ArXiv] [QA].
- Doubly Right Object Recognition: A Why Prompt for Visual Rationales - [ArXiv] [QA].
- Genie: Show Me the Data for Quantization - [ArXiv] [QA].
- BEVBert: Multimodal Map Pre-training for Language-guided Navigation - [ArXiv] [QA].
- Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation - [ArXiv] [QA].
- Successive Prompting for Decomposing Complex Questions - [ArXiv] [QA].
- LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models - [ArXiv] [QA].
- Teaching Matters: Investigating the Role of Supervision in Vision Transformers - [ArXiv] [QA].
- EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points - [ArXiv] [QA].
- Diffusion-SDF: Text-to-Shape via Voxelized Diffusion - [ArXiv] [QA].
- Momentum Decoding: Open-ended Text Generation As Graph Exploration - [ArXiv] [QA].
- Fast Point Cloud Generation with Straight Flows - [ArXiv] [QA].
- RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering - [ArXiv] [QA].
- ResFormer: Scaling ViTs with Multi-Resolution Training - [ArXiv] [QA].
- Safe Learning-Based Control of Elastic Joint Robots via Control Barrier Functions - [ArXiv] [QA].
- Language Model Pre-training on True Negatives - [ArXiv] [QA].
- Distilling Reasoning Capabilities into Smaller Language Models - [ArXiv] [QA].
November 2022
- Feature Selection with Distance Correlation - [ArXiv] [QA].
- Fast Inference from Transformers via Speculative Decoding - [ArXiv] [QA].
- PLA: Language-Driven Open-Vocabulary 3D Scene Understanding - [ArXiv] [QA].
- NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers - [ArXiv] [QA].
- Decentralized Learning with Multi-Headed Distillation - [ArXiv] [QA].
- Post-training Quantization on Diffusion Models - [ArXiv] [QA].
- SuS-X: Training-Free Name-Only Transfer of Vision-Language Models - [ArXiv] [QA].
- In-Hand 3D Object Scanning from an RGB Sequence - [ArXiv] [QA].
- DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models - [ArXiv] [QA].
- RUST: Latent Neural Scene Representations from Unposed Imagery - [ArXiv] [QA].
- NeuralUDF: Learning Unsigned Distance Fields for Multi-view Reconstruction of Surfaces with Arbitrary Topologies - [ArXiv] [QA].
- ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision - [ArXiv] [QA].
- SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow - [ArXiv] [QA].
- SfM-TTR: Using Structure from Motion for Test-Time Refinement of Single-View Depth Networks - [ArXiv] [QA].
- Video Test-Time Adaptation for Action Recognition - [ArXiv] [QA].
- TSGP: Two-Stage Generative Prompting for Unsupervised Commonsense Question Answering - [ArXiv] [QA].
- Robust Mean Teacher for Continual and Gradual Test-Time Adaptation - [ArXiv] [QA].
- ActMAD: Activation Matching to Align Distributions for Test-Time-Training - [ArXiv] [QA].
- BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields - [ArXiv] [QA].
- Integrally Pre-Trained Transformer Pyramid Networks - [ArXiv] [QA].
- Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks - [ArXiv] [QA].
- Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations - [ArXiv] [QA].
- OCTET: Object-aware Counterfactual Explanations - [ArXiv] [QA].
- Explaining Image Classifiers with Multiscale Directional Image Representation - [ArXiv] [QA].
- Level-S$^2$fM: Structure from Motion on Neural Level Set of Implicit Surfaces - [ArXiv] [QA].
- PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning - [ArXiv] [QA].
- MATE: Masked Autoencoders are Online 3D Test-Time Learners - [ArXiv] [QA].
- NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization - [ArXiv] [QA].
- Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification - [ArXiv] [QA].
- You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model - [ArXiv] [QA].
- DynIBaR: Neural Dynamic Image-Based Rendering - [ArXiv] [QA].
- Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation - [ArXiv] [QA].
- LidarGait: Benchmarking 3D Gait Recognition with Point Clouds - [ArXiv] [QA].
- PAL: Program-aided Language Models - [ArXiv] [QA].
- Visual Programming: Compositional visual reasoning without training - [ArXiv] [QA].
- CRAFT: Concept Recursive Activation FacTorization for Explainability - [ArXiv] [QA].
- AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders - [ArXiv] [QA].
- MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis - [ArXiv] [QA].
- Holistic Evaluation of Language Models - [ArXiv] [QA].
- Galactica: A Large Language Model for Science - [ArXiv] [QA].
- Stare at What You See: Masked Image Modeling without Reconstruction - [ArXiv] [QA].
- Consistent Direct Time-of-Flight Video Depth Super-Resolution - [ArXiv] [QA].
- Teaching Algorithmic Reasoning via In-context Learning - [ArXiv] [QA].
- EVA: Exploring the Limits of Masked Visual Representation Learning at Scale - [ArXiv] [QA].
- Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding - [ArXiv] [QA].
- PKCAM: Previous Knowledge Channel Attention Module - [ArXiv] [QA].
- PKCAM: Previous Knowledge Channel Attention Module - [ArXiv] [QA].
- What would Harry say? Building Dialogue Agents for Characters in a Story - [ArXiv] [QA].
- OpenGait: Revisiting Gait Recognition Toward Better Practicality - [ArXiv] [QA].
- Masked Contrastive Representation Learning - [ArXiv] [QA].
- Masked Contrastive Representation Learning - [ArXiv] [QA].
- MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation - [ArXiv] [QA].
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model - [ArXiv] [QA].
- Self-conditioned Embedding Diffusion for Text Generation - [ArXiv] [QA].
- Crosslingual Generalization through Multitask Finetuning - [ArXiv] [QA].
- PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales - [ArXiv] [QA].
- Flashlights: An Off-Caustic Lensed Star at Redshift $z$ = 1.26 in Abell 370 - [ArXiv] [QA].
- Late lumping of transformation-based feedback laws for boundary control systems - [ArXiv] [QA].
- Bipartite Mixed Membership Distribution-Free Model. A novel model for community detection in overlapping bipartite weighted networks - [ArXiv] [QA].
- CARE: Causality Reasoning for Empathetic Responses by Conditional Graph Generation - [ArXiv] [QA].
- Evaluating Impact of Social Media Posts by Executives on Stock Prices - [ArXiv] [QA].
October 2022
- SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control - [ArXiv] [QA].
- GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers - [ArXiv] [QA].
- DiffusER: Discrete Diffusion via Edit-based Reconstruction - [ArXiv] [QA].
- Contrastive Decoding: Open-ended Text Generation as Optimization - [ArXiv] [QA].
- Streaming Radiance Fields for 3D Video Synthesis - [ArXiv] [QA].
- Contrastive Search Is What You Need For Neural Text Generation - [ArXiv] [QA].
- FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation - [ArXiv] [QA].
- DANLI: Deliberative Agent for Following Natural Language Instructions - [ArXiv] [QA].
- Towards Efficient Dialogue Pre-training with Transferable and Interpretable Latent Structure - [ArXiv] [QA].
- Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation - [ArXiv] [QA].
- There Is No Standard Answer: Knowledge-Grounded Dialogue Generation with Adversarial Activated Multi-Reference Learning - [ArXiv] [QA].
- WikiWhy: Answering and Explaining Cause-and-Effect Questions - [ArXiv] [QA].
- Large Language Models Can Self-Improve - [ArXiv] [QA].
- Scaling Instruction-Finetuned Language Models - [ArXiv] [QA].
- Scaling Laws for Reward Model Overoptimization - [ArXiv] [QA].
- DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Generation - [ArXiv] [QA].
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them - [ArXiv] [QA].
- DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models - [ArXiv] [QA].
- Keep Me Updated! Memory Management in Long-term Conversations - [ArXiv] [QA].
- Data-Efficient Augmentation for Training Neural Networks - [ArXiv] [QA].
- Data-Efficient Augmentation for Training Neural Networks - [ArXiv] [QA].
- DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation - [ArXiv] [QA].
- Language Models of Code are Few-Shot Commonsense Learners - [ArXiv] [QA].
- Explanations from Large Language Models Make Small Reasoners Better - [ArXiv] [QA].
- Large Language Models are few(1)-shot Table Reasoners - [ArXiv] [QA].
- Masked Motion Encoding for Self-Supervised Video Representation Learning - [ArXiv] [QA].
- Mind's Eye: Grounded Language Model Reasoning through Simulation - [ArXiv] [QA].
- Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning - [ArXiv] [QA].
- Uncertainty-Aware Unsupervised Image Deblurring with Deep Residual Prior - [ArXiv] [QA].
- Controllable Dialogue Simulation with In-Context Learning - [ArXiv] [QA].
- Don't Lose Yourself! Empathetic Response Generation via Explicit Self-Other Awareness - [ArXiv] [QA].
- Automatic Chain of Thought Prompting in Large Language Models - [ArXiv] [QA].
- Measuring and Narrowing the Compositionality Gap in Language Models - [ArXiv] [QA].
- FAST: Improving Controllability for Text Generation with Feedback Aware Self-Training - [ArXiv] [QA].
- VIMA: General Robot Manipulation with Multimodal Prompts - [ArXiv] [QA].
- Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering - [ArXiv] [QA].
- Language Models are Multilingual Chain-of-Thought Reasoners - [ArXiv] [QA].
- A Distributional Lens for Multi-Aspect Controllable Text Generation - [ArXiv] [QA].
- ReAct: Synergizing Reasoning and Acting in Language Models - [ArXiv] [QA].
- GLM-130B: An Open Bilingual Pre-trained Model - [ArXiv] [QA].
- Decomposed Prompting: A Modular Approach for Solving Complex Tasks - [ArXiv] [QA].
- CorefDiffs: Co-referential and Differential Knowledge Flow in Document Grounded Conversations - [ArXiv] [QA].
- Group Personalized Federated Learning - [ArXiv] [QA].
- Group Personalized Federated Learning - [ArXiv] [QA].
- Knowledge Unlearning for Mitigating Privacy Risks in Language Models - [ArXiv] [QA].
- Extraneousness-Aware Imitation Learning - [ArXiv] [QA].
- Extraneousness-Aware Imitation Learning - [ArXiv] [QA].
- Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization - [ArXiv] [QA].
- Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought - [ArXiv] [QA].
- Complexity-Based Prompting for Multi-Step Reasoning - [ArXiv] [QA].
- "Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction - [ArXiv] [QA].
- NeRF: Neural Radiance Field in 3D Vision, A Comprehensive Review - [ArXiv] [QA].
- Multimodal Analogical Reasoning over Knowledge Graphs - [ArXiv] [QA].
September 2022
- Compositional Semantic Parsing with Large Language Models - [ArXiv] [QA].
- Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning - [ArXiv] [QA].
- Improving alignment of dialogue agents via targeted human judgements - [ArXiv] [QA].
- Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts - [ArXiv] [QA].
- Target-Guided Open-Domain Conversation Planning - [ArXiv] [QA].
- Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering - [ArXiv] [QA].
- Loc-NeRF: Monte Carlo Localization using Neural Radiance Fields - [ArXiv] [QA].
- A Benchmark for Understanding and Generating Dialogue between Characters in Stories - [ArXiv] [QA].
- Psychologically-informed chain-of-thought prompts for metaphor understanding in large language models - [ArXiv] [QA].
- A Geometric Perspective on Variational Autoencoders - [ArXiv] [QA].
- Selective Annotation Makes Language Models Better Few-Shot Learners - [ArXiv] [QA].
August 2022
- Radon concentration variations at the Yangyang underground laboratory - [ArXiv] [QA].
- Faithful Reasoning Using Large Language Models - [ArXiv] [QA].
- Masked Autoencoders Enable Efficient Knowledge Distillers - [ArXiv] [QA].
- Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned - [ArXiv] [QA].
- Improving Personality Consistency in Conversation by Persona Extending - [ArXiv] [QA].
- CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation - [ArXiv] [QA].
- Follow Me: Conversation Planning for Target-driven Recommendation Dialogue Systems - [ArXiv] [QA].
- BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage - [ArXiv] [QA].
- Character Generation through Self-Supervised Vectorization - [ArXiv] [QA].
- Character Generation through Self-Supervised Vectorization - [ArXiv] [QA].
- Composable Text Controls in Latent Space with ODEs - [ArXiv] [QA].
July 2022
- MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures - [ArXiv] [QA].
- Visual correspondence-based explanations improve AI robustness and human-AI team accuracy - [ArXiv] [QA].
- Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning - [ArXiv] [QA].
- Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent - [ArXiv] [QA].
- Language Model Cascades - [ArXiv] [QA].
- Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability - [ArXiv] [QA].
- Language models show human-like content effects on reasoning - [ArXiv] [QA].
- Inner Monologue: Embodied Reasoning through Planning with Language Models - [ArXiv] [QA].
- Bootstrapping a User-Centered Task-Oriented Dialogue System - [ArXiv] [QA].
- LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action - [ArXiv] [QA].
- Back to the Source: Diffusion-Driven Test-Time Adaptation - [ArXiv] [QA].
- PVO: Panoptic Visual Odometry - [ArXiv] [QA].
- Rationale-Augmented Ensembles in Language Models - [ArXiv] [QA].
June 2022
- Solving Quantitative Reasoning Problems with Language Models - [ArXiv] [QA].
- Invariant Causal Mechanisms through Distribution Matching - [ArXiv] [QA].
- Invariant Causal Mechanisms through Distribution Matching - [ArXiv] [QA].
- GODEL: Large-Scale Pre-Training for Goal-Directed Dialog - [ArXiv] [QA].
- KiloNeuS: A Versatile Neural Implicit Surface Representation for Real-Time Rendering - [ArXiv] [QA].
- Marginal Tail-Adaptive Normalizing Flows - [ArXiv] [QA].
- Marginal Tail-Adaptive Normalizing Flows - [ArXiv] [QA].
- MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge - [ArXiv] [QA].
- Balancing Discriminability and Transferability for Source-Free Domain Adaptation - [ArXiv] [QA].
- Emergent Abilities of Large Language Models - [ArXiv] [QA].
- Confidence Score for Source-Free Unsupervised Domain Adaptation - [ArXiv] [QA].
- Transformers are Meta-Reinforcement Learners - [ArXiv] [QA].
- Transformers are Meta-Reinforcement Learners - [ArXiv] [QA].
- Language Models are General-Purpose Interfaces - [ArXiv] [QA].
- Mining Multi-Label Samples from Single Positive Labels - [ArXiv] [QA].
- Mining Multi-Label Samples from Single Positive Labels - [ArXiv] [QA].
- Building a Personalized Dialogue System with Prompt-Tuning - [ArXiv] [QA].
- Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models - [ArXiv] [QA].
- Spatial-temporal Concept based Explanation of 3D ConvNets - [ArXiv] [QA].
- MobileOne: An Improved One millisecond Mobile Backbone - [ArXiv] [QA].
- Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering - [ArXiv] [QA].
- Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation - [ArXiv] [QA].
- Making Large Language Models Better Reasoners with Step-Aware Verifier - [ArXiv] [QA].
- PROMISSING: Pruning Missing Values in Neural Networks - [ArXiv] [QA].
- PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images - [ArXiv] [QA].
- Unified Recurrence Modeling for Video Action Anticipation - [ArXiv] [QA].
- Unified Recurrence Modeling for Video Action Anticipation - [ArXiv] [QA].
- NIPQ: Noise proxy-based Integrated Pseudo-Quantization - [ArXiv] [QA].
- Hopular: Modern Hopfield Networks for Tabular Data - [ArXiv] [QA].
- One- and two-dimensional solitons in spin-orbit-coupled Bose-Einstein condensates with fractional kinetic energy - [ArXiv] [QA].
- A Theoretical Framework for Inference Learning - [ArXiv] [QA].
May 2022
- New asymptotically flat static vacuum metrics with near Euclidean boundary data - [ArXiv] [QA].
- itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection - [ArXiv] [QA].
- Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning - [ArXiv] [QA].
- Robust Weight Perturbation for Adversarial Training - [ArXiv] [QA].
- Robust Weight Perturbation for Adversarial Training - [ArXiv] [QA].
- CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI - [ArXiv] [QA].
- CoNT: Contrastive Neural Text Generation - [ArXiv] [QA].
- Controllable Text Generation with Neurally-Decomposed Oracle - [ArXiv] [QA].
- Diffusion-LM Improves Controllable Text Generation - [ArXiv] [QA].
- GIT: A Generative Image-to-text Transformer for Vision and Language - [ArXiv] [QA].
- Prototype Based Classification from Hierarchy to Fairness - [ArXiv] [QA].
- Prototype Based Classification from Hierarchy to Fairness - [ArXiv] [QA].
- Quark: Controllable Text Generation with Reinforced Unlearning - [ArXiv] [QA].
- RSTGen: Imbuing Fine-Grained Interpretable Control into Long-FormText Generators - [ArXiv] [QA].
- TALM: Tool Augmented Language Models - [ArXiv] [QA].
- Large Language Models are Zero-Shot Reasoners - [ArXiv] [QA].
- Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations - [ArXiv] [QA].
- PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection - [ArXiv] [QA].
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models - [ArXiv] [QA].
- RankGen: Improving Text Generation with Large Ranking Models - [ArXiv] [QA].
- Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning - [ArXiv] [QA].
- Learning Graph Structure from Convolutional Mixtures - [ArXiv] [QA].
- Learning Graph Structure from Convolutional Mixtures - [ArXiv] [QA].
- Target-Guided Dialogue Response Generation Using Commonsense and Data Augmentation - [ArXiv] [QA].
- Robust Losses for Learning Value Functions - [ArXiv] [QA].
- Robust Losses for Learning Value Functions - [ArXiv] [QA].
- LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning - [ArXiv] [QA].
- Long-term Control for Dialogue Generation: Methods and Evaluation - [ArXiv] [QA].
- Reduce Information Loss in Transformers for Pluralistic Image Inpainting - [ArXiv] [QA].
- Towards a Progression-Aware Autonomous Dialogue Agent - [ArXiv] [QA].
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning - [ArXiv] [QA].
- Spiking Graph Convolutional Networks - [ArXiv] [QA].
- Spiking Graph Convolutional Networks - [ArXiv] [QA].
- A Simple Contrastive Learning Objective for Alleviating Neural Text Degeneration - [ArXiv] [QA].
- Lexical Knowledge Internalization for Neural Dialog Generation - [ArXiv] [QA].
- Learning to Transfer Prompts for Text Generation - [ArXiv] [QA].
- OPT: Open Pre-trained Transformer Language Models - [ArXiv] [QA].
April 2022
- Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models - [ArXiv] [QA].
- Flamingo: a Visual Language Model for Few-Shot Learning - [ArXiv] [QA].
- Control Globally, Understand Locally: A Global-to-Local Hierarchical Graph Network for Emotional Support Conversation - [ArXiv] [QA].
- MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation - [ArXiv] [QA].
- Meet Your Favorite Character: Open-domain Chatbot Mimicking Fictional Characters with only a Few Utterances - [ArXiv] [QA].
- Sharper Utility Bounds for Differentially Private Models - [ArXiv] [QA].
- Sharper Utility Bounds for Differentially Private Models - [ArXiv] [QA].
- Towards Multi-Turn Empathetic Dialogs with Positive Emotion Elicitation - [ArXiv] [QA].
- Event Transition Planning for Open-ended Text Generation - [ArXiv] [QA].
- Visio-Linguistic Brain Encoding - [ArXiv] [QA].
- Visio-Linguistic Brain Encoding - [ArXiv] [QA].
- A Personalized Dialogue Generator with Implicit User Persona Detection - [ArXiv] [QA].
- LaMemo: Language Modeling with Look-Ahead Memory - [ArXiv] [QA].
- GPT-NeoX-20B: An Open-Source Autoregressive Language Model - [ArXiv] [QA].
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback - [ArXiv] [QA].
- Stylized Knowledge-Grounded Dialogue Generation via Disentangled Template Rewriting - [ArXiv] [QA].
- Federated Learning with Partial Model Personalization - [ArXiv] [QA].
- Federated Learning with Partial Model Personalization - [ArXiv] [QA].
- Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy - [ArXiv] [QA].
- Knowledge Infused Decoding - [ArXiv] [QA].
- Knowledge Infused Decoding - [ArXiv] [QA].
- Towards An End-to-End Framework for Flow-Guided Video Inpainting - [ArXiv] [QA].
- There Are a Thousand Hamlets in a Thousand People's Eyes: Enhancing Knowledge-grounded Dialogue with Personal Memory - [ArXiv] [QA].
- Efficient Test-Time Model Adaptation without Forgetting - [ArXiv] [QA].
- C3KG: A Chinese Commonsense Conversation Knowledge Graph - [ArXiv] [QA].
- Can language models learn from explanations in context? - [ArXiv] [QA].
- PaLM: Scaling Language Modeling with Pathways - [ArXiv] [QA].
- $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text Generation - [ArXiv] [QA].
- Learning Neural Acoustic Fields - [ArXiv] [QA].
- Learning Neural Acoustic Fields - [ArXiv] [QA].
- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances - [ArXiv] [QA].
- Value Gradient weighted Model-Based Reinforcement Learning - [ArXiv] [QA].
- Value Gradient weighted Model-Based Reinforcement Learning - [ArXiv] [QA].
- Probabilistic Implicit Scene Completion - [ArXiv] [QA].
- Probabilistic Implicit Scene Completion - [ArXiv] [QA].
- Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language - [ArXiv] [QA].
March 2022
- R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis - [ArXiv] [QA].
- MAT: Mask-Aware Transformer for Large Hole Image Inpainting - [ArXiv] [QA].
- Generalizing Few-Shot NAS with Gradient Matching - [ArXiv] [QA].
- Generalizing Few-Shot NAS with Gradient Matching - [ArXiv] [QA].
- STaR: Bootstrapping Reasoning With Reasoning - [ArXiv] [QA].
- Continual Test-Time Domain Adaptation - [ArXiv] [QA].
- MISC: A MIxed Strategy-Aware Model Integrating COMET for Emotional Support Conversation - [ArXiv] [QA].
- A Comparative Survey of Deep Active Learning - [ArXiv] [QA].
- Linking Emergent and Natural Languages via Corpus Transfer - [ArXiv] [QA].
- Linking Emergent and Natural Languages via Corpus Transfer - [ArXiv] [QA].
- Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition - [ArXiv] [QA].
- Language modeling via stochastic processes - [ArXiv] [QA].
- Language modeling via stochastic processes - [ArXiv] [QA].
- Self-Consistency Improves Chain of Thought Reasoning in Language Models - [ArXiv] [QA].
- Teaching language models to support answers with verified quotes - [ArXiv] [QA].
- Towards Large-Scale Interpretable Knowledge Graph Reasoning for Dialogue Systems - [ArXiv] [QA].
- On Robust Prefix-Tuning for Text Classification - [ArXiv] [QA].
- On Robust Prefix-Tuning for Text Classification - [ArXiv] [QA].
- Generative Principal Component Analysis - [ArXiv] [QA].
- Generative Principal Component Analysis - [ArXiv] [QA].
- Monotonic Differentiable Sorting Networks - [ArXiv] [QA].
- A Framework and Benchmark for Deep Batch Active Learning for Regression - [ArXiv] [QA].
- RoMe: A Robust Metric for Evaluating Natural Language Generation - [ArXiv] [QA].
- PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation - [ArXiv] [QA].
- Memorizing Transformers - [ArXiv] [QA].
- Memorizing Transformers - [ArXiv] [QA].
- Multi-Stage Prompting for Knowledgeable Dialogue Generation - [ArXiv] [QA].
- Differentiable DAG Sampling - [ArXiv] [QA].
- Differentiable DAG Sampling - [ArXiv] [QA].
- Iteratively Prompt Pre-trained Language Models for Chain of Thought - [ArXiv] [QA].
- Unified Visual Transformer Compression - [ArXiv] [QA].
- Unified Visual Transformer Compression - [ArXiv] [QA].
- Vision-Based Manipulators Need to Also See from Their Hands - [ArXiv] [QA].
- Vision-Based Manipulators Need to Also See from Their Hands - [ArXiv] [QA].
- Orchestrated Value Mapping for Reinforcement Learning - [ArXiv] [QA].
- Orchestrated Value Mapping for Reinforcement Learning - [ArXiv] [QA].
- BiBERT: Accurate Fully Binarized BERT - [ArXiv] [QA].
- MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image Inpainting - [ArXiv] [QA].
- An Interpretable Neuro-Symbolic Reasoning Framework for Task-Oriented Dialogue Generation - [ArXiv] [QA].
- Long Time No See! Open-Domain Conversation with Long-Term Persona Memory - [ArXiv] [QA].
- Source-free Video Domain Adaptation by Learning Temporal Consistency for Action Recognition - [ArXiv] [QA].
- Kubric: A scalable dataset generator - [ArXiv] [QA].
- Adaptive Cross-Layer Attention for Image Restoration - [ArXiv] [QA].
- Adaptive Cross-Layer Attention for Image Restoration - [ArXiv] [QA].
- Neural Simulated Annealing - [ArXiv] [QA].
- Neural Simulated Annealing - [ArXiv] [QA].
- Training language models to follow instructions with human feedback - [ArXiv] [QA].
- Self-Supervised Scene Flow Estimation with 4-D Automotive Radar - [ArXiv] [QA].
- Follow-Up of Extended Shells around B[e] Stars - [ArXiv] [QA].
- Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding - [ArXiv] [QA].
- MERIt: Meta-Path Guided Contrastive Learning for Logical Reasoning - [ArXiv] [QA].
February 2022
- Rethinking and Refining the Distinct Metric - [ArXiv] [QA].
- The Spectral Bias of Polynomial Neural Networks - [ArXiv] [QA].
- The Spectral Bias of Polynomial Neural Networks - [ArXiv] [QA].
- AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation - [ArXiv] [QA].
- Ask2Mask: Guided Data Selection for Masked Speech Modeling - [ArXiv] [QA].
- Ask2Mask: Guided Data Selection for Masked Speech Modeling - [ArXiv] [QA].
- Auto-scaling Vision Transformers without Training - [ArXiv] [QA].
- Auto-scaling Vision Transformers without Training - [ArXiv] [QA].
- COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics - [ArXiv] [QA].
- Pseudo Numerical Methods for Diffusion Models on Manifolds - [ArXiv] [QA].
- Pseudo Numerical Methods for Diffusion Models on Manifolds - [ArXiv] [QA].
- Bit-wise Training of Neural Network Weights - [ArXiv] [QA].
- Bit-wise Training of Neural Network Weights - [ArXiv] [QA].
- Gaussian Mixture Convolution Networks - [ArXiv] [QA].
- Gaussian Mixture Convolution Networks - [ArXiv] [QA].
- cosFormer: Rethinking Softmax in Attention - [ArXiv] [QA].
- cosFormer: Rethinking Softmax in Attention - [ArXiv] [QA].
- Task-Agnostic Graph Explanations - [ArXiv] [QA].
- Task-Agnostic Graph Explanations - [ArXiv] [QA].
- Don't Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis - [ArXiv] [QA].
- A precortical module for robust CNNs to light variations - [ArXiv] [QA].
- A precortical module for robust CNNs to light variations - [ArXiv] [QA].
- Domain Adaptation via Prompt Learning - [ArXiv] [QA].
- FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows - [ArXiv] [QA].
- A Contrastive Framework for Neural Text Generation - [ArXiv] [QA].
- Conditional Contrastive Learning with Kernel - [ArXiv] [QA].
- Conditional Contrastive Learning with Kernel - [ArXiv] [QA].
- Domain Adversarial Training: A Game Perspective - [ArXiv] [QA].
- Domain Adversarial Training: A Game Perspective - [ArXiv] [QA].
- GiraffeDet: A Heavy-Neck Paradigm for Object Detection - [ArXiv] [QA].
- GiraffeDet: A Heavy-Neck Paradigm for Object Detection - [ArXiv] [QA].
- Survey of Hallucination in Natural Language Generation - [ArXiv] [QA].
- GrASP: Gradient-Based Affordance Selection for Planning - [ArXiv] [QA].
- GrASP: Gradient-Based Affordance Selection for Planning - [ArXiv] [QA].
- Message Passing Neural PDE Solvers - [ArXiv] [QA].
- Message Passing Neural PDE Solvers - [ArXiv] [QA].
- User Satisfaction Estimation with Sequential Dialogue Act Modeling in Goal-oriented Conversational Systems - [ArXiv] [QA].
- A Survey on Retrieval-Augmented Text Generation - [ArXiv] [QA].
- CLA-NeRF: Category-Level Articulated Neural Radiance Field - [ArXiv] [QA].
January 2022
- Signing the Supermask: Keep, Hide, Invert - [ArXiv] [QA].
- Signing the Supermask: Keep, Hide, Invert - [ArXiv] [QA].
- Few-Shot Backdoor Attacks on Visual Object Tracking - [ArXiv] [QA].
- Few-Shot Backdoor Attacks on Visual Object Tracking - [ArXiv] [QA].
- Robust Imitation Learning from Corrupted Demonstrations - [ArXiv] [QA].
- Robust Imitation Learning from Corrupted Demonstrations - [ArXiv] [QA].
- Counterfactual Plans under Distributional Ambiguity - [ArXiv] [QA].
- Counterfactual Plans under Distributional Ambiguity - [ArXiv] [QA].
- DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR - [ArXiv] [QA].
- DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR - [ArXiv] [QA].
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model - [ArXiv] [QA].
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models - [ArXiv] [QA].
- DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence - [ArXiv] [QA].
- Natural Language Descriptions of Deep Visual Features - [ArXiv] [QA].
- Natural Language Descriptions of Deep Visual Features - [ArXiv] [QA].
- Explanatory Learning: Beyond Empiricism in Neural Networks - [ArXiv] [QA].
- Explanatory Learning: Beyond Empiricism in Neural Networks - [ArXiv] [QA].
- RePaint: Inpainting using Denoising Diffusion Probabilistic Models - [ArXiv] [QA].
- Learning Graph Augmentations to Learn Graph Representations - [ArXiv] [QA].
- Patches Are All You Need? - [ArXiv] [QA].
- Patches Are All You Need? - [ArXiv] [QA].
- Fast Differentiable Matrix Square Root - [ArXiv] [QA].
- Fast Differentiable Matrix Square Root - [ArXiv] [QA].
- LaMDA: Language Models for Dialog Applications - [ArXiv] [QA].
- Safe Deep RL in 3D Environments using Human Feedback - [ArXiv] [QA].
- Safe Deep RL in 3D Environments using Human Feedback - [ArXiv] [QA].
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents - [ArXiv] [QA].
- Parameter-free Online Test-time Adaptation - [ArXiv] [QA].
- A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models - [ArXiv] [QA].
- Neural Circuit Architectural Priors for Embodied Control - [ArXiv] [QA].
- Neural Circuit Architectural Priors for Embodied Control - [ArXiv] [QA].
- QuadTree Attention for Vision Transformers - [ArXiv] [QA].
- QuadTree Attention for Vision Transformers - [ArXiv] [QA].
- C2-CRS: Coarse-to-Fine Contrastive Learning for Conversational Recommender System - [ArXiv] [QA].
- Global existence and decay estimates for a viscoelastic plate equation with nonlinear damping and logarithmic nonlinearity - [ArXiv] [QA].
2021
December 2021
- Optimal Representations for Covariate Shift - [ArXiv] [QA].
- Optimal Representations for Covariate Shift - [ArXiv] [QA].
- On the Role of Neural Collapse in Transfer Learning - [ArXiv] [QA].
- On the Role of Neural Collapse in Transfer Learning - [ArXiv] [QA].
- Self Reward Design with Fine-grained Interpretability - [ArXiv] [QA].
- Self Reward Design with Fine-grained Interpretability - [ArXiv] [QA].
- Generative Kernel Continual learning - [ArXiv] [QA].
- Transformers Can Do Bayesian Inference - [ArXiv] [QA].
- WebGPT: Browser-assisted question-answering with human feedback - [ArXiv] [QA].
- NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics - [ArXiv] [QA].
- Reframing Human-AI Collaboration for Generating Free-Text Explanations - [ArXiv] [QA].
- Learning to Prompt for Continual Learning - [ArXiv] [QA].
- Learning to Prompt for Continual Learning - [ArXiv] [QA].
- Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge - [ArXiv] [QA].
- Rethinking Nearest Neighbors for Visual Classification - [ArXiv] [QA].
- Improving Conversational Recommendation Systems' Quality with Context-Aware Item Meta Information - [ArXiv] [QA].
- Massive-scale Decoding for Text Generation using Lattices - [ArXiv] [QA].
- MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation - [ArXiv] [QA].
- Real-Time Neural Voice Camouflage - [ArXiv] [QA].
- Real-Time Neural Voice Camouflage - [ArXiv] [QA].
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts - [ArXiv] [QA].
- Step-unrolled Denoising Autoencoders for Text Generation - [ArXiv] [QA].
- Step-unrolled Denoising Autoencoders for Text Generation - [ArXiv] [QA].
- CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability - [ArXiv] [QA].
- Self-Supervised Bot Play for Conversational Recommendation with Justifications - [ArXiv] [QA].
- On Convergence of Federated Averaging Langevin Dynamics - [ArXiv] [QA].
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher - [ArXiv] [QA].
- Pareto Domain Adaptation - [ArXiv] [QA].
- Pareto Domain Adaptation - [ArXiv] [QA].
- DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover's Distance Improves Out-Of-Distribution Face Identification - [ArXiv] [QA].
- Universalizing Weak Supervision - [ArXiv] [QA].
- Universalizing Weak Supervision - [ArXiv] [QA].
- Genetic Algorithm for Constrained Molecular Inverse Design - [ArXiv] [QA].
- Genetic Algorithm for Constrained Molecular Inverse Design - [ArXiv] [QA].
- Variational Wasserstein gradient flow - [ArXiv] [QA].
- Variational Wasserstein gradient flow - [ArXiv] [QA].
- Linear algebra with transformers - [ArXiv] [QA].
- Linear algebra with transformers - [ArXiv] [QA].
- Mind the gap in university rankings: a complex network approach towards fairness - [ArXiv] [QA].
- Magnetic correction to the Anomalous Magnetic Moment of Electron - [ArXiv] [QA].
- Neural Stochastic Dual Dynamic Programming - [ArXiv] [QA].
- Neural Stochastic Dual Dynamic Programming - [ArXiv] [QA].
- A General Language Assistant as a Laboratory for Alignment - [ArXiv] [QA].
- Routing with Self-Attention for Multimodal Capsule Networks - [ArXiv] [QA].
- Routing with Self-Attention for Multimodal Capsule Networks - [ArXiv] [QA].
November 2021
- Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective - [ArXiv] [QA].
- GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection - [ArXiv] [QA].
- Group equivariant neural posterior estimation - [ArXiv] [QA].
- Group equivariant neural posterior estimation - [ArXiv] [QA].
- Node-Level Differentially Private Graph Neural Networks - [ArXiv] [QA].
- Node-Level Differentially Private Graph Neural Networks - [ArXiv] [QA].
- Deep Point Cloud Reconstruction - [ArXiv] [QA].
- Deep Point Cloud Reconstruction - [ArXiv] [QA].
- Lossless Compression with Probabilistic Circuits - [ArXiv] [QA].
- Lossless Compression with Probabilistic Circuits - [ArXiv] [QA].
- Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction - [ArXiv] [QA].
- Plant 'n' Seek: Can You Find the Winning Ticket? - [ArXiv] [QA].
- Plant 'n' Seek: Can You Find the Winning Ticket? - [ArXiv] [QA].
- Deep Probability Estimation - [ArXiv] [QA].
- Deep Probability Estimation - [ArXiv] [QA].
- Are Vision Transformers Robust to Patch Perturbations? - [ArXiv] [QA].
- Are Vision Transformers Robust to Patch Perturbations? - [ArXiv] [QA].
- Deep Safe Multi-Task Learning - [ArXiv] [QA].
- Deep Safe Multi-Task Learning - [ArXiv] [QA].
- Selective Ensembles for Consistent Predictions - [ArXiv] [QA].
- Bolstering Stochastic Gradient Descent with Model Building - [ArXiv] [QA].
- Bolstering Stochastic Gradient Descent with Model Building - [ArXiv] [QA].
- Sliced Recursive Transformer - [ArXiv] [QA].
- Sliced Recursive Transformer - [ArXiv] [QA].
- MT3: Multi-Task Multitrack Music Transcription - [ArXiv] [QA].
- MT3: Multi-Task Multitrack Music Transcription - [ArXiv] [QA].
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - [ArXiv] [QA].
- DAGSurv: Directed Acyclic Graph Based Survival Analysis Using Deep Neural Networks - [ArXiv] [QA].
- Can Vision Transformers Perform Convolution? - [ArXiv] [QA].
- Can Vision Transformers Perform Convolution? - [ArXiv] [QA].
- LSTA-Net: Long short-term Spatio-Temporal Aggregation Network for Skeleton-based Action Recognition - [ArXiv] [QA].
October 2021
- Template Filling for Controllable Commonsense Reasoning - [ArXiv] [QA].
- Improving Fairness via Federated Learning - [ArXiv] [QA].
- Improving Fairness via Federated Learning - [ArXiv] [QA].
- The magnitude vector of images - [ArXiv] [QA].
- The magnitude vector of images - [ArXiv] [QA].
- Training Verifiers to Solve Math Word Problems - [ArXiv] [QA].
- s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning - [ArXiv] [QA].
- The Efficiency Misnomer - [ArXiv] [QA].
- The Efficiency Misnomer - [ArXiv] [QA].
- Double Trouble: How to not explain a text classifier's decisions using counterfactuals synthesized by masked language models? - [ArXiv] [QA].
- Center Loss Regularization for Continual Learning - [ArXiv] [QA].
- Center Loss Regularization for Continual Learning - [ArXiv] [QA].
- Fast Model Editing at Scale - [ArXiv] [QA].
- Fast Model Editing at Scale - [ArXiv] [QA].
- BERMo: What can BERT learn from ELMo? - [ArXiv] [QA].
- BERMo: What can BERT learn from ELMo? - [ArXiv] [QA].
- TLDR: Twin Learning for Dimensionality Reduction - [ArXiv] [QA].
- TLDR: Twin Learning for Dimensionality Reduction - [ArXiv] [QA].
- Natural Attribute-based Shift Detection - [ArXiv] [QA].
- Natural Attribute-based Shift Detection - [ArXiv] [QA].
- Illiterate DALL-E Learns to Compose - [ArXiv] [QA].
- Illiterate DALL-E Learns to Compose - [ArXiv] [QA].
- Multimodal Dialogue Response Generation - [ArXiv] [QA].
- Comparing Human and Machine Bias in Face Recognition - [ArXiv] [QA].
- Comparing Human and Machine Bias in Face Recognition - [ArXiv] [QA].
- Generated Knowledge Prompting for Commonsense Reasoning - [ArXiv] [QA].
- On Learning the Transformer Kernel - [ArXiv] [QA].
- On Learning the Transformer Kernel - [ArXiv] [QA].
- Multitask Prompted Training Enables Zero-Shot Task Generalization - [ArXiv] [QA].
- Few-Shot Bot: Prompt-Based Learning for Dialogue Systems - [ArXiv] [QA].
- On-Policy Model Errors in Reinforcement Learning - [ArXiv] [QA].
- On-Policy Model Errors in Reinforcement Learning - [ArXiv] [QA].
- ContraQA: Question Answering under Contradicting Contexts - [ArXiv] [QA].
- ContraQA: Question Answering under Contradicting Contexts - [ArXiv] [QA].
- RecInDial: A Unified Framework for Conversational Recommendation with Pretrained Language Models - [ArXiv] [QA].
- Parallel Deep Neural Networks Have Zero Duality Gap - [ArXiv] [QA].
- Parallel Deep Neural Networks Have Zero Duality Gap - [ArXiv] [QA].
- Causal discovery from conditionally stationary time-series - [ArXiv] [QA].
- Causal discovery from conditionally stationary time-series - [ArXiv] [QA].
- Molecular Graph Generation via Geometric Scattering - [ArXiv] [QA].
- Molecular Graph Generation via Geometric Scattering - [ArXiv] [QA].
- DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational Transformer - [ArXiv] [QA].
- Relative Molecule Self-Attention Transformer - [ArXiv] [QA].
- Relative Molecule Self-Attention Transformer - [ArXiv] [QA].
- Certified Patch Robustness via Smoothed Vision Transformers - [ArXiv] [QA].
- Certified Patch Robustness via Smoothed Vision Transformers - [ArXiv] [QA].
- Global Vision Transformer Pruning with Hessian-Aware Saliency - [ArXiv] [QA].
- Long Expressive Memory for Sequence Modeling - [ArXiv] [QA].
- Long Expressive Memory for Sequence Modeling - [ArXiv] [QA].
- Multi-Agent MDP Homomorphic Networks - [ArXiv] [QA].
- Multi-Agent MDP Homomorphic Networks - [ArXiv] [QA].
- Neural Link Prediction with Walk Pooling - [ArXiv] [QA].
- Neural Link Prediction with Walk Pooling - [ArXiv] [QA].
- FRL: Federated Rank Learning - [ArXiv] [QA].
- On the Limitations of Multimodal VAEs - [ArXiv] [QA].
- On the Limitations of Multimodal VAEs - [ArXiv] [QA].
- Token Pooling in Vision Transformers - [ArXiv] [QA].
- FOCUS: Familiar Objects in Common and Uncommon Settings - [ArXiv] [QA].
- FOCUS: Familiar Objects in Common and Uncommon Settings - [ArXiv] [QA].
- Hyperparameter Tuning with Renyi Differential Privacy - [ArXiv] [QA].
- Adversarial Retriever-Ranker for dense text retrieval - [ArXiv] [QA].
- Adversarial Retriever-Ranker for dense text retrieval - [ArXiv] [QA].
- RAR: Region-Aware Point Cloud Registration - [ArXiv] [QA].
- RAR: Region-Aware Point Cloud Registration - [ArXiv] [QA].
- Cartoon Explanations of Image Classifiers - [ArXiv] [QA].
- Cartoon Explanations of Image Classifiers - [ArXiv] [QA].
- Situated Dialogue Learning through Procedural Environment Generation - [ArXiv] [QA].
- On the Optimal Memorization Power of ReLU Neural Networks - [ArXiv] [QA].
- On the Optimal Memorization Power of ReLU Neural Networks - [ArXiv] [QA].
- Generative Modeling with Optimal Transport Maps - [ArXiv] [QA].
- Generative Modeling with Optimal Transport Maps - [ArXiv] [QA].
- Federated Learning via Plurality Vote - [ArXiv] [QA].
- Federated Learning via Plurality Vote - [ArXiv] [QA].
- Nested Policy Reinforcement Learning - [ArXiv] [QA].
- Nested Policy Reinforcement Learning - [ArXiv] [QA].
- How BPE Affects Memorization in Transformers - [ArXiv] [QA].
- How BPE Affects Memorization in Transformers - [ArXiv] [QA].
- On The Transferability of Deep-Q Networks - [ArXiv] [QA].
- On The Transferability of Deep-Q Networks - [ArXiv] [QA].
- Test-time Batch Statistics Calibration for Covariate Shift - [ArXiv] [QA].
- Test-time Batch Statistics Calibration for Covariate Shift - [ArXiv] [QA].
- Geometric Algebra Attention Networks for Small Point Clouds - [ArXiv] [QA].
- Geometric Algebra Attention Networks for Small Point Clouds - [ArXiv] [QA].
- EntQA: Entity Linking as Question Answering - [ArXiv] [QA].
- EntQA: Entity Linking as Question Answering - [ArXiv] [QA].
- Autoregressive Diffusion Models - [ArXiv] [QA].
- Autoregressive Diffusion Models - [ArXiv] [QA].
- Generalized Kernel Thinning - [ArXiv] [QA].
- Generalized Kernel Thinning - [ArXiv] [QA].
- Batch size-invariance for policy optimization - [ArXiv] [QA].
- Batch size-invariance for policy optimization - [ArXiv] [QA].
- Dynamics of targeted ransomware negotiation - [ArXiv] [QA].
- Vision-Only Robot Navigation in a Neural Radiance World - [ArXiv] [QA].
September 2021
- Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System - [ArXiv] [QA].
- Stochastic Training is Not Necessary for Generalization - [ArXiv] [QA].
- Stochastic Training is Not Necessary for Generalization - [ArXiv] [QA].
- IGLU: Efficient GCN Training via Lazy Updates - [ArXiv] [QA].
- IGLU: Efficient GCN Training via Lazy Updates - [ArXiv] [QA].
- OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts - [ArXiv] [QA].
- Learning Neural Templates for Recommender Dialogue System - [ArXiv] [QA].
- A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification - [ArXiv] [QA].
- Recursively Summarizing Books with Human Feedback - [ArXiv] [QA].
- Neural networks with trainable matrix activation functions - [ArXiv] [QA].
- Neural networks with trainable matrix activation functions - [ArXiv] [QA].
- PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation - [ArXiv] [QA].
- DuRecDial 2.0: A Bilingual Parallel Corpus for Conversational Recommendation - [ArXiv] [QA].
- Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes - [ArXiv] [QA].
- Scaling Laws for Neural Machine Translation - [ArXiv] [QA].
- Transferable Persona-Grounded Dialogues via Grounded Minimal Edits - [ArXiv] [QA].
- Benchmarking the Spectrum of Agent Capabilities - [ArXiv] [QA].
- Exploring Prompt-based Few-shot Learning for Grounded Dialog Generation - [ArXiv] [QA].
- Space Time Recurrent Memory Network - [ArXiv] [QA].
- Space Time Recurrent Memory Network - [ArXiv] [QA].
- Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation - [ArXiv] [QA].
- CEM: Commonsense-aware Empathetic Response Generation - [ArXiv] [QA].
- Bootstrapped Meta-Learning - [ArXiv] [QA].
- Bootstrapped Meta-Learning - [ArXiv] [QA].
- A Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue Generation - [ArXiv] [QA].
- Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems - [ArXiv] [QA].
- Local Augmentation for Graph Neural Networks - [ArXiv] [QA].
- Local Augmentation for Graph Neural Networks - [ArXiv] [QA].
- Sqrt(d) Dimension Dependence of Langevin Monte Carlo - [ArXiv] [QA].
- Sqrt(d) Dimension Dependence of Langevin Monte Carlo - [ArXiv] [QA].
- Learning Neural Causal Models with Active Interventions - [ArXiv] [QA].
- Learning Neural Causal Models with Active Interventions - [ArXiv] [QA].
- Learning to Prompt for Vision-Language Models - [ArXiv] [QA].
- Learning to Prompt for Vision-Language Models - [ArXiv] [QA].
- The fractional chromatic number of double cones over graphs - [ArXiv] [QA].
- Regional Adversarial Training for Better Robust Generalization - [ArXiv] [QA].
- Boosting Search Engines with Interactive Agents - [ArXiv] [QA].
- Boosting Search Engines with Interactive Agents - [ArXiv] [QA].
August 2021
- Subjective Learning for Open-Ended Data - [ArXiv] [QA].
- Subjective Learning for Open-Ended Data - [ArXiv] [QA].
- Dynamic processes in superconductors and the laws of thermodynamics - [ArXiv] [QA].
- Anarchic Federated Learning - [ArXiv] [QA].
- Anarchic Federated Learning - [ArXiv] [QA].
- On the Opportunities and Risks of Foundation Models - [ArXiv] [QA].
- MMChat: Multi-Modal Chat Dataset on Social Media - [ArXiv] [QA].
- FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning - [ArXiv] [QA].
- Logit Attenuating Weight Normalization - [ArXiv] [QA].
- Logit Attenuating Weight Normalization - [ArXiv] [QA].
- BIGRoC: Boosting Image Generation via a Robust Classifier - [ArXiv] [QA].
- BIGRoC: Boosting Image Generation via a Robust Classifier - [ArXiv] [QA].
- Source-Free Domain Adaptation for Image Segmentation - [ArXiv] [QA].
- Internal Video Inpainting by Implicit Long-range Propagation - [ArXiv] [QA].
- Model-Based Opponent Modeling - [ArXiv] [QA].
- Model-Based Opponent Modeling - [ArXiv] [QA].
- Offline Decentralized Multi-Agent Reinforcement Learning - [ArXiv] [QA].
- Offline Decentralized Multi-Agent Reinforcement Learning - [ArXiv] [QA].
- How to Evaluate Your Dialogue Models: A Review of Approaches - [ArXiv] [QA].
- Evaluating Deep Graph Neural Networks - [ArXiv] [QA].
- Evaluating Deep Graph Neural Networks - [ArXiv] [QA].
July 2021
- Imbalanced Adversarial Training with Reweighting - [ArXiv] [QA].
- Imbalanced Adversarial Training with Reweighting - [ArXiv] [QA].
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing - [ArXiv] [QA].
- Unsupervised Learning of Neurosymbolic Encoders - [ArXiv] [QA].
- Unsupervised Learning of Neurosymbolic Encoders - [ArXiv] [QA].
- Joint Shapley values: a measure of joint feature importance - [ArXiv] [QA].
- Joint Shapley values: a measure of joint feature importance - [ArXiv] [QA].
- Conditional GANs with Auxiliary Discriminative Classifier - [ArXiv] [QA].
- Guided Generation of Cause and Effect - [ArXiv] [QA].
- Structured Stochastic Gradient MCMC - [ArXiv] [QA].
- Structured Stochastic Gradient MCMC - [ArXiv] [QA].
- FastSHAP: Real-Time Shapley Value Estimation - [ArXiv] [QA].
- FastSHAP: Real-Time Shapley Value Estimation - [ArXiv] [QA].
- How Much Can CLIP Benefit Vision-and-Language Tasks? - [ArXiv] [QA].
- How Much Can CLIP Benefit Vision-and-Language Tasks? - [ArXiv] [QA].
- Explore and Control with Adversarial Surprise - [ArXiv] [QA].
- Explore and Control with Adversarial Surprise - [ArXiv] [QA].
- ViTGAN: Training GANs with Vision Transformers - [ArXiv] [QA].
- ViTGAN: Training GANs with Vision Transformers - [ArXiv] [QA].
- Towards Robust Active Feature Acquisition - [ArXiv] [QA].
- Towards Robust Active Feature Acquisition - [ArXiv] [QA].
- Evaluating Large Language Models Trained on Code - [ArXiv] [QA].
- Understanding Intrinsic Robustness Using Label Uncertainty - [ArXiv] [QA].
- Neural Contextual Bandits without Regret - [ArXiv] [QA].
- Neural Contextual Bandits without Regret - [ArXiv] [QA].
- Structured Denoising Diffusion Models in Discrete State-Spaces - [ArXiv] [QA].
- Depth-supervised NeRF: Fewer Views and Faster Training for Free - [ArXiv] [QA].
- Rethinking Positional Encoding - [ArXiv] [QA].
- Rethinking Positional Encoding - [ArXiv] [QA].
- When and How to Fool Explainable Models (and Humans) with Adversarial Examples - [ArXiv] [QA].
- Scale Mixtures of Neural Network Gaussian Processes - [ArXiv] [QA].
- Scale Mixtures of Neural Network Gaussian Processes - [ArXiv] [QA].
- On the Practicality of Deterministic Epistemic Uncertainty - [ArXiv] [QA].
- On the Practicality of Deterministic Epistemic Uncertainty - [ArXiv] [QA].
- Exact verification of the strong BSD conjecture for some absolutely simple abelian surfaces - [ArXiv] [QA].
June 2021
- Automatically Select Emotion for Response via Personality-affected Emotion Transition - [ArXiv] [QA].
- Local Reweighting for Adversarial Training - [ArXiv] [QA].
- Local Reweighting for Adversarial Training - [ArXiv] [QA].
- Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation - [ArXiv] [QA].
- Multimodal Few-Shot Learning with Frozen Language Models - [ArXiv] [QA].
- Animatable Neural Radiance Fields from Monocular RGB Videos - [ArXiv] [QA].
- DCoM: A Deep Column Mapper for Semantic Data Type Detection - [ArXiv] [QA].
- DCoM: A Deep Column Mapper for Semantic Data Type Detection - [ArXiv] [QA].
- IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers - [ArXiv] [QA].
- Learning Multimodal VAEs through Mutual Supervision - [ArXiv] [QA].
- Sampling with Mirrored Stein Operators - [ArXiv] [QA].
- Sampling with Mirrored Stein Operators - [ArXiv] [QA].
- Adapting Off-the-Shelf Source Segmenter for Target Medical Image Segmentation - [ArXiv] [QA].
- CharacterChat: Supporting the Creation of Fictional Characters through Conversation and Progressive Manifestation with a Chatbot - [ArXiv] [QA].
- Secure Domain Adaptation with Multiple Sources - [ArXiv] [QA].
- Secure Domain Adaptation with Multiple Sources - [ArXiv] [QA].
- Volume Rendering of Neural Implicit Surfaces - [ArXiv] [QA].
- Policy Smoothing for Provably Robust Reinforcement Learning - [ArXiv] [QA].
- Boundary Graph Neural Networks for 3D Simulations - [ArXiv] [QA].
- Boundary Graph Neural Networks for 3D Simulations - [ArXiv] [QA].
- Analytically Tractable Bayesian Deep Q-Learning - [ArXiv] [QA].
- Analytically Tractable Bayesian Deep Q-Learning - [ArXiv] [QA].
- NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction - [ArXiv] [QA].
- Shuffle Private Stochastic Convex Optimization - [ArXiv] [QA].
- Shuffle Private Stochastic Convex Optimization - [ArXiv] [QA].
- On Invariance Penalties for Risk Minimization - [ArXiv] [QA].
- On Invariance Penalties for Risk Minimization - [ArXiv] [QA].
- Visual Correspondence Hallucination - [ArXiv] [QA].
- Visual Correspondence Hallucination - [ArXiv] [QA].
- Poisoning and Backdooring Contrastive Learning - [ArXiv] [QA].
- Poisoning and Backdooring Contrastive Learning - [ArXiv] [QA].
- Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation - [ArXiv] [QA].
- Unsupervised Enrichment of Persona-grounded Dialog with Background Stories - [ArXiv] [QA].
- Query Embedding on Hyper-relational Knowledge Graphs - [ArXiv] [QA].
- Query Embedding on Hyper-relational Knowledge Graphs - [ArXiv] [QA].
- Constraining Linear-chain CRFs to Regular Languages - [ArXiv] [QA].
- Constraining Linear-chain CRFs to Regular Languages - [ArXiv] [QA].
- Pre-Trained Models: Past, Present and Future - [ArXiv] [QA].
- Inverting Adversarially Robust Networks for Image Synthesis - [ArXiv] [QA].
- Prompting Contrastive Explanations for Commonsense Reasoning Tasks - [ArXiv] [QA].
- Learning to Pool in Graph Neural Networks for Extrapolation - [ArXiv] [QA].
- Is Homophily a Necessity for Graph Neural Networks? - [ArXiv] [QA].
- Is Homophily a Necessity for Graph Neural Networks? - [ArXiv] [QA].
- Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation - [ArXiv] [QA].
- Fair Normalizing Flows - [ArXiv] [QA].
- Fair Normalizing Flows - [ArXiv] [QA].
- A Neural Tangent Kernel Perspective of GANs - [ArXiv] [QA].
- A Neural Tangent Kernel Perspective of GANs - [ArXiv] [QA].
- Do Transformers Really Perform Bad for Graph Representation? - [ArXiv] [QA].
- DIGRAC: Digraph Clustering Based on Flow Imbalance - [ArXiv] [QA].
- DIGRAC: Digraph Clustering Based on Flow Imbalance - [ArXiv] [QA].
- It Takes Two to Tango: Mixup for Deep Metric Learning - [ArXiv] [QA].
- Mean-Shifted Contrastive Loss for Anomaly Detection - [ArXiv] [QA].
- Mean-Shifted Contrastive Loss for Anomaly Detection - [ArXiv] [QA].
- RegMix: Data Mixing Augmentation for Regression - [ArXiv] [QA].
- RegMix: Data Mixing Augmentation for Regression - [ArXiv] [QA].
- Model Zoo: A Growing "Brain" That Learns Continually - [ArXiv] [QA].
- Model Zoo: A Growing "Brain" That Learns Continually - [ArXiv] [QA].
- Context-Aware Sparse Deep Coordination Graphs - [ArXiv] [QA].
- Context-Aware Sparse Deep Coordination Graphs - [ArXiv] [QA].
- Learning Curves for SGD on Structured Features - [ArXiv] [QA].
- Learning Curves for SGD on Structured Features - [ArXiv] [QA].
- Meta-Learning with Fewer Tasks through Task Interpolation - [ArXiv] [QA].
- Meta-Learning with Fewer Tasks through Task Interpolation - [ArXiv] [QA].
- Churn Reduction via Distillation - [ArXiv] [QA].
- Churn Reduction via Distillation - [ArXiv] [QA].
- Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances - [ArXiv] [QA].
- Convergent Graph Solvers - [ArXiv] [QA].
- Steerable 3D Spherical Neurons - [ArXiv] [QA].
- Steerable 3D Spherical Neurons - [ArXiv] [QA].
- Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize - [ArXiv] [QA].
- Evidential Turing Processes - [ArXiv] [QA].
- Evidential Turing Processes - [ArXiv] [QA].
- Towards Emotional Support Dialog Systems - [ArXiv] [QA].
- Transition-Based Constrained DFT for the Robust and Reliable Treatment of Excitations in Supramolecular Systems - [ArXiv] [QA].
- Multiresolution Equivariant Graph Variational Autoencoder - [ArXiv] [QA].
- Multiresolution Equivariant Graph Variational Autoencoder - [ArXiv] [QA].
- RevCore: Review-augmented Conversational Recommendation - [ArXiv] [QA].
- DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues - [ArXiv] [QA].
- DYPLOC: Dynamic Planning of Content Using Mixed Language Models for Text Generation - [ArXiv] [QA].
- Towards Quantifiable Dialogue Coherence Evaluation - [ArXiv] [QA].
- Concurrent Adversarial Learning for Large-Batch Training - [ArXiv] [QA].
- Concurrent Adversarial Learning for Large-Batch Training - [ArXiv] [QA].
- Rethinking Pseudo Labels for Semi-Supervised Object Detection - [ArXiv] [QA].
May 2021
- Efficient and Modular Implicit Differentiation - [ArXiv] [QA].
- Efficient and Modular Implicit Differentiation - [ArXiv] [QA].
- How Attentive are Graph Attention Networks? - [ArXiv] [QA].
- How Attentive are Graph Attention Networks? - [ArXiv] [QA].
- An Attention Free Transformer - [ArXiv] [QA].
- An Attention Free Transformer - [ArXiv] [QA].
- Gotta Go Fast When Generating Data with Score-Based Models - [ArXiv] [QA].
- OTTers: One-turn Topic Transitions for Open-Domain Dialogue - [ArXiv] [QA].
- Data Augmentation for Text Generation Without Any Augmented Data - [ArXiv] [QA].
- Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning - [ArXiv] [QA].
- KECRS: Towards Knowledge-Enriched Conversational Recommendation System - [ArXiv] [QA].
- RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling - [ArXiv] [QA].
- HyKnow: End-to-End Task-Oriented Dialog Modeling with Hybrid Knowledge Management - [ArXiv] [QA].
- The DEVIL is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting - [ArXiv] [QA].
- EL-Attention: Memory Efficient Lossless Attention for Generation - [ArXiv] [QA].
- Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey - [ArXiv] [QA].
- Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems - [ArXiv] [QA].
- A Survey of Data Augmentation Approaches for NLP - [ArXiv] [QA].
- PD-GAN: Probabilistic Diverse GAN for Image Inpainting - [ArXiv] [QA].
- Unsteady and inertial dynamics of an active particle in a fluid - [ArXiv] [QA].
April 2021
- If your data distribution shifts, use self-learning - [ArXiv] [QA].
- If your data distribution shifts, use self-learning - [ArXiv] [QA].
- PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation - [ArXiv] [QA].
- UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction - [ArXiv] [QA].
- Gradient Matching for Domain Generalization - [ArXiv] [QA].
- Gradient Matching for Domain Generalization - [ArXiv] [QA].
- Image Inpainting with External-internal Learning and Monochromic Bottleneck - [ArXiv] [QA].
- Explaining Answers with Entailment Trees - [ArXiv] [QA].
- $Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering - [ArXiv] [QA].
- Sparse Attention with Linear Units - [ArXiv] [QA].
- Sparse Attention with Linear Units - [ArXiv] [QA].
- Progressive Temporal Feature Alignment Network for Video Inpainting - [ArXiv] [QA].
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - [ArXiv] [QA].
- NeRF-VAE: A Geometry Aware 3D Scene Generative Model - [ArXiv] [QA].
- Improved Image Generation via Sparse Modeling - [ArXiv] [QA].
- Improved Image Generation via Sparse Modeling - [ArXiv] [QA].
- Domain Invariant Adversarial Learning - [ArXiv] [QA].
- Domain Invariant Adversarial Learning - [ArXiv] [QA].
March 2021
- CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields - [ArXiv] [QA].
- Contrastive Embedding for Generalized Zero-Shot Learning - [ArXiv] [QA].
- TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations - [ArXiv] [QA].
- Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers - [ArXiv] [QA].
- GNeRF: GAN-based Neural Radiance Field without Posed Camera - [ArXiv] [QA].
- Efficient Explanations from Empirical Explainers - [ArXiv] [QA].
- KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs - [ArXiv] [QA].
- DNN Quantization with Attention - [ArXiv] [QA].
- DNN Quantization with Attention - [ArXiv] [QA].
- Concentric Spherical GNN for 3D Representation Learning - [ArXiv] [QA].
- Concentric Spherical GNN for 3D Representation Learning - [ArXiv] [QA].
- FastNeRF: High-Fidelity Neural Rendering at 200FPS - [ArXiv] [QA].
- GLM: General Language Model Pretraining with Autoregressive Blank Infilling - [ArXiv] [QA].
- Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE - [ArXiv] [QA].
- ENCONTER: Entity Constrained Progressive Sequence Generation via Insertion-based Transformer - [ArXiv] [QA].
- Online Adversarial Attacks - [ArXiv] [QA].
- Online Adversarial Attacks - [ArXiv] [QA].
- Mixture of Volumetric Primitives for Efficient Neural Rendering - [ArXiv] [QA].
February 2021
- Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing - [ArXiv] [QA].
- Deep ReLU Networks Preserve Expected Length - [ArXiv] [QA].
- Deep ReLU Networks Preserve Expected Length - [ArXiv] [QA].
- Meta-Learning Dynamics Forecasting Using Task Inference - [ArXiv] [QA].
- Meta-Learning Dynamics Forecasting Using Task Inference - [ArXiv] [QA].
- ShaRF: Shape-conditioned Radiance Fields from a Single View - [ArXiv] [QA].
- DEUP: Direct Epistemic Uncertainty Prediction - [ArXiv] [QA].
- DEUP: Direct Epistemic Uncertainty Prediction - [ArXiv] [QA].
- Topological Graph Neural Networks - [ArXiv] [QA].
- Topological Graph Neural Networks - [ArXiv] [QA].
- Contrastive Embeddings for Neural Architectures - [ArXiv] [QA].
- Contrastive Embeddings for Neural Architectures - [ArXiv] [QA].
- Hyperspherical embedding for novel class classification - [ArXiv] [QA].
- Hyperspherical embedding for novel class classification - [ArXiv] [QA].
- Learning Graph Embeddings for Compositional Zero-shot Learning - [ArXiv] [QA].
January 2021
- RESPER: Computationally Modelling Resisting Strategies in Persuasive Conversations - [ArXiv] [QA].
- Advances and Challenges in Conversational Recommender Systems: A Survey - [ArXiv] [QA].
- Evaluating Disentanglement of Structured Representations - [ArXiv] [QA].
- Evaluating Disentanglement of Structured Representations - [ArXiv] [QA].
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity - [ArXiv] [QA].
- Max-Affine Spline Insights Into Deep Network Pruning - [ArXiv] [QA].
- Max-Affine Spline Insights Into Deep Network Pruning - [ArXiv] [QA].
2020
December 2020
- Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation - [ArXiv] [QA].
- Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human Demonstration - [ArXiv] [QA].
- ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language - [ArXiv] [QA].
- A Distributional Approach to Controlled Text Generation - [ArXiv] [QA].
- Transformer Interpretability Beyond Attention Visualization - [ArXiv] [QA].
- Neural Volume Rendering: NeRF And Beyond - [ArXiv] [QA].
- Keyword-Guided Neural Conversational Model - [ArXiv] [QA].
- CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts - [ArXiv] [QA].
- Image Inpainting Guided by Coherence Priors of Semantics and Textures - [ArXiv] [QA].
- Contrastive Learning with Adversarial Perturbations for Conditional Text Generation - [ArXiv] [QA].
- Active Learning: Problem Settings and Recent Developments - [ArXiv] [QA].
- Challenging common interpretability assumptions in feature attribution explanations - [ArXiv] [QA].
- Practical No-box Adversarial Attacks against DNNs - [ArXiv] [QA].
- Practical No-box Adversarial Attacks against DNNs - [ArXiv] [QA].
- pixelNeRF: Neural Radiance Fields from One or Few Images - [ArXiv] [QA].
- Learned Initializations for Optimizing Coordinate-Based Neural Representations - [ArXiv] [QA].
- Neural Prototype Trees for Interpretable Fine-grained Image Recognition - [ArXiv] [QA].
- CPM: A Large-scale Generative Chinese Pre-trained Language Model - [ArXiv] [QA].
November 2020
- DeRF: Decomposed Radiance Fields - [ArXiv] [QA].
- GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields - [ArXiv] [QA].
- Contextual Fusion For Adversarial Robustness - [ArXiv] [QA].
- Contextual Fusion For Adversarial Robustness - [ArXiv] [QA].
October 2020
- Learning to Actively Learn: A Robust Approach - [ArXiv] [QA].
- Learning to Actively Learn: A Robust Approach - [ArXiv] [QA].
- How Does the Task Landscape Affect MAML Performance? - [ArXiv] [QA].
- How Does the Task Landscape Affect MAML Performance? - [ArXiv] [QA].
- Interpretation of NLP models through input marginalization - [ArXiv] [QA].
- Towards falsifiable interpretability research - [ArXiv] [QA].
- CR-Walker: Tree-Structured Graph Reasoning and Dialog Acts for Conversational Recommendation - [ArXiv] [QA].
- Improving Dialog Systems for Negotiation with Personality Modeling - [ArXiv] [QA].
- NeRF++: Analyzing and Improving Neural Radiance Fields - [ArXiv] [QA].
- Fairness-aware Agnostic Federated Learning - [ArXiv] [QA].
- Fairness-aware Agnostic Federated Learning - [ArXiv] [QA].
- GRF: Learning a General Radiance Field for 3D Representation and Rendering - [ArXiv] [QA].
- Like hiking? You probably enjoy nature: Persona-grounded Dialog with Commonsense Expansions - [ArXiv] [QA].
- MIME: MIMicking Emotions for Empathetic Response Generation - [ArXiv] [QA].
September 2020
- Learning to Plan and Realize Separately for Open-Ended Dialogue Systems - [ArXiv] [QA].
- From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation - [ArXiv] [QA].
- Understanding the Role of Individual Units in a Deep Neural Network - [ArXiv] [QA].
- Sample-Efficient Automated Deep Reinforcement Learning - [ArXiv] [QA].
- Sample-Efficient Automated Deep Reinforcement Learning - [ArXiv] [QA].
- Learning to summarize from human feedback - [ArXiv] [QA].
August 2020
- A Survey of Deep Active Learning - [ArXiv] [QA].
- A Survey of Evaluation Metrics Used for NLG Systems - [ArXiv] [QA].
- A Survey of Active Learning for Text Classification using Deep Neural Networks - [ArXiv] [QA].
- Context-aware Feature Generation for Zero-shot Semantic Segmentation - [ArXiv] [QA].
- Adaptive Learning of Tensor Network Structures - [ArXiv] [QA].
- Adaptive Learning of Tensor Network Structures - [ArXiv] [QA].
- A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning - [ArXiv] [QA].
- Explainable Face Recognition - [ArXiv] [QA].
July 2020
- Learning Joint Spatial-Temporal Transformations for Video Inpainting - [ArXiv] [QA].
- Mixture Representation Learning with Coupled Autoencoders - [ArXiv] [QA].
- Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning - [ArXiv] [QA].
- Towards Deeper Graph Neural Networks - [ArXiv] [QA].
- Towards Deeper Graph Neural Networks - [ArXiv] [QA].
- DVI: Depth Guided Video Inpainting for Autonomous Driving - [ArXiv] [QA].
- Few-shot Scene-adaptive Anomaly Detection - [ArXiv] [QA].
- Few-shot Scene-adaptive Anomaly Detection - [ArXiv] [QA].
- Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations - [ArXiv] [QA].
- GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis - [ArXiv] [QA].
- The Fyodorov-Hiary-Keating Conjecture. I - [ArXiv] [QA].
- Interactive Path Reasoning on Graph for Conversational Recommendation - [ArXiv] [QA].
June 2020
- PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning - [ArXiv] [QA].
- Generative causal explanations of black-box classifiers - [ArXiv] [QA].
- Unsupervised Evaluation of Interactive Dialog with DialoGPT - [ArXiv] [QA].
- Towards Understanding Label Smoothing - [ArXiv] [QA].
- Towards Understanding Label Smoothing - [ArXiv] [QA].
- Neural Parameter Allocation Search - [ArXiv] [QA].
- Neural Parameter Allocation Search - [ArXiv] [QA].
- Augmented Sliced Wasserstein Distances - [ArXiv] [QA].
- Augmented Sliced Wasserstein Distances - [ArXiv] [QA].
- DeeperGCN: All You Need to Train Deeper GCNs - [ArXiv] [QA].
- DeeperGCN: All You Need to Train Deeper GCNs - [ArXiv] [QA].
- CoCon: A Self-Supervised Approach for Controlled Text Generation - [ArXiv] [QA].
- Situated and Interactive Multimodal Conversations - [ArXiv] [QA].
May 2020
- Language Models are Few-Shot Learners - [ArXiv] [QA].
- High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling - [ArXiv] [QA].
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - [ArXiv] [QA].
- Novel Policy Seeking with Constrained Optimization - [ArXiv] [QA].
- Novel Policy Seeking with Constrained Optimization - [ArXiv] [QA].
- Mirror Descent Policy Optimization - [ArXiv] [QA].
- Mirror Descent Policy Optimization - [ArXiv] [QA].
- Normalized Attention Without Probability Cage - [ArXiv] [QA].
- Normalized Attention Without Probability Cage - [ArXiv] [QA].
- Semantic Photo Manipulation with a Generative Image Prior - [ArXiv] [QA].
- Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation - [ArXiv] [QA].
- Learning an Unreferenced Metric for Online Dialogue Evaluation - [ArXiv] [QA].
- POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training - [ArXiv] [QA].
April 2020
- Consistent Video Depth Estimation - [ArXiv] [QA].
- Recipes for building an open-domain chatbot - [ArXiv] [QA].
- Multi-Domain Dialogue Acts and Response Co-Generation - [ArXiv] [QA].
- Federated Stochastic Gradient Langevin Dynamics - [ArXiv] [QA].
- Federated Stochastic Gradient Langevin Dynamics - [ArXiv] [QA].
- Considering Likelihood in NLP Classification Explanations with Occlusion and Language Modeling - [ArXiv] [QA].
- Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness - [ArXiv] [QA].
- TextGAIL: Generative Adversarial Imitation Learning for Text Generation - [ArXiv] [QA].
- There and Back Again: Revisiting Backpropagation Saliency Methods - [ArXiv] [QA].
- A Survey on Conversational Recommender Systems - [ArXiv] [QA].
March 2020
- Distributional Reinforcement Learning with Ensembles - [ArXiv] [QA].
- Distributional Reinforcement Learning with Ensembles - [ArXiv] [QA].
- Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification - [ArXiv] [QA].
- XPersona: Evaluating Multilingual Personalized Chatbot - [ArXiv] [QA].
- Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes - [ArXiv] [QA].
- VCNet: A Robust Approach to Blind Image Inpainting - [ArXiv] [QA].
- Building and Interpreting Deep Similarity Models - [ArXiv] [QA].
- xCos: An Explainable Cosine Metric for Face Verification Task - [ArXiv] [QA].
- Benchmarking Graph Neural Networks - [ArXiv] [QA].
- Benchmarking Graph Neural Networks - [ArXiv] [QA].
February 2020
- Estimation-Action-Reflection: Towards Deep Interaction Between Conversational and Recommender Systems - [ArXiv] [QA].
- Gradient Boosting Neural Networks: GrowNet - [ArXiv] [QA].
- Gradient Boosting Neural Networks: GrowNet - [ArXiv] [QA].
- Information Condensing Active Learning - [ArXiv] [QA].
- Information Condensing Active Learning - [ArXiv] [QA].
- Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation - [ArXiv] [QA].
January 2020
2019
December 2019
- Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering - [ArXiv] [QA].
- Image Processing Using Multi-Code GAN Prior - [ArXiv] [QA].
November 2019
- Binarized Neural Architecture Search - [ArXiv] [QA].
- Binarized Neural Architecture Search - [ArXiv] [QA].
- Region Normalization for Image Inpainting - [ArXiv] [QA].
- Automatic Text-based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings - [ArXiv] [QA].
- Generating Persona Consistent Dialogues by Exploiting Natural Language Inference - [ArXiv] [QA].
- A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data - [ArXiv] [QA].
October 2019
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer - [ArXiv] [QA].
- Understanding Deep Networks via Extremal Perturbations and Smooth Masks - [ArXiv] [QA].
- ALOHA: Artificial Learning of Human Attributes for Dialogue Agents - [ArXiv] [QA].
- A cost-effective method for improving and re-purposing large, pre-trained GANs by fine-tuning their class-embeddings - [ArXiv] [QA].
- Explaining image classifiers by removing input features using generative models - [ArXiv] [QA].
- Continual Learning in Neural Networks - [ArXiv] [QA].
- Continual Learning in Neural Networks - [ArXiv] [QA].
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models - [ArXiv] [QA].
September 2019
- Visual Explanation for Deep Metric Learning - [ArXiv] [QA].
- Improving Generative Visual Dialog by Answering Diverse Questions - [ArXiv] [QA].
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism - [ArXiv] [QA].
- An Internal Learning Approach to Video Inpainting - [ArXiv] [QA].
- Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset - [ArXiv] [QA].
- CTRL: A Conditional Transformer Language Model for Controllable Generation - [ArXiv] [QA].
- ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons - [ArXiv] [QA].
- Image Inpainting with Learnable Bidirectional Attention Maps - [ArXiv] [QA].
- Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue - [ArXiv] [QA].
August 2019
- Copy-and-Paste Networks for Deep Video Inpainting - [ArXiv] [QA].
- Onion-Peel Networks for Deep Video Completion - [ArXiv] [QA].
- Efficient Deep Neural Networks - [ArXiv] [QA].
- Efficient Deep Neural Networks - [ArXiv] [QA].
- StructureFlow: Image Inpainting via Structure-aware Appearance Flow - [ArXiv] [QA].
- Generative Image Inpainting with Submanifold Alignment - [ArXiv] [QA].
July 2019
- Benchmarking Attribution Methods with Relative Feature Importance - [ArXiv] [QA].
- Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning - [ArXiv] [QA].
- Generative Counterfactual Introspection for Explainable Deep Learning - [ArXiv] [QA].
- Learnable Gated Temporal Shift Module for Deep Video Inpainting - [ArXiv] [QA].
June 2019
- Improving performance of deep learning models with axiomatic attribution priors and expected gradients - [ArXiv] [QA].
- Factorized Mutual Information Maximization - [ArXiv] [QA].
- XRAI: Better Attributions Through Regions - [ArXiv] [QA].
- Image Synthesis with a Single (Robust) Classifier - [ArXiv] [QA].
- Zero-Shot Semantic Segmentation - [ArXiv] [QA].
- Rethinking Loss Design for Large-scale 3D Shape Retrieval - [ArXiv] [QA].
May 2019
- Align-and-Attend Network for Globally and Locally Coherent Video Inpainting - [ArXiv] [QA].
- Why do These Match? Explaining the Behavior of Image Similarity Models - [ArXiv] [QA].
- PEPSI++: Fast and Lightweight Network for Image Inpainting - [ArXiv] [QA].
- Deep Flow-Guided Video Inpainting - [ArXiv] [QA].
- Frame-Recurrent Video Inpainting by Robust Optical Flow Inference - [ArXiv] [QA].
- Deep Video Inpainting - [ArXiv] [QA].
April 2019
- Free-form Video Inpainting with 3D Gated Convolution and Temporal PatchGAN - [ArXiv] [QA].
- Deep Fusion Network for Image Completion - [ArXiv] [QA].
- Semantically Aligned Bias Reducing Zero Shot Learning - [ArXiv] [QA].
- Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting - [ArXiv] [QA].
- VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal - [ArXiv] [QA].
- On zero-shot recognition of generic objects - [ArXiv] [QA].
- Leveraging the Invariant Side of Generative Zero-Shot Learning - [ArXiv] [QA].
- Creativity Inspired Zero-Shot Learning - [ArXiv] [QA].
March 2019
- Pluralistic Image Completion - [ArXiv] [QA].
- Deep Reinforcement Learning of Volume-guided Progressive View Inpainting for 3D Point Scene Completion from a Single Depth Image - [ArXiv] [QA].
- CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog - [ArXiv] [QA].
- Stabilizing the Lottery Ticket Hypothesis - [ArXiv] [QA].
- Stabilizing the Lottery Ticket Hypothesis - [ArXiv] [QA].
- Semantic-Guided Multi-Attention Localization for Zero-Shot Learning - [ArXiv] [QA].
February 2019
- SC-FEGAN: Face Editing Generative Adversarial Network with User's Sketch and Color - [ArXiv] [QA].
- LS-Tree: Model Interpretation When the Data Are Linguistic - [ArXiv] [QA].
- Towards Automatic Concept-based Explanations - [ArXiv] [QA].
- Collaborative Sampling in Generative Adversarial Networks - [ArXiv] [QA].
January 2019
- Personalized Dialogue Generation with Diversified Traits - [ArXiv] [QA].
- Diffusion Variational Autoencoders - [ArXiv] [QA].
- Diffusion Variational Autoencoders - [ArXiv] [QA].
- Improving Sequence-to-Sequence Learning via Optimal Transport - [ArXiv] [QA].
- Foreground-aware Image Inpainting - [ArXiv] [QA].
- Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions - [ArXiv] [QA].
- Detecting Overfitting of Deep Generative Networks via Latent Recovery - [ArXiv] [QA].
- Visualizing Deep Similarity Networks - [ArXiv] [QA].
- EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning - [ArXiv] [QA].
- A Theoretical Analysis of Deep Q-Learning - [ArXiv] [QA].
- A Theoretical Analysis of Deep Q-Learning - [ArXiv] [QA].
2018
December 2018
- Adaptive Confidence Smoothing for Generalized Zero-Shot Learning - [ArXiv] [QA].
- Face Completion with Semantic Knowledge and Collaborative Adversarial Learning - [ArXiv] [QA].
- Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders - [ArXiv] [QA].
- Deep Inception Generative Network for Cognitive Image Inpainting - [ArXiv] [QA].
November 2018
- Strike (with) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects - [ArXiv] [QA].
- Coordinate-based Texture Inpainting for Pose-Guided Image Generation - [ArXiv] [QA].
- GAN Dissection: Visualizing and Understanding Generative Adversarial Networks - [ArXiv] [QA].
- Generalized Zero-Shot Recognition based on Visually Semantic Embedding - [ArXiv] [QA].
- Scalable agent alignment via reward modeling: a research direction - [ArXiv] [QA].
- On Hallucinating Context and Background Pixels from a Face Mask using Multi-scale GANs - [ArXiv] [QA].
- Reward learning from human preferences and demonstrations in Atari - [ArXiv] [QA].
- CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling - [ArXiv] [QA].
- Generative Dual Adversarial Network for Generalized Zero-shot Learning - [ArXiv] [QA].
- Blockwise Parallel Decoding for Deep Autoregressive Models - [ArXiv] [QA].
- Image Chat: Engaging Grounded Conversations - [ArXiv] [QA].
October 2018
August 2018
- AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale - [ArXiv] [QA].
- Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning - [ArXiv] [QA].
July 2018
June 2018
- A Benchmark for Interpretability Methods in Deep Neural Networks - [ArXiv] [QA].
- This Looks Like That: Deep Learning for Interpretable Image Recognition - [ArXiv] [QA].
- Video Inpainting by Jointly Learning Temporal Structure and Spatial Details - [ArXiv] [QA].
- Free-Form Image Inpainting with Gated Convolution - [ArXiv] [QA].
- A Peek Into the Hidden Layers of a Convolutional Neural Network Through a Factorization Lens - [ArXiv] [QA].
May 2018
- Rethinking Knowledge Graph Propagation for Zero-Shot Learning - [ArXiv] [QA].
- Controlling Personality-Based Stylistic Variation with Neural Natural Language Generators - [ArXiv] [QA].
- Progressive Ensemble Networks for Zero-Shot Recognition - [ArXiv] [QA].
- Unsupervised Learning of Neural Networks to Explain Neural Networks - [ArXiv] [QA].
- A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations - [ArXiv] [QA].
- SPG-Net: Segmentation Prediction and Guidance Network for Image Inpainting - [ArXiv] [QA].
April 2018
- How convolutional neural network see the world - A survey of convolutional neural network visualization methods - [ArXiv] [QA].
- FaceShop: Deep Sketch-based Face Image Editing - [ArXiv] [QA].
- Subgoal Discovery for Hierarchical Dialogue Policy Learning - [ArXiv] [QA].
- Image Inpainting for Irregular Holes Using Partial Convolutions - [ArXiv] [QA].
March 2018
- Structural inpainting - [ArXiv] [QA].
- Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs - [ArXiv] [QA].
- Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge - [ArXiv] [QA].
- Preserving Semantic Relations for Zero-Shot Learning - [ArXiv] [QA].
February 2018
- Machine Theory of Mind - [ArXiv] [QA].
- Multimodal Explanations: Justifying Decisions and Pointing to the Evidence - [ArXiv] [QA].
- Singularities in Einstein-conformally coupled Higgs cosmological models - [ArXiv] [QA].
- Interpreting CNNs via Decision Trees - [ArXiv] [QA].
January 2018
2017
December 2017
- Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation - [ArXiv] [QA].
November 2017
- Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) - [ArXiv] [QA].
- Deep Image Prior - [ArXiv] [QA].
- Distilling a Neural Network Into a Soft Decision Tree - [ArXiv] [QA].
- Contextual-based Image Inpainting: Infer, Match, and Translate - [ArXiv] [QA].
October 2017
- Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks - [ArXiv] [QA].
- Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation - [ArXiv] [QA].
- Recent Advances in Zero-shot Recognition - [ArXiv] [QA].
September 2017
- Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces - [ArXiv] [QA].
- AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline - [ArXiv] [QA].
August 2017
July 2017
June 2017
- SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability - [ArXiv] [QA].
- SmoothGrad: removing noise by adding noise - [ArXiv] [QA].
- Attention Is All You Need - [ArXiv] [QA].
- Deep reinforcement learning from human preferences - [ArXiv] [QA].
May 2017
April 2017
January 2017
2016
November 2016
- High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis - [ArXiv] [QA].
- Gaze Embeddings for Zero-Shot Image Classification - [ArXiv] [QA].
- Visual Dialog - [ArXiv] [QA].
- Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation - [ArXiv] [QA].
- Learning a Deep Embedding Model for Zero-Shot Learning - [ArXiv] [QA].
October 2016
July 2016
- Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classification - [ArXiv] [QA].
June 2016
May 2016
- An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild - [ArXiv] [QA].
April 2016
2015
2014
- Downloads last month
- 13