The dataset viewer is not available for this dataset.
Cannot get the config names for the dataset.
Error code:   ConfigNamesError
Exception:    KeyError
Message:      'lastCommit'
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/dataset/config_names.py", line 55, in compute_config_names_response
                  for config in sorted(get_dataset_config_names(path=dataset, token=hf_token))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 351, in get_dataset_config_names
                  dataset_module = dataset_module_factory(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1512, in dataset_module_factory
                  raise e1 from None
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1489, in dataset_module_factory
                  return HubDatasetModuleFactoryWithoutScript(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1047, in get_module
                  patterns = get_data_patterns(base_path, download_config=self.download_config)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/data_files.py", line 458, in get_data_patterns
                  return _get_data_files_patterns(resolver)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/data_files.py", line 249, in _get_data_files_patterns
                  data_files = pattern_resolver(pattern)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/data_files.py", line 333, in resolve_pattern
                  fs, _, _ = get_fs_token_paths(pattern, storage_options=storage_options)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 622, in get_fs_token_paths
                  paths = [f for f in sorted(fs.glob(paths)) if not fs.isdir(f)]
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 565, in glob
                  allpaths = self.find(root, maxdepth=depth, withdirs=True, detail=True, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 466, in find
                  for _, dirs, files in self.walk(path, maxdepth, detail=True, **kwargs):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 440, in walk
                  yield from self.walk(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 403, in walk
                  listing = self.ls(path, detail=True, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/huggingface_hub/hf_file_system.py", line 282, in ls
                  "last_modified": parse_datetime(tree_item["lastCommit"]["date"]),
              KeyError: 'lastCommit'

Need help to make the dataset viewer work? Open a discussion for direct support.

ArXiv QA

(TBD) Automated ArXiv question answering via large language models

Github | Homepage | Simple QA - Model Database Space


List of Papers

2023

September 2023

  • SlimPajama-DC: Understanding Data Combinations for LLM Training - [ArXiv] [QA].
  • OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch - [ArXiv] [QA].
  • Language Modeling Is Compression - [ArXiv] [QA].
  • FoleyGen: Visually-Guided Audio Generation - [ArXiv] [QA].
  • Baichuan 2: Open Large-scale Language Models - [ArXiv] [QA].
  • 360$^\circ$ Reconstruction From a Single Image Using Space Carved Outpainting - [ArXiv] [QA].
  • Stabilizing RLHF through Advantage Model and Selective Rehearsal - [ArXiv] [QA].
  • Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions - [ArXiv] [QA].
  • Multimodal Foundation Models: From Specialists to General-Purpose Assistants - [ArXiv] [QA].
  • MindAgent: Emergent Gaming Interaction - [ArXiv] [QA].
  • An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models - [ArXiv] [QA].
  • Adapting Large Language Models via Reading Comprehension - [ArXiv] [QA].
  • LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models - [ArXiv] [QA].
  • CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages - [ArXiv] [QA].
  • Augmenting text for spoken language understanding with Large Language Models - [ArXiv] [QA].
  • OWL: A Large Language Model for IT Operations - [ArXiv] [QA].
  • Contrastive Decoding Improves Reasoning in Large Language Models - [ArXiv] [QA].
  • Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT) - [ArXiv] [QA].
  • Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? - [ArXiv] [QA].
  • PDFTriage: Question Answering over Long, Structured Documents - [ArXiv] [QA].
  • S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking in the Era of LLMs - [ArXiv] [QA].
  • Stack-and-Delay: a new codebook pattern for music generation - [ArXiv] [QA].
  • Enhance audio generation controllability through representation similarity regularization - [ArXiv] [QA].
  • Sparse Autoencoders Find Highly Interpretable Features in Language Models - [ArXiv] [QA].
  • Compositional Foundation Models for Hierarchical Planning - [ArXiv] [QA].
  • Replacing softmax with ReLU in Vision Transformers - [ArXiv] [QA].
  • Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers - [ArXiv] [QA].
  • Scaling Laws for Sparsely-Connected Foundation Models - [ArXiv] [QA].
  • Cure the headache of Transformers via Collinear Constrained Attention - [ArXiv] [QA].
  • Investigating Answerability of LLMs for Long-Form Question Answering - [ArXiv] [QA].
  • LASER: LLM Agent with State-Space Exploration for Web Navigation - [ArXiv] [QA].
  • Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding - [ArXiv] [QA].
  • Retrieval-Augmented Text-to-Audio Generation - [ArXiv] [QA].
  • Leveraging Contextual Information for Effective Entity Salience Detection - [ArXiv] [QA].
  • Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models - [ArXiv] [QA].
  • A Data Source for Reasoning Embodied Agents - [ArXiv] [QA].
  • Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping - [ArXiv] [QA].
  • ALWOD: Active Learning for Weakly-Supervised Object Detection - [ArXiv] [QA].
  • Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning - [ArXiv] [QA].
  • TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting - [ArXiv] [QA].
  • Generative Image Dynamics - [ArXiv] [QA].
  • Ambiguity-Aware In-Context Learning with Large Language Models - [ArXiv] [QA].
  • Agents: An Open-source Framework for Autonomous Language Agents - [ArXiv] [QA].
  • TextBind: Multi-turn Interleaved Multimodal Instruction-following - [ArXiv] [QA].
  • OmnimatteRF: Robust Omnimatte with 3D Background Modeling - [ArXiv] [QA].
  • Efficiently Robustify Pre-trained Models - [ArXiv] [QA].
  • EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization - [ArXiv] [QA].
  • Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation? - [ArXiv] [QA].
  • Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts - [ArXiv] [QA].
  • Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance - [ArXiv] [QA].
  • AudioSR: Versatile Audio Super-resolution at Scale - [ArXiv] [QA].
  • Text-Guided Generation and Editing of Compositional 3D Avatars - [ArXiv] [QA].
  • Tree-Structured Shading Decomposition - [ArXiv] [QA].
  • SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection - [ArXiv] [QA].
  • DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models - [ArXiv] [QA].
  • MagiCapture: High-Resolution Multi-Concept Portrait Customization - [ArXiv] [QA].
  • Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit? - [ArXiv] [QA].
  • Leveraging SE(3) Equivariance for Learning 3D Geometric Shape Assembly - [ArXiv] [QA].
  • Dynamic NeRFs for Soccer Scenes - [ArXiv] [QA].
  • MPI-Flow: Learning Realistic Optical Flow with Multiplane Images - [ArXiv] [QA].
  • VLSlice: Interactive Vision-and-Language Slice Discovery - [ArXiv] [QA].
  • Generalizable Neural Fields as Partially Observed Neural Processes - [ArXiv] [QA].
  • Statistical Rejection Sampling Improves Preference Optimization - [ArXiv] [QA].
  • A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale - [ArXiv] [QA].
  • Learning Disentangled Avatars with Hybrid 3D Representations - [ArXiv] [QA].
  • LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning - [ArXiv] [QA].
  • InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation - [ArXiv] [QA].
  • Recovering from Privacy-Preserving Masking with Large Language Models - [ArXiv] [QA].
  • Modality Unifying Network for Visible-Infrared Person Re-Identification - [ArXiv] [QA].
  • Efficient Memory Management for Large Language Model Serving with PagedAttention - [ArXiv] [QA].
  • AstroLLaMA: Towards Specialized Foundation Models in Astronomy - [ArXiv] [QA].
  • Uncovering mesa-optimization algorithms in Transformers - [ArXiv] [QA].
  • Large Language Models for Compiler Optimization - [ArXiv] [QA].
  • SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors - [ArXiv] [QA].
  • PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models - [ArXiv] [QA].
  • Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips - [ArXiv] [QA].
  • Large Language Model for Science: A Study on P vs. NP - [ArXiv] [QA].
  • UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase - [ArXiv] [QA].
  • ITI-GEN: Inclusive Text-to-Image Generation - [ArXiv] [QA].
  • NExT-GPT: Any-to-Any Multimodal LLM - [ArXiv] [QA].
  • Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs - [ArXiv] [QA].
  • Textbooks Are All You Need II: phi-1.5 technical report - [ArXiv] [QA].
  • Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning - [ArXiv] [QA].
  • Class-Incremental Grouping Network for Continual Audio-Visual Learning - [ArXiv] [QA].
  • Multi3DRefer: Grounding Text Description to Multiple 3D Objects - [ArXiv] [QA].
  • Towards Viewpoint Robustness in Bird's Eye View Segmentation - [ArXiv] [QA].
  • Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color - [ArXiv] [QA].
  • 3D Implicit Transporter for Temporally Consistent Keypoint Discovery - [ArXiv] [QA].
  • Multi-view Self-supervised Disentanglement for General Image Denoising - [ArXiv] [QA].
  • Mitigating Word Bias in Zero-shot Prompt-based Classifiers - [ArXiv] [QA].
  • Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation - [ArXiv] [QA].
  • Effective Real Image Editing with Accelerated Iterative Diffusion Inversion - [ArXiv] [QA].
  • Neurons in Large Language Models: Dead, N-gram, Positional - [ArXiv] [QA].
  • Towards Real-World Burst Image Super-Resolution: Benchmark and Method - [ArXiv] [QA].
  • Towards Robust Model Watermark via Reducing Parametric Vulnerability - [ArXiv] [QA].
  • FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning - [ArXiv] [QA].
  • MADLAD-400: A Multilingual And Document-Level Large Audited Dataset - [ArXiv] [QA].
  • Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf - [ArXiv] [QA].
  • Dynamic Mesh-Aware Radiance Fields - [ArXiv] [QA].
  • When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale - [ArXiv] [QA].
  • Examining Autoexposure for Challenging Scenes - [ArXiv] [QA].
  • Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving - [ArXiv] [QA].
  • DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields - [ArXiv] [QA].
  • Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts - [ArXiv] [QA].
  • The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion - [ArXiv] [QA].
  • From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting - [ArXiv] [QA].
  • Towards Practical Capture of High-Fidelity Relightable Avatars - [ArXiv] [QA].
  • Unsupervised Object Localization with Representer Point Selection - [ArXiv] [QA].
  • Evaluation and Mitigation of Agnosia in Multimodal Large Language Models - [ArXiv] [QA].
  • CDFSL-V: Cross-Domain Few-Shot Learning for Videos - [ArXiv] [QA].
  • ImageBind-LLM: Multi-modality Instruction Tuning - [ArXiv] [QA].
  • Tracking Anything with Decoupled Video Segmentation - [ArXiv] [QA].
  • Learning Continuous Exposure Value Representations for Single-Image HDR Reconstruction - [ArXiv] [QA].
  • The Making and Breaking of Camouflage - [ArXiv] [QA].
  • ProPainter: Improving Propagation and Transformer for Video Inpainting - [ArXiv] [QA].
  • InstructDiffusion: A Generalist Modeling Interface for Vision Tasks - [ArXiv] [QA].
  • DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models - [ArXiv] [QA].
  • FLM-101B: An Open LLM and How to Train It with $100K Budget - [ArXiv] [QA].
  • Panoramas from Photons - [ArXiv] [QA].
  • SimNP: Learning Self-Similarity Priors Between Neural Points - [ArXiv] [QA].
  • Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption - [ArXiv] [QA].
  • Large-Scale Automatic Audiobook Creation - [ArXiv] [QA].
  • Enhancing Sample Utilization through Sample Adaptive Augmentation in Semi-Supervised Learning - [ArXiv] [QA].
  • Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model - [ArXiv] [QA].
  • Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation - [ArXiv] [QA].
  • Temporal Collection and Distribution for Referring Video Object Segmentation - [ArXiv] [QA].
  • SyncDreamer: Generating Multiview-consistent Images from a Single-view Image - [ArXiv] [QA].
  • Large Language Models as Optimizers - [ArXiv] [QA].
  • Distribution-Aware Prompt Tuning for Vision-Language Models - [ArXiv] [QA].
  • Robotic Table Tennis: A Case Study into a High Speed Learning System - [ArXiv] [QA].
  • Matcha-TTS: A fast TTS architecture with conditional flow matching - [ArXiv] [QA].
  • Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields - [ArXiv] [QA].
  • SLiMe: Segment Like Me - [ArXiv] [QA].
  • ResFields: Residual Neural Fields for Spatiotemporal Signals - [ArXiv] [QA].
  • MyoDex: A Generalizable Prior for Dexterous Manipulation - [ArXiv] [QA].
  • Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction - [ArXiv] [QA].
  • GPT Can Solve Mathematical Problems Without a Calculator - [ArXiv] [QA].
  • Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning - [ArXiv] [QA].
  • Physically Grounded Vision-Language Models for Robotic Manipulation - [ArXiv] [QA].
  • A skeletonization algorithm for gradient-based optimization - [ArXiv] [QA].
  • GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction - [ArXiv] [QA].
  • Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach - [ArXiv] [QA].
  • EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding - [ArXiv] [QA].
  • Doppelgangers: Learning to Disambiguate Images of Similar Structures - [ArXiv] [QA].
  • Generating Realistic Images from In-the-wild Sounds - [ArXiv] [QA].
  • Prototype-based Dataset Comparison - [ArXiv] [QA].
  • CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning - [ArXiv] [QA].
  • Multi-label affordance mapping from egocentric vision - [ArXiv] [QA].
  • Iterative Superquadric Recomposition of 3D Objects from Multiple Views - [ArXiv] [QA].
  • Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples - [ArXiv] [QA].
  • RawHDR: High Dynamic Range Image Reconstruction from a Single Raw Image - [ArXiv] [QA].
  • NICE: CVPR 2023 Challenge on Zero-shot Image Captioning - [ArXiv] [QA].
  • Empowering Low-Light Image Enhancer through Customized Learnable Priors - [ArXiv] [QA].
  • Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations - [ArXiv] [QA].
  • Are Emergent Abilities in Large Language Models just In-Context Learning? - [ArXiv] [QA].
  • Mask-Attention-Free Transformer for 3D Instance Segmentation - [ArXiv] [QA].
  • AGG-Net: Attention Guided Gated-convolutional Network for Depth Image Completion - [ArXiv] [QA].
  • Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification - [ArXiv] [QA].
  • EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity - [ArXiv] [QA].
  • SOAR: Scene-debiasing Open-set Action Recognition - [ArXiv] [QA].
  • Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning - [ArXiv] [QA].
  • LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models - [ArXiv] [QA].
  • EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment - [ArXiv] [QA].
  • Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration - [ArXiv] [QA].
  • CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection - [ArXiv] [QA].
  • Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning - [ArXiv] [QA].
  • ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models - [ArXiv] [QA].
  • eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models - [ArXiv] [QA].
  • Two-in-One Depth: Bridging the Gap Between Monocular and Binocular Self-supervised Depth Estimation - [ArXiv] [QA].
  • Domain Generalization via Balancing Training Difficulty and Model Capability - [ArXiv] [QA].
  • Few shot font generation via transferring similarity guided global style and quantization local style - [ArXiv] [QA].
  • Instability of the solitary waves for the Generalized Benjamin-Bona-Mahony Equation - [ArXiv] [QA].
  • Contrastive Feature Masking Open-Vocabulary Vision Transformer - [ArXiv] [QA].
  • Searching for a Leptophilic Z' and a 3-3-1 symmetry at CLIC - [ArXiv] [QA].
  • Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following - [ArXiv] [QA].
  • CityDreamer: Compositional Generative Model of Unbounded 3D Cities - [ArXiv] [QA].
  • Rieger, Schwabe, Suess-de Vries: The Sunny Beats of Resonance - [ArXiv] [QA].
  • VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation - [ArXiv] [QA].
  • Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior - [ArXiv] [QA].
  • RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback - [ArXiv] [QA].
  • A Massively Parallel Dynamic Programming for Approximate Rectangle Escape Problem - [ArXiv] [QA].
  • Object-Centric Multiple Object Tracking - [ArXiv] [QA].
  • Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation - [ArXiv] [QA].
  • Pseudo-magnetic fields in square lattices - [ArXiv] [QA].
  • Empirical Modeling of Variance in Medium Frequency R-Mode Time-of-Arrival Measurements - [ArXiv] [QA].

August 2023

  • Block occurrences in the binary expansion - [ArXiv] [QA].
  • YaRN: Efficient Context Window Extension of Large Language Models - [ArXiv] [QA].
  • SoDaCam: Software-defined Cameras via Single-Photon Imaging - [ArXiv] [QA].
  • FACET: Fairness in Computer Vision Evaluation Benchmark - [ArXiv] [QA].
  • PointLLM: Empowering Large Language Models to Understand Point Clouds - [ArXiv] [QA].
  • StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation - [ArXiv] [QA].
  • InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion - [ArXiv] [QA].
  • EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild - [ArXiv] [QA].
  • GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields - [ArXiv] [QA].
  • TouchStone: Evaluating Vision-Language Models by Language Models - [ArXiv] [QA].
  • The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants - [ArXiv] [QA].
  • SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation - [ArXiv] [QA].
  • Coarse-to-Fine Amodal Segmentation with Shape Prior - [ArXiv] [QA].
  • Can Programming Languages Boost Each Other via Instruction Tuning? - [ArXiv] [QA].
  • Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models - [ArXiv] [QA].
  • Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images - [ArXiv] [QA].
  • Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images - [ArXiv] [QA].
  • MVDream: Multi-view Diffusion for 3D Generation - [ArXiv] [QA].
  • PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction - [ArXiv] [QA].
  • Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models - [ArXiv] [QA].
  • Improving Lens Flare Removal with General Purpose Pipeline and Multiple Light Sources Recovery - [ArXiv] [QA].
  • BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge - [ArXiv] [QA].
  • Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff - [ArXiv] [QA].
  • Emergence of Segmentation with Minimalistic White-Box Transformers - [ArXiv] [QA].
  • Active Neural Mapping - [ArXiv] [QA].
  • Learning Vision-based Pursuit-Evasion Robot Policies - [ArXiv] [QA].
  • SAM-Med2D - [ArXiv] [QA].
  • MMVP: Motion-Matrix-based Video Prediction - [ArXiv] [QA].
  • LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models - [ArXiv] [QA].
  • Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-Spectral Image Fusion - [ArXiv] [QA].
  • RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation - [ArXiv] [QA].
  • WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model - [ArXiv] [QA].
  • LLaSM: Large Language and Speech Model - [ArXiv] [QA].
  • Reconstructing Groups of People with Hypergraph Relational Reasoning - [ArXiv] [QA].
  • Introducing Language Guidance in Prompt-based Continual Learning - [ArXiv] [QA].
  • WeatherBench 2: A benchmark for the next generation of data-driven global weather models - [ArXiv] [QA].
  • Canonical Factors for Hybrid Neural Fields - [ArXiv] [QA].
  • Shatter and Gather: Learning Referring Image Segmentation with Text Supervision - [ArXiv] [QA].
  • Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation - [ArXiv] [QA].
  • CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation - [ArXiv] [QA].
  • Evaluation and Analysis of Hallucination in Large Vision-Language Models - [ArXiv] [QA].
  • Learning to Upsample by Learning to Sample - [ArXiv] [QA].
  • Class Prior-Free Positive-Unlabeled Learning with Taylor Variational Loss for Hyperspectral Remote Sensing Imagery - [ArXiv] [QA].
  • Exploring Model Transferability through the Lens of Potential Energy - [ArXiv] [QA].
  • Pose-Free Neural Radiance Fields via Implicit Pose Regularization - [ArXiv] [QA].
  • Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models - [ArXiv] [QA].
  • Vision Grid Transformer for Document Layout Analysis - [ArXiv] [QA].
  • LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks - [ArXiv] [QA].
  • Read-only Prompt Optimization for Vision-Language Few-shot Learning - [ArXiv] [QA].
  • NSF: Neural Surface Fields for Human Modeling from Monocular Depth - [ArXiv] [QA].
  • CLNeRF: Continual Learning Meets NeRF - [ArXiv] [QA].
  • Efficient Discovery and Effective Evaluation of Visual Perceptual Similarity: A Benchmark and Beyond - [ArXiv] [QA].
  • R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras - [ArXiv] [QA].
  • S-TREK: Sequential Translation and Rotation Equivariant Keypoints for local feature extraction - [ArXiv] [QA].
  • Referring Image Segmentation Using Text Supervision - [ArXiv] [QA].
  • LAC: Latent Action Composition for Skeleton-based Action Segmentation - [ArXiv] [QA].
  • Priority-Centric Human Motion Generation in Discrete Latent Space - [ArXiv] [QA].
  • Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor - [ArXiv] [QA].
  • Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection - [ArXiv] [QA].
  • HoloFusion: Towards Photo-realistic 3D Generative Modeling - [ArXiv] [QA].
  • Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks - [ArXiv] [QA].
  • Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers - [ArXiv] [QA].
  • Semi-Supervised Learning in the Few-Shot Zero-Shot Scenario - [ArXiv] [QA].
  • MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records - [ArXiv] [QA].
  • 4D Myocardium Reconstruction with Decoupled Motion and Shape Model - [ArXiv] [QA].
  • Reconstructing Interacting Hands with Interaction Prior from Monocular Images - [ArXiv] [QA].
  • Nonrigid Object Contact Estimation With Regional Unwrapping Transformer - [ArXiv] [QA].
  • Hierarchical Contrastive Learning for Pattern-Generalizable Image Corruption Detection - [ArXiv] [QA].
  • Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation - [ArXiv] [QA].
  • Calibrating Panoramic Depth Estimation for Practical Localization and Mapping - [ArXiv] [QA].
  • LDL: Line Distance Functions for Panoramic Localization - [ArXiv] [QA].
  • Prior-guided Source-free Domain Adaptation for Human Pose Estimation - [ArXiv] [QA].
  • Late Stopping: Avoiding Confidently Learning from Mislabeled Examples - [ArXiv] [QA].
  • Beyond One-to-One: Rethinking the Referring Image Segmentation - [ArXiv] [QA].
  • Point-Query Quadtree for Crowd Counting, Localization, and More - [ArXiv] [QA].
  • ORES: Open-vocabulary Responsible Visual Synthesis - [ArXiv] [QA].
  • Generalized Lightness Adaptation with Channel Selective Normalization - [ArXiv] [QA].
  • MST-compression: Compressing and Accelerating Binary Neural Networks with Minimum Spanning Tree - [ArXiv] [QA].
  • ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning - [ArXiv] [QA].
  • Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers - [ArXiv] [QA].
  • Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models - [ArXiv] [QA].
  • Nougat: Neural Optical Understanding for Academic Documents - [ArXiv] [QA].
  • SoTaNa: The Open-Source Software Development Assistant - [ArXiv] [QA].
  • Harvard Glaucoma Detection and Progression: A Multimodal Multitask Dataset and Generalization-Reinforced Semi-Supervised Learning - [ArXiv] [QA].
  • Relighting Neural Radiance Fields with Shadow and Highlight Hints - [ArXiv] [QA].
  • Distribution-Aligned Diffusion for Human Mesh Recovery - [ArXiv] [QA].
  • ConSlide: Asynchronous Hierarchical Interaction Transformer with Breakup-Reorganize Rehearsal for Continual Whole Slide Image Analysis - [ArXiv] [QA].
  • SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation - [ArXiv] [QA].
  • Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation - [ArXiv] [QA].
  • Black-box Unsupervised Domain Adaptation with Bi-directional Atkinson-Shiffrin Memory - [ArXiv] [QA].
  • ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking - [ArXiv] [QA].
  • MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning - [ArXiv] [QA].
  • IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint Inliers and Outliers Utilization - [ArXiv] [QA].
  • Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model - [ArXiv] [QA].
  • OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models - [ArXiv] [QA].
  • MLLM-DataEngine: An Iterative Refinement Approach for MLLM - [ArXiv] [QA].
  • Preserving Modality Structure Improves Multi-Modal Learning - [ArXiv] [QA].
  • NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes - [ArXiv] [QA].
  • Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation - [ArXiv] [QA].
  • Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities - [ArXiv] [QA].
  • Dense Text-to-Image Generation with Attention Modulation - [ArXiv] [QA].
  • Motion-Guided Masking for Spatiotemporal Representation Learning - [ArXiv] [QA].
  • Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment - [ArXiv] [QA].
  • Code Llama: Open Foundation Models for Code - [ArXiv] [QA].
  • Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining? - [ArXiv] [QA].
  • On Offline Evaluation of 3D Object Detection for Autonomous Driving - [ArXiv] [QA].
  • LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition - [ArXiv] [QA].
  • VIGC: Visual Instruction Generation and Correction - [ArXiv] [QA].
  • A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions - [ArXiv] [QA].
  • Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation - [ArXiv] [QA].
  • Self-supervised Learning of Implicit Shape Representation with Dense Correspondence for Deformable Objects - [ArXiv] [QA].
  • Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation - [ArXiv] [QA].
  • Hyperbolic Audio-visual Zero-shot Learning - [ArXiv] [QA].
  • Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking - [ArXiv] [QA].
  • Masked Autoencoders are Efficient Class Incremental Learners - [ArXiv] [QA].
  • CGMI: Configurable General Multi-Agent Interaction Framework - [ArXiv] [QA].
  • With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning - [ArXiv] [QA].
  • Vision Transformer Adapters for Generalizable Multitask Learning - [ArXiv] [QA].
  • AdVerb: Visually Guided Audio Dereverberation - [ArXiv] [QA].
  • Continual Zero-Shot Learning through Semantically Guided Generative Random Walks - [ArXiv] [QA].
  • Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation - [ArXiv] [QA].
  • CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images - [ArXiv] [QA].
  • Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning - [ArXiv] [QA].
  • SG-Former: Self-guided Transformer with Evolving Token Reallocation - [ArXiv] [QA].
  • CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No - [ArXiv] [QA].
  • Sign Language Translation with Iterative Prototype - [ArXiv] [QA].
  • SILT: Shadow-aware Iterative Label Tuning for Learning to Detect Shadows from Noisy Labels - [ArXiv] [QA].
  • DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration - [ArXiv] [QA].
  • Aligning Language Models with Offline Reinforcement Learning from Human Feedback - [ArXiv] [QA].
  • Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages - [ArXiv] [QA].
  • RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D - [ArXiv] [QA].
  • From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models - [ArXiv] [QA].
  • RankMixup: Ranking-Based Mixup Training for Network Calibration - [ArXiv] [QA].
  • Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields - [ArXiv] [QA].
  • LFS-GAN: Lifelong Few-Shot Image Generation - [ArXiv] [QA].
  • ACLS: Adaptive and Conditional Label Smoothing for Network Calibration - [ArXiv] [QA].
  • Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification - [ArXiv] [QA].
  • Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack - [ArXiv] [QA].
  • SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets - [ArXiv] [QA].
  • Semi-Supervised Learning via Weight-aware Distillation under Class Distribution Mismatch - [ArXiv] [QA].
  • Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts - [ArXiv] [QA].
  • Understanding Hessian Alignment for Domain Generalization - [ArXiv] [QA].
  • Efficient Controllable Multi-Task Architectures - [ArXiv] [QA].
  • Delving into Motion-Aware Matching for Monocular 3D Object Tracking - [ArXiv] [QA].
  • StoryBench: A Multifaceted Benchmark for Continuous Story Visualization - [ArXiv] [QA].
  • SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation - [ArXiv] [QA].
  • Multi-event Video-Text Retrieval - [ArXiv] [QA].
  • TrackFlow: Multi-Object Tracking with Normalizing Flows - [ArXiv] [QA].
  • Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition - [ArXiv] [QA].
  • Learning a More Continuous Zero Level Set in Unsigned Distance Fields through Level Set Projection - [ArXiv] [QA].
  • A Survey on Large Language Model based Autonomous Agents - [ArXiv] [QA].
  • ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes - [ArXiv] [QA].
  • How Much Temporal Long-Term Context is Needed for Action Segmentation? - [ArXiv] [QA].
  • Exemplar-Free Continual Transformer with Convolutions - [ArXiv] [QA].
  • ProAgent: Building Proactive Cooperative AI with Large Language Models - [ArXiv] [QA].
  • GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training - [ArXiv] [QA].
  • CiteTracker: Correlating Image and Text for Visual Tracking - [ArXiv] [QA].
  • CNN based Cuneiform Sign Detection Learned from Annotated 3D Renderings and Mapped Photographs with Illumination Augmentation - [ArXiv] [QA].
  • HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations - [ArXiv] [QA].
  • ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts - [ArXiv] [QA].
  • LDP-Feat: Image Features with Local Differential Privacy - [ArXiv] [QA].
  • DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment - [ArXiv] [QA].
  • ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data - [ArXiv] [QA].
  • Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models - [ArXiv] [QA].
  • MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation - [ArXiv] [QA].
  • ReFit: Recurrent Fitting Network for 3D Human Recovery - [ArXiv] [QA].
  • Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation - [ArXiv] [QA].
  • Domain Generalization via Rationale Invariance - [ArXiv] [QA].
  • Efficient View Synthesis with Neural Radiance Distribution Field - [ArXiv] [QA].
  • LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video Reconstruction - [ArXiv] [QA].
  • CAME: Contrastive Automated Model Evaluation - [ArXiv] [QA].
  • Recursive Video Lane Detection - [ArXiv] [QA].
  • MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers - [ArXiv] [QA].
  • Video OWL-ViT: Temporally-consistent open-world localization in video - [ArXiv] [QA].
  • Audio-Visual Class-Incremental Learning - [ArXiv] [QA].
  • TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection - [ArXiv] [QA].
  • Neural Amortized Inference for Nested Multi-agent Reasoning - [ArXiv] [QA].
  • MetaGCD: Learning to Continually Learn in Generalized Category Discovery - [ArXiv] [QA].
  • UnLoc: A Unified Framework for Video Localization Tasks - [ArXiv] [QA].
  • Coordinate Quantized Neural Implicit Representations for Multi-view Reconstruction - [ArXiv] [QA].
  • Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images - [ArXiv] [QA].
  • Few-Shot Physically-Aware Articulated Mesh Generation via Hierarchical Deformation - [ArXiv] [QA].
  • Can Language Models Learn to Listen? - [ArXiv] [QA].
  • EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition - [ArXiv] [QA].
  • Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction - [ArXiv] [QA].
  • Improving Continuous Sign Language Recognition with Cross-Lingual Signs - [ArXiv] [QA].
  • MGMAE: Motion Guided Masking for Video Masked Autoencoding - [ArXiv] [QA].
  • Instruction Tuning for Large Language Models: A Survey - [ArXiv] [QA].
  • WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models - [ArXiv] [QA].
  • On the Adversarial Robustness of Multi-Modal Foundation Models - [ArXiv] [QA].
  • Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction - [ArXiv] [QA].
  • Learning Clothing and Pose Invariant 3D Shape Representation for Long-Term Person Re-Identification - [ArXiv] [QA].
  • A step towards understanding why classification helps regression - [ArXiv] [QA].
  • Image-free Classifier Injection for Zero-Shot Classification - [ArXiv] [QA].
  • CHORD: Category-level Hand-held Object Reconstruction via Shape Deformation - [ArXiv] [QA].
  • Self-Feedback DETR for Temporal Action Detection - [ArXiv] [QA].
  • Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations - [ArXiv] [QA].
  • QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection - [ArXiv] [QA].
  • Texture Generation on 3D Meshes with Point-UV Diffusion - [ArXiv] [QA].
  • ADNet: Lane Shape Prediction via Anchor Decomposition - [ArXiv] [QA].
  • STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning - [ArXiv] [QA].
  • Privacy-Preserving Face Recognition Using Random Frequency Components - [ArXiv] [QA].
  • Explore and Tell: Embodied Visual Captioning in 3D Environments - [ArXiv] [QA].
  • When Prompt-based Incremental Learning Does Not Meet Strong Pretraining - [ArXiv] [QA].
  • X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events - [ArXiv] [QA].
  • GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems - [ArXiv] [QA].
  • Diffusion Model as Representation Learner - [ArXiv] [QA].
  • Simple Baselines for Interactive Video Retrieval with Questions and Answers - [ArXiv] [QA].
  • Strata-NeRF : Neural Radiance Fields for Stratified Scenes - [ArXiv] [QA].
  • Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos - [ArXiv] [QA].
  • Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting - [ArXiv] [QA].
  • DVGaze: Dual-View Gaze Estimation - [ArXiv] [QA].
  • Representation Disparity-aware Distillation for 3D Object Detection - [ArXiv] [QA].
  • Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation - [ArXiv] [QA].
  • Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video - [ArXiv] [QA].
  • DomainAdaptor: A Novel Approach to Test-time Adaptation - [ArXiv] [QA].
  • DomainDrop: Suppressing Domain-Sensitive Channels for Domain Generalization - [ArXiv] [QA].
  • CharacterChat: Learning towards Conversational AI with Personalized Social Support - [ArXiv] [QA].
  • StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data - [ArXiv] [QA].
  • GeT: Generative Target Structure Debiasing for Domain Adaptation - [ArXiv] [QA].
  • ViT-Lens: Towards Omni-modal Representations - [ArXiv] [QA].
  • Neural Interactive Keypoint Detection - [ArXiv] [QA].
  • VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation - [ArXiv] [QA].
  • FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory - [ArXiv] [QA].
  • Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection - [ArXiv] [QA].
  • ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer - [ArXiv] [QA].
  • OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision - [ArXiv] [QA].
  • ExpeL: LLM Agents Are Experiential Learners - [ArXiv] [QA].
  • March in Chat: Interactive Prompting for Remote Embodied Referring Expression - [ArXiv] [QA].
  • TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective - [ArXiv] [QA].
  • 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation - [ArXiv] [QA].
  • HollowNeRF: Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation - [ArXiv] [QA].
  • Robust Mixture-of-Expert Training for Convolutional Neural Networks - [ArXiv] [QA].
  • Root Pose Decomposition Towards Generic Non-rigid 3D Reconstruction with Monocular Videos - [ArXiv] [QA].
  • Single Image Reflection Separation via Component Synergy - [ArXiv] [QA].
  • Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation - [ArXiv] [QA].
  • Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of Biases-Specific Experts - [ArXiv] [QA].
  • ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment - [ArXiv] [QA].
  • Disposable Transfer Learning for Selective Source Task Unlearning - [ArXiv] [QA].
  • Tackling Vision Language Tasks Through Learning Inner Monologues - [ArXiv] [QA].
  • Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos - [ArXiv] [QA].
  • Scene-Aware Feature Matching - [ArXiv] [QA].
  • Weakly-Supervised Action Localization by Hierarchically-structured Latent Attention Modeling - [ArXiv] [QA].
  • On the Robustness of Open-World Test-Time Training: Self-Training with Dynamic Prototype Expansion - [ArXiv] [QA].
  • Understanding Self-attention Mechanism via Dynamical System Perspective - [ArXiv] [QA].
  • BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions - [ArXiv] [QA].
  • MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition - [ArXiv] [QA].
  • VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations - [ArXiv] [QA].
  • Scalable Video Object Segmentation with Simplified Framework - [ArXiv] [QA].
  • SwinLSTM:Improving Spatiotemporal Prediction Accuracy using Swin Transformer and LSTM - [ArXiv] [QA].
  • Calibrating Uncertainty for Semi-Supervised Crowd Counting - [ArXiv] [QA].
  • Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders - [ArXiv] [QA].
  • A Theory of Topological Derivatives for Inverse Rendering of Geometry - [ArXiv] [QA].
  • How susceptible are LLMs to Logical Fallacies? - [ArXiv] [QA].
  • VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control - [ArXiv] [QA].
  • Long-range Multimodal Pretraining for Movie Understanding - [ArXiv] [QA].
  • Smoothness Similarity Regularization for Few-Shot GAN Adaptation - [ArXiv] [QA].
  • Robust Monocular Depth Estimation under Challenging Conditions - [ArXiv] [QA].
  • LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark - [ArXiv] [QA].
  • ChatHaruhi: Reviving Anime Character in Reality via Large Language Model - [ArXiv] [QA].
  • StableVideo: Text-driven Consistency-aware Diffusion Video Editing - [ArXiv] [QA].
  • WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct - [ArXiv] [QA].
  • PUMGPT: A Large Vision-Language Model for Product Understanding - [ArXiv] [QA].
  • Meta-ZSDETR: Zero-shot DETR with Meta-learning - [ArXiv] [QA].
  • Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning - [ArXiv] [QA].
  • Leveraging Intrinsic Properties for Non-Rigid Garment Alignment - [ArXiv] [QA].
  • ResQ: Residual Quantization for Video Perception - [ArXiv] [QA].
  • Vision Relation Transformer for Unbiased Scene Graph Generation - [ArXiv] [QA].
  • MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection - [ArXiv] [QA].
  • Generalizable Decision Boundaries: Dualistic Meta-Learning for Open Set Domain Generalization - [ArXiv] [QA].
  • DReg-NeRF: Deep Registration for Neural Radiance Fields - [ArXiv] [QA].
  • Label-Free Event-based Object Recognition via Joint Learning with Image Reconstruction from Events - [ArXiv] [QA].
  • Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models - [ArXiv] [QA].
  • RLIPv2: Fast Scaling of Relational Language-Image Pre-training - [ArXiv] [QA].
  • Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching - [ArXiv] [QA].
  • Audio-Visual Glance Network for Efficient Video Recognition - [ArXiv] [QA].
  • Retro-FPN: Retrospective Feature Pyramid Network for Point Cloud Semantic Segmentation - [ArXiv] [QA].
  • Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge - [ArXiv] [QA].
  • DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability - [ArXiv] [QA].
  • Human Part-wise 3D Motion Context Learning for Sign Language Recognition - [ArXiv] [QA].
  • NAPA-VQ: Neighborhood Aware Prototype Augmentation with Vector Quantization for Continual Learning - [ArXiv] [QA].
  • Self-Calibrated Cross Attention Network for Few-Shot Segmentation - [ArXiv] [QA].
  • Diverse Cotraining Makes Strong Semi-Supervised Segmentor - [ArXiv] [QA].
  • Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos - [ArXiv] [QA].
  • Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos - [ArXiv] [QA].
  • SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos - [ArXiv] [QA].
  • ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation - [ArXiv] [QA].
  • Generalized Sum Pooling for Metric Learning - [ArXiv] [QA].
  • FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning - [ArXiv] [QA].
  • The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation - [ArXiv] [QA].
  • ImGeoNet: Image-induced Geometry-aware Voxel Representation for Multi-view 3D Object Detection - [ArXiv] [QA].
  • SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning - [ArXiv] [QA].
  • Reinforced Self-Training (ReST) for Language Modeling - [ArXiv] [QA].
  • Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction - [ArXiv] [QA].
  • Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification - [ArXiv] [QA].
  • Event-Guided Procedure Planning from Instructional Videos with Text Supervision - [ArXiv] [QA].
  • Towards Semi-supervised Learning with Non-random Missing Labels - [ArXiv] [QA].
  • Spatially and Spectrally Consistent Deep Functional Maps - [ArXiv] [QA].
  • Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling - [ArXiv] [QA].
  • Fast Inference and Update of Probabilistic Density Estimation on Trajectory Prediction - [ArXiv] [QA].
  • MixBag: Bag-Level Data Augmentation for Learning from Label Proportions - [ArXiv] [QA].
  • Label Shift Adapter for Test-Time Adaptation under Covariate and Label Shifts - [ArXiv] [QA].
  • Long-Range Grouping Transformer for Multi-View 3D Reconstruction - [ArXiv] [QA].
  • V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints - [ArXiv] [QA].
  • TeCH: Text-guided Reconstruction of Lifelike Clothed Humans - [ArXiv] [QA].
  • MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions - [ArXiv] [QA].
  • Learning to Distill Global Representation for Sparse-View CT - [ArXiv] [QA].
  • ALIP: Adaptive Language-Image Pre-training with Synthetic Caption - [ArXiv] [QA].
  • Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer - [ArXiv] [QA].
  • Agglomerative Transformer for Human-Object Interaction Detection - [ArXiv] [QA].
  • Membrane Potential Batch Normalization for Spiking Neural Networks - [ArXiv] [QA].
  • Stable and Causal Inference for Discriminative Self-supervised Deep Visual Representations - [ArXiv] [QA].
  • Dual-Stream Diffusion Net for Text-to-Video Generation - [ArXiv] [QA].
  • SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes - [ArXiv] [QA].
  • MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation - [ArXiv] [QA].
  • Inherent Redundancy in Spiking Neural Networks - [ArXiv] [QA].
  • Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network - [ArXiv] [QA].
  • Unsupervised Domain Adaptive Detection with Network Stability Analysis - [ArXiv] [QA].
  • Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis - [ArXiv] [QA].
  • AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework - [ArXiv] [QA].
  • GPA-3D: Geometry-aware Prototype Alignment for Unsupervised Domain Adaptive 3D Object Detection from Point Clouds - [ArXiv] [QA].
  • OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution - [ArXiv] [QA].
  • View Consistent Purification for Accurate Cross-View Localization - [ArXiv] [QA].
  • DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory - [ArXiv] [QA].
  • Teach LLMs to Personalize -- An Approach inspired by Writing Education - [ArXiv] [QA].
  • CoDeF: Content Deformation Fields for Temporally Consistent Video Processing - [ArXiv] [QA].
  • RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models - [ArXiv] [QA].
  • Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification - [ArXiv] [QA].
  • Helping Hands: An Object-Aware Ego-Centric Video Recognition Model - [ArXiv] [QA].
  • Relightable and Animatable Neural Avatar from Sparse-View Video - [ArXiv] [QA].
  • Memory-and-Anticipation Transformer for Online Action Understanding - [ArXiv] [QA].
  • Link-Context Learning for Multimodal LLMs - [ArXiv] [QA].
  • ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces - [ArXiv] [QA].
  • StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models - [ArXiv] [QA].
  • ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition - [ArXiv] [QA].
  • Learning to Identify Critical States for Reinforcement Learning from Videos - [ArXiv] [QA].
  • DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding - [ArXiv] [QA].
  • Identity-Consistent Aggregation for Video Object Detection - [ArXiv] [QA].
  • UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation - [ArXiv] [QA].
  • DiffGuard: Semantic Mismatch-Guided Out-of-Distribution Detection using Pre-trained Diffusion Models - [ArXiv] [QA].
  • Boosting Multi-modal Model Performance with Adaptive Gradient Modulation - [ArXiv] [QA].
  • Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval - [ArXiv] [QA].
  • Backpropagation Path Search On Adversarial Transferability - [ArXiv] [QA].
  • Story Visualization by Online Text Augmentation with Context Memory - [ArXiv] [QA].
  • 3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack - [ArXiv] [QA].
  • DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation - [ArXiv] [QA].
  • Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering - [ArXiv] [QA].
  • Text Injection for Capitalization and Turn-Taking Prediction in Speech Models - [ArXiv] [QA].
  • PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects - [ArXiv] [QA].
  • Platypus: Quick, Cheap, and Powerful Refinement of LLMs - [ArXiv] [QA].
  • Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation - [ArXiv] [QA].
  • Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation - [ArXiv] [QA].
  • The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation - [ArXiv] [QA].
  • RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs - [ArXiv] [QA].
  • Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning - [ArXiv] [QA].
  • ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate - [ArXiv] [QA].
  • OctoPack: Instruction Tuning Code Large Language Models - [ArXiv] [QA].
  • CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation - [ArXiv] [QA].
  • Masked Motion Predictors are Strong 3D Action Representation Learners - [ArXiv] [QA].
  • S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields - [ArXiv] [QA].
  • ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion - [ArXiv] [QA].
  • Global Features are All You Need for Image Retrieval and Reranking - [ArXiv] [QA].
  • Knowing Where to Focus: Event-aware Transformer for Video Grounding - [ArXiv] [QA].
  • CBA: Improving Online Continual Learning via Continual Bias Adaptor - [ArXiv] [QA].
  • CausalLM is not optimal for in-context learning - [ArXiv] [QA].
  • Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking - [ArXiv] [QA].
  • Towards Open-Set Test-Time Adaptation Utilizing the Wisdom of Crowds in Entropy Minimization - [ArXiv] [QA].
  • SpeechX: Neural Codec Language Model as a Versatile Speech Transformer - [ArXiv] [QA].
  • RMP-Loss: Regularizing Membrane Potential Distribution for Spiking Neural Networks - [ArXiv] [QA].
  • Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning - [ArXiv] [QA].
  • Unsupervised Image Denoising in Real-World Scenarios via Self-Collaboration Parallel Generative Adversarial Branches - [ArXiv] [QA].
  • Dual Meta-Learning with Longitudinally Generalized Regularization for One-Shot Brain Tissue Segmentation Across the Human Lifespan - [ArXiv] [QA].
  • AerialVLN: Vision-and-Language Navigation for UAVs - [ArXiv] [QA].
  • IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models - [ArXiv] [QA].
  • Compositional Feature Augmentation for Unbiased Scene Graph Generation - [ArXiv] [QA].
  • Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation - [ArXiv] [QA].
  • Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training - [ArXiv] [QA].
  • 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking - [ArXiv] [QA].
  • VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use - [ArXiv] [QA].
  • Cyclic Test-Time Adaptation on Monocular Video for 3D Human Mesh Reconstruction - [ArXiv] [QA].
  • Revisiting Vision Transformer from the View of Path Ensemble - [ArXiv] [QA].
  • SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning - [ArXiv] [QA].
  • BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation - [ArXiv] [QA].
  • One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training - [ArXiv] [QA].
  • Tiny and Efficient Model for the Edge Detection Generalization - [ArXiv] [QA].
  • Multi-Label Knowledge Distillation - [ArXiv] [QA].
  • Detecting and Preventing Hallucinations in Large Vision Language Models - [ArXiv] [QA].
  • U-RED: Unsupervised 3D Shape Retrieval and Deformation for Partial Point Clouds - [ArXiv] [QA].
  • Enhancing Network Management Using Code Generated by Large Language Models - [ArXiv] [QA].
  • Self-Alignment with Instruction Backtranslation - [ArXiv] [QA].
  • FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods - [ArXiv] [QA].
  • Improving Joint Speech-Text Representations Without Alignment - [ArXiv] [QA].
  • Composable Function-preserving Expansions for Transformer Architectures - [ArXiv] [QA].
  • BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents - [ArXiv] [QA].
  • PIPPA: A Partially Synthetic Conversational Dataset - [ArXiv] [QA].
  • PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs - [ArXiv] [QA].
  • Follow Anything: Open-set detection, tracking, and following in real-time - [ArXiv] [QA].
  • AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining - [ArXiv] [QA].
  • FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models - [ArXiv] [QA].
  • PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers - [ArXiv] [QA].
  • 2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds - [ArXiv] [QA].
  • Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative Convolution Network - [ArXiv] [QA].
  • Cross-Domain Product Representation Learning for Rich-Content E-Commerce - [ArXiv] [QA].
  • Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation - [ArXiv] [QA].
  • LLM As DBA - [ArXiv] [QA].
  • Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation - [ArXiv] [QA].
  • Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation - [ArXiv] [QA].
  • SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated, Noisy, and Decimated Point Cloud Data - [ArXiv] [QA].
  • Learning Gabor Texture Features for Fine-Grained Recognition - [ArXiv] [QA].
  • Enhancing Trust in LLM-Based AI Automation Agents: New Considerations and Future Challenges - [ArXiv] [QA].
  • Interaction-aware Joint Attention Estimation Using People Attributes - [ArXiv] [QA].
  • Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment - [ArXiv] [QA].
  • Flexible Isosurface Extraction for Gradient-Based Mesh Optimization - [ArXiv] [QA].
  • Pseudo-label Alignment for Semi-supervised Instance Segmentation - [ArXiv] [QA].
  • OpenProteinSet: Training data for structural biology at scale - [ArXiv] [QA].
  • RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End Robust Estimation - [ArXiv] [QA].
  • Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI - [ArXiv] [QA].
  • LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation - [ArXiv] [QA].
  • Feature Modulation Transformer: Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution - [ArXiv] [QA].
  • Robust Object Modeling for Visual Tracking - [ArXiv] [QA].
  • IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-Conditioned Diffusion Models - [ArXiv] [QA].
  • Foreground Object Search by Distilling Composite Image Feature - [ArXiv] [QA].
  • Prototypical Kernel Learning and Open-set Foreground Perception for Generalized Few-shot Semantic Segmentation - [ArXiv] [QA].
  • SelectNAdapt: Support Set Selection for Few-Shot Domain Adaptation - [ArXiv] [QA].
  • WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields - [ArXiv] [QA].
  • PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration - [ArXiv] [QA].
  • Objects do not disappear: Video object detection by single-frame object location anticipation - [ArXiv] [QA].
  • Bird's-Eye-View Scene Graph for Vision-Language Navigation - [ArXiv] [QA].
  • JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models - [ArXiv] [QA].
  • GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization - [ArXiv] [QA].
  • Score Priors Guided Deep Variational Inference for Unsupervised Real-World Single Image Denoising - [ArXiv] [QA].
  • Accelerating LLM Inference with Staged Speculative Decoding - [ArXiv] [QA].
  • Rendering Humans from Object-Occluded Monocular Videos - [ArXiv] [QA].
  • Shepherd: A Critic for Language Model Generation - [ArXiv] [QA].
  • LATR: 3D Lane Detection from Monocular Images with Transformer - [ArXiv] [QA].
  • FocalFormer3D : Focusing on Hard Instance for 3D Object Detection - [ArXiv] [QA].
  • Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation - [ArXiv] [QA].
  • DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds - [ArXiv] [QA].
  • 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment - [ArXiv] [QA].
  • Exploring Transformers for Open-world Instance Segmentation - [ArXiv] [QA].
  • D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation - [ArXiv] [QA].
  • Under-Display Camera Image Restoration with Scattering Effect - [ArXiv] [QA].
  • Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions - [ArXiv] [QA].
  • OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation - [ArXiv] [QA].
  • 3D Gaussian Splatting for Real-Time Radiance Field Rendering - [ArXiv] [QA].
  • Gentopia: A Collaborative Platform for Tool-Augmented LLMs - [ArXiv] [QA].
  • AgentSims: An Open-Source Sandbox for Large Language Model Evaluation - [ArXiv] [QA].
  • Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning - [ArXiv] [QA].
  • Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval - [ArXiv] [QA].
  • PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection - [ArXiv] [QA].
  • TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models - [ArXiv] [QA].
  • From Sky to the Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal - [ArXiv] [QA].
  • 3D Motion Magnification: Visualizing Subtle Motions with Time Varying Radiance Fields - [ArXiv] [QA].
  • Tiny LVLM-eHub: Early Multimodal Experiments with Bard - [ArXiv] [QA].
  • AgentBench: Evaluating LLMs as Agents - [ArXiv] [QA].
  • Learning Concise and Descriptive Attributes for Visual Recognition - [ArXiv] [QA].
  • FeatEnHancer: Enhancing Hierarchical Features for Object Detection and Beyond Under Low-Light Vision - [ArXiv] [QA].
  • Lighting Every Darkness in Two Pairs: A Calibration-Free Pipeline for RAW Denoising - [ArXiv] [QA].
  • GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images - [ArXiv] [QA].
  • Heterogeneous Forgetting Compensation for Class-Incremental Learning - [ArXiv] [QA].
  • Dual Aggregation Transformer for Image Super-Resolution - [ArXiv] [QA].
  • Foundation Model based Open Vocabulary Task Planning and Executive System for General Purpose Service Robots - [ArXiv] [QA].
  • SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs - [ArXiv] [QA].
  • Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation - [ArXiv] [QA].
  • A Benchmark for Chinese-English Scene Text Image Super-resolution - [ArXiv] [QA].
  • Source-free Domain Adaptive Human Pose Estimation - [ArXiv] [QA].
  • Prototypes-oriented Transductive Few-shot Learning with Conditional Transport - [ArXiv] [QA].
  • Learning Fine-Grained Features for Pixel-wise Video Correspondences - [ArXiv] [QA].
  • Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection - [ArXiv] [QA].
  • An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial Transferability - [ArXiv] [QA].
  • Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation - [ArXiv] [QA].
  • Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis - [ArXiv] [QA].
  • EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education - [ArXiv] [QA].
  • DeDrift: Robust Similarity Search under Content Drift - [ArXiv] [QA].
  • MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities - [ArXiv] [QA].
  • Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization - [ArXiv] [QA].
  • The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World - [ArXiv] [QA].
  • DETR Doesn't Need Multi-Scale or Locality Design - [ArXiv] [QA].
  • Scaling Relationship on Learning Mathematical Reasoning with Large Language Models - [ArXiv] [QA].
  • RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension - [ArXiv] [QA].
  • Point2Mask: Point-supervised Panoptic Segmentation via Optimal Transport - [ArXiv] [QA].
  • Ambient Adventures: Teaching ChatGPT on Developing Complex Stories - [ArXiv] [QA].
  • LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment - [ArXiv] [QA].
  • InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent - [ArXiv] [QA].
  • Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class Representation - [ArXiv] [QA].
  • MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies - [ArXiv] [QA].
  • Multimodal Neurons in Pretrained Text-Only Transformers - [ArXiv] [QA].
  • TDMD: A Database for Dynamic Color Mesh Subjective and Objective Quality Explorations - [ArXiv] [QA].
  • Target-point Attention Transformer: A novel trajectory predict network for end-to-end autonomous driving - [ArXiv] [QA].
  • Efficient neural supersampling on a novel gaming dataset - [ArXiv] [QA].
  • HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions - [ArXiv] [QA].
  • On $κ$-solutions and canonical neighborhoods in 4d Ricci flow - [ArXiv] [QA].
  • OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models - [ArXiv] [QA].
  • DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales - [ArXiv] [QA].
  • Computational Long Exposure Mobile Photography - [ArXiv] [QA].
  • More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes - [ArXiv] [QA].
  • Revisiting DETR Pre-training for Object Detection - [ArXiv] [QA].
  • A Hyper-pixel-wise Contrastive Learning Augmented Segmentation Network for Old Landslide Detection Using High-Resolution Remote Sensing Images and Digital Elevation Model Data - [ArXiv] [QA].
  • LSF-IDM: Automotive Intrusion Detection Model with Lightweight Attribution and Semantic Fusion - [ArXiv] [QA].
  • Geometric wakes in collimators and step transitions of arbitrary cross-sections: conformal mapping approach - [ArXiv] [QA].
  • One Tree to Rule Them All: Poly-Logarithmic Universal Steiner Tree - [ArXiv] [QA].
  • Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation - [ArXiv] [QA].
  • Three-level Dicke quantum battery - [ArXiv] [QA].
  • Multiobjective Optimization of Non-Smooth PDE-Constrained Problems - [ArXiv] [QA].
  • Black hole thermodynamics in Horndeski theories - [ArXiv] [QA].
  • MammoDG: Generalisable Deep Learning Breaks the Limits of Cross-Domain Multi-Center Breast Cancer Screening - [ArXiv] [QA].
  • Stability Analysis for a Class of Heterogeneous Catalysis Models - [ArXiv] [QA].
  • An improved infrastructure for the IceCube realtime system - [ArXiv] [QA].
  • Model-agnostic search for the quasinormal modes of gravitational wave echoes - [ArXiv] [QA].
  • Enhancing Representation Learning for Periodic Time Series with Floss: A Frequency Domain Regularization Approach - [ArXiv] [QA].
  • From Sparse to Soft Mixtures of Experts - [ArXiv] [QA].
  • Cosmological Distance Measurement of 12 Nearby Supernovae IIP with ROTSE-IIIB - [ArXiv] [QA].
  • ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation - [ArXiv] [QA].
  • VLUCI: Variational Learning of Unobserved Confounders for Counterfactual Inference - [ArXiv] [QA].
  • Weak localization in radiative transfer of acoustic waves in a randomly-fluctuating slab - [ArXiv] [QA].
  • Optimal design of plane elastic membranes using the convexified Föppl's model - [ArXiv] [QA].
  • Body Knowledge and Uncertainty Modeling for Monocular 3D Human Body Reconstruction - [ArXiv] [QA].
  • LISA: Reasoning Segmentation via Large Language Model - [ArXiv] [QA].
  • Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models - [ArXiv] [QA].
  • Note: Stokes-Einstein relation without hydrodynamic diameter in the TIP4P/Ice water model - [ArXiv] [QA].
  • ELFNet: Evidential Local-global Fusion for Stereo Matching - [ArXiv] [QA].
  • Detecting Cloud Presence in Satellite Images Using the RGB-based CLIP Vision-Language Model - [ArXiv] [QA].
  • Understanding URDF: A Dataset and Analysis - [ArXiv] [QA].
  • Stochastic Geometry Based Modeling and Analysis on Network NOMA in Downlink CoMP Systems - [ArXiv] [QA].
  • A many-sorted epistemic logic for chromatic hypergraphs - [ArXiv] [QA].
  • SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning - [ArXiv] [QA].
  • DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving - [ArXiv] [QA].
  • Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning - [ArXiv] [QA].
  • Deep Image Harmonization with Learnable Augmentation - [ArXiv] [QA].
  • Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation - [ArXiv] [QA].
  • MetaGPT: Meta Programming for Multi-Agent Collaborative Framework - [ArXiv] [QA].
  • Artifact: Measuring and Mitigating Gaps in Structural Testing - [ArXiv] [QA].
  • Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models - [ArXiv] [QA].
  • Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models - [ArXiv] [QA].
  • Online Prototype Learning for Online Continual Learning - [ArXiv] [QA].
  • CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering - [ArXiv] [QA].
  • Improving Pixel-based MIM by Reducing Wasted Modeling Capability - [ArXiv] [QA].
  • GOALS-JWST: Gas Dynamics and Excitation in NGC7469 revealed by NIRSpec - [ArXiv] [QA].

July 2023

  • Predicting masked tokens in stochastic locations improves masked image modeling - [ArXiv] [QA].
  • Learning to Model the World with Language - [ArXiv] [QA].
  • Discovering Adaptable Symbolic Algorithms from Scratch - [ArXiv] [QA].
  • Virtual Prompt Injection for Instruction-Tuned Large Language Models - [ArXiv] [QA].
  • Shortcut Partitions in Minor-Free Graphs: Steiner Point Removal, Distance Oracles, Tree Covers, and More - [ArXiv] [QA].
  • Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy - [ArXiv] [QA].
  • Random Sub-Samples Generation for Self-Supervised Real Image Denoising - [ArXiv] [QA].
  • ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs - [ArXiv] [QA].
  • UniVTG: Towards Unified Video-Language Temporal Grounding - [ArXiv] [QA].
  • DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation - [ArXiv] [QA].
  • Guiding Image Captioning Models Toward More Specific Captions - [ArXiv] [QA].
  • CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification - [ArXiv] [QA].
  • Transferable Decoding with Visual Entities for Zero-Shot Image Captioning - [ArXiv] [QA].
  • Towards General Low-Light Raw Noise Synthesis and Modeling - [ArXiv] [QA].
  • MovieChat: From Dense Token to Sparse Memory for Long Video Understanding - [ArXiv] [QA].
  • DRAW: Defending Camera-shooted RAW against Image Manipulation - [ArXiv] [QA].
  • DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization - [ArXiv] [QA].
  • Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks - [ArXiv] [QA].
  • JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery - [ArXiv] [QA].
  • LP-MusicCaps: LLM-Based Pseudo Music Captioning - [ArXiv] [QA].
  • AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? - [ArXiv] [QA].
  • Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples - [ArXiv] [QA].
  • Evaluating ChatGPT and GPT-4 for Visual Programming - [ArXiv] [QA].
  • Unified Model for Image, Video, Audio and Language Tasks - [ArXiv] [QA].
  • Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models - [ArXiv] [QA].
  • SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension - [ArXiv] [QA].
  • XMem++: Production-level Video Segmentation From Few Annotated Frames - [ArXiv] [QA].
  • CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation - [ArXiv] [QA].
  • What can Discriminator do? Towards Box-free Ownership Verification of Generative Adversarial Network - [ArXiv] [QA].
  • RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control - [ArXiv] [QA].
  • The Hydra Effect: Emergent Self-repair in Language Model Computations - [ArXiv] [QA].
  • MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking - [ArXiv] [QA].
  • Scaling Data Generation in Vision-and-Language Navigation - [ArXiv] [QA].
  • Robust Distortion-free Watermarks for Language Models - [ArXiv] [QA].
  • Exploring Format Consistency for Instruction Tuning - [ArXiv] [QA].
  • Uncertainty-aware Unsupervised Multi-Object Tracking - [ArXiv] [QA].
  • Supervised Homography Learning with Realistic Dataset Generation - [ArXiv] [QA].
  • Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding - [ArXiv] [QA].
  • Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF - [ArXiv] [QA].
  • TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts - [ArXiv] [QA].
  • Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification - [ArXiv] [QA].
  • Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback - [ArXiv] [QA].
  • PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization - [ArXiv] [QA].
  • Med-Flamingo: a Multimodal Medical Few-shot Learner - [ArXiv] [QA].
  • Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields - [ArXiv] [QA].
  • To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation - [ArXiv] [QA].
  • Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation - [ArXiv] [QA].
  • Learning Depth Estimation for Transparent and Mirror Surfaces - [ArXiv] [QA].
  • Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models - [ArXiv] [QA].
  • TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis - [ArXiv] [QA].
  • Diverse Inpainting and Editing with GAN Inversion - [ArXiv] [QA].
  • How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges - [ArXiv] [QA].
  • Scaling TransNormer to 175 Billion Parameters - [ArXiv] [QA].
  • S$^3$: Social-network Simulation System with Large Language Model-Empowered Agents - [ArXiv] [QA].
  • Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models - [ArXiv] [QA].
  • PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback - [ArXiv] [QA].
  • Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning - [ArXiv] [QA].
  • Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining - [ArXiv] [QA].
  • Test Time Adaptation for Blind Image Quality Assessment - [ArXiv] [QA].
  • P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds - [ArXiv] [QA].
  • Pre-training Vision Transformers with Very Limited Synthesized Images - [ArXiv] [QA].
  • Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation - [ArXiv] [QA].
  • 360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking - [ArXiv] [QA].
  • NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection - [ArXiv] [QA].
  • TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation - [ArXiv] [QA].
  • Clustering based Point Cloud Representation Learning for 3D Analysis - [ArXiv] [QA].
  • Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition - [ArXiv] [QA].
  • MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation - [ArXiv] [QA].
  • Three Bricks to Consolidate Watermarks for Large Language Models - [ArXiv] [QA].
  • MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation - [ArXiv] [QA].
  • WavJourney: Compositional Audio Creation with Large Language Models - [ArXiv] [QA].
  • Towards Generalist Biomedical AI - [ArXiv] [QA].
  • G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory - [ArXiv] [QA].
  • Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences - [ArXiv] [QA].
  • ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation - [ArXiv] [QA].
  • Creative Birds: Self-Supervised Single-View 3D Style Transfer - [ArXiv] [QA].
  • Leveraging Implicit Feedback from Deployment Data in Dialogue - [ArXiv] [QA].
  • Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching - [ArXiv] [QA].
  • Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models - [ArXiv] [QA].
  • 3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability - [ArXiv] [QA].
  • Controllable Guide-Space for Generalizable Face Forgery Detection - [ArXiv] [QA].
  • Adaptive Frequency Filters As Efficient Global Token Mixers - [ArXiv] [QA].
  • Tracking Anything in High Quality - [ArXiv] [QA].
  • AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception - [ArXiv] [QA].
  • Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception - [ArXiv] [QA].
  • trajdata: A Unified Interface to Multiple Human Trajectory Datasets - [ArXiv] [QA].
  • Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation - [ArXiv] [QA].
  • WebArena: A Realistic Web Environment for Building Autonomous Agents - [ArXiv] [QA].
  • How to Scale Your EMA - [ArXiv] [QA].
  • PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single View - [ArXiv] [QA].
  • Composite Diffusion | whole >= Σparts - [ArXiv] [QA].
  • ARB: Advanced Reasoning Benchmark for Large Language Models - [ArXiv] [QA].
  • RecursiveDet: End-to-End Region-based Recursive Object Detection - [ArXiv] [QA].
  • Spectrum-guided Multi-granularity Referring Video Object Segmentation - [ArXiv] [QA].
  • Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection - [ArXiv] [QA].
  • FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios - [ArXiv] [QA].
  • Weakly-supervised 3D Pose Transfer with Keypoints - [ArXiv] [QA].
  • Predicting Code Coverage without Execution - [ArXiv] [QA].
  • Unmasking Anomalies in Road-Scene Segmentation - [ArXiv] [QA].
  • LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition - [ArXiv] [QA].
  • Conditional Cross Attention Network for Multi-Space Embedding without Entanglement in Only a SINGLE Network - [ArXiv] [QA].
  • GaPro: Box-Supervised 3D Point Cloud Instance Segmentation Using Gaussian Processes as Pseudo Labelers - [ArXiv] [QA].
  • Strivec: Sparse Tri-Vector Radiance Fields - [ArXiv] [QA].
  • GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping - [ArXiv] [QA].
  • Contrastive Example-Based Control - [ArXiv] [QA].
  • LLM-Rec: Personalized Recommendation via Prompting Large Language Models - [ArXiv] [QA].
  • 3D-LLM: Injecting the 3D World into Large Language Models - [ArXiv] [QA].
  • Evaluating the Ripple Effects of Knowledge Editing in Language Models - [ArXiv] [QA].
  • RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment - [ArXiv] [QA].
  • GridMM: Grid Memory Map for Vision-and-Language Navigation - [ArXiv] [QA].
  • A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis - [ArXiv] [QA].
  • Multiscale Video Pretraining for Long-Term Activity Forecasting - [ArXiv] [QA].
  • Fast Full-frame Video Stabilization with Iterative Optimization - [ArXiv] [QA].
  • COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts - [ArXiv] [QA].
  • Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction - [ArXiv] [QA].
  • MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features - [ArXiv] [QA].
  • PG-RCNN: Semantic Surface Point Generation for 3D Object Detection - [ArXiv] [QA].
  • CTVIS: Consistent Training for Online Video Instance Segmentation - [ArXiv] [QA].
  • Less is More: Focus Attention for Efficient DETR - [ArXiv] [QA].
  • PRIOR: Prototype Representation Joint Learning from Medical Images and Reports - [ArXiv] [QA].
  • A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation - [ArXiv] [QA].
  • Interpolating between Images with Diffusion Models - [ArXiv] [QA].
  • PUMA: Secure Inference of LLaMA-7B in Five Minutes - [ArXiv] [QA].
  • TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition - [ArXiv] [QA].
  • Rethinking Data Distillation: Do Not Overlook Calibration - [ArXiv] [QA].
  • ProtoFL: Unsupervised Federated Learning via Prototypical Distillation - [ArXiv] [QA].
  • Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection - [ArXiv] [QA].
  • TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering - [ArXiv] [QA].
  • Downstream-agnostic Adversarial Examples - [ArXiv] [QA].
  • LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference - [ArXiv] [QA].
  • LIST: Learning Implicitly from Spatial Transformers for Single-View 3D Reconstruction - [ArXiv] [QA].
  • Optimized Network Architectures for Large Language Model Training with Billions of Parameters - [ArXiv] [QA].
  • Hallucination Improves the Performance of Unsupervised Visual Representation Learning - [ArXiv] [QA].
  • Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes - [ArXiv] [QA].
  • Discovering Spatio-Temporal Rationales for Video Question Answering - [ArXiv] [QA].
  • On the Effectiveness of Spectral Discriminators for Perceptual Quality Improvement - [ArXiv] [QA].
  • Learning Vision-and-Language Navigation from YouTube Videos - [ArXiv] [QA].
  • Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels? - [ArXiv] [QA].
  • CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots - [ArXiv] [QA].
  • HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness - [ArXiv] [QA].
  • Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts - [ArXiv] [QA].
  • OxfordTVG-HIC: Can Machine Make Humorous Captions from Images? - [ArXiv] [QA].
  • Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation - [ArXiv] [QA].
  • CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields - [ArXiv] [QA].
  • Prompting Large Language Models with Speech Recognition Abilities - [ArXiv] [QA].
  • FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields - [ArXiv] [QA].
  • Deep Directly-Trained Spiking Neural Networks for Object Detection - [ArXiv] [QA].
  • Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning - [ArXiv] [QA].
  • CLR: Channel-wise Lightweight Reprogramming for Continual Learning - [ArXiv] [QA].
  • Tuning Pre-trained Model via Moment Probing - [ArXiv] [QA].
  • Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields - [ArXiv] [QA].
  • DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport - [ArXiv] [QA].
  • MAS: Towards Resource-Efficient Federated Multiple-Task Learning - [ArXiv] [QA].
  • Brain2Music: Reconstructing Music from Human Brain Activity - [ArXiv] [QA].
  • AlignDet: Aligning Pre-training and Fine-tuning in Object Detection - [ArXiv] [QA].
  • Cascade-DETR: Delving into High-Quality Universal Object Detection - [ArXiv] [QA].
  • General Image-to-Image Translation with One-Shot Image Guidance - [ArXiv] [QA].
  • Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image - [ArXiv] [QA].
  • Improving Online Lane Graph Extraction by Object-Lane Clustering - [ArXiv] [QA].
  • Proxy Anchor-based Unsupervised Learning for Continuous Generalized Category Discovery - [ArXiv] [QA].
  • PASTA: Pretrained Action-State Transformer Agents - [ArXiv] [QA].
  • FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets - [ArXiv] [QA].
  • Diffusion Sampling with Momentum for Mitigating Divergence Artifacts - [ArXiv] [QA].
  • The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning - [ArXiv] [QA].
  • BlendFace: Re-designing Identity Encoders for Face-Swapping - [ArXiv] [QA].
  • BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion - [ArXiv] [QA].
  • Meta-Transformer: A Unified Framework for Multimodal Learning - [ArXiv] [QA].
  • HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces - [ArXiv] [QA].
  • See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data - [ArXiv] [QA].
  • Urban Radiance Field Representation with Deformable Neural Mesh Primitives - [ArXiv] [QA].
  • Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV - [ArXiv] [QA].
  • Lighting up NeRF via Unsupervised Decomposition and Enhancement - [ArXiv] [QA].
  • SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models - [ArXiv] [QA].
  • Physics-Driven Turbulence Image Restoration with Stochastic Refinement - [ArXiv] [QA].
  • Flatness-Aware Minimization for Domain Generalization - [ArXiv] [QA].
  • Instruction-following Evaluation through Verbalizer Manipulation - [ArXiv] [QA].
  • EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization - [ArXiv] [QA].
  • TokenFlow: Consistent Diffusion Features for Consistent Video Editing - [ArXiv] [QA].
  • DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering - [ArXiv] [QA].
  • DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI - [ArXiv] [QA].
  • Challenges and Applications of Large Language Models - [ArXiv] [QA].
  • LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs - [ArXiv] [QA].
  • Improving Multimodal Datasets with Image Captioning - [ArXiv] [QA].
  • FABRIC: Personalizing Diffusion Models with Iterative Feedback - [ArXiv] [QA].
  • Android in the Wild: A Large-Scale Dataset for Android Device Control - [ArXiv] [QA].
  • Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples - [ArXiv] [QA].
  • MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions - [ArXiv] [QA].
  • Hierarchical Spatio-Temporal Representation Learning for Gait Recognition - [ArXiv] [QA].
  • What do neural networks learn in image classification? A frequency shortcut perspective - [ArXiv] [QA].
  • Density-invariant Features for Distant Point Cloud Registration - [ArXiv] [QA].
  • Text2Layer: Layered Image Generation using Latent Diffusion Model - [ArXiv] [QA].
  • Towards Building More Robust Models with Frequency Bias - [ArXiv] [QA].
  • Generative Prompt Model for Weakly Supervised Object Localization - [ArXiv] [QA].
  • Space Engage: Collaborative Space Supervision for Contrastive-based Semi-Supervised Semantic Segmentation - [ArXiv] [QA].
  • CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation - [ArXiv] [QA].
  • AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks - [ArXiv] [QA].
  • Towards Saner Deep Image Registration - [ArXiv] [QA].
  • GlobalMapper: Arbitrary-Shaped Urban Layout Generation - [ArXiv] [QA].
  • Towards A Unified Agent with Foundation Models - [ArXiv] [QA].
  • Object-aware Gaze Target Detection - [ArXiv] [QA].
  • Promoting Exploration in Memory-Augmented Adam using Critical Momenta - [ArXiv] [QA].
  • Conditional 360-degree Image Synthesis for Immersive Indoor Scene Decoration - [ArXiv] [QA].
  • ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning - [ArXiv] [QA].
  • Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla - [ArXiv] [QA].
  • OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation - [ArXiv] [QA].
  • Biomaker CA: a Biome Maker project using Cellular Automata - [ArXiv] [QA].
  • Llama 2: Open Foundation and Fine-Tuned Chat Models - [ArXiv] [QA].
  • Augmenting CLIP with Improved Visio-Linguistic Reasoning - [ArXiv] [QA].
  • NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF - [ArXiv] [QA].
  • How is ChatGPT's behavior changing over time? - [ArXiv] [QA].
  • GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution - [ArXiv] [QA].
  • Diffusion Models Beat GANs on Image Classification - [ArXiv] [QA].
  • AlpaGasus: Training A Better Alpaca with Fewer Data - [ArXiv] [QA].
  • TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT - [ArXiv] [QA].
  • Retentive Network: A Successor to Transformer for Large Language Models - [ArXiv] [QA].
  • BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs - [ArXiv] [QA].
  • Scale-Aware Modulation Meet Transformer - [ArXiv] [QA].
  • Does Visual Pretraining Help End-to-End Reasoning? - [ArXiv] [QA].
  • Cumulative Spatial Knowledge Distillation for Vision Transformers - [ArXiv] [QA].
  • DOT: A Distillation-Oriented Trainer - [ArXiv] [QA].
  • Measuring Faithfulness in Chain-of-Thought Reasoning - [ArXiv] [QA].
  • Question Decomposition Improves the Faithfulness of Model-Generated Reasoning - [ArXiv] [QA].
  • Planting a SEED of Vision in Large Language Model - [ArXiv] [QA].
  • Towards Viewpoint-Invariant Visual Recognition via Adversarial Training - [ArXiv] [QA].
  • Language Conditioned Traffic Generation - [ArXiv] [QA].
  • Communicative Agents for Software Development - [ArXiv] [QA].
  • INVE: Interactive Neural Video Editing - [ArXiv] [QA].
  • CoTracker: It is Better to Track Together - [ArXiv] [QA].
  • NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis - [ArXiv] [QA].
  • DreamTeacher: Pretraining Image Backbones with Deep Generative Models - [ArXiv] [QA].
  • Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts - [ArXiv] [QA].
  • Learning to Retrieve In-Context Examples for Large Language Models - [ArXiv] [QA].
  • Bootstrapping Vision-Language Learning with Decoupled Language Pre-training - [ArXiv] [QA].
  • DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations - [ArXiv] [QA].
  • HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models - [ArXiv] [QA].
  • In-context Autoencoder for Context Compression in a Large Language Model - [ArXiv] [QA].
  • InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation - [ArXiv] [QA].
  • Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation - [ArXiv] [QA].
  • mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs - [ArXiv] [QA].
  • Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models - [ArXiv] [QA].
  • Generating Benchmarks for Factuality Evaluation of Language Models - [ArXiv] [QA].
  • Copy Is All You Need - [ArXiv] [QA].
  • Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events - [ArXiv] [QA].
  • T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation - [ArXiv] [QA].
  • Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution - [ArXiv] [QA].
  • Instruction Mining: High-Quality Instruction Data Selection for Large Language Models - [ArXiv] [QA].
  • MMBench: Is Your Multi-modal Model an All-around Player? - [ArXiv] [QA].
  • SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning - [ArXiv] [QA].
  • VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View - [ArXiv] [QA].
  • PolyLM: An Open Source Polyglot Large Language Model - [ArXiv] [QA].
  • VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models - [ArXiv] [QA].
  • Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations - [ArXiv] [QA].
  • Towards Robust and Efficient Continual Language Learning - [ArXiv] [QA].
  • Stack More Layers Differently: High-Rank Training Through Low-Rank Updates - [ArXiv] [QA].
  • Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives - [ArXiv] [QA].
  • Self-consistency for open-ended generations - [ArXiv] [QA].
  • EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone - [ArXiv] [QA].
  • Efficient 3D Articulated Human Generation with Layered Surface Volumes - [ArXiv] [QA].
  • Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features - [ArXiv] [QA].
  • Self-Supervised Learning with Lie Symmetries for Partial Differential Equations - [ArXiv] [QA].
  • Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration - [ArXiv] [QA].
  • Generative Pretraining in Multimodality - [ArXiv] [QA].
  • DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks - [ArXiv] [QA].
  • Test-Time Training on Video Streams - [ArXiv] [QA].
  • Monotone deep Boltzmann machines - [ArXiv] [QA].
  • Secrets of RLHF in Large Language Models Part I: PPO - [ArXiv] [QA].
  • Semantic-SAM: Segment and Recognize Anything at Any Granularity - [ArXiv] [QA].
  • SITTA: A Semantic Image-Text Alignment for Image Captioning - [ArXiv] [QA].
  • Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement - [ArXiv] [QA].
  • RoCo: Dialectic Multi-Robot Collaboration with Large Language Models - [ArXiv] [QA].
  • AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning - [ArXiv] [QA].
  • Large Language Models as General Pattern Machines - [ArXiv] [QA].
  • International Institutions for Advanced AI - [ArXiv] [QA].
  • VampNet: Music Generation via Masked Acoustic Token Modeling - [ArXiv] [QA].
  • AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System - [ArXiv] [QA].
  • RLTF: Reinforcement Learning from Unit Test Feedback - [ArXiv] [QA].
  • SVIT: Scaling up Visual Instruction Tuning - [ArXiv] [QA].
  • Toward Interactive Dictation - [ArXiv] [QA].
  • On decoder-only architecture for speech-to-text and large language model integration - [ArXiv] [QA].
  • Large Language Models for Supply Chain Optimization - [ArXiv] [QA].
  • Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation - [ArXiv] [QA].
  • AutoDecoding Latent 3D Diffusion Models - [ArXiv] [QA].
  • GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest - [ArXiv] [QA].
  • Solvent: A Framework for Protein Folding - [ArXiv] [QA].
  • Wireless Multi-Agent Generative AI: From Connected Intelligence to Collective Intelligence - [ArXiv] [QA].
  • Building Cooperative Embodied Agents Modularly with Large Language Models - [ArXiv] [QA].
  • What Matters in Training a GPT4-Style Language Model with Multimodal Inputs? - [ArXiv] [QA].
  • Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners - [ArXiv] [QA].
  • Embodied Task Planning with Large Language Models - [ArXiv] [QA].
  • Collaborative Score Distillation for Consistent Visual Synthesis - [ArXiv] [QA].
  • mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding - [ArXiv] [QA].
  • On Hofstadter's G-sequence - [ArXiv] [QA].
  • Hybrid two-level MCMC for Bayesian Inverse Problems - [ArXiv] [QA].
  • Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection - [ArXiv] [QA].
  • Multi-Task Learning Improves Performance In Deep Argument Mining Models - [ArXiv] [QA].
  • EIGER IV: The cool 10$^4$K circumgalactic environment of high-$z$ galaxies reveals remarkably efficient IGM enrichment - [ArXiv] [QA].
  • Variational integrals on Hessian spaces: partial regularity for critical points - [ArXiv] [QA].
  • Characterisation of three-body loss in ${}^{166}$Er and optimised production of large Bose-Einstein condensates - [ArXiv] [QA].
  • SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions - [ArXiv] [QA].
  • Scalable quantum neural networks by few quantum resources - [ArXiv] [QA].
  • Visual Instruction Tuning with Polite Flamingo - [ArXiv] [QA].
  • NOMA-Assisted Grant-Free Transmission: How to Design Pre-Configured SNR Levels? - [ArXiv] [QA].
  • Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset - [ArXiv] [QA].
  • JourneyDB: A Benchmark for Generative Image Understanding - [ArXiv] [QA].
  • Almost sure bounds for a weighted Steinhaus random multiplicative function - [ArXiv] [QA].
  • DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment - [ArXiv] [QA].
  • Personality Traits in Large Language Models - [ArXiv] [QA].

June 2023

  • SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs - [ArXiv] [QA].
  • Statler: State-Maintaining Language Models for Embodied Reasoning - [ArXiv] [QA].
  • Preference Ranking Optimization for Human Alignment - [ArXiv] [QA].
  • LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding - [ArXiv] [QA].
  • End-to-end Autonomous Driving: Challenges and Frontiers - [ArXiv] [QA].
  • KITE: Keypoint-Conditioned Policies for Semantic Manipulation - [ArXiv] [QA].
  • Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language - [ArXiv] [QA].
  • Inferring the Goals of Communicating Agents from Actions and Instructions - [ArXiv] [QA].
  • Confidence Ranking for CTR Prediction - [ArXiv] [QA].
  • Explainable Multimodal Emotion Reasoning - [ArXiv] [QA].
  • MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation - [ArXiv] [QA].
  • Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic - [ArXiv] [QA].
  • Kosmos-2: Grounding Multimodal Large Language Models to the World - [ArXiv] [QA].
  • MotionGPT: Human Motion as a Foreign Language - [ArXiv] [QA].
  • SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality - [ArXiv] [QA].
  • Aligning Large Multi-Modal Model with Robust Instruction Tuning - [ArXiv] [QA].
  • DesCo: Learning Object Recognition with Rich Language Descriptions - [ArXiv] [QA].
  • A Survey on Multimodal Large Language Models - [ArXiv] [QA].
  • MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models - [ArXiv] [QA].
  • Evading Forensic Classifiers with Attribute-Conditioned Adversarial Faces - [ArXiv] [QA].
  • SoftGPT: Learn Goal-oriented Soft Object Manipulation Skills by Generative Pre-trained Heterogeneous Graph Transformer - [ArXiv] [QA].
  • Local 3D Editing via 3D Distillation of CLIP Knowledge - [ArXiv] [QA].
  • FFCV: Accelerating Training by Removing Data Bottlenecks - [ArXiv] [QA].
  • Mass-Producing Failures of Multimodal Systems with Language Models - [ArXiv] [QA].
  • SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling - [ArXiv] [QA].
  • Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion - [ArXiv] [QA].
  • RM-PRT: Realistic Robotic Manipulation Simulator and Benchmark with Progressive Reasoning Tasks - [ArXiv] [QA].
  • MotionGPT: Finetuned LLMs are General-Purpose Motion Generators - [ArXiv] [QA].
  • UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning - [ArXiv] [QA].
  • CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents - [ArXiv] [QA].
  • Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering - [ArXiv] [QA].
  • LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning - [ArXiv] [QA].
  • Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models - [ArXiv] [QA].
  • LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models - [ArXiv] [QA].
  • Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration - [ArXiv] [QA].
  • Re-Benchmarking Pool-Based Active Learning for Binary Classification - [ArXiv] [QA].
  • Toward Grounded Social Reasoning - [ArXiv] [QA].
  • Language to Rewards for Robotic Skill Synthesis - [ArXiv] [QA].
  • Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models - [ArXiv] [QA].
  • AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn - [ArXiv] [QA].
  • AVIS: Autonomous Visual Information Seeking with Large Language Models - [ArXiv] [QA].
  • Neural Scene Chronology - [ArXiv] [QA].
  • Instant Multi-View Head Capture through Learnable Registration - [ArXiv] [QA].
  • LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark - [ArXiv] [QA].
  • RestGPT: Connecting Large Language Models with Real-World RESTful APIs - [ArXiv] [QA].
  • Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models - [ArXiv] [QA].
  • MIMIC-IT: Multi-Modal In-Context Instruction Tuning - [ArXiv] [QA].
  • M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models - [ArXiv] [QA].
  • ScaleDet: A Scalable Multi-Dataset Object Detector - [ArXiv] [QA].
  • M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning - [ArXiv] [QA].
  • Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks - [ArXiv] [QA].
  • ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory - [ArXiv] [QA].
  • Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach - [ArXiv] [QA].
  • On Pitfalls of Test-Time Adaptation - [ArXiv] [QA].
  • GaitGCI: Generative Counterfactual Intervention for Gait Recognition - [ArXiv] [QA].
  • DVIS: Decoupled Video Instance Segmentation Framework - [ArXiv] [QA].
  • Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents - [ArXiv] [QA].
  • Neuralangelo: High-Fidelity Neural Surface Reconstruction - [ArXiv] [QA].
  • BeyondPixels: A Comprehensive Review of the Evolution of Neural Radiance Fields - [ArXiv] [QA].
  • Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding - [ArXiv] [QA].
  • Orca: Progressive Learning from Complex Explanation Traces of GPT-4 - [ArXiv] [QA].
  • RecAgent: A Novel Simulation Paradigm for Recommender Systems - [ArXiv] [QA].
  • Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection - [ArXiv] [QA].
  • LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day - [ArXiv] [QA].
  • Microstructure quality control of steels using deep learning - [ArXiv] [QA].
  • GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception Tasks? - [ArXiv] [QA].
  • Thought Cloning: Learning to Think while Acting by Imitating Human Thinking - [ArXiv] [QA].

May 2023

  • Monotonic Location Attention for Length Generalization - [ArXiv] [QA].
  • Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models - [ArXiv] [QA].
  • Neural Kernel Surface Reconstruction - [ArXiv] [QA].
  • Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate - [ArXiv] [QA].
  • Independent Component Alignment for Multi-Task Learning - [ArXiv] [QA].
  • VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions - [ArXiv] [QA].
  • GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction - [ArXiv] [QA].
  • Direct Preference Optimization: Your Language Model is Secretly a Reward Model - [ArXiv] [QA].
  • Contextual Object Detection with Multimodal Large Language Models - [ArXiv] [QA].
  • Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models - [ArXiv] [QA].
  • SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks - [ArXiv] [QA].
  • MPCHAT: Towards Multimodal Persona-Grounded Conversation - [ArXiv] [QA].
  • Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance - [ArXiv] [QA].
  • Generating Images with Multimodal Language Models - [ArXiv] [QA].
  • Large Language Models as Tool Makers - [ArXiv] [QA].
  • Mindstorms in Natural Language-Based Societies of Mind - [ArXiv] [QA].
  • Training Socially Aligned Language Models in Simulated Human Society - [ArXiv] [QA].
  • On Evaluating Adversarial Robustness of Large Vision-Language Models - [ArXiv] [QA].
  • MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting - [ArXiv] [QA].
  • Playing repeated games with Large Language Models - [ArXiv] [QA].
  • Randomized Positional Encodings Boost Length Generalization of Transformers - [ArXiv] [QA].
  • Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark - [ArXiv] [QA].
  • AdaPlanner: Adaptive Planning from Feedback with Language Models - [ArXiv] [QA].
  • Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models - [ArXiv] [QA].
  • Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory - [ArXiv] [QA].
  • Landmark Attention: Random-Access Infinite Context Length for Transformers - [ArXiv] [QA].
  • Voyager: An Open-Ended Embodied Agent with Large Language Models - [ArXiv] [QA].
  • ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - [ArXiv] [QA].
  • Role-Play with Large Language Models - [ArXiv] [QA].
  • PandaGPT: One Model To Instruction-Follow Them All - [ArXiv] [QA].
  • LayoutGPT: Compositional Visual Planning and Generation with Large Language Models - [ArXiv] [QA].
  • Gorilla: Large Language Model Connected with Massive APIs - [ArXiv] [QA].
  • Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration - [ArXiv] [QA].
  • Dynamic Masking Rate Schedules for MLM Pretraining - [ArXiv] [QA].
  • Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models - [ArXiv] [QA].
  • EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought - [ArXiv] [QA].
  • Reasoning with Language Model is Planning with World Model - [ArXiv] [QA].
  • IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models - [ArXiv] [QA].
  • Discriminator-Guided Multi-step Reasoning with Language Models - [ArXiv] [QA].
  • PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts - [ArXiv] [QA].
  • Adapting Language Models to Compress Contexts - [ArXiv] [QA].
  • ExpertPrompting: Instructing Large Language Models to be Distinguished Experts - [ArXiv] [QA].
  • Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement - [ArXiv] [QA].
  • Automatic Model Selection with Large Language Models for Reasoning - [ArXiv] [QA].
  • Improving Factuality and Reasoning in Language Models through Multiagent Debate - [ArXiv] [QA].
  • ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models - [ArXiv] [QA].
  • RET-LLM: Towards a General Read-Write Memory for Large Language Models - [ArXiv] [QA].
  • CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation - [ArXiv] [QA].
  • REC-MV: REconstructing 3D Dynamic Cloth from Monocular Videos - [ArXiv] [QA].
  • Enhancing Chat Language Models by Scaling High-quality Instructional Conversations - [ArXiv] [QA].
  • DetGPT: Detect What You Need via Reasoning - [ArXiv] [QA].
  • Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction - [ArXiv] [QA].
  • PaD: Program-aided Distillation Specializes Large Models in Reasoning - [ArXiv] [QA].
  • Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration - [ArXiv] [QA].
  • RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text - [ArXiv] [QA].
  • Training Diffusion Models with Reinforcement Learning - [ArXiv] [QA].
  • Interactive Natural Language Processing - [ArXiv] [QA].
  • LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities - [ArXiv] [QA].
  • Making Language Models Better Tool Learners with Execution Feedback - [ArXiv] [QA].
  • RWKV: Reinventing RNNs for the Transformer Era - [ArXiv] [QA].
  • Pengi: An Audio Language Model for Audio Tasks - [ArXiv] [QA].
  • CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing - [ArXiv] [QA].
  • Learning Global-aware Kernel for Image Harmonization - [ArXiv] [QA].
  • ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings - [ArXiv] [QA].
  • RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought - [ArXiv] [QA].
  • Enhancing Personalized Dialogue Generation with Contrastive Latent Variables: Combining Sparse and Dense Persona - [ArXiv] [QA].
  • Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue - [ArXiv] [QA].
  • Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model - [ArXiv] [QA].
  • VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks - [ArXiv] [QA].
  • SimOAP: Improve Coherence and Consistency in Persona-based Dialogue Generation via Over-sampling and Post-evaluation - [ArXiv] [QA].
  • LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation - [ArXiv] [QA].
  • DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs - [ArXiv] [QA].
  • An Android Robot Head as Embodied Conversational Agent - [ArXiv] [QA].
  • 3D Registration with Maximal Cliques - [ArXiv] [QA].
  • Listen, Think, and Understand - [ArXiv] [QA].
  • OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding - [ArXiv] [QA].
  • Boost Vision Transformer with GPU-Friendly Sparsity and Quantization - [ArXiv] [QA].
  • Language Models Meet World Models: Embodied Experiences Enhance Language Models - [ArXiv] [QA].
  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models - [ArXiv] [QA].
  • IMAD: IMage-Augmented multi-modal Dialogue - [ArXiv] [QA].
  • PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering - [ArXiv] [QA].
  • Evaluating Object Hallucination in Large Vision-Language Models - [ArXiv] [QA].
  • MemoryBank: Enhancing Large Language Models with Long-Term Memory - [ArXiv] [QA].
  • Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations - [ArXiv] [QA].
  • Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback - [ArXiv] [QA].
  • Dual Semantic Knowledge Composed Multimodal Dialog Systems - [ArXiv] [QA].
  • Towards Generalist Robots: A Promising Paradigm via Generative Simulation - [ArXiv] [QA].
  • Small Models are Valuable Plug-ins for Large Language Models - [ArXiv] [QA].
  • Attacking Perceptual Similarity Metrics - [ArXiv] [QA].
  • A Cognitive Stimulation Dialogue System with Multi-source Knowledge Fusion for Elders with Cognitive Impairment - [ArXiv] [QA].
  • ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems - [ArXiv] [QA].
  • In Search of Verifiability: Explanations Rarely Enable Complementary Performance in AI-Advised Decision Making - [ArXiv] [QA].
  • ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4 - [ArXiv] [QA].
  • EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention - [ArXiv] [QA].
  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning - [ArXiv] [QA].
  • VideoChat: Chat-Centric Video Understanding - [ArXiv] [QA].
  • SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds - [ArXiv] [QA].
  • TidyBot: Personalized Robot Assistance with Large Language Models - [ArXiv] [QA].
  • Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue - [ArXiv] [QA].
  • Distilling Script Knowledge from Large Language Models for Constrained Language Planning - [ArXiv] [QA].
  • FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance - [ArXiv] [QA].
  • Knowledge-enhanced Agents for Interactive Text Games - [ArXiv] [QA].
  • MultiModal-GPT: A Vision and Language Model for Dialogue with Humans - [ArXiv] [QA].
  • Multi-Space Neural Radiance Fields - [ArXiv] [QA].
  • X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages - [ArXiv] [QA].
  • Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models - [ArXiv] [QA].
  • Otter: A Multi-Modal Model with In-Context Instruction Tuning - [ArXiv] [QA].
  • LMEye: An Interactive Perception Network for Large Language Models - [ArXiv] [QA].
  • T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering - [ArXiv] [QA].
  • TransESC: Smoothing Emotional Support Conversation via Turn-Level State Transition - [ArXiv] [QA].
  • Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework - [ArXiv] [QA].
  • ZipIt! Merging Models from Different Tasks without Training - [ArXiv] [QA].
  • Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision - [ArXiv] [QA].
  • A Survey on Proactive Dialogue Systems: Problems, Methods, and Prospects - [ArXiv] [QA].
  • Caption Anything: Interactive Image Description with Diverse Multimodal Controls - [ArXiv] [QA].
  • Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents - [ArXiv] [QA].
  • Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings - [ArXiv] [QA].
  • Multimodal Procedural Planning via Dual Text-Image Prompting - [ArXiv] [QA].
  • Unlimiformer: Long-Range Transformers with Unlimited Length Input - [ArXiv] [QA].
  • Transfer Visual Prompt Generator across LLMs - [ArXiv] [QA].
  • The Role of Summarization in Generative Agents: A Preliminary Perspective - [ArXiv] [QA].
  • ArK: Augmented Reality with Knowledge Interactive Emergent Ability - [ArXiv] [QA].
  • Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation - [ArXiv] [QA].
  • Hypernuclear event detection in the nuclear emulsion with Monte Carlo simulation and machine learning - [ArXiv] [QA].
  • Learning to Reason and Memorize with Self-Notes - [ArXiv] [QA].

April 2023

  • LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model - [ArXiv] [QA].
  • IMP: Iterative Matching and Pose Estimation with Adaptive Pooling - [ArXiv] [QA].
  • ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System - [ArXiv] [QA].
  • mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality - [ArXiv] [QA].
  • ChatLog: Recording and Analyzing ChatGPT Across Time - [ArXiv] [QA].
  • Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models - [ArXiv] [QA].
  • Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond - [ArXiv] [QA].
  • Multimodal Grounding for Embodied AI via Augmented Reality Headsets for Natural Language Driven Task Planning - [ArXiv] [QA].
  • Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System - [ArXiv] [QA].
  • Answering Questions by Meta-Reasoning over Multiple Chains of Thought - [ArXiv] [QA].
  • Patch-based 3D Natural Scene Generation from a Single Example - [ArXiv] [QA].
  • GlyphDiffusion: Text Generation as Image Generation - [ArXiv] [QA].
  • WizardLM: Empowering Large Language Models to Follow Complex Instructions - [ArXiv] [QA].
  • ChatLLM Network: More brains, More intelligence - [ArXiv] [QA].
  • SketchXAI: A First Look at Explainability for Human Sketches - [ArXiv] [QA].
  • Emergent and Predictable Memorization in Large Language Models - [ArXiv] [QA].
  • ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT - [ArXiv] [QA].
  • Can GPT-4 Perform Neural Architecture Search? - [ArXiv] [QA].
  • MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - [ArXiv] [QA].
  • Phoenix: Democratizing ChatGPT across Languages - [ArXiv] [QA].
  • SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation - [ArXiv] [QA].
  • SCoDA: Domain Adaptive Shape Completion for Real Scans - [ArXiv] [QA].
  • Learning Bottleneck Concepts in Image Classification - [ArXiv] [QA].
  • Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation - [ArXiv] [QA].
  • Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models - [ArXiv] [QA].
  • Network Pruning Spaces - [ArXiv] [QA].
  • Network Pruning Spaces - [ArXiv] [QA].
  • SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes - [ArXiv] [QA].
  • Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections - [ArXiv] [QA].
  • Visual Instruction Tuning - [ArXiv] [QA].
  • Tool Learning with Foundation Models - [ArXiv] [QA].
  • Chain of Thought Prompt Tuning in Vision Language Models - [ArXiv] [QA].
  • Self-collaboration Code Generation via ChatGPT - [ArXiv] [QA].
  • Tractable Control for Autoregressive Language Generation - [ArXiv] [QA].
  • DCFace: Synthetic Face Generation with Dual Condition Diffusion Model - [ArXiv] [QA].
  • Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text - [ArXiv] [QA].
  • RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment - [ArXiv] [QA].
  • Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning - [ArXiv] [QA].
  • NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds - [ArXiv] [QA].
  • Language Instructed Reinforcement Learning for Human-AI Coordination - [ArXiv] [QA].
  • Hard Patches Mining for Masked Image Modeling - [ArXiv] [QA].
  • Instance-Aware Domain Generalization for Face Anti-Spoofing - [ArXiv] [QA].
  • ChemCrow: Augmenting large-language models with chemistry tools - [ArXiv] [QA].
  • Toxicity in ChatGPT: Analyzing Persona-assigned Language Models - [ArXiv] [QA].
  • Teaching Large Language Models to Self-Debug - [ArXiv] [QA].
  • Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning - [ArXiv] [QA].
  • A Cheaper and Better Diffusion Language Model with Soft-Masked Noise - [ArXiv] [QA].
  • Improved Test-Time Adaptation for Domain Generalization - [ArXiv] [QA].
  • Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT - [ArXiv] [QA].
  • OpenAGI: When LLM Meets Domain Experts - [ArXiv] [QA].
  • Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions - [ArXiv] [QA].
  • Token Boosting for Robust Self-Supervised Visual Transformer Pre-training - [ArXiv] [QA].
  • Hi Sheldon! Creating Deep Personalized Characters from TV Shows - [ArXiv] [QA].
  • Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder - [ArXiv] [QA].
  • ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application - [ArXiv] [QA].
  • Why think step by step? Reasoning emerges from the locality of experience - [ArXiv] [QA].
  • Generative Agents: Interactive Simulacra of Human Behavior - [ArXiv] [QA].
  • ERRA: An Embodied Representation and Reasoning Architecture for Long-horizon Language-conditioned Manipulation Tasks - [ArXiv] [QA].
  • GINA-3D: Learning to Generate Implicit Neural Assets in the Wild - [ArXiv] [QA].
  • Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling - [ArXiv] [QA].
  • Asymptotic expansions for the maximum likelihood estimation errors of the rotating parameter of the gravitational wave from core-collapse supernovae - [ArXiv] [QA].
  • Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data - [ArXiv] [QA].
  • Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement - [ArXiv] [QA].
  • ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model - [ArXiv] [QA].
  • 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds - [ArXiv] [QA].
  • Metrological detection of multipartite entanglement through dynamical symmetries - [ArXiv] [QA].
  • When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus - [ArXiv] [QA].

March 2023

  • Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation - [ArXiv] [QA].
  • On stochastic MPC formulations with closed-loop guarantees: Analysis and a unifying framework - [ArXiv] [QA].
  • A Survey of Large Language Models - [ArXiv] [QA].
  • VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization - [ArXiv] [QA].
  • Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning - [ArXiv] [QA].
  • CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society - [ArXiv] [QA].
  • Self-Refine: Iterative Refinement with Self-Feedback - [ArXiv] [QA].
  • SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer - [ArXiv] [QA].
  • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face - [ArXiv] [QA].
  • WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research - [ArXiv] [QA].
  • Mixed Autoencoder for Self-supervised Visual Representation Learning - [ArXiv] [QA].
  • ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance - [ArXiv] [QA].
  • TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation - [ArXiv] [QA].
  • G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment - [ArXiv] [QA].
  • Personalised Language Modelling of Screen Characters Using Rich Metadata Annotations - [ArXiv] [QA].
  • Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks - [ArXiv] [QA].
  • Multi-View Azimuth Stereo via Tangent Space Consistency - [ArXiv] [QA].
  • TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs - [ArXiv] [QA].
  • ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models - [ArXiv] [QA].
  • Are Data-driven Explanations Robust against Out-of-distribution Data? - [ArXiv] [QA].
  • LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention - [ArXiv] [QA].
  • F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories - [ArXiv] [QA].
  • DisWOT: Student Architecture Search for Distillation WithOut Training - [ArXiv] [QA].
  • Zero-shot Model Diagnosis - [ArXiv] [QA].
  • Learning to Zoom and Unzoom - [ArXiv] [QA].
  • SimpleNet: A Simple Network for Image Anomaly Detection and Localization - [ArXiv] [QA].
  • UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View - [ArXiv] [QA].
  • Natural Language Reasoning, A Survey - [ArXiv] [QA].
  • Learning Versatile 3D Shape Generation with Improved AR Models - [ArXiv] [QA].
  • Learning video embedding space with Natural Language Supervision - [ArXiv] [QA].
  • SUDS: Scalable Urban Dynamic Scenes - [ArXiv] [QA].
  • Compacting Binary Neural Networks by Sparse Kernel Selection - [ArXiv] [QA].
  • NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects - [ArXiv] [QA].
  • Human Preference Score: Better Aligning Text-to-Image Models with Human Preference - [ArXiv] [QA].
  • VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud - [ArXiv] [QA].
  • IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients - [ArXiv] [QA].
  • Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting - [ArXiv] [QA].
  • Robust Test-Time Adaptation in Dynamic Scenarios - [ArXiv] [QA].
  • Progressively Optimized Local Radiance Fields for Robust View Synthesis - [ArXiv] [QA].
  • Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers - [ArXiv] [QA].
  • Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment - [ArXiv] [QA].
  • Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration - [ArXiv] [QA].
  • Spherical Transformer for LiDAR-based 3D Recognition - [ArXiv] [QA].
  • Correlational Image Modeling for Self-Supervised Visual Pre-Training - [ArXiv] [QA].
  • Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation - [ArXiv] [QA].
  • Logical Reasoning over Natural Language as Knowledge Representation: A Survey - [ArXiv] [QA].
  • NeAT: Learning Neural Implicit Surfaces with Arbitrary Topologies from Multi-view Images - [ArXiv] [QA].
  • Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection - [ArXiv] [QA].
  • Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective - [ArXiv] [QA].
  • Implicit Neural Representation for Cooperative Low-light Image Enhancement - [ArXiv] [QA].
  • eP-ALM: Efficient Perceptual Augmentation of Language Models - [ArXiv] [QA].
  • MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action - [ArXiv] [QA].
  • Reflexion: Language Agents with Verbal Reinforcement Learning - [ArXiv] [QA].
  • Learning Optical Flow from Event Camera with Rendered Dataset - [ArXiv] [QA].
  • Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning - [ArXiv] [QA].
  • DialogPaint: A Dialog-based Image Editing Model - [ArXiv] [QA].
  • Adversarial Counterfactual Visual Explanations - [ArXiv] [QA].
  • TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation - [ArXiv] [QA].
  • CoLT5: Faster Long-Range Transformers with Conditional Computation - [ArXiv] [QA].
  • CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos - [ArXiv] [QA].
  • Human-AI Collaboration: The Effect of AI Delegation on Human Task Performance and Task Satisfaction - [ArXiv] [QA].
  • ART: Automatic multi-step reasoning and tool-use for large language models - [ArXiv] [QA].
  • MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge - [ArXiv] [QA].
  • Can Large Language Models design a Robot? - [ArXiv] [QA].
  • VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation - [ArXiv] [QA].
  • Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting - [ArXiv] [QA].
  • MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences - [ArXiv] [QA].
  • Chat with the Environment: Interactive Multimodal Perception Using Large Language Models - [ArXiv] [QA].
  • Rotation-Invariant Transformer for Point Cloud Matching - [ArXiv] [QA].
  • Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis - [ArXiv] [QA].
  • ViperGPT: Visual Inference via Python Execution for Reasoning - [ArXiv] [QA].
  • NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images - [ArXiv] [QA].
  • RE-MOVE: An Adaptive Policy Design Approach for Dynamic Environments via Language-Based Feedback - [ArXiv] [QA].
  • The Life Cycle of Knowledge in Big Language Models: A Survey - [ArXiv] [QA].
  • Audio Visual Language Maps for Robot Navigation - [ArXiv] [QA].
  • Adaptive Data-Free Quantization - [ArXiv] [QA].
  • Iterative Geometry Encoding Volume for Stereo Matching - [ArXiv] [QA].
  • ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions - [ArXiv] [QA].
  • ChatGPT Prompt Patterns for Improving Code Quality, Refactoring, Requirements Elicitation, and Software Design - [ArXiv] [QA].
  • FAC: 3D Representation Learning via Foreground Aware Feature Contrast - [ArXiv] [QA].
  • Task and Motion Planning with Large Language Models for Object Rearrangement - [ArXiv] [QA].
  • MVImgNet: A Large-scale Dataset of Multi-view Images - [ArXiv] [QA].
  • Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation - [ArXiv] [QA].
  • Hardware Acceleration of Neural Graphics - [ArXiv] [QA].
  • 3D Video Loops from Asynchronous Input - [ArXiv] [QA].
  • Masked Image Modeling with Local Multi-Scale Reconstruction - [ArXiv] [QA].
  • ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction - [ArXiv] [QA].
  • X-Pruner: eXplainable Pruning for Vision Transformers - [ArXiv] [QA].
  • Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models - [ArXiv] [QA].
  • DNBP: Differentiable Nonparametric Belief Propagation - [ArXiv] [QA].
  • DNBP: Differentiable Nonparametric Belief Propagation - [ArXiv] [QA].
  • LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion - [ArXiv] [QA].
  • Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Based Zero-Shot Object Navigation - [ArXiv] [QA].
  • PaLM-E: An Embodied Multimodal Language Model - [ArXiv] [QA].
  • Prismer: A Vision-Language Model with An Ensemble of Experts - [ArXiv] [QA].
  • MathPrompter: Mathematical Reasoning using Large Language Models - [ArXiv] [QA].
  • Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners - [ArXiv] [QA].
  • EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization - [ArXiv] [QA].
  • Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering - [ArXiv] [QA].
  • Near Optimal Memory-Regret Tradeoff for Online Learning - [ArXiv] [QA].
  • WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions - [ArXiv] [QA].
  • First Order Quantum Phase Transition in the Hybrid Metal-Mott Insulator Transition Metal Dichalcogenide 4Hb-TaS2 - [ArXiv] [QA].
  • Isotopic effects in molecular attosecond photoelectron interferometry - [ArXiv] [QA].
  • Token Contrast for Weakly-Supervised Semantic Segmentation - [ArXiv] [QA].
  • Eulerian-Lagrangian particle-based model for diffusional growth for the better parameterization of ISM clouds: A road map for improving climate model through small-scale model using observations - [ArXiv] [QA].
  • Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation - [ArXiv] [QA].
  • Open-World Object Manipulation using Pre-trained Vision-Language Models - [ArXiv] [QA].
  • Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control - [ArXiv] [QA].
  • A Practical Upper Bound for the Worst-Case Attribution Deviations - [ArXiv] [QA].
  • Can ChatGPT Assess Human Personalities? A General Evaluation Framework - [ArXiv] [QA].

February 2023

  • A Comprehensive Perturbative Formalism for Phase Mixing in Perturbed Disks. II. Phase Spirals in an Inhomogeneous Disk Galaxy with a Non-responsive Dark Matter Halo - [ArXiv] [QA].
  • Generic-to-Specific Distillation of Masked Autoencoders - [ArXiv] [QA].
  • Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue - [ArXiv] [QA].
  • GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation - [ArXiv] [QA].
  • HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes with Iterative Intertwined Regularization - [ArXiv] [QA].
  • Internet Explorer: Targeted Representation Learning on the Open Web - [ArXiv] [QA].
  • Language Is Not All You Need: Aligning Perception with Language Models - [ArXiv] [QA].
  • LLaMA: Open and Efficient Foundation Language Models - [ArXiv] [QA].
  • Control flow in active inference systems - [ArXiv] [QA].
  • Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data - [ArXiv] [QA].
  • Active Prompting with Chain-of-Thought for Large Language Models - [ArXiv] [QA].
  • Aligning Text-to-Image Models using Human Feedback - [ArXiv] [QA].
  • Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? - [ArXiv] [QA].
  • Distributionally Robust Recourse Action - [ArXiv] [QA].
  • Distributionally Robust Recourse Action - [ArXiv] [QA].
  • Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities - [ArXiv] [QA].
  • ChatGPT for Robotics: Design Principles and Model Abilities - [ArXiv] [QA].
  • Weakly Supervised Label Learning Flows - [ArXiv] [QA].
  • Weakly Supervised Label Learning Flows - [ArXiv] [QA].
  • Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey - [ArXiv] [QA].
  • A survey on online active learning - [ArXiv] [QA].
  • PersonNeRF: Personalized Reconstruction from Photo Collections - [ArXiv] [QA].
  • Tuning computer vision models with task rewards - [ArXiv] [QA].
  • Aligning Language Models with Preferences through f-divergence Minimization - [ArXiv] [QA].
  • À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting - [ArXiv] [QA].
  • Augmented Language Models: a Survey - [ArXiv] [QA].
  • The Capacity for Moral Self-Correction in Large Language Models - [ArXiv] [QA].
  • Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask - [ArXiv] [QA].
  • The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation - [ArXiv] [QA].
  • Stitchable Neural Networks - [ArXiv] [QA].
  • A Reparameterized Discrete Diffusion Model for Text Generation - [ArXiv] [QA].
  • The Wisdom of Hindsight Makes Language Models Better Instruction Followers - [ArXiv] [QA].
  • Toolformer: Language Models Can Teach Themselves to Use Tools - [ArXiv] [QA].
  • GPTScore: Evaluate as You Desire - [ArXiv] [QA].
  • A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity - [ArXiv] [QA].
  • Controlling Personality Style in Dialogue with Zero-Shot Prompt-Based Learning - [ArXiv] [QA].
  • Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need - [ArXiv] [QA].
  • Robust Camera Pose Refinement for Multi-Resolution Hash Encoding - [ArXiv] [QA].
  • Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents - [ArXiv] [QA].
  • Inference in Non-stationary High-Dimensional VARs - [ArXiv] [QA].
  • Accelerating Large Language Model Decoding with Speculative Sampling - [ArXiv] [QA].
  • Multimodal Chain-of-Thought Reasoning in Language Models - [ArXiv] [QA].
  • Collaborating with language models for embodied reasoning - [ArXiv] [QA].
  • Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models - [ArXiv] [QA].

January 2023

  • Large Language Models Can Be Easily Distracted by Irrelevant Context - [ArXiv] [QA].
  • Grounding Language Models to Images for Multimodal Inputs and Outputs - [ArXiv] [QA].
  • Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning - [ArXiv] [QA].
  • The Flan Collection: Designing Data and Methods for Effective Instruction Tuning - [ArXiv] [QA].
  • Faithful Chain-of-Thought Reasoning - [ArXiv] [QA].
  • DepGraph: Towards Any Structural Pruning - [ArXiv] [QA].
  • Specializing Smaller Language Models towards Multi-Step Reasoning - [ArXiv] [QA].
  • Adversarial Style Augmentation for Domain Generalization - [ArXiv] [QA].
  • Adversarial Style Augmentation for Domain Generalization - [ArXiv] [QA].
  • BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models - [ArXiv] [QA].
  • Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling - [ArXiv] [QA].
  • Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation - [ArXiv] [QA].
  • Cut and Learn for Unsupervised Object Detection and Instance Segmentation - [ArXiv] [QA].
  • Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons - [ArXiv] [QA].
  • HexPlane: A Fast Representation for Dynamic Scenes - [ArXiv] [QA].
  • FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer - [ArXiv] [QA].
  • OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation - [ArXiv] [QA].
  • Dissociating language and thought in large language models: a cognitive perspective - [ArXiv] [QA].
  • TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World - [ArXiv] [QA].
  • Learning to Memorize Entailment and Discourse Relations for Persona-Consistent Dialogues - [ArXiv] [QA].
  • Pruning Compact ConvNets for Efficient Inference - [ArXiv] [QA].
  • Pruning Compact ConvNets for Efficient Inference - [ArXiv] [QA].
  • You Truly Understand What I Need: Intellectual and Friendly Dialogue Agents grounding Knowledge and Persona - [ArXiv] [QA].
  • Robust Dynamic Radiance Fields - [ArXiv] [QA].
  • SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph - [ArXiv] [QA].
  • Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes - [ArXiv] [QA].
  • Cross Modal Transformer: Towards Fast and Robust 3D Object Detection - [ArXiv] [QA].
  • Rethinking Mobile Block for Efficient Attention-based Models - [ArXiv] [QA].
  • One-Time Universal Hashing Quantum Digital Signatures without Perfect Keys - [ArXiv] [QA].
  • Efficient On-device Training via Gradient Filtering - [ArXiv] [QA].
2022

December 2022

  • Rethinking with Retrieval: Faithful Large Language Model Inference - [ArXiv] [QA].
  • A Survey on In-context Learning - [ArXiv] [QA].
  • Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples - [ArXiv] [QA].
  • NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling - [ArXiv] [QA].
  • Effects of Data Geometry in Early Deep Learning - [ArXiv] [QA].
  • Effects of Data Geometry in Early Deep Learning - [ArXiv] [QA].
  • Discriminator-Cooperated Feature Map Distillation for GAN Compression - [ArXiv] [QA].
  • SMMix: Self-Motivated Image Mixing for Vision Transformers - [ArXiv] [QA].
  • OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization - [ArXiv] [QA].
  • Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized Photography - [ArXiv] [QA].
  • Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise - [ArXiv] [QA].
  • 3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions - [ArXiv] [QA].
  • Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble - [ArXiv] [QA].
  • TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization - [ArXiv] [QA].
  • Critic-Guided Decoding for Controlled Text Generation - [ArXiv] [QA].
  • MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning - [ArXiv] [QA].
  • MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions - [ArXiv] [QA].
  • Ontologically Faithful Generation of Non-Player Character Dialogues - [ArXiv] [QA].
  • Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers - [ArXiv] [QA].
  • A Survey of Deep Learning for Mathematical Reasoning - [ArXiv] [QA].
  • Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions - [ArXiv] [QA].
  • LAMBADA: Backward Chaining for Automated Reasoning in Natural Language - [ArXiv] [QA].
  • Controllable Text Generation with Language Constraints - [ArXiv] [QA].
  • Towards Reasoning in Large Language Models: A Survey - [ArXiv] [QA].
  • SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers - [ArXiv] [QA].
  • Large Language Models Are Reasoning Teachers - [ArXiv] [QA].
  • Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters - [ArXiv] [QA].
  • Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments - [ArXiv] [QA].
  • A Probabilistic Framework for Lifelong Test-Time Adaptation - [ArXiv] [QA].
  • Reasoning with Language Model Prompting: A Survey - [ArXiv] [QA].
  • Large Language Models are Better Reasoners with Self-Verification - [ArXiv] [QA].
  • Latent Diffusion for Language Generation - [ArXiv] [QA].
  • Difformer: Empowering Diffusion Models on the Embedding Space for Text Generation - [ArXiv] [QA].
  • Discovering Language Model Behaviors with Model-Written Evaluations - [ArXiv] [QA].
  • PAL: Persona-Augmented Emotional Support Conversation Generation - [ArXiv] [QA].
  • Emergent Analogical Reasoning in Large Language Models - [ArXiv] [QA].
  • Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems - [ArXiv] [QA].
  • Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model - [ArXiv] [QA].
  • Let's Negotiate! A Survey of Negotiation Dialogue Systems - [ArXiv] [QA].
  • The Impact of Symbolic Representations on In-context Learning for Few-shot Reasoning - [ArXiv] [QA].
  • Teaching Small Language Models to Reason - [ArXiv] [QA].
  • Injecting Domain Knowledge in Language Models for Task-Oriented Dialogue Systems - [ArXiv] [QA].
  • On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning - [ArXiv] [QA].
  • Real-Time Neural Light Field on Mobile Devices - [ArXiv] [QA].
  • Constitutional AI: Harmlessness from AI Feedback - [ArXiv] [QA].
  • NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior - [ArXiv] [QA].
  • PD-Quant: Post-Training Quantization based on Prediction Difference Metric - [ArXiv] [QA].
  • Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders - [ArXiv] [QA].
  • Doubly Right Object Recognition: A Why Prompt for Visual Rationales - [ArXiv] [QA].
  • Genie: Show Me the Data for Quantization - [ArXiv] [QA].
  • BEVBert: Multimodal Map Pre-training for Language-guided Navigation - [ArXiv] [QA].
  • Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation - [ArXiv] [QA].
  • Successive Prompting for Decomposing Complex Questions - [ArXiv] [QA].
  • LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models - [ArXiv] [QA].
  • Teaching Matters: Investigating the Role of Supervision in Vision Transformers - [ArXiv] [QA].
  • EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points - [ArXiv] [QA].
  • Diffusion-SDF: Text-to-Shape via Voxelized Diffusion - [ArXiv] [QA].
  • Momentum Decoding: Open-ended Text Generation As Graph Exploration - [ArXiv] [QA].
  • Fast Point Cloud Generation with Straight Flows - [ArXiv] [QA].
  • RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering - [ArXiv] [QA].
  • ResFormer: Scaling ViTs with Multi-Resolution Training - [ArXiv] [QA].
  • Safe Learning-Based Control of Elastic Joint Robots via Control Barrier Functions - [ArXiv] [QA].
  • Language Model Pre-training on True Negatives - [ArXiv] [QA].
  • Distilling Reasoning Capabilities into Smaller Language Models - [ArXiv] [QA].

November 2022

  • Feature Selection with Distance Correlation - [ArXiv] [QA].
  • Fast Inference from Transformers via Speculative Decoding - [ArXiv] [QA].
  • PLA: Language-Driven Open-Vocabulary 3D Scene Understanding - [ArXiv] [QA].
  • NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers - [ArXiv] [QA].
  • Decentralized Learning with Multi-Headed Distillation - [ArXiv] [QA].
  • Post-training Quantization on Diffusion Models - [ArXiv] [QA].
  • SuS-X: Training-Free Name-Only Transfer of Vision-Language Models - [ArXiv] [QA].
  • In-Hand 3D Object Scanning from an RGB Sequence - [ArXiv] [QA].
  • DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models - [ArXiv] [QA].
  • RUST: Latent Neural Scene Representations from Unposed Imagery - [ArXiv] [QA].
  • NeuralUDF: Learning Unsigned Distance Fields for Multi-view Reconstruction of Surfaces with Arbitrary Topologies - [ArXiv] [QA].
  • ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision - [ArXiv] [QA].
  • SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow - [ArXiv] [QA].
  • SfM-TTR: Using Structure from Motion for Test-Time Refinement of Single-View Depth Networks - [ArXiv] [QA].
  • Video Test-Time Adaptation for Action Recognition - [ArXiv] [QA].
  • TSGP: Two-Stage Generative Prompting for Unsupervised Commonsense Question Answering - [ArXiv] [QA].
  • Robust Mean Teacher for Continual and Gradual Test-Time Adaptation - [ArXiv] [QA].
  • ActMAD: Activation Matching to Align Distributions for Test-Time-Training - [ArXiv] [QA].
  • BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields - [ArXiv] [QA].
  • Integrally Pre-Trained Transformer Pyramid Networks - [ArXiv] [QA].
  • Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks - [ArXiv] [QA].
  • Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations - [ArXiv] [QA].
  • OCTET: Object-aware Counterfactual Explanations - [ArXiv] [QA].
  • Explaining Image Classifiers with Multiscale Directional Image Representation - [ArXiv] [QA].
  • Level-S$^2$fM: Structure from Motion on Neural Level Set of Implicit Surfaces - [ArXiv] [QA].
  • PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning - [ArXiv] [QA].
  • MATE: Masked Autoencoders are Online 3D Test-Time Learners - [ArXiv] [QA].
  • NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization - [ArXiv] [QA].
  • Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification - [ArXiv] [QA].
  • You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model - [ArXiv] [QA].
  • DynIBaR: Neural Dynamic Image-Based Rendering - [ArXiv] [QA].
  • Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation - [ArXiv] [QA].
  • LidarGait: Benchmarking 3D Gait Recognition with Point Clouds - [ArXiv] [QA].
  • PAL: Program-aided Language Models - [ArXiv] [QA].
  • Visual Programming: Compositional visual reasoning without training - [ArXiv] [QA].
  • CRAFT: Concept Recursive Activation FacTorization for Explainability - [ArXiv] [QA].
  • AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders - [ArXiv] [QA].
  • MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis - [ArXiv] [QA].
  • Holistic Evaluation of Language Models - [ArXiv] [QA].
  • Galactica: A Large Language Model for Science - [ArXiv] [QA].
  • Stare at What You See: Masked Image Modeling without Reconstruction - [ArXiv] [QA].
  • Consistent Direct Time-of-Flight Video Depth Super-Resolution - [ArXiv] [QA].
  • Teaching Algorithmic Reasoning via In-context Learning - [ArXiv] [QA].
  • EVA: Exploring the Limits of Masked Visual Representation Learning at Scale - [ArXiv] [QA].
  • Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding - [ArXiv] [QA].
  • PKCAM: Previous Knowledge Channel Attention Module - [ArXiv] [QA].
  • PKCAM: Previous Knowledge Channel Attention Module - [ArXiv] [QA].
  • What would Harry say? Building Dialogue Agents for Characters in a Story - [ArXiv] [QA].
  • OpenGait: Revisiting Gait Recognition Toward Better Practicality - [ArXiv] [QA].
  • Masked Contrastive Representation Learning - [ArXiv] [QA].
  • Masked Contrastive Representation Learning - [ArXiv] [QA].
  • MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation - [ArXiv] [QA].
  • BLOOM: A 176B-Parameter Open-Access Multilingual Language Model - [ArXiv] [QA].
  • Self-conditioned Embedding Diffusion for Text Generation - [ArXiv] [QA].
  • Crosslingual Generalization through Multitask Finetuning - [ArXiv] [QA].
  • PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales - [ArXiv] [QA].
  • Flashlights: An Off-Caustic Lensed Star at Redshift $z$ = 1.26 in Abell 370 - [ArXiv] [QA].
  • Late lumping of transformation-based feedback laws for boundary control systems - [ArXiv] [QA].
  • Bipartite Mixed Membership Distribution-Free Model. A novel model for community detection in overlapping bipartite weighted networks - [ArXiv] [QA].
  • CARE: Causality Reasoning for Empathetic Responses by Conditional Graph Generation - [ArXiv] [QA].
  • Evaluating Impact of Social Media Posts by Executives on Stock Prices - [ArXiv] [QA].

October 2022

  • SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control - [ArXiv] [QA].
  • GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers - [ArXiv] [QA].
  • DiffusER: Discrete Diffusion via Edit-based Reconstruction - [ArXiv] [QA].
  • Contrastive Decoding: Open-ended Text Generation as Optimization - [ArXiv] [QA].
  • Streaming Radiance Fields for 3D Video Synthesis - [ArXiv] [QA].
  • Contrastive Search Is What You Need For Neural Text Generation - [ArXiv] [QA].
  • FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation - [ArXiv] [QA].
  • DANLI: Deliberative Agent for Following Natural Language Instructions - [ArXiv] [QA].
  • Towards Efficient Dialogue Pre-training with Transferable and Interpretable Latent Structure - [ArXiv] [QA].
  • Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation - [ArXiv] [QA].
  • There Is No Standard Answer: Knowledge-Grounded Dialogue Generation with Adversarial Activated Multi-Reference Learning - [ArXiv] [QA].
  • WikiWhy: Answering and Explaining Cause-and-Effect Questions - [ArXiv] [QA].
  • Large Language Models Can Self-Improve - [ArXiv] [QA].
  • Scaling Instruction-Finetuned Language Models - [ArXiv] [QA].
  • Scaling Laws for Reward Model Overoptimization - [ArXiv] [QA].
  • DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Generation - [ArXiv] [QA].
  • Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them - [ArXiv] [QA].
  • DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models - [ArXiv] [QA].
  • Keep Me Updated! Memory Management in Long-term Conversations - [ArXiv] [QA].
  • Data-Efficient Augmentation for Training Neural Networks - [ArXiv] [QA].
  • Data-Efficient Augmentation for Training Neural Networks - [ArXiv] [QA].
  • DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation - [ArXiv] [QA].
  • Language Models of Code are Few-Shot Commonsense Learners - [ArXiv] [QA].
  • Explanations from Large Language Models Make Small Reasoners Better - [ArXiv] [QA].
  • Large Language Models are few(1)-shot Table Reasoners - [ArXiv] [QA].
  • Masked Motion Encoding for Self-Supervised Video Representation Learning - [ArXiv] [QA].
  • Mind's Eye: Grounded Language Model Reasoning through Simulation - [ArXiv] [QA].
  • Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning - [ArXiv] [QA].
  • Uncertainty-Aware Unsupervised Image Deblurring with Deep Residual Prior - [ArXiv] [QA].
  • Controllable Dialogue Simulation with In-Context Learning - [ArXiv] [QA].
  • Don't Lose Yourself! Empathetic Response Generation via Explicit Self-Other Awareness - [ArXiv] [QA].
  • Automatic Chain of Thought Prompting in Large Language Models - [ArXiv] [QA].
  • Measuring and Narrowing the Compositionality Gap in Language Models - [ArXiv] [QA].
  • FAST: Improving Controllability for Text Generation with Feedback Aware Self-Training - [ArXiv] [QA].
  • VIMA: General Robot Manipulation with Multimodal Prompts - [ArXiv] [QA].
  • Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering - [ArXiv] [QA].
  • Language Models are Multilingual Chain-of-Thought Reasoners - [ArXiv] [QA].
  • A Distributional Lens for Multi-Aspect Controllable Text Generation - [ArXiv] [QA].
  • ReAct: Synergizing Reasoning and Acting in Language Models - [ArXiv] [QA].
  • GLM-130B: An Open Bilingual Pre-trained Model - [ArXiv] [QA].
  • Decomposed Prompting: A Modular Approach for Solving Complex Tasks - [ArXiv] [QA].
  • CorefDiffs: Co-referential and Differential Knowledge Flow in Document Grounded Conversations - [ArXiv] [QA].
  • Group Personalized Federated Learning - [ArXiv] [QA].
  • Group Personalized Federated Learning - [ArXiv] [QA].
  • Knowledge Unlearning for Mitigating Privacy Risks in Language Models - [ArXiv] [QA].
  • Extraneousness-Aware Imitation Learning - [ArXiv] [QA].
  • Extraneousness-Aware Imitation Learning - [ArXiv] [QA].
  • Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization - [ArXiv] [QA].
  • Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought - [ArXiv] [QA].
  • Complexity-Based Prompting for Multi-Step Reasoning - [ArXiv] [QA].
  • "Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction - [ArXiv] [QA].
  • NeRF: Neural Radiance Field in 3D Vision, A Comprehensive Review - [ArXiv] [QA].
  • Multimodal Analogical Reasoning over Knowledge Graphs - [ArXiv] [QA].

September 2022

  • Compositional Semantic Parsing with Large Language Models - [ArXiv] [QA].
  • Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning - [ArXiv] [QA].
  • Improving alignment of dialogue agents via targeted human judgements - [ArXiv] [QA].
  • Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts - [ArXiv] [QA].
  • Target-Guided Open-Domain Conversation Planning - [ArXiv] [QA].
  • Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering - [ArXiv] [QA].
  • Loc-NeRF: Monte Carlo Localization using Neural Radiance Fields - [ArXiv] [QA].
  • A Benchmark for Understanding and Generating Dialogue between Characters in Stories - [ArXiv] [QA].
  • Psychologically-informed chain-of-thought prompts for metaphor understanding in large language models - [ArXiv] [QA].
  • A Geometric Perspective on Variational Autoencoders - [ArXiv] [QA].
  • Selective Annotation Makes Language Models Better Few-Shot Learners - [ArXiv] [QA].

August 2022

  • Radon concentration variations at the Yangyang underground laboratory - [ArXiv] [QA].
  • Faithful Reasoning Using Large Language Models - [ArXiv] [QA].
  • Masked Autoencoders Enable Efficient Knowledge Distillers - [ArXiv] [QA].
  • Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned - [ArXiv] [QA].
  • Improving Personality Consistency in Conversation by Persona Extending - [ArXiv] [QA].
  • CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation - [ArXiv] [QA].
  • Follow Me: Conversation Planning for Target-driven Recommendation Dialogue Systems - [ArXiv] [QA].
  • BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage - [ArXiv] [QA].
  • Character Generation through Self-Supervised Vectorization - [ArXiv] [QA].
  • Character Generation through Self-Supervised Vectorization - [ArXiv] [QA].
  • Composable Text Controls in Latent Space with ODEs - [ArXiv] [QA].

July 2022

  • MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures - [ArXiv] [QA].
  • Visual correspondence-based explanations improve AI robustness and human-AI team accuracy - [ArXiv] [QA].
  • Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning - [ArXiv] [QA].
  • Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent - [ArXiv] [QA].
  • Language Model Cascades - [ArXiv] [QA].
  • Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability - [ArXiv] [QA].
  • Language models show human-like content effects on reasoning - [ArXiv] [QA].
  • Inner Monologue: Embodied Reasoning through Planning with Language Models - [ArXiv] [QA].
  • Bootstrapping a User-Centered Task-Oriented Dialogue System - [ArXiv] [QA].
  • LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action - [ArXiv] [QA].
  • Back to the Source: Diffusion-Driven Test-Time Adaptation - [ArXiv] [QA].
  • PVO: Panoptic Visual Odometry - [ArXiv] [QA].
  • Rationale-Augmented Ensembles in Language Models - [ArXiv] [QA].

June 2022

  • Solving Quantitative Reasoning Problems with Language Models - [ArXiv] [QA].
  • Invariant Causal Mechanisms through Distribution Matching - [ArXiv] [QA].
  • Invariant Causal Mechanisms through Distribution Matching - [ArXiv] [QA].
  • GODEL: Large-Scale Pre-Training for Goal-Directed Dialog - [ArXiv] [QA].
  • KiloNeuS: A Versatile Neural Implicit Surface Representation for Real-Time Rendering - [ArXiv] [QA].
  • Marginal Tail-Adaptive Normalizing Flows - [ArXiv] [QA].
  • Marginal Tail-Adaptive Normalizing Flows - [ArXiv] [QA].
  • MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge - [ArXiv] [QA].
  • Balancing Discriminability and Transferability for Source-Free Domain Adaptation - [ArXiv] [QA].
  • Emergent Abilities of Large Language Models - [ArXiv] [QA].
  • Confidence Score for Source-Free Unsupervised Domain Adaptation - [ArXiv] [QA].
  • Transformers are Meta-Reinforcement Learners - [ArXiv] [QA].
  • Transformers are Meta-Reinforcement Learners - [ArXiv] [QA].
  • Language Models are General-Purpose Interfaces - [ArXiv] [QA].
  • Mining Multi-Label Samples from Single Positive Labels - [ArXiv] [QA].
  • Mining Multi-Label Samples from Single Positive Labels - [ArXiv] [QA].
  • Building a Personalized Dialogue System with Prompt-Tuning - [ArXiv] [QA].
  • Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models - [ArXiv] [QA].
  • Spatial-temporal Concept based Explanation of 3D ConvNets - [ArXiv] [QA].
  • MobileOne: An Improved One millisecond Mobile Backbone - [ArXiv] [QA].
  • Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering - [ArXiv] [QA].
  • Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation - [ArXiv] [QA].
  • Making Large Language Models Better Reasoners with Step-Aware Verifier - [ArXiv] [QA].
  • PROMISSING: Pruning Missing Values in Neural Networks - [ArXiv] [QA].
  • PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images - [ArXiv] [QA].
  • Unified Recurrence Modeling for Video Action Anticipation - [ArXiv] [QA].
  • Unified Recurrence Modeling for Video Action Anticipation - [ArXiv] [QA].
  • NIPQ: Noise proxy-based Integrated Pseudo-Quantization - [ArXiv] [QA].
  • Hopular: Modern Hopfield Networks for Tabular Data - [ArXiv] [QA].
  • One- and two-dimensional solitons in spin-orbit-coupled Bose-Einstein condensates with fractional kinetic energy - [ArXiv] [QA].
  • A Theoretical Framework for Inference Learning - [ArXiv] [QA].

May 2022

  • New asymptotically flat static vacuum metrics with near Euclidean boundary data - [ArXiv] [QA].
  • itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection - [ArXiv] [QA].
  • Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning - [ArXiv] [QA].
  • Robust Weight Perturbation for Adversarial Training - [ArXiv] [QA].
  • Robust Weight Perturbation for Adversarial Training - [ArXiv] [QA].
  • CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI - [ArXiv] [QA].
  • CoNT: Contrastive Neural Text Generation - [ArXiv] [QA].
  • Controllable Text Generation with Neurally-Decomposed Oracle - [ArXiv] [QA].
  • Diffusion-LM Improves Controllable Text Generation - [ArXiv] [QA].
  • GIT: A Generative Image-to-text Transformer for Vision and Language - [ArXiv] [QA].
  • Prototype Based Classification from Hierarchy to Fairness - [ArXiv] [QA].
  • Prototype Based Classification from Hierarchy to Fairness - [ArXiv] [QA].
  • Quark: Controllable Text Generation with Reinforced Unlearning - [ArXiv] [QA].
  • RSTGen: Imbuing Fine-Grained Interpretable Control into Long-FormText Generators - [ArXiv] [QA].
  • TALM: Tool Augmented Language Models - [ArXiv] [QA].
  • Large Language Models are Zero-Shot Reasoners - [ArXiv] [QA].
  • Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations - [ArXiv] [QA].
  • PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection - [ArXiv] [QA].
  • Least-to-Most Prompting Enables Complex Reasoning in Large Language Models - [ArXiv] [QA].
  • RankGen: Improving Text Generation with Large Ranking Models - [ArXiv] [QA].
  • Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning - [ArXiv] [QA].
  • Learning Graph Structure from Convolutional Mixtures - [ArXiv] [QA].
  • Learning Graph Structure from Convolutional Mixtures - [ArXiv] [QA].
  • Target-Guided Dialogue Response Generation Using Commonsense and Data Augmentation - [ArXiv] [QA].
  • Robust Losses for Learning Value Functions - [ArXiv] [QA].
  • Robust Losses for Learning Value Functions - [ArXiv] [QA].
  • LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning - [ArXiv] [QA].
  • Long-term Control for Dialogue Generation: Methods and Evaluation - [ArXiv] [QA].
  • Reduce Information Loss in Transformers for Pluralistic Image Inpainting - [ArXiv] [QA].
  • Towards a Progression-Aware Autonomous Dialogue Agent - [ArXiv] [QA].
  • The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning - [ArXiv] [QA].
  • Spiking Graph Convolutional Networks - [ArXiv] [QA].
  • Spiking Graph Convolutional Networks - [ArXiv] [QA].
  • A Simple Contrastive Learning Objective for Alleviating Neural Text Degeneration - [ArXiv] [QA].
  • Lexical Knowledge Internalization for Neural Dialog Generation - [ArXiv] [QA].
  • Learning to Transfer Prompts for Text Generation - [ArXiv] [QA].
  • OPT: Open Pre-trained Transformer Language Models - [ArXiv] [QA].

April 2022

  • Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models - [ArXiv] [QA].
  • Flamingo: a Visual Language Model for Few-Shot Learning - [ArXiv] [QA].
  • Control Globally, Understand Locally: A Global-to-Local Hierarchical Graph Network for Emotional Support Conversation - [ArXiv] [QA].
  • MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation - [ArXiv] [QA].
  • Meet Your Favorite Character: Open-domain Chatbot Mimicking Fictional Characters with only a Few Utterances - [ArXiv] [QA].
  • Sharper Utility Bounds for Differentially Private Models - [ArXiv] [QA].
  • Sharper Utility Bounds for Differentially Private Models - [ArXiv] [QA].
  • Towards Multi-Turn Empathetic Dialogs with Positive Emotion Elicitation - [ArXiv] [QA].
  • Event Transition Planning for Open-ended Text Generation - [ArXiv] [QA].
  • Visio-Linguistic Brain Encoding - [ArXiv] [QA].
  • Visio-Linguistic Brain Encoding - [ArXiv] [QA].
  • A Personalized Dialogue Generator with Implicit User Persona Detection - [ArXiv] [QA].
  • LaMemo: Language Modeling with Look-Ahead Memory - [ArXiv] [QA].
  • GPT-NeoX-20B: An Open-Source Autoregressive Language Model - [ArXiv] [QA].
  • Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback - [ArXiv] [QA].
  • Stylized Knowledge-Grounded Dialogue Generation via Disentangled Template Rewriting - [ArXiv] [QA].
  • Federated Learning with Partial Model Personalization - [ArXiv] [QA].
  • Federated Learning with Partial Model Personalization - [ArXiv] [QA].
  • Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy - [ArXiv] [QA].
  • Knowledge Infused Decoding - [ArXiv] [QA].
  • Knowledge Infused Decoding - [ArXiv] [QA].
  • Towards An End-to-End Framework for Flow-Guided Video Inpainting - [ArXiv] [QA].
  • There Are a Thousand Hamlets in a Thousand People's Eyes: Enhancing Knowledge-grounded Dialogue with Personal Memory - [ArXiv] [QA].
  • Efficient Test-Time Model Adaptation without Forgetting - [ArXiv] [QA].
  • C3KG: A Chinese Commonsense Conversation Knowledge Graph - [ArXiv] [QA].
  • Can language models learn from explanations in context? - [ArXiv] [QA].
  • PaLM: Scaling Language Modeling with Pathways - [ArXiv] [QA].
  • $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text Generation - [ArXiv] [QA].
  • Learning Neural Acoustic Fields - [ArXiv] [QA].
  • Learning Neural Acoustic Fields - [ArXiv] [QA].
  • Do As I Can, Not As I Say: Grounding Language in Robotic Affordances - [ArXiv] [QA].
  • Value Gradient weighted Model-Based Reinforcement Learning - [ArXiv] [QA].
  • Value Gradient weighted Model-Based Reinforcement Learning - [ArXiv] [QA].
  • Probabilistic Implicit Scene Completion - [ArXiv] [QA].
  • Probabilistic Implicit Scene Completion - [ArXiv] [QA].
  • Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language - [ArXiv] [QA].

March 2022

  • R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis - [ArXiv] [QA].
  • MAT: Mask-Aware Transformer for Large Hole Image Inpainting - [ArXiv] [QA].
  • Generalizing Few-Shot NAS with Gradient Matching - [ArXiv] [QA].
  • Generalizing Few-Shot NAS with Gradient Matching - [ArXiv] [QA].
  • STaR: Bootstrapping Reasoning With Reasoning - [ArXiv] [QA].
  • Continual Test-Time Domain Adaptation - [ArXiv] [QA].
  • MISC: A MIxed Strategy-Aware Model Integrating COMET for Emotional Support Conversation - [ArXiv] [QA].
  • A Comparative Survey of Deep Active Learning - [ArXiv] [QA].
  • Linking Emergent and Natural Languages via Corpus Transfer - [ArXiv] [QA].
  • Linking Emergent and Natural Languages via Corpus Transfer - [ArXiv] [QA].
  • Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition - [ArXiv] [QA].
  • Language modeling via stochastic processes - [ArXiv] [QA].
  • Language modeling via stochastic processes - [ArXiv] [QA].
  • Self-Consistency Improves Chain of Thought Reasoning in Language Models - [ArXiv] [QA].
  • Teaching language models to support answers with verified quotes - [ArXiv] [QA].
  • Towards Large-Scale Interpretable Knowledge Graph Reasoning for Dialogue Systems - [ArXiv] [QA].
  • On Robust Prefix-Tuning for Text Classification - [ArXiv] [QA].
  • On Robust Prefix-Tuning for Text Classification - [ArXiv] [QA].
  • Generative Principal Component Analysis - [ArXiv] [QA].
  • Generative Principal Component Analysis - [ArXiv] [QA].
  • Monotonic Differentiable Sorting Networks - [ArXiv] [QA].
  • A Framework and Benchmark for Deep Batch Active Learning for Regression - [ArXiv] [QA].
  • RoMe: A Robust Metric for Evaluating Natural Language Generation - [ArXiv] [QA].
  • PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation - [ArXiv] [QA].
  • Memorizing Transformers - [ArXiv] [QA].
  • Memorizing Transformers - [ArXiv] [QA].
  • Multi-Stage Prompting for Knowledgeable Dialogue Generation - [ArXiv] [QA].
  • Differentiable DAG Sampling - [ArXiv] [QA].
  • Differentiable DAG Sampling - [ArXiv] [QA].
  • Iteratively Prompt Pre-trained Language Models for Chain of Thought - [ArXiv] [QA].
  • Unified Visual Transformer Compression - [ArXiv] [QA].
  • Unified Visual Transformer Compression - [ArXiv] [QA].
  • Vision-Based Manipulators Need to Also See from Their Hands - [ArXiv] [QA].
  • Vision-Based Manipulators Need to Also See from Their Hands - [ArXiv] [QA].
  • Orchestrated Value Mapping for Reinforcement Learning - [ArXiv] [QA].
  • Orchestrated Value Mapping for Reinforcement Learning - [ArXiv] [QA].
  • BiBERT: Accurate Fully Binarized BERT - [ArXiv] [QA].
  • MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image Inpainting - [ArXiv] [QA].
  • An Interpretable Neuro-Symbolic Reasoning Framework for Task-Oriented Dialogue Generation - [ArXiv] [QA].
  • Long Time No See! Open-Domain Conversation with Long-Term Persona Memory - [ArXiv] [QA].
  • Source-free Video Domain Adaptation by Learning Temporal Consistency for Action Recognition - [ArXiv] [QA].
  • Kubric: A scalable dataset generator - [ArXiv] [QA].
  • Adaptive Cross-Layer Attention for Image Restoration - [ArXiv] [QA].
  • Adaptive Cross-Layer Attention for Image Restoration - [ArXiv] [QA].
  • Neural Simulated Annealing - [ArXiv] [QA].
  • Neural Simulated Annealing - [ArXiv] [QA].
  • Training language models to follow instructions with human feedback - [ArXiv] [QA].
  • Self-Supervised Scene Flow Estimation with 4-D Automotive Radar - [ArXiv] [QA].
  • Follow-Up of Extended Shells around B[e] Stars - [ArXiv] [QA].
  • Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding - [ArXiv] [QA].
  • MERIt: Meta-Path Guided Contrastive Learning for Logical Reasoning - [ArXiv] [QA].

February 2022

  • Rethinking and Refining the Distinct Metric - [ArXiv] [QA].
  • The Spectral Bias of Polynomial Neural Networks - [ArXiv] [QA].
  • The Spectral Bias of Polynomial Neural Networks - [ArXiv] [QA].
  • AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation - [ArXiv] [QA].
  • Ask2Mask: Guided Data Selection for Masked Speech Modeling - [ArXiv] [QA].
  • Ask2Mask: Guided Data Selection for Masked Speech Modeling - [ArXiv] [QA].
  • Auto-scaling Vision Transformers without Training - [ArXiv] [QA].
  • Auto-scaling Vision Transformers without Training - [ArXiv] [QA].
  • COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics - [ArXiv] [QA].
  • Pseudo Numerical Methods for Diffusion Models on Manifolds - [ArXiv] [QA].
  • Pseudo Numerical Methods for Diffusion Models on Manifolds - [ArXiv] [QA].
  • Bit-wise Training of Neural Network Weights - [ArXiv] [QA].
  • Bit-wise Training of Neural Network Weights - [ArXiv] [QA].
  • Gaussian Mixture Convolution Networks - [ArXiv] [QA].
  • Gaussian Mixture Convolution Networks - [ArXiv] [QA].
  • cosFormer: Rethinking Softmax in Attention - [ArXiv] [QA].
  • cosFormer: Rethinking Softmax in Attention - [ArXiv] [QA].
  • Task-Agnostic Graph Explanations - [ArXiv] [QA].
  • Task-Agnostic Graph Explanations - [ArXiv] [QA].
  • Don't Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis - [ArXiv] [QA].
  • A precortical module for robust CNNs to light variations - [ArXiv] [QA].
  • A precortical module for robust CNNs to light variations - [ArXiv] [QA].
  • Domain Adaptation via Prompt Learning - [ArXiv] [QA].
  • FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows - [ArXiv] [QA].
  • A Contrastive Framework for Neural Text Generation - [ArXiv] [QA].
  • Conditional Contrastive Learning with Kernel - [ArXiv] [QA].
  • Conditional Contrastive Learning with Kernel - [ArXiv] [QA].
  • Domain Adversarial Training: A Game Perspective - [ArXiv] [QA].
  • Domain Adversarial Training: A Game Perspective - [ArXiv] [QA].
  • GiraffeDet: A Heavy-Neck Paradigm for Object Detection - [ArXiv] [QA].
  • GiraffeDet: A Heavy-Neck Paradigm for Object Detection - [ArXiv] [QA].
  • Survey of Hallucination in Natural Language Generation - [ArXiv] [QA].
  • GrASP: Gradient-Based Affordance Selection for Planning - [ArXiv] [QA].
  • GrASP: Gradient-Based Affordance Selection for Planning - [ArXiv] [QA].
  • Message Passing Neural PDE Solvers - [ArXiv] [QA].
  • Message Passing Neural PDE Solvers - [ArXiv] [QA].
  • User Satisfaction Estimation with Sequential Dialogue Act Modeling in Goal-oriented Conversational Systems - [ArXiv] [QA].
  • A Survey on Retrieval-Augmented Text Generation - [ArXiv] [QA].
  • CLA-NeRF: Category-Level Articulated Neural Radiance Field - [ArXiv] [QA].

January 2022

  • Signing the Supermask: Keep, Hide, Invert - [ArXiv] [QA].
  • Signing the Supermask: Keep, Hide, Invert - [ArXiv] [QA].
  • Few-Shot Backdoor Attacks on Visual Object Tracking - [ArXiv] [QA].
  • Few-Shot Backdoor Attacks on Visual Object Tracking - [ArXiv] [QA].
  • Robust Imitation Learning from Corrupted Demonstrations - [ArXiv] [QA].
  • Robust Imitation Learning from Corrupted Demonstrations - [ArXiv] [QA].
  • Counterfactual Plans under Distributional Ambiguity - [ArXiv] [QA].
  • Counterfactual Plans under Distributional Ambiguity - [ArXiv] [QA].
  • DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR - [ArXiv] [QA].
  • DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR - [ArXiv] [QA].
  • Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model - [ArXiv] [QA].
  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models - [ArXiv] [QA].
  • DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence - [ArXiv] [QA].
  • Natural Language Descriptions of Deep Visual Features - [ArXiv] [QA].
  • Natural Language Descriptions of Deep Visual Features - [ArXiv] [QA].
  • Explanatory Learning: Beyond Empiricism in Neural Networks - [ArXiv] [QA].
  • Explanatory Learning: Beyond Empiricism in Neural Networks - [ArXiv] [QA].
  • RePaint: Inpainting using Denoising Diffusion Probabilistic Models - [ArXiv] [QA].
  • Learning Graph Augmentations to Learn Graph Representations - [ArXiv] [QA].
  • Patches Are All You Need? - [ArXiv] [QA].
  • Patches Are All You Need? - [ArXiv] [QA].
  • Fast Differentiable Matrix Square Root - [ArXiv] [QA].
  • Fast Differentiable Matrix Square Root - [ArXiv] [QA].
  • LaMDA: Language Models for Dialog Applications - [ArXiv] [QA].
  • Safe Deep RL in 3D Environments using Human Feedback - [ArXiv] [QA].
  • Safe Deep RL in 3D Environments using Human Feedback - [ArXiv] [QA].
  • Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents - [ArXiv] [QA].
  • Parameter-free Online Test-time Adaptation - [ArXiv] [QA].
  • A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models - [ArXiv] [QA].
  • Neural Circuit Architectural Priors for Embodied Control - [ArXiv] [QA].
  • Neural Circuit Architectural Priors for Embodied Control - [ArXiv] [QA].
  • QuadTree Attention for Vision Transformers - [ArXiv] [QA].
  • QuadTree Attention for Vision Transformers - [ArXiv] [QA].
  • C2-CRS: Coarse-to-Fine Contrastive Learning for Conversational Recommender System - [ArXiv] [QA].
  • Global existence and decay estimates for a viscoelastic plate equation with nonlinear damping and logarithmic nonlinearity - [ArXiv] [QA].
2021

December 2021

  • Optimal Representations for Covariate Shift - [ArXiv] [QA].
  • Optimal Representations for Covariate Shift - [ArXiv] [QA].
  • On the Role of Neural Collapse in Transfer Learning - [ArXiv] [QA].
  • On the Role of Neural Collapse in Transfer Learning - [ArXiv] [QA].
  • Self Reward Design with Fine-grained Interpretability - [ArXiv] [QA].
  • Self Reward Design with Fine-grained Interpretability - [ArXiv] [QA].
  • Generative Kernel Continual learning - [ArXiv] [QA].
  • Transformers Can Do Bayesian Inference - [ArXiv] [QA].
  • WebGPT: Browser-assisted question-answering with human feedback - [ArXiv] [QA].
  • NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics - [ArXiv] [QA].
  • Reframing Human-AI Collaboration for Generating Free-Text Explanations - [ArXiv] [QA].
  • Learning to Prompt for Continual Learning - [ArXiv] [QA].
  • Learning to Prompt for Continual Learning - [ArXiv] [QA].
  • Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge - [ArXiv] [QA].
  • Rethinking Nearest Neighbors for Visual Classification - [ArXiv] [QA].
  • Improving Conversational Recommendation Systems' Quality with Context-Aware Item Meta Information - [ArXiv] [QA].
  • Massive-scale Decoding for Text Generation using Lattices - [ArXiv] [QA].
  • MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation - [ArXiv] [QA].
  • Real-Time Neural Voice Camouflage - [ArXiv] [QA].
  • Real-Time Neural Voice Camouflage - [ArXiv] [QA].
  • GLaM: Efficient Scaling of Language Models with Mixture-of-Experts - [ArXiv] [QA].
  • Step-unrolled Denoising Autoencoders for Text Generation - [ArXiv] [QA].
  • Step-unrolled Denoising Autoencoders for Text Generation - [ArXiv] [QA].
  • CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability - [ArXiv] [QA].
  • Self-Supervised Bot Play for Conversational Recommendation with Justifications - [ArXiv] [QA].
  • On Convergence of Federated Averaging Langevin Dynamics - [ArXiv] [QA].
  • Scaling Language Models: Methods, Analysis & Insights from Training Gopher - [ArXiv] [QA].
  • Pareto Domain Adaptation - [ArXiv] [QA].
  • Pareto Domain Adaptation - [ArXiv] [QA].
  • DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover's Distance Improves Out-Of-Distribution Face Identification - [ArXiv] [QA].
  • Universalizing Weak Supervision - [ArXiv] [QA].
  • Universalizing Weak Supervision - [ArXiv] [QA].
  • Genetic Algorithm for Constrained Molecular Inverse Design - [ArXiv] [QA].
  • Genetic Algorithm for Constrained Molecular Inverse Design - [ArXiv] [QA].
  • Variational Wasserstein gradient flow - [ArXiv] [QA].
  • Variational Wasserstein gradient flow - [ArXiv] [QA].
  • Linear algebra with transformers - [ArXiv] [QA].
  • Linear algebra with transformers - [ArXiv] [QA].
  • Mind the gap in university rankings: a complex network approach towards fairness - [ArXiv] [QA].
  • Magnetic correction to the Anomalous Magnetic Moment of Electron - [ArXiv] [QA].
  • Neural Stochastic Dual Dynamic Programming - [ArXiv] [QA].
  • Neural Stochastic Dual Dynamic Programming - [ArXiv] [QA].
  • A General Language Assistant as a Laboratory for Alignment - [ArXiv] [QA].
  • Routing with Self-Attention for Multimodal Capsule Networks - [ArXiv] [QA].
  • Routing with Self-Attention for Multimodal Capsule Networks - [ArXiv] [QA].

November 2021

  • Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective - [ArXiv] [QA].
  • GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection - [ArXiv] [QA].
  • Group equivariant neural posterior estimation - [ArXiv] [QA].
  • Group equivariant neural posterior estimation - [ArXiv] [QA].
  • Node-Level Differentially Private Graph Neural Networks - [ArXiv] [QA].
  • Node-Level Differentially Private Graph Neural Networks - [ArXiv] [QA].
  • Deep Point Cloud Reconstruction - [ArXiv] [QA].
  • Deep Point Cloud Reconstruction - [ArXiv] [QA].
  • Lossless Compression with Probabilistic Circuits - [ArXiv] [QA].
  • Lossless Compression with Probabilistic Circuits - [ArXiv] [QA].
  • Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction - [ArXiv] [QA].
  • Plant 'n' Seek: Can You Find the Winning Ticket? - [ArXiv] [QA].
  • Plant 'n' Seek: Can You Find the Winning Ticket? - [ArXiv] [QA].
  • Deep Probability Estimation - [ArXiv] [QA].
  • Deep Probability Estimation - [ArXiv] [QA].
  • Are Vision Transformers Robust to Patch Perturbations? - [ArXiv] [QA].
  • Are Vision Transformers Robust to Patch Perturbations? - [ArXiv] [QA].
  • Deep Safe Multi-Task Learning - [ArXiv] [QA].
  • Deep Safe Multi-Task Learning - [ArXiv] [QA].
  • Selective Ensembles for Consistent Predictions - [ArXiv] [QA].
  • Bolstering Stochastic Gradient Descent with Model Building - [ArXiv] [QA].
  • Bolstering Stochastic Gradient Descent with Model Building - [ArXiv] [QA].
  • Sliced Recursive Transformer - [ArXiv] [QA].
  • Sliced Recursive Transformer - [ArXiv] [QA].
  • MT3: Multi-Task Multitrack Music Transcription - [ArXiv] [QA].
  • MT3: Multi-Task Multitrack Music Transcription - [ArXiv] [QA].
  • LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - [ArXiv] [QA].
  • DAGSurv: Directed Acyclic Graph Based Survival Analysis Using Deep Neural Networks - [ArXiv] [QA].
  • Can Vision Transformers Perform Convolution? - [ArXiv] [QA].
  • Can Vision Transformers Perform Convolution? - [ArXiv] [QA].
  • LSTA-Net: Long short-term Spatio-Temporal Aggregation Network for Skeleton-based Action Recognition - [ArXiv] [QA].

October 2021

  • Template Filling for Controllable Commonsense Reasoning - [ArXiv] [QA].
  • Improving Fairness via Federated Learning - [ArXiv] [QA].
  • Improving Fairness via Federated Learning - [ArXiv] [QA].
  • The magnitude vector of images - [ArXiv] [QA].
  • The magnitude vector of images - [ArXiv] [QA].
  • Training Verifiers to Solve Math Word Problems - [ArXiv] [QA].
  • s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning - [ArXiv] [QA].
  • The Efficiency Misnomer - [ArXiv] [QA].
  • The Efficiency Misnomer - [ArXiv] [QA].
  • Double Trouble: How to not explain a text classifier's decisions using counterfactuals synthesized by masked language models? - [ArXiv] [QA].
  • Center Loss Regularization for Continual Learning - [ArXiv] [QA].
  • Center Loss Regularization for Continual Learning - [ArXiv] [QA].
  • Fast Model Editing at Scale - [ArXiv] [QA].
  • Fast Model Editing at Scale - [ArXiv] [QA].
  • BERMo: What can BERT learn from ELMo? - [ArXiv] [QA].
  • BERMo: What can BERT learn from ELMo? - [ArXiv] [QA].
  • TLDR: Twin Learning for Dimensionality Reduction - [ArXiv] [QA].
  • TLDR: Twin Learning for Dimensionality Reduction - [ArXiv] [QA].
  • Natural Attribute-based Shift Detection - [ArXiv] [QA].
  • Natural Attribute-based Shift Detection - [ArXiv] [QA].
  • Illiterate DALL-E Learns to Compose - [ArXiv] [QA].
  • Illiterate DALL-E Learns to Compose - [ArXiv] [QA].
  • Multimodal Dialogue Response Generation - [ArXiv] [QA].
  • Comparing Human and Machine Bias in Face Recognition - [ArXiv] [QA].
  • Comparing Human and Machine Bias in Face Recognition - [ArXiv] [QA].
  • Generated Knowledge Prompting for Commonsense Reasoning - [ArXiv] [QA].
  • On Learning the Transformer Kernel - [ArXiv] [QA].
  • On Learning the Transformer Kernel - [ArXiv] [QA].
  • Multitask Prompted Training Enables Zero-Shot Task Generalization - [ArXiv] [QA].
  • Few-Shot Bot: Prompt-Based Learning for Dialogue Systems - [ArXiv] [QA].
  • On-Policy Model Errors in Reinforcement Learning - [ArXiv] [QA].
  • On-Policy Model Errors in Reinforcement Learning - [ArXiv] [QA].
  • ContraQA: Question Answering under Contradicting Contexts - [ArXiv] [QA].
  • ContraQA: Question Answering under Contradicting Contexts - [ArXiv] [QA].
  • RecInDial: A Unified Framework for Conversational Recommendation with Pretrained Language Models - [ArXiv] [QA].
  • Parallel Deep Neural Networks Have Zero Duality Gap - [ArXiv] [QA].
  • Parallel Deep Neural Networks Have Zero Duality Gap - [ArXiv] [QA].
  • Causal discovery from conditionally stationary time-series - [ArXiv] [QA].
  • Causal discovery from conditionally stationary time-series - [ArXiv] [QA].
  • Molecular Graph Generation via Geometric Scattering - [ArXiv] [QA].
  • Molecular Graph Generation via Geometric Scattering - [ArXiv] [QA].
  • DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational Transformer - [ArXiv] [QA].
  • Relative Molecule Self-Attention Transformer - [ArXiv] [QA].
  • Relative Molecule Self-Attention Transformer - [ArXiv] [QA].
  • Certified Patch Robustness via Smoothed Vision Transformers - [ArXiv] [QA].
  • Certified Patch Robustness via Smoothed Vision Transformers - [ArXiv] [QA].
  • Global Vision Transformer Pruning with Hessian-Aware Saliency - [ArXiv] [QA].
  • Long Expressive Memory for Sequence Modeling - [ArXiv] [QA].
  • Long Expressive Memory for Sequence Modeling - [ArXiv] [QA].
  • Multi-Agent MDP Homomorphic Networks - [ArXiv] [QA].
  • Multi-Agent MDP Homomorphic Networks - [ArXiv] [QA].
  • Neural Link Prediction with Walk Pooling - [ArXiv] [QA].
  • Neural Link Prediction with Walk Pooling - [ArXiv] [QA].
  • FRL: Federated Rank Learning - [ArXiv] [QA].
  • On the Limitations of Multimodal VAEs - [ArXiv] [QA].
  • On the Limitations of Multimodal VAEs - [ArXiv] [QA].
  • Token Pooling in Vision Transformers - [ArXiv] [QA].
  • FOCUS: Familiar Objects in Common and Uncommon Settings - [ArXiv] [QA].
  • FOCUS: Familiar Objects in Common and Uncommon Settings - [ArXiv] [QA].
  • Hyperparameter Tuning with Renyi Differential Privacy - [ArXiv] [QA].
  • Adversarial Retriever-Ranker for dense text retrieval - [ArXiv] [QA].
  • Adversarial Retriever-Ranker for dense text retrieval - [ArXiv] [QA].
  • RAR: Region-Aware Point Cloud Registration - [ArXiv] [QA].
  • RAR: Region-Aware Point Cloud Registration - [ArXiv] [QA].
  • Cartoon Explanations of Image Classifiers - [ArXiv] [QA].
  • Cartoon Explanations of Image Classifiers - [ArXiv] [QA].
  • Situated Dialogue Learning through Procedural Environment Generation - [ArXiv] [QA].
  • On the Optimal Memorization Power of ReLU Neural Networks - [ArXiv] [QA].
  • On the Optimal Memorization Power of ReLU Neural Networks - [ArXiv] [QA].
  • Generative Modeling with Optimal Transport Maps - [ArXiv] [QA].
  • Generative Modeling with Optimal Transport Maps - [ArXiv] [QA].
  • Federated Learning via Plurality Vote - [ArXiv] [QA].
  • Federated Learning via Plurality Vote - [ArXiv] [QA].
  • Nested Policy Reinforcement Learning - [ArXiv] [QA].
  • Nested Policy Reinforcement Learning - [ArXiv] [QA].
  • How BPE Affects Memorization in Transformers - [ArXiv] [QA].
  • How BPE Affects Memorization in Transformers - [ArXiv] [QA].
  • On The Transferability of Deep-Q Networks - [ArXiv] [QA].
  • On The Transferability of Deep-Q Networks - [ArXiv] [QA].
  • Test-time Batch Statistics Calibration for Covariate Shift - [ArXiv] [QA].
  • Test-time Batch Statistics Calibration for Covariate Shift - [ArXiv] [QA].
  • Geometric Algebra Attention Networks for Small Point Clouds - [ArXiv] [QA].
  • Geometric Algebra Attention Networks for Small Point Clouds - [ArXiv] [QA].
  • EntQA: Entity Linking as Question Answering - [ArXiv] [QA].
  • EntQA: Entity Linking as Question Answering - [ArXiv] [QA].
  • Autoregressive Diffusion Models - [ArXiv] [QA].
  • Autoregressive Diffusion Models - [ArXiv] [QA].
  • Generalized Kernel Thinning - [ArXiv] [QA].
  • Generalized Kernel Thinning - [ArXiv] [QA].
  • Batch size-invariance for policy optimization - [ArXiv] [QA].
  • Batch size-invariance for policy optimization - [ArXiv] [QA].
  • Dynamics of targeted ransomware negotiation - [ArXiv] [QA].
  • Vision-Only Robot Navigation in a Neural Radiance World - [ArXiv] [QA].

September 2021

  • Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System - [ArXiv] [QA].
  • Stochastic Training is Not Necessary for Generalization - [ArXiv] [QA].
  • Stochastic Training is Not Necessary for Generalization - [ArXiv] [QA].
  • IGLU: Efficient GCN Training via Lazy Updates - [ArXiv] [QA].
  • IGLU: Efficient GCN Training via Lazy Updates - [ArXiv] [QA].
  • OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts - [ArXiv] [QA].
  • Learning Neural Templates for Recommender Dialogue System - [ArXiv] [QA].
  • A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification - [ArXiv] [QA].
  • Recursively Summarizing Books with Human Feedback - [ArXiv] [QA].
  • Neural networks with trainable matrix activation functions - [ArXiv] [QA].
  • Neural networks with trainable matrix activation functions - [ArXiv] [QA].
  • PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation - [ArXiv] [QA].
  • DuRecDial 2.0: A Bilingual Parallel Corpus for Conversational Recommendation - [ArXiv] [QA].
  • Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes - [ArXiv] [QA].
  • Scaling Laws for Neural Machine Translation - [ArXiv] [QA].
  • Transferable Persona-Grounded Dialogues via Grounded Minimal Edits - [ArXiv] [QA].
  • Benchmarking the Spectrum of Agent Capabilities - [ArXiv] [QA].
  • Exploring Prompt-based Few-shot Learning for Grounded Dialog Generation - [ArXiv] [QA].
  • Space Time Recurrent Memory Network - [ArXiv] [QA].
  • Space Time Recurrent Memory Network - [ArXiv] [QA].
  • Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation - [ArXiv] [QA].
  • CEM: Commonsense-aware Empathetic Response Generation - [ArXiv] [QA].
  • Bootstrapped Meta-Learning - [ArXiv] [QA].
  • Bootstrapped Meta-Learning - [ArXiv] [QA].
  • A Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue Generation - [ArXiv] [QA].
  • Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems - [ArXiv] [QA].
  • Local Augmentation for Graph Neural Networks - [ArXiv] [QA].
  • Local Augmentation for Graph Neural Networks - [ArXiv] [QA].
  • Sqrt(d) Dimension Dependence of Langevin Monte Carlo - [ArXiv] [QA].
  • Sqrt(d) Dimension Dependence of Langevin Monte Carlo - [ArXiv] [QA].
  • Learning Neural Causal Models with Active Interventions - [ArXiv] [QA].
  • Learning Neural Causal Models with Active Interventions - [ArXiv] [QA].
  • Learning to Prompt for Vision-Language Models - [ArXiv] [QA].
  • Learning to Prompt for Vision-Language Models - [ArXiv] [QA].
  • The fractional chromatic number of double cones over graphs - [ArXiv] [QA].
  • Regional Adversarial Training for Better Robust Generalization - [ArXiv] [QA].
  • Boosting Search Engines with Interactive Agents - [ArXiv] [QA].
  • Boosting Search Engines with Interactive Agents - [ArXiv] [QA].

August 2021

  • Subjective Learning for Open-Ended Data - [ArXiv] [QA].
  • Subjective Learning for Open-Ended Data - [ArXiv] [QA].
  • Dynamic processes in superconductors and the laws of thermodynamics - [ArXiv] [QA].
  • Anarchic Federated Learning - [ArXiv] [QA].
  • Anarchic Federated Learning - [ArXiv] [QA].
  • On the Opportunities and Risks of Foundation Models - [ArXiv] [QA].
  • MMChat: Multi-Modal Chat Dataset on Social Media - [ArXiv] [QA].
  • FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning - [ArXiv] [QA].
  • Logit Attenuating Weight Normalization - [ArXiv] [QA].
  • Logit Attenuating Weight Normalization - [ArXiv] [QA].
  • BIGRoC: Boosting Image Generation via a Robust Classifier - [ArXiv] [QA].
  • BIGRoC: Boosting Image Generation via a Robust Classifier - [ArXiv] [QA].
  • Source-Free Domain Adaptation for Image Segmentation - [ArXiv] [QA].
  • Internal Video Inpainting by Implicit Long-range Propagation - [ArXiv] [QA].
  • Model-Based Opponent Modeling - [ArXiv] [QA].
  • Model-Based Opponent Modeling - [ArXiv] [QA].
  • Offline Decentralized Multi-Agent Reinforcement Learning - [ArXiv] [QA].
  • Offline Decentralized Multi-Agent Reinforcement Learning - [ArXiv] [QA].
  • How to Evaluate Your Dialogue Models: A Review of Approaches - [ArXiv] [QA].
  • Evaluating Deep Graph Neural Networks - [ArXiv] [QA].
  • Evaluating Deep Graph Neural Networks - [ArXiv] [QA].

July 2021

  • Imbalanced Adversarial Training with Reweighting - [ArXiv] [QA].
  • Imbalanced Adversarial Training with Reweighting - [ArXiv] [QA].
  • Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing - [ArXiv] [QA].
  • Unsupervised Learning of Neurosymbolic Encoders - [ArXiv] [QA].
  • Unsupervised Learning of Neurosymbolic Encoders - [ArXiv] [QA].
  • Joint Shapley values: a measure of joint feature importance - [ArXiv] [QA].
  • Joint Shapley values: a measure of joint feature importance - [ArXiv] [QA].
  • Conditional GANs with Auxiliary Discriminative Classifier - [ArXiv] [QA].
  • Guided Generation of Cause and Effect - [ArXiv] [QA].
  • Structured Stochastic Gradient MCMC - [ArXiv] [QA].
  • Structured Stochastic Gradient MCMC - [ArXiv] [QA].
  • FastSHAP: Real-Time Shapley Value Estimation - [ArXiv] [QA].
  • FastSHAP: Real-Time Shapley Value Estimation - [ArXiv] [QA].
  • How Much Can CLIP Benefit Vision-and-Language Tasks? - [ArXiv] [QA].
  • How Much Can CLIP Benefit Vision-and-Language Tasks? - [ArXiv] [QA].
  • Explore and Control with Adversarial Surprise - [ArXiv] [QA].
  • Explore and Control with Adversarial Surprise - [ArXiv] [QA].
  • ViTGAN: Training GANs with Vision Transformers - [ArXiv] [QA].
  • ViTGAN: Training GANs with Vision Transformers - [ArXiv] [QA].
  • Towards Robust Active Feature Acquisition - [ArXiv] [QA].
  • Towards Robust Active Feature Acquisition - [ArXiv] [QA].
  • Evaluating Large Language Models Trained on Code - [ArXiv] [QA].
  • Understanding Intrinsic Robustness Using Label Uncertainty - [ArXiv] [QA].
  • Neural Contextual Bandits without Regret - [ArXiv] [QA].
  • Neural Contextual Bandits without Regret - [ArXiv] [QA].
  • Structured Denoising Diffusion Models in Discrete State-Spaces - [ArXiv] [QA].
  • Depth-supervised NeRF: Fewer Views and Faster Training for Free - [ArXiv] [QA].
  • Rethinking Positional Encoding - [ArXiv] [QA].
  • Rethinking Positional Encoding - [ArXiv] [QA].
  • When and How to Fool Explainable Models (and Humans) with Adversarial Examples - [ArXiv] [QA].
  • Scale Mixtures of Neural Network Gaussian Processes - [ArXiv] [QA].
  • Scale Mixtures of Neural Network Gaussian Processes - [ArXiv] [QA].
  • On the Practicality of Deterministic Epistemic Uncertainty - [ArXiv] [QA].
  • On the Practicality of Deterministic Epistemic Uncertainty - [ArXiv] [QA].
  • Exact verification of the strong BSD conjecture for some absolutely simple abelian surfaces - [ArXiv] [QA].

June 2021

  • Automatically Select Emotion for Response via Personality-affected Emotion Transition - [ArXiv] [QA].
  • Local Reweighting for Adversarial Training - [ArXiv] [QA].
  • Local Reweighting for Adversarial Training - [ArXiv] [QA].
  • Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation - [ArXiv] [QA].
  • Multimodal Few-Shot Learning with Frozen Language Models - [ArXiv] [QA].
  • Animatable Neural Radiance Fields from Monocular RGB Videos - [ArXiv] [QA].
  • DCoM: A Deep Column Mapper for Semantic Data Type Detection - [ArXiv] [QA].
  • DCoM: A Deep Column Mapper for Semantic Data Type Detection - [ArXiv] [QA].
  • IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers - [ArXiv] [QA].
  • Learning Multimodal VAEs through Mutual Supervision - [ArXiv] [QA].
  • Sampling with Mirrored Stein Operators - [ArXiv] [QA].
  • Sampling with Mirrored Stein Operators - [ArXiv] [QA].
  • Adapting Off-the-Shelf Source Segmenter for Target Medical Image Segmentation - [ArXiv] [QA].
  • CharacterChat: Supporting the Creation of Fictional Characters through Conversation and Progressive Manifestation with a Chatbot - [ArXiv] [QA].
  • Secure Domain Adaptation with Multiple Sources - [ArXiv] [QA].
  • Secure Domain Adaptation with Multiple Sources - [ArXiv] [QA].
  • Volume Rendering of Neural Implicit Surfaces - [ArXiv] [QA].
  • Policy Smoothing for Provably Robust Reinforcement Learning - [ArXiv] [QA].
  • Boundary Graph Neural Networks for 3D Simulations - [ArXiv] [QA].
  • Boundary Graph Neural Networks for 3D Simulations - [ArXiv] [QA].
  • Analytically Tractable Bayesian Deep Q-Learning - [ArXiv] [QA].
  • Analytically Tractable Bayesian Deep Q-Learning - [ArXiv] [QA].
  • NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction - [ArXiv] [QA].
  • Shuffle Private Stochastic Convex Optimization - [ArXiv] [QA].
  • Shuffle Private Stochastic Convex Optimization - [ArXiv] [QA].
  • On Invariance Penalties for Risk Minimization - [ArXiv] [QA].
  • On Invariance Penalties for Risk Minimization - [ArXiv] [QA].
  • Visual Correspondence Hallucination - [ArXiv] [QA].
  • Visual Correspondence Hallucination - [ArXiv] [QA].
  • Poisoning and Backdooring Contrastive Learning - [ArXiv] [QA].
  • Poisoning and Backdooring Contrastive Learning - [ArXiv] [QA].
  • Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation - [ArXiv] [QA].
  • Unsupervised Enrichment of Persona-grounded Dialog with Background Stories - [ArXiv] [QA].
  • Query Embedding on Hyper-relational Knowledge Graphs - [ArXiv] [QA].
  • Query Embedding on Hyper-relational Knowledge Graphs - [ArXiv] [QA].
  • Constraining Linear-chain CRFs to Regular Languages - [ArXiv] [QA].
  • Constraining Linear-chain CRFs to Regular Languages - [ArXiv] [QA].
  • Pre-Trained Models: Past, Present and Future - [ArXiv] [QA].
  • Inverting Adversarially Robust Networks for Image Synthesis - [ArXiv] [QA].
  • Prompting Contrastive Explanations for Commonsense Reasoning Tasks - [ArXiv] [QA].
  • Learning to Pool in Graph Neural Networks for Extrapolation - [ArXiv] [QA].
  • Is Homophily a Necessity for Graph Neural Networks? - [ArXiv] [QA].
  • Is Homophily a Necessity for Graph Neural Networks? - [ArXiv] [QA].
  • Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation - [ArXiv] [QA].
  • Fair Normalizing Flows - [ArXiv] [QA].
  • Fair Normalizing Flows - [ArXiv] [QA].
  • A Neural Tangent Kernel Perspective of GANs - [ArXiv] [QA].
  • A Neural Tangent Kernel Perspective of GANs - [ArXiv] [QA].
  • Do Transformers Really Perform Bad for Graph Representation? - [ArXiv] [QA].
  • DIGRAC: Digraph Clustering Based on Flow Imbalance - [ArXiv] [QA].
  • DIGRAC: Digraph Clustering Based on Flow Imbalance - [ArXiv] [QA].
  • It Takes Two to Tango: Mixup for Deep Metric Learning - [ArXiv] [QA].
  • Mean-Shifted Contrastive Loss for Anomaly Detection - [ArXiv] [QA].
  • Mean-Shifted Contrastive Loss for Anomaly Detection - [ArXiv] [QA].
  • RegMix: Data Mixing Augmentation for Regression - [ArXiv] [QA].
  • RegMix: Data Mixing Augmentation for Regression - [ArXiv] [QA].
  • Model Zoo: A Growing "Brain" That Learns Continually - [ArXiv] [QA].
  • Model Zoo: A Growing "Brain" That Learns Continually - [ArXiv] [QA].
  • Context-Aware Sparse Deep Coordination Graphs - [ArXiv] [QA].
  • Context-Aware Sparse Deep Coordination Graphs - [ArXiv] [QA].
  • Learning Curves for SGD on Structured Features - [ArXiv] [QA].
  • Learning Curves for SGD on Structured Features - [ArXiv] [QA].
  • Meta-Learning with Fewer Tasks through Task Interpolation - [ArXiv] [QA].
  • Meta-Learning with Fewer Tasks through Task Interpolation - [ArXiv] [QA].
  • Churn Reduction via Distillation - [ArXiv] [QA].
  • Churn Reduction via Distillation - [ArXiv] [QA].
  • Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances - [ArXiv] [QA].
  • Convergent Graph Solvers - [ArXiv] [QA].
  • Steerable 3D Spherical Neurons - [ArXiv] [QA].
  • Steerable 3D Spherical Neurons - [ArXiv] [QA].
  • Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize - [ArXiv] [QA].
  • Evidential Turing Processes - [ArXiv] [QA].
  • Evidential Turing Processes - [ArXiv] [QA].
  • Towards Emotional Support Dialog Systems - [ArXiv] [QA].
  • Transition-Based Constrained DFT for the Robust and Reliable Treatment of Excitations in Supramolecular Systems - [ArXiv] [QA].
  • Multiresolution Equivariant Graph Variational Autoencoder - [ArXiv] [QA].
  • Multiresolution Equivariant Graph Variational Autoencoder - [ArXiv] [QA].
  • RevCore: Review-augmented Conversational Recommendation - [ArXiv] [QA].
  • DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues - [ArXiv] [QA].
  • DYPLOC: Dynamic Planning of Content Using Mixed Language Models for Text Generation - [ArXiv] [QA].
  • Towards Quantifiable Dialogue Coherence Evaluation - [ArXiv] [QA].
  • Concurrent Adversarial Learning for Large-Batch Training - [ArXiv] [QA].
  • Concurrent Adversarial Learning for Large-Batch Training - [ArXiv] [QA].
  • Rethinking Pseudo Labels for Semi-Supervised Object Detection - [ArXiv] [QA].

May 2021

  • Efficient and Modular Implicit Differentiation - [ArXiv] [QA].
  • Efficient and Modular Implicit Differentiation - [ArXiv] [QA].
  • How Attentive are Graph Attention Networks? - [ArXiv] [QA].
  • How Attentive are Graph Attention Networks? - [ArXiv] [QA].
  • An Attention Free Transformer - [ArXiv] [QA].
  • An Attention Free Transformer - [ArXiv] [QA].
  • Gotta Go Fast When Generating Data with Score-Based Models - [ArXiv] [QA].
  • OTTers: One-turn Topic Transitions for Open-Domain Dialogue - [ArXiv] [QA].
  • Data Augmentation for Text Generation Without Any Augmented Data - [ArXiv] [QA].
  • Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning - [ArXiv] [QA].
  • KECRS: Towards Knowledge-Enriched Conversational Recommendation System - [ArXiv] [QA].
  • RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling - [ArXiv] [QA].
  • HyKnow: End-to-End Task-Oriented Dialog Modeling with Hybrid Knowledge Management - [ArXiv] [QA].
  • The DEVIL is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting - [ArXiv] [QA].
  • EL-Attention: Memory Efficient Lossless Attention for Generation - [ArXiv] [QA].
  • Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey - [ArXiv] [QA].
  • Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems - [ArXiv] [QA].
  • A Survey of Data Augmentation Approaches for NLP - [ArXiv] [QA].
  • PD-GAN: Probabilistic Diverse GAN for Image Inpainting - [ArXiv] [QA].
  • Unsteady and inertial dynamics of an active particle in a fluid - [ArXiv] [QA].

April 2021

  • If your data distribution shifts, use self-learning - [ArXiv] [QA].
  • If your data distribution shifts, use self-learning - [ArXiv] [QA].
  • PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation - [ArXiv] [QA].
  • UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction - [ArXiv] [QA].
  • Gradient Matching for Domain Generalization - [ArXiv] [QA].
  • Gradient Matching for Domain Generalization - [ArXiv] [QA].
  • Image Inpainting with External-internal Learning and Monochromic Bottleneck - [ArXiv] [QA].
  • Explaining Answers with Entailment Trees - [ArXiv] [QA].
  • $Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering - [ArXiv] [QA].
  • Sparse Attention with Linear Units - [ArXiv] [QA].
  • Sparse Attention with Linear Units - [ArXiv] [QA].
  • Progressive Temporal Feature Alignment Network for Video Inpainting - [ArXiv] [QA].
  • Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - [ArXiv] [QA].
  • NeRF-VAE: A Geometry Aware 3D Scene Generative Model - [ArXiv] [QA].
  • Improved Image Generation via Sparse Modeling - [ArXiv] [QA].
  • Improved Image Generation via Sparse Modeling - [ArXiv] [QA].
  • Domain Invariant Adversarial Learning - [ArXiv] [QA].
  • Domain Invariant Adversarial Learning - [ArXiv] [QA].

March 2021

  • CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields - [ArXiv] [QA].
  • Contrastive Embedding for Generalized Zero-Shot Learning - [ArXiv] [QA].
  • TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations - [ArXiv] [QA].
  • Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers - [ArXiv] [QA].
  • GNeRF: GAN-based Neural Radiance Field without Posed Camera - [ArXiv] [QA].
  • Efficient Explanations from Empirical Explainers - [ArXiv] [QA].
  • KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs - [ArXiv] [QA].
  • DNN Quantization with Attention - [ArXiv] [QA].
  • DNN Quantization with Attention - [ArXiv] [QA].
  • Concentric Spherical GNN for 3D Representation Learning - [ArXiv] [QA].
  • Concentric Spherical GNN for 3D Representation Learning - [ArXiv] [QA].
  • FastNeRF: High-Fidelity Neural Rendering at 200FPS - [ArXiv] [QA].
  • GLM: General Language Model Pretraining with Autoregressive Blank Infilling - [ArXiv] [QA].
  • Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE - [ArXiv] [QA].
  • ENCONTER: Entity Constrained Progressive Sequence Generation via Insertion-based Transformer - [ArXiv] [QA].
  • Online Adversarial Attacks - [ArXiv] [QA].
  • Online Adversarial Attacks - [ArXiv] [QA].
  • Mixture of Volumetric Primitives for Efficient Neural Rendering - [ArXiv] [QA].

February 2021

  • Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing - [ArXiv] [QA].
  • Deep ReLU Networks Preserve Expected Length - [ArXiv] [QA].
  • Deep ReLU Networks Preserve Expected Length - [ArXiv] [QA].
  • Meta-Learning Dynamics Forecasting Using Task Inference - [ArXiv] [QA].
  • Meta-Learning Dynamics Forecasting Using Task Inference - [ArXiv] [QA].
  • ShaRF: Shape-conditioned Radiance Fields from a Single View - [ArXiv] [QA].
  • DEUP: Direct Epistemic Uncertainty Prediction - [ArXiv] [QA].
  • DEUP: Direct Epistemic Uncertainty Prediction - [ArXiv] [QA].
  • Topological Graph Neural Networks - [ArXiv] [QA].
  • Topological Graph Neural Networks - [ArXiv] [QA].
  • Contrastive Embeddings for Neural Architectures - [ArXiv] [QA].
  • Contrastive Embeddings for Neural Architectures - [ArXiv] [QA].
  • Hyperspherical embedding for novel class classification - [ArXiv] [QA].
  • Hyperspherical embedding for novel class classification - [ArXiv] [QA].
  • Learning Graph Embeddings for Compositional Zero-shot Learning - [ArXiv] [QA].

January 2021

  • RESPER: Computationally Modelling Resisting Strategies in Persuasive Conversations - [ArXiv] [QA].
  • Advances and Challenges in Conversational Recommender Systems: A Survey - [ArXiv] [QA].
  • Evaluating Disentanglement of Structured Representations - [ArXiv] [QA].
  • Evaluating Disentanglement of Structured Representations - [ArXiv] [QA].
  • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity - [ArXiv] [QA].
  • Max-Affine Spline Insights Into Deep Network Pruning - [ArXiv] [QA].
  • Max-Affine Spline Insights Into Deep Network Pruning - [ArXiv] [QA].
2020

December 2020

  • Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation - [ArXiv] [QA].
  • Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human Demonstration - [ArXiv] [QA].
  • ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language - [ArXiv] [QA].
  • A Distributional Approach to Controlled Text Generation - [ArXiv] [QA].
  • Transformer Interpretability Beyond Attention Visualization - [ArXiv] [QA].
  • Neural Volume Rendering: NeRF And Beyond - [ArXiv] [QA].
  • Keyword-Guided Neural Conversational Model - [ArXiv] [QA].
  • CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts - [ArXiv] [QA].
  • Image Inpainting Guided by Coherence Priors of Semantics and Textures - [ArXiv] [QA].
  • Contrastive Learning with Adversarial Perturbations for Conditional Text Generation - [ArXiv] [QA].
  • Active Learning: Problem Settings and Recent Developments - [ArXiv] [QA].
  • Challenging common interpretability assumptions in feature attribution explanations - [ArXiv] [QA].
  • Practical No-box Adversarial Attacks against DNNs - [ArXiv] [QA].
  • Practical No-box Adversarial Attacks against DNNs - [ArXiv] [QA].
  • pixelNeRF: Neural Radiance Fields from One or Few Images - [ArXiv] [QA].
  • Learned Initializations for Optimizing Coordinate-Based Neural Representations - [ArXiv] [QA].
  • Neural Prototype Trees for Interpretable Fine-grained Image Recognition - [ArXiv] [QA].
  • CPM: A Large-scale Generative Chinese Pre-trained Language Model - [ArXiv] [QA].

November 2020

  • DeRF: Decomposed Radiance Fields - [ArXiv] [QA].
  • GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields - [ArXiv] [QA].
  • Contextual Fusion For Adversarial Robustness - [ArXiv] [QA].
  • Contextual Fusion For Adversarial Robustness - [ArXiv] [QA].

October 2020

  • Learning to Actively Learn: A Robust Approach - [ArXiv] [QA].
  • Learning to Actively Learn: A Robust Approach - [ArXiv] [QA].
  • How Does the Task Landscape Affect MAML Performance? - [ArXiv] [QA].
  • How Does the Task Landscape Affect MAML Performance? - [ArXiv] [QA].
  • Interpretation of NLP models through input marginalization - [ArXiv] [QA].
  • Towards falsifiable interpretability research - [ArXiv] [QA].
  • CR-Walker: Tree-Structured Graph Reasoning and Dialog Acts for Conversational Recommendation - [ArXiv] [QA].
  • Improving Dialog Systems for Negotiation with Personality Modeling - [ArXiv] [QA].
  • NeRF++: Analyzing and Improving Neural Radiance Fields - [ArXiv] [QA].
  • Fairness-aware Agnostic Federated Learning - [ArXiv] [QA].
  • Fairness-aware Agnostic Federated Learning - [ArXiv] [QA].
  • GRF: Learning a General Radiance Field for 3D Representation and Rendering - [ArXiv] [QA].
  • Like hiking? You probably enjoy nature: Persona-grounded Dialog with Commonsense Expansions - [ArXiv] [QA].
  • MIME: MIMicking Emotions for Empathetic Response Generation - [ArXiv] [QA].

September 2020

  • Learning to Plan and Realize Separately for Open-Ended Dialogue Systems - [ArXiv] [QA].
  • From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation - [ArXiv] [QA].
  • Understanding the Role of Individual Units in a Deep Neural Network - [ArXiv] [QA].
  • Sample-Efficient Automated Deep Reinforcement Learning - [ArXiv] [QA].
  • Sample-Efficient Automated Deep Reinforcement Learning - [ArXiv] [QA].
  • Learning to summarize from human feedback - [ArXiv] [QA].

August 2020

  • A Survey of Deep Active Learning - [ArXiv] [QA].
  • A Survey of Evaluation Metrics Used for NLG Systems - [ArXiv] [QA].
  • A Survey of Active Learning for Text Classification using Deep Neural Networks - [ArXiv] [QA].
  • Context-aware Feature Generation for Zero-shot Semantic Segmentation - [ArXiv] [QA].
  • Adaptive Learning of Tensor Network Structures - [ArXiv] [QA].
  • Adaptive Learning of Tensor Network Structures - [ArXiv] [QA].
  • A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning - [ArXiv] [QA].
  • Explainable Face Recognition - [ArXiv] [QA].

July 2020

  • Learning Joint Spatial-Temporal Transformations for Video Inpainting - [ArXiv] [QA].
  • Mixture Representation Learning with Coupled Autoencoders - [ArXiv] [QA].
  • Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning - [ArXiv] [QA].
  • Towards Deeper Graph Neural Networks - [ArXiv] [QA].
  • Towards Deeper Graph Neural Networks - [ArXiv] [QA].
  • DVI: Depth Guided Video Inpainting for Autonomous Driving - [ArXiv] [QA].
  • Few-shot Scene-adaptive Anomaly Detection - [ArXiv] [QA].
  • Few-shot Scene-adaptive Anomaly Detection - [ArXiv] [QA].
  • Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations - [ArXiv] [QA].
  • GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis - [ArXiv] [QA].
  • The Fyodorov-Hiary-Keating Conjecture. I - [ArXiv] [QA].
  • Interactive Path Reasoning on Graph for Conversational Recommendation - [ArXiv] [QA].

June 2020

  • PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning - [ArXiv] [QA].
  • Generative causal explanations of black-box classifiers - [ArXiv] [QA].
  • Unsupervised Evaluation of Interactive Dialog with DialoGPT - [ArXiv] [QA].
  • Towards Understanding Label Smoothing - [ArXiv] [QA].
  • Towards Understanding Label Smoothing - [ArXiv] [QA].
  • Neural Parameter Allocation Search - [ArXiv] [QA].
  • Neural Parameter Allocation Search - [ArXiv] [QA].
  • Augmented Sliced Wasserstein Distances - [ArXiv] [QA].
  • Augmented Sliced Wasserstein Distances - [ArXiv] [QA].
  • DeeperGCN: All You Need to Train Deeper GCNs - [ArXiv] [QA].
  • DeeperGCN: All You Need to Train Deeper GCNs - [ArXiv] [QA].
  • CoCon: A Self-Supervised Approach for Controlled Text Generation - [ArXiv] [QA].
  • Situated and Interactive Multimodal Conversations - [ArXiv] [QA].

May 2020

  • Language Models are Few-Shot Learners - [ArXiv] [QA].
  • High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling - [ArXiv] [QA].
  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - [ArXiv] [QA].
  • Novel Policy Seeking with Constrained Optimization - [ArXiv] [QA].
  • Novel Policy Seeking with Constrained Optimization - [ArXiv] [QA].
  • Mirror Descent Policy Optimization - [ArXiv] [QA].
  • Mirror Descent Policy Optimization - [ArXiv] [QA].
  • Normalized Attention Without Probability Cage - [ArXiv] [QA].
  • Normalized Attention Without Probability Cage - [ArXiv] [QA].
  • Semantic Photo Manipulation with a Generative Image Prior - [ArXiv] [QA].
  • Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation - [ArXiv] [QA].
  • Learning an Unreferenced Metric for Online Dialogue Evaluation - [ArXiv] [QA].
  • POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training - [ArXiv] [QA].

April 2020

  • Consistent Video Depth Estimation - [ArXiv] [QA].
  • Recipes for building an open-domain chatbot - [ArXiv] [QA].
  • Multi-Domain Dialogue Acts and Response Co-Generation - [ArXiv] [QA].
  • Federated Stochastic Gradient Langevin Dynamics - [ArXiv] [QA].
  • Federated Stochastic Gradient Langevin Dynamics - [ArXiv] [QA].
  • Considering Likelihood in NLP Classification Explanations with Occlusion and Language Modeling - [ArXiv] [QA].
  • Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness - [ArXiv] [QA].
  • TextGAIL: Generative Adversarial Imitation Learning for Text Generation - [ArXiv] [QA].
  • There and Back Again: Revisiting Backpropagation Saliency Methods - [ArXiv] [QA].
  • A Survey on Conversational Recommender Systems - [ArXiv] [QA].

March 2020

  • Distributional Reinforcement Learning with Ensembles - [ArXiv] [QA].
  • Distributional Reinforcement Learning with Ensembles - [ArXiv] [QA].
  • Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification - [ArXiv] [QA].
  • XPersona: Evaluating Multilingual Personalized Chatbot - [ArXiv] [QA].
  • Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes - [ArXiv] [QA].
  • VCNet: A Robust Approach to Blind Image Inpainting - [ArXiv] [QA].
  • Building and Interpreting Deep Similarity Models - [ArXiv] [QA].
  • xCos: An Explainable Cosine Metric for Face Verification Task - [ArXiv] [QA].
  • Benchmarking Graph Neural Networks - [ArXiv] [QA].
  • Benchmarking Graph Neural Networks - [ArXiv] [QA].

February 2020

  • Estimation-Action-Reflection: Towards Deep Interaction Between Conversational and Recommender Systems - [ArXiv] [QA].
  • Gradient Boosting Neural Networks: GrowNet - [ArXiv] [QA].
  • Gradient Boosting Neural Networks: GrowNet - [ArXiv] [QA].
  • Information Condensing Active Learning - [ArXiv] [QA].
  • Information Condensing Active Learning - [ArXiv] [QA].
  • Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation - [ArXiv] [QA].

January 2020

  • Scaling Laws for Neural Language Models - [ArXiv] [QA].
  • ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training - [ArXiv] [QA].
2019

December 2019

  • Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering - [ArXiv] [QA].
  • Image Processing Using Multi-Code GAN Prior - [ArXiv] [QA].

November 2019

  • Binarized Neural Architecture Search - [ArXiv] [QA].
  • Binarized Neural Architecture Search - [ArXiv] [QA].
  • Region Normalization for Image Inpainting - [ArXiv] [QA].
  • Automatic Text-based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings - [ArXiv] [QA].
  • Generating Persona Consistent Dialogues by Exploiting Natural Language Inference - [ArXiv] [QA].
  • A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data - [ArXiv] [QA].

October 2019

  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer - [ArXiv] [QA].
  • Understanding Deep Networks via Extremal Perturbations and Smooth Masks - [ArXiv] [QA].
  • ALOHA: Artificial Learning of Human Attributes for Dialogue Agents - [ArXiv] [QA].
  • A cost-effective method for improving and re-purposing large, pre-trained GANs by fine-tuning their class-embeddings - [ArXiv] [QA].
  • Explaining image classifiers by removing input features using generative models - [ArXiv] [QA].
  • Continual Learning in Neural Networks - [ArXiv] [QA].
  • Continual Learning in Neural Networks - [ArXiv] [QA].
  • ZeRO: Memory Optimizations Toward Training Trillion Parameter Models - [ArXiv] [QA].

September 2019

  • Visual Explanation for Deep Metric Learning - [ArXiv] [QA].
  • Improving Generative Visual Dialog by Answering Diverse Questions - [ArXiv] [QA].
  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism - [ArXiv] [QA].
  • An Internal Learning Approach to Video Inpainting - [ArXiv] [QA].
  • Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset - [ArXiv] [QA].
  • CTRL: A Conditional Transformer Language Model for Controllable Generation - [ArXiv] [QA].
  • ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons - [ArXiv] [QA].
  • Image Inpainting with Learnable Bidirectional Attention Maps - [ArXiv] [QA].
  • Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue - [ArXiv] [QA].

August 2019

  • Copy-and-Paste Networks for Deep Video Inpainting - [ArXiv] [QA].
  • Onion-Peel Networks for Deep Video Completion - [ArXiv] [QA].
  • Efficient Deep Neural Networks - [ArXiv] [QA].
  • Efficient Deep Neural Networks - [ArXiv] [QA].
  • StructureFlow: Image Inpainting via Structure-aware Appearance Flow - [ArXiv] [QA].
  • Generative Image Inpainting with Submanifold Alignment - [ArXiv] [QA].

July 2019

  • Benchmarking Attribution Methods with Relative Feature Importance - [ArXiv] [QA].
  • Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning - [ArXiv] [QA].
  • Generative Counterfactual Introspection for Explainable Deep Learning - [ArXiv] [QA].
  • Learnable Gated Temporal Shift Module for Deep Video Inpainting - [ArXiv] [QA].

June 2019

  • Improving performance of deep learning models with axiomatic attribution priors and expected gradients - [ArXiv] [QA].
  • Factorized Mutual Information Maximization - [ArXiv] [QA].
  • XRAI: Better Attributions Through Regions - [ArXiv] [QA].
  • Image Synthesis with a Single (Robust) Classifier - [ArXiv] [QA].
  • Zero-Shot Semantic Segmentation - [ArXiv] [QA].
  • Rethinking Loss Design for Large-scale 3D Shape Retrieval - [ArXiv] [QA].

May 2019

  • Align-and-Attend Network for Globally and Locally Coherent Video Inpainting - [ArXiv] [QA].
  • Why do These Match? Explaining the Behavior of Image Similarity Models - [ArXiv] [QA].
  • PEPSI++: Fast and Lightweight Network for Image Inpainting - [ArXiv] [QA].
  • Deep Flow-Guided Video Inpainting - [ArXiv] [QA].
  • Frame-Recurrent Video Inpainting by Robust Optical Flow Inference - [ArXiv] [QA].
  • Deep Video Inpainting - [ArXiv] [QA].

April 2019

  • Free-form Video Inpainting with 3D Gated Convolution and Temporal PatchGAN - [ArXiv] [QA].
  • Deep Fusion Network for Image Completion - [ArXiv] [QA].
  • Semantically Aligned Bias Reducing Zero Shot Learning - [ArXiv] [QA].
  • Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting - [ArXiv] [QA].
  • VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal - [ArXiv] [QA].
  • On zero-shot recognition of generic objects - [ArXiv] [QA].
  • Leveraging the Invariant Side of Generative Zero-Shot Learning - [ArXiv] [QA].
  • Creativity Inspired Zero-Shot Learning - [ArXiv] [QA].

March 2019

  • Pluralistic Image Completion - [ArXiv] [QA].
  • Deep Reinforcement Learning of Volume-guided Progressive View Inpainting for 3D Point Scene Completion from a Single Depth Image - [ArXiv] [QA].
  • CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog - [ArXiv] [QA].
  • Stabilizing the Lottery Ticket Hypothesis - [ArXiv] [QA].
  • Stabilizing the Lottery Ticket Hypothesis - [ArXiv] [QA].
  • Semantic-Guided Multi-Attention Localization for Zero-Shot Learning - [ArXiv] [QA].

February 2019

  • SC-FEGAN: Face Editing Generative Adversarial Network with User's Sketch and Color - [ArXiv] [QA].
  • LS-Tree: Model Interpretation When the Data Are Linguistic - [ArXiv] [QA].
  • Towards Automatic Concept-based Explanations - [ArXiv] [QA].
  • Collaborative Sampling in Generative Adversarial Networks - [ArXiv] [QA].

January 2019

  • Personalized Dialogue Generation with Diversified Traits - [ArXiv] [QA].
  • Diffusion Variational Autoencoders - [ArXiv] [QA].
  • Diffusion Variational Autoencoders - [ArXiv] [QA].
  • Improving Sequence-to-Sequence Learning via Optimal Transport - [ArXiv] [QA].
  • Foreground-aware Image Inpainting - [ArXiv] [QA].
  • Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions - [ArXiv] [QA].
  • Detecting Overfitting of Deep Generative Networks via Latent Recovery - [ArXiv] [QA].
  • Visualizing Deep Similarity Networks - [ArXiv] [QA].
  • EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning - [ArXiv] [QA].
  • A Theoretical Analysis of Deep Q-Learning - [ArXiv] [QA].
  • A Theoretical Analysis of Deep Q-Learning - [ArXiv] [QA].
2018

December 2018

  • Adaptive Confidence Smoothing for Generalized Zero-Shot Learning - [ArXiv] [QA].
  • Face Completion with Semantic Knowledge and Collaborative Adversarial Learning - [ArXiv] [QA].
  • Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders - [ArXiv] [QA].
  • Deep Inception Generative Network for Cognitive Image Inpainting - [ArXiv] [QA].

November 2018

  • Strike (with) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects - [ArXiv] [QA].
  • Coordinate-based Texture Inpainting for Pose-Guided Image Generation - [ArXiv] [QA].
  • GAN Dissection: Visualizing and Understanding Generative Adversarial Networks - [ArXiv] [QA].
  • Generalized Zero-Shot Recognition based on Visually Semantic Embedding - [ArXiv] [QA].
  • Scalable agent alignment via reward modeling: a research direction - [ArXiv] [QA].
  • On Hallucinating Context and Background Pixels from a Face Mask using Multi-scale GANs - [ArXiv] [QA].
  • Reward learning from human preferences and demonstrations in Atari - [ArXiv] [QA].
  • CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling - [ArXiv] [QA].
  • Generative Dual Adversarial Network for Generalized Zero-shot Learning - [ArXiv] [QA].
  • Blockwise Parallel Decoding for Deep Autoregressive Models - [ArXiv] [QA].
  • Image Chat: Engaging Grounded Conversations - [ArXiv] [QA].

October 2018

  • Image Inpainting via Generative Multi-column Convolutional Neural Networks - [ArXiv] [QA].

August 2018

  • AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale - [ArXiv] [QA].
  • Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning - [ArXiv] [QA].

July 2018

  • Talk the Walk: Navigating New York City through Grounded Dialogue - [ArXiv] [QA].

June 2018

  • A Benchmark for Interpretability Methods in Deep Neural Networks - [ArXiv] [QA].
  • This Looks Like That: Deep Learning for Interpretable Image Recognition - [ArXiv] [QA].
  • Video Inpainting by Jointly Learning Temporal Structure and Spatial Details - [ArXiv] [QA].
  • Free-Form Image Inpainting with Gated Convolution - [ArXiv] [QA].
  • A Peek Into the Hidden Layers of a Convolutional Neural Network Through a Factorization Lens - [ArXiv] [QA].

May 2018

  • Rethinking Knowledge Graph Propagation for Zero-Shot Learning - [ArXiv] [QA].
  • Controlling Personality-Based Stylistic Variation with Neural Natural Language Generators - [ArXiv] [QA].
  • Progressive Ensemble Networks for Zero-Shot Recognition - [ArXiv] [QA].
  • Unsupervised Learning of Neural Networks to Explain Neural Networks - [ArXiv] [QA].
  • A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations - [ArXiv] [QA].
  • SPG-Net: Segmentation Prediction and Guidance Network for Image Inpainting - [ArXiv] [QA].

April 2018

  • How convolutional neural network see the world - A survey of convolutional neural network visualization methods - [ArXiv] [QA].
  • FaceShop: Deep Sketch-based Face Image Editing - [ArXiv] [QA].
  • Subgoal Discovery for Hierarchical Dialogue Policy Learning - [ArXiv] [QA].
  • Image Inpainting for Irregular Holes Using Partial Convolutions - [ArXiv] [QA].

March 2018

  • Structural inpainting - [ArXiv] [QA].
  • Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs - [ArXiv] [QA].
  • Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge - [ArXiv] [QA].
  • Preserving Semantic Relations for Zero-Shot Learning - [ArXiv] [QA].

February 2018

  • Machine Theory of Mind - [ArXiv] [QA].
  • Multimodal Explanations: Justifying Decisions and Pointing to the Evidence - [ArXiv] [QA].
  • Singularities in Einstein-conformally coupled Higgs cosmological models - [ArXiv] [QA].
  • Interpreting CNNs via Decision Trees - [ArXiv] [QA].

January 2018

  • Shift-Net: Image Inpainting via Deep Feature Rearrangement - [ArXiv] [QA].
  • Generative Image Inpainting with Contextual Attention - [ArXiv] [QA].
  • Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks - [ArXiv] [QA].
2017

December 2017

  • Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation - [ArXiv] [QA].

November 2017

  • Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) - [ArXiv] [QA].
  • Deep Image Prior - [ArXiv] [QA].
  • Distilling a Neural Network Into a Soft Decision Tree - [ArXiv] [QA].
  • Contextual-based Image Inpainting: Infer, Match, and Translate - [ArXiv] [QA].

October 2017

  • Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks - [ArXiv] [QA].
  • Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation - [ArXiv] [QA].
  • Recent Advances in Zero-shot Recognition - [ArXiv] [QA].

September 2017

  • Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces - [ArXiv] [QA].
  • AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline - [ArXiv] [QA].

August 2017

  • Twin Networks: Matching the Future for Sequence Generation - [ArXiv] [QA].

July 2017

  • Zero-Shot Learning -- A Comprehensive Evaluation of the Good, the Bad and the Ugly - [ArXiv] [QA].

June 2017

  • SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability - [ArXiv] [QA].
  • SmoothGrad: removing noise by adding noise - [ArXiv] [QA].
  • Attention Is All You Need - [ArXiv] [QA].
  • Deep reinforcement learning from human preferences - [ArXiv] [QA].

May 2017

  • Learning how to explain neural networks: PatternNet and PatternAttribution - [ArXiv] [QA].

April 2017

  • Towards Building Large Scale Multimodal Domain-Aware Conversation Systems - [ArXiv] [QA].

January 2017

  • Interactive Learning from Policy-Dependent Human Feedback - [ArXiv] [QA].
2016

November 2016

  • High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis - [ArXiv] [QA].
  • Gaze Embeddings for Zero-Shot Image Classification - [ArXiv] [QA].
  • Visual Dialog - [ArXiv] [QA].
  • Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation - [ArXiv] [QA].
  • Learning a Deep Embedding Model for Zero-Shot Learning - [ArXiv] [QA].

October 2016

  • Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization - [ArXiv] [QA].

July 2016

  • Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classification - [ArXiv] [QA].

June 2016

  • The Mythos of Model Interpretability - [ArXiv] [QA].

May 2016

  • An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild - [ArXiv] [QA].

April 2016

  • Context Encoders: Feature Learning by Inpainting - [ArXiv] [QA].
2015

December 2015

  • Explaining NonLinear Classification Decisions with Deep Taylor Decomposition - [ArXiv] [QA].

June 2015

  • Inverting Visual Representations with Convolutional Networks - [ArXiv] [QA].
  • Visualizing and Understanding Recurrent Networks - [ArXiv] [QA].

March 2015

  • Label-Embedding for Image Classification - [ArXiv] [QA].

January 2015

  • Transductive Multi-view Zero-Shot Learning - [ArXiv] [QA].
2014

December 2014

  • Object Detectors Emerge in Deep Scene CNNs - [ArXiv] [QA].

November 2014

  • Understanding Deep Image Representations by Inverting Them - [ArXiv] [QA].

May 2014

  • Microsoft COCO: Common Objects in Context - [ArXiv] [QA].
2009

September 2009

  • Chaos in Partial Differential Equations - [ArXiv] [QA].

August 2009

  • Sparse Canonical Correlation Analysis - [ArXiv] [QA].
Downloads last month
13