Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-07-24 | ViPer: Visual Personalization of Generative Models via Individual Preference Learning | Sogand Salehi et.al. | 2407.17365v1 | null |
2024-07-24 | Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism | Anhao Zhao et.al. | 2407.17011v1 | null |
2024-07-24 | MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues | Liyun Zhang et.al. | 2407.16552v2 | null |
2024-07-22 | AI for Handball: predicting and explaining the 2024 Olympic Games tournament with Deep Learning and Large Language Models | Florian Felice et.al. | 2407.15987v1 | null |
2024-07-22 | Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability | Zhuoyan Xu et.al. | 2407.15720v1 | link |
2024-07-22 | Dissecting Multiplication in Transformers: Insights into LLMs | Luyu Qiu et.al. | 2407.15360v1 | null |
2024-07-21 | Explaining Decisions of Agents in Mixed-Motive Games | Maayan Orner et.al. | 2407.15255v1 | null |
2024-07-21 | XAI meets LLMs: A Survey of the Relation between Explainable AI and Large Language Models | Erik Cambria et.al. | 2407.15248v1 | null |
2024-07-20 | Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models | Ze Yu Zhang et.al. | 2407.14845v1 | null |
2024-07-21 | Trading Devil Final: Backdoor attack via Stock market and Bayesian Optimization | Orson Mengara et.al. | 2407.14573v1 | null |
2024-07-19 | Evaluating the Reliability of Self-Explanations in Large Language Models | Korbinian Randl et.al. | 2407.14487v1 | link |
2024-07-19 | Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier | Zachary Wojtowicz et.al. | 2407.14452v1 | null |
2024-07-18 | The Software Complexity of Nations | Sándor Juhász et.al. | 2407.13880v1 | null |
2024-07-24 | The Honorific Effect: Exploring the Impact of Japanese Linguistic Formalities on AI-Generated Physics Explanations | Keisuke Sato et.al. | 2407.13787v2 | null |
2024-07-18 | COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization | Skyler Grandel et.al. | 2407.13648v1 | null |
2024-07-18 | SOMONITOR: Explainable Marketing Data Processing and Analysis with Large Language Models | Qi Yang et.al. | 2407.13117v1 | null |
2024-07-17 | Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models | Alexander R. Pelletier et.al. | 2407.12888v1 | null |
2024-07-16 | InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification | Yujia Hu et.al. | 2407.12882v1 | link |
2024-07-03 | Truth is Universal: Robust Detection of Lies in LLMs | Lennart Bürger et.al. | 2407.12831v1 | null |
2024-07-16 | InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains | Yinzhu Quan et.al. | 2407.11384v1 | link |
2024-06-03 | The Life Cycle of Large Language Models: A Review of Biases in Education | Jinsook Lee et.al. | 2407.11203v1 | null |
2024-06-25 | RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems | Robert Friel et.al. | 2407.11005v1 | null |
2024-06-24 | Visualization Literacy of Multimodal Large Language Models: A Comparative Study | Zhimin Li et.al. | 2407.10996v1 | null |
2024-06-23 | Do Large Language Models Understand Verbal Indicators of Romantic Attraction? | Sandra C. Matz et.al. | 2407.10989v1 | null |
2024-07-15 | GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework | Hannah Sansford et.al. | 2407.10793v1 | null |
2024-07-16 | Transforming Agency. On the mode of existence of Large Language Models | Xabier E. Barandiaran et.al. | 2407.10735v2 | null |
2024-07-15 | Learning Dynamics of LLM Finetuning | Yi Ren et.al. | 2407.10490v1 | link |
2024-07-19 | Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine | Omid Rohanian et.al. | 2407.10086v2 | null |
2024-07-13 | Building pre-train LLM Dataset for the INDIC Languages: a case study on Hindi | Shantipriya Parida et.al. | 2407.09855v1 | null |
2024-07-17 | Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models | Dong Shu et.al. | 2407.09292v2 | null |
2024-07-12 | DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection | Sangpil Youm et.al. | 2407.09283v1 | null |
2024-07-11 | Fault Diagnosis in Power Grids with Large Language Model | Liu Jing et.al. | 2407.08836v1 | null |
2024-07-11 | Towards Explainable Evolution Strategies with Large Language Models | Jill Baumann et.al. | 2407.08331v1 | null |
2024-07-10 | Training on the Test Task Confounds Evaluation and Emergence | Ricardo Dominguez-Olmedo et.al. | 2407.07890v1 | link |
2024-07-10 | A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability | Ting Fang Tan et.al. | 2407.07666v1 | null |
2024-07-08 | SimPal: Towards a Meta-Conversational Framework to Understand Teacher's Instructional Goals for K-12 Physics | Effat Farhana et.al. | 2407.06241v1 | null |
2024-07-07 | Experiments with truth using Machine Learning: Spectral analysis and explainable classification of synthetic, false, and genuine information | Vishnu S. Pendyala et.al. | 2407.05464v1 | null |
2024-07-07 | Exploring the Educational Landscape of AI: Large Language Models' Approaches to Explaining Conservation of Momentum in Physics | Keisuke Sato et.al. | 2407.05308v1 | null |
2024-07-04 | From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI | Stefanie Krause et.al. | 2407.03778v1 | null |
2024-07-04 | Improving Self Consistency in LLMs through Probabilistic Tokenization | Ashutosh Sathe et.al. | 2407.03678v1 | null |
2024-07-04 | The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model | Brenden Smith et.al. | 2407.03621v1 | link |
2024-07-03 | LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation | Hongke Zhao et.al. | 2407.02833v1 | null |
2024-07-01 | Engineering Conversational Search Systems: A Review of Applications, Architectures, and Functional Components | Phillip Schneider et.al. | 2407.00997v1 | null |
2024-07-08 | LLM Uncertainty Quantification through Directional Entailment Graph and Claim Level Response Augmentation | Longchao Da et.al. | 2407.00994v2 | null |
2024-07-03 | HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability | Yanfang Chen et.al. | 2407.00668v2 | link |
2024-06-29 | MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation | Jinsheng Huang et.al. | 2407.00468v1 | link |
2024-06-28 | Evaluating Human Alignment and Model Faithfulness of LLM Rationale | Mohsen Fayyaz et.al. | 2407.00219v1 | null |
2024-06-28 | Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach | Sai Krishna Revanth Vuruma et.al. | 2407.00167v1 | null |
2024-06-28 | Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring | Jiazheng Li et.al. | 2406.19949v1 | null |
2024-06-27 | xTower: A Multilingual LLM for Explaining and Correcting Translation Errors | Marcos Treviso et.al. | 2406.19482v1 | null |
2024-06-26 | "Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline | Grace Li et.al. | 2406.18512v1 | null |
2024-06-26 | Mental Modeling of Reinforcement Learning Agents by Language Models | Wenhao Lu et.al. | 2406.18505v1 | null |
2024-06-26 | Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming | Zhenghao Zhou et.al. | 2406.18501v1 | null |
2024-06-25 | From Distributional to Overton Pluralism: Investigating Large Language Model Alignment | Thom Lake et.al. | 2406.17692v1 | link |
2024-06-25 | Banishing LLM Hallucinations Requires Rethinking Generalization | Johnny Li et.al. | 2406.17642v1 | null |
2024-06-23 | Unveiling LLM Mechanisms Through Neural ODEs and Control Theory | Yukun Zhang et.al. | 2406.16985v1 | null |
2024-06-24 | Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track | Ronak Pradeep et.al. | 2406.16828v1 | link |
2024-06-24 | Large Language Models Are Cross-Lingual Knowledge-Free Reasoners | Peng Hu et.al. | 2406.16655v1 | link |
2024-06-24 | UNO Arena for Evaluating Sequential Decision-Making Capability of Large Language Models | Zhanyue Qin et.al. | 2406.16382v1 | null |
2024-06-23 | Preference Tuning For Toxicity Mitigation Generalizes Across Languages | Xiaochen Li et.al. | 2406.16235v1 | link |
2024-06-23 | Effectiveness of ChatGPT in explaining complex medical reports to patients | Mengxuan Sun et.al. | 2406.15963v1 | null |
2024-06-30 | LLM-Powered Explanations: Unraveling Recommendations Through Subgraph Reasoning | Guangsi Shi et.al. | 2406.15859v2 | null |
2024-06-21 | Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network | Badr AlKhamissi et.al. | 2406.15109v1 | link |
2024-06-21 | Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction | Jinge Wu et.al. | 2406.15045v1 | null |
2024-06-20 | Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task? | Zhiqiang Pi et.al. | 2406.14737v1 | null |
2024-06-20 | Self-supervised Interpretable Concept-based Models for Text Classification | Francesco De Santis et.al. | 2406.14335v1 | null |
2024-06-20 | Definition generation for lexical semantic change detection | Mariia Fedorova et.al. | 2406.14167v1 | link |
2024-06-22 | Enhancing Travel Choice Modeling with Large Language Models: A Prompt-Learning Approach | Xuehao Zhai et.al. | 2406.13558v2 | null |
2024-06-16 | Current state of LLM Risks and AI Guardrails | Suriya Ganesh Ayyamperumal et.al. | 2406.12934v1 | null |
2024-06-19 | Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models | Hengyi Wang et.al. | 2406.12649v2 | null |
2024-06-18 | An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs | Daking Rai et.al. | 2406.12288v1 | link |
2024-06-18 | Unveiling Implicit Table Knowledge with Question-Then-Pinpoint Reasoner for Insightful Table Summarization | Kwangwook Seo et.al. | 2406.12269v1 | null |
2024-06-18 | A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning | Lijie Hu et.al. | 2406.12255v1 | null |
2024-06-29 | Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM | Huaxin Zhang et.al. | 2406.12235v2 | link |
2024-06-28 | WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions | Seyedali Mohammadi et.al. | 2406.12058v3 | null |
2024-05-31 | Generative AI Voting: Fair Collective Choice is Resilient to LLM Biases and Inconsistencies | Srijoni Majumdar et.al. | 2406.11871v1 | null |
2024-06-17 | CELL your Model: Contrastive Explanation Methods for Large Language Models | Ronny Luss et.al. | 2406.11785v1 | null |
2024-06-17 | GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations | Rick Wilming et.al. | 2406.11547v1 | link |
2024-06-17 | A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences | Leonardo Bertolazzi et.al. | 2406.11341v1 | null |
2024-06-17 | TIFG: Text-Informed Feature Generation with Large Language Models | Xinhao Zhang et.al. | 2406.11177v1 | null |
2024-06-16 | LLMFactor: Extracting Profitable Factors through Prompts for Explainable Stock Movement Prediction | Meiyun Wang et.al. | 2406.10811v1 | null |
2024-06-15 | A Comprehensive Survey of Foundation Models in Medicine | Wasif Khan et.al. | 2406.10729v1 | null |
2024-06-15 | Multilingual Large Language Models and Curse of Multilinguality | Daniil Gurgurov et.al. | 2406.10602v1 | null |
2024-06-14 | Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models | Qiheng Mao et.al. | 2406.09701v1 | null |
2024-06-13 | Automated Molecular Concept Generation and Labeling with Large Language Models | Shichang Zhang et.al. | 2406.09612v1 | null |
2024-06-12 | LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions | Nhat Hoang-Xuan et.al. | 2406.08572v1 | null |
2024-06-13 | CoXQL: A Dataset for Parsing Explanation Requests in Conversational XAI Systems | Qianli Wang et.al. | 2406.08101v2 | link |
2024-06-12 | A Concept-Based Explainability Framework for Large Multimodal Models | Jayneel Parekh et.al. | 2406.08074v1 | null |
2024-06-13 | LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing | Hongxiang Zhang et.al. | 2406.07714v2 | null |
2024-06-15 | What's in an embedding? Would a rose by any embedding smell as sweet? | Venkat Venkatasubramanian et.al. | 2406.06870v3 | null |
2024-06-10 | Evaluating Zero-Shot Long-Context LLM Compression | Chenyu Wang et.al. | 2406.06773v1 | null |
2024-06-09 | Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering | Saman Pordanesh et.al. | 2406.06637v1 | null |
2024-06-06 | Reinterpreting 'the Company a Word Keeps': Towards Explainable and Ontologically Grounded Language Models | Walid S. Saba et.al. | 2406.06610v1 | null |
2024-06-06 | Are Large Language Models the New Interface for Data Pipelines? | Sylvio Barbon Junior et.al. | 2406.06596v1 | null |
2024-06-13 | From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models | Xiaofeng Zhang et.al. | 2406.06579v2 | null |
2024-06-10 | Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course | Aadarsh Padiyath et.al. | 2406.06451v1 | null |
2024-07-05 | Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue | Simone Alghisi et.al. | 2406.06399v2 | null |
2024-07-03 | MedExQA: Medical Question Answering Benchmark with Multiple Explanations | Yunsoo Kim et.al. | 2406.06331v2 | link |
2024-06-10 | Safety Alignment Should Be Made More Than Just a Few Tokens Deep | Xiangyu Qi et.al. | 2406.05946v1 | link |
2024-06-13 | How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States | Zhenhong Zhou et.al. | 2406.05644v2 | link |
2024-06-08 | Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification | Yunhe Gao et.al. | 2406.05596v1 | null |
2024-06-07 | Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models | Michał Romaszewski et.al. | 2406.04926v1 | null |
2024-06-07 | Think out Loud: Emotion Deducing Explanation in Dialogues | Jiangnan Li et.al. | 2406.04758v1 | null |
2024-06-07 | Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions | Jingtan Wang et.al. | 2406.04606v1 | link |
2024-06-08 | What Do Language Models Learn in Context? The Structured Task Hypothesis | Jiaoda Li et.al. | 2406.04216v2 | link |
2024-06-06 | Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective | Xinhao Yao et.al. | 2406.03768v1 | link |
2024-06-04 | Dynamic and Adaptive Feature Generation with LLM | Xinhao Zhang et.al. | 2406.03505v1 | null |
2024-06-05 | AD-H: Autonomous Driving with Hierarchical Agents | Zaibin Zhang et.al. | 2406.03474v1 | null |
2024-06-06 | Large Language Models as Evaluators for Recommendation Explanations | Xiaoyu Zhang et.al. | 2406.03248v2 | link |
2024-06-05 | Missci: Reconstructing Fallacies in Misrepresented Science | Max Glockner et.al. | 2406.03181v1 | link |
2024-06-04 | XRec: Large Language Models for Explainable Recommendation | Qiyao Ma et.al. | 2406.02377v1 | link |
2024-06-04 | I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering | Valeriya Goloviznina et.al. | 2406.02060v1 | null |
2024-06-20 | What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores | Ebrahim Feghhi et.al. | 2406.01538v2 | link |
2024-06-04 | Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study | Martin J. Hetz et.al. | 2406.01428v2 | null |
2024-06-03 | TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine | Wenjing Yue et.al. | 2406.01126v1 | null |
2024-06-03 | Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution | Shicheng Xu et.al. | 2406.00944v1 | null |
2024-06-01 | Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners | Zhi Zheng et.al. | 2406.00430v1 | null |
2024-05-31 | How In-Context Learning Emerges from Training on Unstructured Data: On the Role of Co-Occurrence, Positional Information, and Noise Structures | Kevin Christian Wibisono et.al. | 2406.00131v1 | link |
2024-05-27 | How Ready Are Generative Pre-trained Large Language Models for Explaining Bengali Grammatical Errors? | Subhankar Maity et.al. | 2406.00039v1 | null |
2024-05-24 | Large Language Model Pruning | Hanjuan Huang et.al. | 2406.00030v1 | null |
2024-06-05 | SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | Tianyang Xu et.al. | 2405.20974v2 | link |
2024-06-03 | Large Language Models are Zero-Shot Next Location Predictors | Ciro Beneduce et.al. | 2405.20962v2 | link |
2024-05-31 | FineRadScore: A Radiology Report Line-by-Line Evaluation Technique Generating Corrections with Severity Scores | Alyssa Huang et.al. | 2405.20613v1 | link |
2024-05-30 | XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution | Yurui Chang et.al. | 2405.20404v1 | null |
2024-05-29 | Quo Vadis ChatGPT? From Large Language Models to Large Knowledge Models | Venkat Venkatasubramanian et.al. | 2405.19561v1 | null |
2024-05-29 | Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners | Jiachun Li et.al. | 2405.18915v1 | null |
2024-06-11 | Faithful Logical Reasoning via Symbolic Chain-of-Thought | Jundong Xu et.al. | 2405.18357v2 | link |
2024-05-28 | Active Use of Latent Constituency Representation in both Humans and Large Language Models | Wei Liu et.al. | 2405.18241v1 | link |
2024-05-28 | Exploring Activation Patterns of Parameters in Language Models | Yudong Wang et.al. | 2405.17799v1 | null |
2024-05-28 | Facilitating Holistic Evaluations with LLMs: Insights from Scenario-Based Experiments | Toru Ishida et.al. | 2405.17728v1 | null |
2024-05-27 | PAE: LLM-based Product Attribute Extraction for E-Commerce Fashion Trends | Apurva Sinha et.al. | 2405.17533v1 | null |
2024-07-02 | TEII: Think, Explain, Interact and Iterate with Large Language Models to Solve Cross-lingual Emotion Detection | Long Cheng et.al. | 2405.17129v2 | link |
2024-05-27 | The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective | Nils Philipp Walter et.al. | 2405.16918v1 | null |
2024-05-25 | Large Language Models Enable Automated Formative Feedback in Human-Robot Interaction Tasks | Emily Jensen et.al. | 2405.16344v1 | null |
2024-06-20 | Finetuning Large Language Model for Personalized Ranking | Zhuoxi Bai et.al. | 2405.16127v2 | link |
2024-05-24 | Transformers represent belief state geometry in their residual stream | Adam S. Shai et.al. | 2405.15943v1 | null |
2024-05-24 | Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment | Hao Sun et.al. | 2405.15624v1 | null |
2024-07-03 | ChatGPT Code Detection: Techniques for Uncovering the Source of Code | Marc Oedingen et.al. | 2405.15512v2 | link |
2024-05-24 | From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks | Jacob Russin et.al. | 2405.15164v1 | null |
2024-05-28 | Explaining Multi-modal Large Language Models by Analyzing their Vision Perception | Loris Giulivi et.al. | 2405.14612v2 | link |
2024-05-23 | Large Language Models for Explainable Decisions in Dynamic Digital Twins | Nan Zhang et.al. | 2405.14411v1 | link |
2024-05-26 | Explainable Few-shot Knowledge Tracing | Haoxuan Li et.al. | 2405.14391v2 | link |
2024-05-23 | Knowledge Localization: Mission Not Accomplished? Enter Query Localization! | Yuheng Chen et.al. | 2405.14117v1 | null |
2024-05-22 | Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation | Cyril Chhun et.al. | 2405.13769v1 | link |
2024-05-22 | Mining Action Rules for Defect Reduction Planning | Khouloud Oueslati et.al. | 2405.13740v1 | null |
2024-05-22 | Navigating User Experience of ChatGPT-based Conversational Recommender Systems: The Effects of Prompt Guidance and Recommendation Domain | Yizhe Zhang et.al. | 2405.13560v1 | null |
2024-05-22 | HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model | Mustafa Yildirim et.al. | 2405.13547v1 | null |
2024-05-21 | Investigating Symbolic Capabilities of Large Language Models | Neisarg Dave et.al. | 2405.13209v1 | null |
2024-05-11 | RAGE Against the Machine: Retrieval-Augmented LLM Explanations | Joel Rorseth et.al. | 2405.13000v1 | null |
2024-05-20 | Directed Metric Structures arising in Large Language Models | Stéphane Gaubert et.al. | 2405.12264v1 | null |
2024-05-19 | Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications | Subhankar Maity et.al. | 2405.11579v1 | null |
2024-05-17 | SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks | Michael Shliselberg et.al. | 2405.10700v1 | null |
2024-05-15 | LoRA Learns Less and Forgets Less | Dan Biderman et.al. | 2405.09673v1 | null |
2024-05-15 | Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models | Majid Zarharan et.al. | 2405.09454v1 | link |
2024-05-14 | Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure | Odysseas S. Chlapanis et.al. | 2405.08502v1 | link |
2024-05-14 | Challenges and Opportunities in Text Generation Explainability | Kenza Amara et.al. | 2405.08468v1 | null |
2024-05-14 | Understanding the performance gap between online and offline alignment algorithms | Yunhao Tang et.al. | 2405.08448v1 | null |
2024-05-12 | ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis | Mohammad Amaz Uddin et.al. | 2405.08026v1 | null |
2024-05-13 | Can Language Models Explain Their Own Classification Behavior? | Dane Sherburn et.al. | 2405.07436v1 | link |
2024-05-10 | LLM-Generated Black-box Explanations Can Be Adversarially Helpful | Rohan Ajwani et.al. | 2405.06800v1 | null |
2024-05-15 | Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling | Subhendu Khatuya et.al. | 2405.06671v2 | link |
2024-06-03 | XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare | Fatemeh Nazary et.al. | 2405.06270v3 | null |
2024-05-09 | Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding? | Yutong Hu et.al. | 2405.06105v1 | null |
2024-05-09 | LLMs for XAI: Future Directions for Explaining Explanations | Alexandra Zytek et.al. | 2405.06064v1 | null |
2024-05-09 | Investigating Interaction Modes and User Agency in Human-LLM Collaboration for Domain-Specific Data Analysis | Jiajing Guo et.al. | 2405.05548v1 | null |
2024-05-08 | The Effect of Model Size on LLM Post-hoc Explainability via LIME | Henning Heyen et.al. | 2405.05348v1 | link |
2024-05-09 | LLMs with Personalities in Multi-issue Negotiation Games | Sean Noh et.al. | 2405.05248v2 | null |
2024-05-08 | Zero-shot LLM-guided Counterfactual Generation for Text | Amrita Bhattacharjee et.al. | 2405.04793v1 | null |
2024-05-09 | Large Language Models for Cyber Security: A Systematic Literature Review | HanXiang Xu et.al. | 2405.04760v2 | null |
2024-05-07 | Large Language Models Cannot Explain Themselves | Advait Sarkar et.al. | 2405.04382v1 | null |
2024-05-07 | Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation | Atharvan Dogra et.al. | 2405.04325v1 | null |
2024-05-07 | Granite Code Models: A Family of Open Foundation Models for Code Intelligence | Mayank Mishra et.al. | 2405.04324v1 | link |
2024-05-07 | Semantic API Alignment: Linking High-level User Goals to APIs | Robert Feldt et.al. | 2405.04236v1 | null |
2024-05-07 | NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions | Elliot Gestrin et.al. | 2405.04215v1 | null |
2024-05-07 | A Causal Explainable Guardrails for Large Language Models | Zhixuan Chu et.al. | 2405.04160v1 | null |
2024-05-06 | FOKE: A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering | Silan Hu et.al. | 2405.03734v1 | null |
2024-05-06 | Explainable Fake News Detection With Large Language Model via Defense Among Competing Wisdom | Bo Wang et.al. | 2405.03371v1 | link |
2024-05-03 | What does the Knowledge Neuron Thesis Have to do with Knowledge? | Jingcheng Niu et.al. | 2405.02421v1 | link |
2024-05-07 | A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model | Jiexia Ye et.al. | 2405.02358v2 | link |
2024-05-03 | Argumentative Large Language Models for Explainable and Contestable Decision-Making | Gabriel Freedman et.al. | 2405.02079v1 | null |
2024-05-03 | Which Identities Are Mobilized: Towards an automated detection of social group appeals in political texts | Felicia Riethmüller et.al. | 2405.01904v1 | null |
2024-05-02 | CoS: Enhancing Personalization and Mitigating Bias with Context Steering | Jerry Zhi-Yang He et.al. | 2405.01768v1 | null |
2024-05-08 | Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving | Xin Quan et.al. | 2405.01379v2 | null |
2024-04-26 | LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study | Van Bach Nguyen et.al. | 2405.00722v1 | null |
2024-05-01 | RAG-based Explainable Prediction of Road Users Behaviors for Automated Driving using Knowledge Graphs and Large Language Models | Mohamed Manzour Hussien et.al. | 2405.00449v1 | null |
2024-05-01 | Social Life Simulation for Non-Cognitive Skills Learning | Zihan Yan et.al. | 2405.00273v1 | null |
2024-04-30 | A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications | Steph Buongiorno et.al. | 2404.19729v1 | null |
2024-04-30 | On Training a Neural Network to Explain Binaries | Alexander Interrante-Grant et.al. | 2404.19631v1 | null |
2024-04-29 | Large Language Models as Conversational Movie Recommenders: A User Study | Ruixuan Sun et.al. | 2404.19093v1 | null |
2024-04-30 | Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability | Meng Li et.al. | 2404.18533v2 | link |
2024-04-30 | Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages | David Ifeoluwa Adelani et.al. | 2404.18286v2 | null |
2024-04-27 | Advancing Healthcare Automation: Multi-Agent Systems for Medical Necessity Justification | Himanshu Pandey et.al. | 2404.17977v1 | null |
2024-04-11 | Rumour Evaluation with Very Large Language Models | Dahlia Shehata et.al. | 2404.16859v1 | link |
2024-04-25 | TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning | Liang Zhang et.al. | 2404.16635v1 | link |
2024-04-04 | Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation | Mohammadmehdi Ataei et.al. | 2404.16045v1 | null |
2024-04-24 | Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach | Linyu Liu et.al. | 2404.15993v1 | null |
2024-04-25 | Detecting Conceptual Abstraction in LLMs | Michaela Regneri et.al. | 2404.15848v2 | null |
2024-04-22 | Pixels and Predictions: Potential of GPT-4V in Meteorological Imagery Analysis and Forecast Communication | John R. Lawson et.al. | 2404.15166v1 | null |
2024-06-04 | Graph Machine Learning in the Era of Large Language Models (LLMs) | Wenqi Fan et.al. | 2404.14928v2 | null |
2024-05-10 | Explaining Arguments' Strength: Unveiling the Role of Attacks and Supports (Technical Report) | Xiang Yin et.al. | 2404.14304v2 | link |
2024-04-22 | Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach | Yao Wan et.al. | 2404.14296v1 | link |
2024-04-22 | EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning | Mingjie Ma et.al. | 2404.13847v1 | null |
2024-04-29 | Large Language Models for Networking: Workflow, Advances and Challenges | Chang Liu et.al. | 2404.12901v2 | null |
2024-04-18 | MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale | Xiaotang Gai et.al. | 2404.12372v1 | null |
2024-04-18 | Concept Induction using LLMs: a user experiment for assessment | Adrita Barua et.al. | 2404.11875v1 | null |
2024-05-01 | Course Recommender Systems Need to Consider the Job Market | Jibril Frej et.al. | 2404.10876v2 | link |
2024-06-03 | Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model | Hengyuan Zhang et.al. | 2404.10306v4 | link |
2024-04-11 | Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs | Jierui Li et.al. | 2404.08148v1 | null |
2024-05-29 | Sketch-Plan-Generalize: Continual Few-Shot Learning of Inductively Generalizable Spatial Concepts | Namasivayam Kalithasan et.al. | 2404.07774v2 | null |
2024-04-11 | Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models | Marvin Pafla et.al. | 2404.07725v1 | null |
2024-04-07 | Explaining EDA synthesis errors with LLMs | Siyu Qiu et.al. | 2404.07235v1 | null |
2024-04-11 | From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications | Yongqiang Ma et.al. | 2404.07108v2 | null |
2024-05-15 | A Mathematical Theory for Learning Semantic Languages by Abstract Learners | Kuo-Yu Liao et.al. | 2404.07009v3 | null |
2024-04-10 | WordDecipher: Enhancing Digital Workspace Communication with Explainable AI for Non-native English Speakers | Yuexi Chen et.al. | 2404.07005v1 | null |
2024-04-09 | CausalBench: A Comprehensive Benchmark for Causal Learning Capability of Large Language Models | Yu Zhou et.al. | 2404.06349v1 | null |
2024-04-07 | X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model | Jan Held et.al. | 2404.06332v1 | null |
2024-04-07 | StockGPT: A GenAI Model for Stock Prediction and Trading | Dat Mai et.al. | 2404.05101v1 | null |
2024-04-07 | Data Bias According to Bipol: Men are Naturally Right and It is the Role of Women to Follow Their Lead | Irene Pagliai et.al. | 2404.04838v1 | link |
2024-04-06 | Binary Classifier Optimization for Large Language Model Alignment | Seungjae Jung et.al. | 2404.04656v1 | null |
2024-04-04 | Language Model Evolution: An Iterated Learning Perspective | Yi Ren et.al. | 2404.04286v1 | link |
2024-04-04 | Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph | Marco Bronzini et.al. | 2404.03623v1 | null |
2024-04-04 | Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models | Yantao Liu et.al. | 2404.03577v1 | link |
2024-04-04 | Edisum: Summarizing and Explaining Wikipedia Edits at Scale | Marija Šakota et.al. | 2404.03428v1 | link |
2024-04-04 | Probing Large Language Models for Scalar Adjective Lexical Semantics and Scalar Diversity Pragmatics | Fangru Lin et.al. | 2404.03301v1 | link |
2024-04-04 | DELTA: Decomposed Efficient Long-Term Robot Task Planning using Large Language Models | Yuchen Liu et.al. | 2404.03275v1 | null |
2024-04-03 | LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models | Gabriela Ben Melech Stan et.al. | 2404.03118v1 | null |
2024-04-10 | An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models | Emmy Liu et.al. | 2404.03028v2 | null |
2024-04-13 | Explainable Traffic Flow Prediction with Large Language Models | Xusen Guo et.al. | 2404.02937v3 | null |
2024-04-03 | Towards detecting unanticipated bias in Large Language Models | Anna Kruspe et.al. | 2404.02650v1 | null |
2024-04-03 | Task Agnostic Architecture for Algorithm Induction via Implicit Composition | Sahil J. Sindhi et.al. | 2404.02450v1 | null |
2024-04-01 | Enhancing Reasoning Capacity of SLM using Cognitive Enhancement | Jonathan Pan et.al. | 2404.01135v1 | null |
2024-04-01 | Query Performance Prediction using Relevance Judgments Generated by Large Language Models | Chuan Meng et.al. | 2404.01012v1 | link |
2024-04-12 | Harnessing the Power of Large Language Model for Uncertainty Aware Graph Processing | Zhenyu Qian et.al. | 2404.00589v2 | link |
2024-03-28 | "I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices | Shivani Kapania et.al. | 2403.19876v1 | null |
2024-03-27 | Measuring Political Bias in Large Language Models: What Is Said and How It Is Said | Yejin Bang et.al. | 2403.18932v1 | null |
2024-03-26 | Targeted Visualization of the Backbone of Encoder LLMs | Isaac Roberts et.al. | 2403.18872v1 | link |
2024-03-27 | A Path Towards Legal Autonomy: An interoperable and explainable approach to extracting, transforming, loading and computing legal information using large language models, expert systems and Bayesian networks | Axel Constant et.al. | 2403.18537v1 | null |
2024-03-27 | LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models | Mingxing Peng et.al. | 2403.18344v1 | null |
2024-03-27 | Exploring the Privacy Protection Capabilities of Chinese Large Language Models | Yuqi Yang et.al. | 2403.18205v1 | null |
2024-03-26 | Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach | Andrea Ferrario et.al. | 2403.17873v1 | null |
2024-03-26 | Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons | Shijia Zhou et.al. | 2403.17760v1 | link |
2024-03-25 | A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection | Benjamin Steenhoek et.al. | 2403.17218v1 | null |
2024-03-25 | Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making | Shuai Ma et.al. | 2403.16812v1 | null |
2024-03-26 | RU22Fact: Optimizing Evidence for Multilingual Explainable Fact-Checking on Russia-Ukraine Conflict | Yirong Zeng et.al. | 2403.16662v2 | link |
2024-03-25 | ChatDBG: An AI-Powered Debugging Assistant | Kyla Levin et.al. | 2403.16354v1 | link |
2024-03-26 | Towards a RAG-based Summarization Agent for the Electron-Ion Collider | Karthik Suresh et.al. | 2403.15729v2 | null |
2024-03-22 | Large language models for crowd decision making based on prompt design strategies using ChatGPT: models, analysis and challenges | Cristina Zuheros et.al. | 2403.15587v1 | null |
2024-04-02 | Assessing the Utility of Large Language Models for Phenotype-Driven Gene Prioritization in Rare Genetic Disorder Diagnosis | Junyoung Kim et.al. | 2403.14801v2 | null |
2024-03-21 | A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science | Clayton Cohn et.al. | 2403.14565v1 | null |
2024-04-08 | MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation | Longzheng Wang et.al. | 2403.14171v3 | link |
2024-03-21 | From Handcrafted Features to LLMs: A Brief Survey for Machine Translation Quality Estimation | Haofei Zhao et.al. | 2403.14118v1 | null |
2024-03-21 | PE-GPT: A Physics-Informed Interactive Large Language Model for Power Converter Modulation Design | Fanfan Lin et.al. | 2403.14059v1 | null |
2024-03-12 | Duwak: Dual Watermarks in Large Language Models | Chaoyi Zhu et.al. | 2403.13000v1 | null |
2024-03-19 | INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations | Lirui Luo et.al. | 2403.12451v1 | null |
2024-05-08 | Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales | Ayushi Nirmal et.al. | 2403.12403v2 | link |
2024-05-09 | From Explainable to Interpretable Deep Learning for Natural Language Processing in Healthcare: How Far from Reality? | Guangming Huang et.al. | 2403.11894v3 | null |
2024-03-18 | DEE: Dual-stage Explainable Evaluation Method for Text Generation | Shenyu Zhang et.al. | 2403.11509v1 | null |
2024-04-30 | Correcting misinformation on social media with a large language model | Xinyi Zhou et.al. | 2403.11169v3 | link |
2024-03-17 | Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering | Baiyan Zhang et.al. | 2403.11129v1 | null |
2024-03-26 | SelfIE: Self-Interpretation of Large Language Model Embeddings | Haozhe Chen et.al. | 2403.10949v2 | link |
2024-03-16 | Depression Detection on Social Media with Large Language Models | Xiaochong Lan et.al. | 2403.10750v1 | null |
2024-03-15 | Demystifying Faulty Code with LLM: Step-by-Step Reasoning for Explainable Fault Localization | Ratnadira Widyasari et.al. | 2403.10507v1 | null |
2024-03-22 | Can a GPT4-Powered AI Agent Be a Good Enough Performance Attribution Analyst? | Bruno de Melo et.al. | 2403.10482v2 | null |
2024-03-15 | A Question on the Explainability of Large Language Models and the Word-Level Univariate First-Order Plausibility Assumption | Jeremie Bogaert et.al. | 2403.10275v1 | null |
2024-03-15 | Language to Map: Topological map generation from natural language path instructions | Hideki Deguchi et.al. | 2403.10008v1 | null |
2024-03-14 | Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey | Xiaoyu Liu et.al. | 2403.09606v1 | null |
2024-04-23 | Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models | Laura Fernández-Becerra et.al. | 2403.09567v2 | null |
2024-03-14 | XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization | Yequan Bie et.al. | 2403.09410v1 | null |
2024-03-14 | Meaningful Learning: Advancing Abstract Reasoning in Large Language Models via Generic Fact Guidance | Kai Xiong et.al. | 2403.09085v1 | null |
2024-03-13 | Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era | Xuansheng Wu et.al. | 2403.08946v1 | link |
2024-03-13 | TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation | Dingbang Li et.al. | 2403.08833v1 | null |
2024-03-13 | Can Large Language Models Identify Authorship? | Baixiang Huang et.al. | 2403.08213v1 | link |
2024-03-12 | generAItor: Tree-in-the-Loop Text Generation for Language Model Explainability and Adaptation | Thilo Spinner et.al. | 2403.07627v1 | null |
2024-03-12 | Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code | Zhou Yang et.al. | 2403.07506v1 | null |
2024-03-11 | Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena | Leonie Weissweiler et.al. | 2403.06965v1 | null |
2024-03-11 | RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems | Jianxun Lian et.al. | 2403.06465v1 | link |
2024-03-10 | ArgMed-Agents: Explainable Clinical Decision Reasoning with Large Language Models via Argumentation Schemes | Shengxin Hong et.al. | 2403.06294v1 | null |
2024-03-10 | Low-dose CT Denoising with Language-engaged Dual-space Alignment | Zhihao Chen et.al. | 2403.06128v1 | link |
2024-03-10 | Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting Skills | Paul Denny et.al. | 2403.06050v1 | null |
2024-03-08 | Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings | Wei Zhou et.al. | 2403.05338v1 | null |
2024-03-08 | Aligning Large Language Models for Controllable Recommendations | Wensheng Lu et.al. | 2403.05063v1 | null |
2024-03-07 | Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference | Wei-Lin Chiang et.al. | 2403.04132v1 | null |
2024-04-26 | Multimodal Large Language Models to Support Real-World Fact-Checking | Jiahui Geng et.al. | 2403.03627v2 | null |
2024-03-06 | RouteExplainer: An Explanation Framework for Vehicle Routing Problem | Daisuke Kikuta et.al. | 2403.03585v1 | link |
2024-03-06 | Explaining Genetic Programming Trees using Large Language Models | Paula Maddigan et.al. | 2403.03397v1 | null |
2024-03-05 | SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection | Peng Qi et.al. | 2403.03170v1 | null |
2024-03-05 | Word Importance Explains How Prompts Affect Language Model Outputs | Stefan Hackmann et.al. | 2403.03028v1 | null |
2024-03-05 | FinReport: Explainable Stock Earnings Forecasting via News Factor Analyzing Model | Xiangyu Li et.al. | 2403.02647v1 | link |
2024-03-04 | Evaluating the Explainability of Neural Rankers | Saran Pandian et.al. | 2403.01981v1 | null |
2024-03-03 | SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos | Yulei Niu et.al. | 2403.01599v1 | null |
2024-03-03 | Logic Rules as Explanations for Legal Case Retrieval | Zhongxiang Sun et.al. | 2403.01457v1 | link |
2024-03-02 | Improving the Validity of Automatically Generated Feedback via Reinforcement Learning | Alexander Scarlatos et.al. | 2403.01304v1 | link |
2024-03-02 | STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models | Linhai Zhang et.al. | 2403.01165v1 | link |
2024-02-25 | Cognitive Bias in High-Stakes Decision-Making with LLMs | Jessica Echterhoff et.al. | 2403.00811v1 | null |
2024-03-16 | ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework | Zhongqi Yang et.al. | 2403.00781v2 | null |
2024-02-29 | FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition | Xiaoqiang Wang et.al. | 2403.00126v1 | null |
2024-02-29 | Dual Operating Modes of In-Context Learning | Ziqian Lin et.al. | 2402.18819v1 | link |
2024-04-15 | Cause and Effect: Can Large Language Models Truly Understand Causality? | Swagata Ashwani et.al. | 2402.18139v2 | null |
2024-03-13 | Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions | Hanjie Chen et.al. | 2402.18060v3 | link |
2024-03-04 | A Language Model based Framework for New Concept Placement in Ontologies | Hang Dong et.al. | 2402.17897v2 | link |
2024-04-12 | Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses | Juyeon Kim et.al. | 2402.17097v2 | link |
2024-02-26 | Leveraging Large Language Models for Learning Complex Legal Concepts through Storytelling | Hang Jiang et.al. | 2402.17019v1 | link |
2024-02-28 | Defending LLMs against Jailbreaking Attacks via Backtranslation | Yihan Wang et.al. | 2402.16459v2 | link |
2024-02-26 | ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors | Zhexin Zhang et.al. | 2402.16444v1 | link |
2024-02-26 | Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models | Tianyi Tang et.al. | 2402.16438v1 | null |
2024-03-11 | Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models | Jeonghwan Kim et.al. | 2402.16315v2 | null |
2024-02-24 | HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition | Yuxuan Liu et.al. | 2402.15754v1 | null |
2024-02-24 | Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning | Yong Liu et.al. | 2402.15751v1 | null |
2024-03-04 | LLMs Can Defend Themselves Against Jailbreaking in a Practical Manner: A Vision Paper | Daoyuan Wu et.al. | 2402.15727v2 | null |
2024-02-26 | Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition | Yufei Huang et.al. | 2402.15175v2 | null |
2024-02-22 | Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark | Xiuying Chen et.al. | 2402.14359v1 | null |
2024-02-22 | Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization | Jiliang Li et.al. | 2402.14182v1 | null |
2024-02-21 | An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach | Mohammad Amaz Uddin et.al. | 2402.13871v1 | null |
2024-02-21 | Factual Consistency Evaluation of Summarisation in the Era of Large Language Models | Zheheng Luo et.al. | 2402.13758v1 | null |
2024-03-08 | SaGE: Evaluating Moral Consistency in Large Language Models | Vamshi Krishna Bonagiri et.al. | 2402.13709v2 | link |
2024-02-19 | Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? | Nishant Balepur et.al. | 2402.12483v1 | link |
2024-02-19 | Explain then Rank: Scale Calibration of Neural Rankers Using Natural Language Explanations from Large Language Models | Puxuan Yu et.al. | 2402.12276v1 | link |
2024-02-18 | Opening the black box of language acquisition | Jérôme Michaud et.al. | 2402.11681v1 | link |
2024-02-23 | Decoding News Narratives: A Critical Analysis of Large Language Models in Framing Bias Detection | Valeria Pastorino et.al. | 2402.11621v2 | null |
2024-02-18 | Large Language Model-driven Meta-structure Discovery in Heterogeneous Information Network | Lin Chen et.al. | 2402.11518v1 | null |
2024-02-18 | Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction | Yinghui Li et.al. | 2402.11420v1 | null |
2024-02-17 | Dissecting Human and LLM Preferences | Junlong Li et.al. | 2402.11296v1 | link |
2024-02-17 | GenDec: A robust generative Question-decomposition method for Multi-hop reasoning | Jian Wu et.al. | 2402.11166v1 | null |
2024-02-16 | Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models | Zihao Lin et.al. | 2402.11122v1 | null |
2024-02-21 | Exploring Value Biases: How LLMs Deviate Towards the Ideal | Sarath Sivaprasad et.al. | 2402.11005v2 | null |
2024-03-15 | Zero-shot Explainable Mental Health Analysis on Social Media by Incorporating Mental Scales | Wenyu Li et.al. | 2402.10948v2 | null |
2024-02-19 | Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities | Mingyu Jin et.al. | 2402.10835v2 | null |
2024-02-16 | RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model | Jianhao Yuan et.al. | 2402.10828v1 | null |
2024-02-16 | Quantifying the Persona Effect in LLM Simulations | Tiancheng Hu et.al. | 2402.10811v1 | null |
2024-02-16 | Properties and Challenges of LLM-Generated Explanations | Jenny Kunz et.al. | 2402.10532v1 | null |
2024-02-15 | Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review | Jing Su et.al. | 2402.10350v1 | null |
2024-02-15 | Case Study: Testing Model Capabilities in Some Reasoning Tasks | Min Zhang et.al. | 2402.09967v1 | null |
2024-02-15 | Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States | Hanyu Duan et.al. | 2402.09733v1 | null |
2024-02-21 | CodeMind: A Framework to Challenge Large Language Models for Code Reasoning | Changshu Liu et.al. | 2402.09664v3 | link |
2024-02-14 | Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems | Liang Zhang et.al. | 2402.09584v1 | null |
2024-02-14 | SyntaxShap: Syntax-aware Explainability Method for Text Generation | Kenza Amara et.al. | 2402.09259v1 | null |
2024-02-12 | Why and When LLM-Based Assistants Can Go Wrong: Investigating the Effectiveness of Prompt-Based Interactions for Software Help-Seeking | Anjali Khurana et.al. | 2402.08030v1 | null |
2024-02-02 | Exploring patient trust in clinical advice from AI-driven LLMs like ChatGPT for self-diagnosis | Delong Du et.al. | 2402.07920v1 | null |
2024-01-29 | Experimental Interface for Multimodal and Large Language Model Based Explanations of Educational Recommender Systems | Hasan Abu-Rasheed et.al. | 2402.07910v1 | null |
2024-02-12 | TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection | Hui Liu et.al. | 2402.07776v1 | link |
2024-02-12 | Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate | Kyungha Kim et.al. | 2402.07401v1 | null |
2024-02-11 | TransGPT: Multi-modal Generative Pre-trained Transformer for Transportation | Peng Wang et.al. | 2402.07233v1 | null |
2024-02-11 | X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Design | Eric L. Buehler et.al. | 2402.07148v1 | link |
2024-02-08 | Integrating LLMs for Explainable Fault Diagnosis in Complex Systems | Akshay J. Dave et.al. | 2402.06695v1 | null |
2024-02-09 | The Quantified Boolean Bayesian Network: Theory and Experiments with a Logical Graphical Model | Gregory Coppola et.al. | 2402.06557v1 | link |
2024-02-06 | Personalized Language Modeling from Personalized Human Feedback | Xinyu Li et.al. | 2402.05133v1 | null |
2024-02-05 | Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering | Aryan Agrawal et.al. | 2402.05127v1 | null |
2024-02-07 | Large Language Models As Faithful Explainers | Yu-Neng Chuang et.al. | 2402.04678v1 | null |
2024-03-14 | Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models | Chirag Agarwal et.al. | 2402.04614v3 | null |
2024-02-06 | Explaining Autonomy: Enhancing Human-Robot Interaction through Explanation Generation with Large Language Models | David Sobrín-Hidalgo et.al. | 2402.04206v1 | null |
2024-02-29 | Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models | Kelvin J. L. Koa et.al. | 2402.03659v3 | link |
2024-01-31 | Uncertainty-Aware Explainable Recommendation with Large Language Models | Yicui Peng et.al. | 2402.03366v1 | null |
2024-02-05 | The Matrix: A Bayesian learning model for LLMs | Siddhartha Dalal et.al. | 2402.03175v1 | null |
2024-02-05 | Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language Models | Michele Mastromattei et.al. | 2402.03142v1 | link |
2024-02-05 | How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning | Zeping Yu et.al. | 2402.02872v1 | null |
2024-02-04 | Selecting Large Language Model to Fine-tune via Rectified Scaling Law | Haowei Lin et.al. | 2402.02314v1 | null |
2024-02-03 | Frequency Explains the Inverse Correlation of Large Language Models' Size, Training Data Amount, and Surprisal's Fit to Reading Times | Byung-Doh Oh et.al. | 2402.02255v1 | link |
2024-02-06 | Large Language Model Agent for Hyper-Parameter Optimization | Siyi Liu et.al. | 2402.01881v2 | null |
2024-02-02 | The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models | Moschoula Pternea et.al. | 2402.01874v1 | null |
2024-02-02 | Ecologically rational meta-learned inference explains human category learning | Akshay K. Jagadish et.al. | 2402.01821v1 | null |
2024-02-01 | When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards | Norah Alzahrani et.al. | 2402.01781v1 | null |
2024-01-30 | Rethinking Interpretability in the Era of Large Language Models | Chandan Singh et.al. | 2402.01761v1 | link |
2024-02-24 | Contextualization Distillation from Large Language Model for Knowledge Graph Completion | Dawei Li et.al. | 2402.01729v3 | null |
2024-03-01 | Measuring Moral Inconsistencies in Large Language Models | Vamshi Krishna Bonagiri et.al. | 2402.01719v3 | null |
2024-02-16 | Emojis Decoded: Leveraging ChatGPT for Enhanced Understanding in Social Media Communications | Yuhang Zhou et.al. | 2402.01681v2 | null |
2024-02-05 | SymbolicAI: A framework for logic-based approaches combining generative models and solvers | Marius-Constantin Dinu et.al. | 2402.00854v2 | link |
2024-02-01 | Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement | Xin Quan et.al. | 2402.00745v1 | link |
2024-02-01 | IndiVec: An Exploration of Leveraging Large Language Models for Media Bias Detection with Fine-Grained Bias Indicators | Luyang Lin et.al. | 2402.00345v1 | null |
2024-02-01 | Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective | Qun Ma et.al. | 2402.00262v1 | null |
2024-01-31 | Multimodal Neurodegenerative Disease Subtyping Explained by ChatGPT | Diego Machado Reyes et.al. | 2402.00137v1 | null |
2024-03-10 | Arrows of Time for Large Language Models | Vassilis Papadopoulos et.al. | 2401.17505v2 | null |
2024-01-30 | Detecting mental disorder on social media: a ChatGPT-augmented explainable approach | Loris Belcastro et.al. | 2401.17477v1 | link |
2024-02-10 | Reproducibility, energy efficiency and performance of pseudorandom number generators in machine learning: a comparative study of python, numpy, tensorflow, and pytorch implementations | Benjamin Antunes et.al. | 2401.17345v2 | null |
2024-01-30 | Incoherent Probability Judgments in Large Language Models | Jian-Qiao Zhu et.al. | 2401.16646v1 | null |
2024-02-27 | How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability | Ivan DeAndres-Tame et.al. | 2401.13641v2 | link |
2024-01-24 | Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models | Hongzhan Lin et.al. | 2401.13298v1 | link |
2024-01-23 | XAI for All: Can Large Language Models Simplify Explainable AI? | Philip Mavrepis et.al. | 2401.13110v1 | null |
2024-02-22 | From Understanding to Utilization: A Survey on Explainability for Large Language Models | Haoyan Luo et.al. | 2401.12874v2 | null |
2024-01-23 | How well can large language models explain business processes? | Dirk Fahland et.al. | 2401.12846v1 | null |
2024-02-23 | Generating Zero-shot Abstractive Explanations for Rumour Verification | Iman Munire Bilal et.al. | 2401.12713v3 | link |
2024-01-23 | LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools | Qianli Wang et.al. | 2401.12576v1 | link |
2024-01-21 | Over-Reasoning and Redundant Calculation of Large Language Models | Cheng-Han Chiang et.al. | 2401.11467v1 | link |
2024-01-20 | Analyzing Task-Encoding Tokens in Large Language Models | Yu Bai et.al. | 2401.11323v1 | null |
2024-01-17 | Vlogger: Make Your Dream A Vlog | Shaobin Zhuang et.al. | 2401.09414v1 | link |
2024-01-24 | Supporting Student Decisions on Learning Recommendations: An LLM-Based Chatbot with Knowledge Graph Contextualization for Conversational Explainability and Mentoring | Hasan Abu-Rasheed et.al. | 2401.08517v3 | null |
2024-01-16 | LLM-Guided Multi-View Hypergraph Learning for Human-Centric Explainable Recommendation | Zhixuan Chu et.al. | 2401.08217v1 | null |
2024-02-15 | Are self-explanations from Large Language Models faithful? | Andreas Madsen et.al. | 2401.07927v3 | link |
2024-01-15 | Quantum Transfer Learning for Acceptability Judgements | Giuseppe Buonaiuto et.al. | 2401.07777v1 | null |
2024-01-14 | Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study | Ahmadul Karim Chowdhury et.al. | 2401.07310v1 | null |
2024-01-12 | TestSpark: IntelliJ IDEA's Ultimate Test Generation Companion | Arkadii Sapozhnikov et.al. | 2401.06580v1 | link |
2024-01-12 | Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models | Asma Ghandeharioun et.al. | 2401.06102v2 | null |
2024-01-11 | Video Anomaly Detection and Explanation via Large Language Models | Hui Lv et.al. | 2401.05702v1 | null |
2024-01-11 | REBUS: A Robust Evaluation Benchmark of Understanding Symbols | Andrew Gritsevskiy et.al. | 2401.05604v1 | link |
2024-01-08 | LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems | Mohamad Fakih et.al. | 2401.05443v1 | link |
2024-01-10 | Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis | Lanling Xu et.al. | 2401.04997v1 | null |
2024-01-08 | ExTraCT -- Explainable Trajectory Corrections from language inputs using Textual description of features | J-Anne Yow et.al. | 2401.03701v1 | null |
2024-01-06 | Autonomous Crowdsensing: Operating and Organizing Crowdsensing for Sensing Automation | Wansen Wu et.al. | 2401.03229v1 | null |
2024-01-02 | Evaluating Large Language Models on the GMAT: Implications for the Future of Business Education | Vahid Ashrafimoghari et.al. | 2401.02985v1 | null |
2024-01-05 | Large Language Models in Plant Biology | Hilbert Yuen In Lam et.al. | 2401.02789v1 | null |
2024-01-02 | VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics | Ammar A. Siddiqui et.al. | 2401.01414v1 | null |
2023-12-30 | The Problem of Alignment | Tsvetelina Hristova et.al. | 2401.00210v1 | null |
2023-12-29 | Building Efficient Universal Classifiers with Natural Language Inference | Moritz Laurer et.al. | 2312.17543v1 | link |
2023-12-23 | An Explainable AI Approach to Large Language Model Assisted Causal Model Auditing and Development | Yanming Zhang et.al. | 2312.16211v1 | null |
2024-01-03 | Unlocking the Potential of Large Language Models for Explainable Recommendations | Yucong Luo et.al. | 2312.15661v3 | link |
2023-12-11 | Transportation Transformed: A Comprehensive Review of Dynamic Rerouting in Multimodal Networks | Suyash Pratap et.al. | 2312.14953v1 | null |
2023-12-22 | VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation | Max Ku et.al. | 2312.14867v1 | null |
2023-12-21 | Deep de Finetti: Recovering Topic Distributions from Large Language Models | Liyi Zhang et.al. | 2312.14226v1 | null |
2023-12-16 | Learning Interpretable Queries for Explainable Image Classification with Information Pursuit | Stefan Kolek et.al. | 2312.11548v1 | null |
2023-12-19 | The Good, The Bad, and Why: Unveiling Emotions in Generative AI | Cheng Li et.al. | 2312.11111v2 | null |
2023-12-17 | Can persistent homology whiten Transformer-based black-box models? A case study on BERT compression | Luis Balderas et.al. | 2312.10702v1 | null |
2024-01-17 | LLM-SQL-Solver: Can LLMs Determine SQL Equivalence? | Fuheng Zhao et.al. | 2312.10321v2 | null |
2023-12-15 | GPT-doctor: Customizing Large Language Models for Medical Consultation | Wen Wang et.al. | 2312.10225v1 | null |
2023-12-04 | A collection of principles for guiding and evaluating large language models | Konstantin Hebenstreit et.al. | 2312.10059v1 | null |
2023-12-15 | Prompting Datasets: Data Discovery with Conversational Agents | Johanna Walker et.al. | 2312.09947v1 | null |
2023-12-15 | SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models | Lee Hyun et.al. | 2312.09818v1 | link |
2023-12-14 | Successor Heads: Recurring, Interpretable Attention Heads In The Wild | Rhys Gould et.al. | 2312.09230v1 | null |
2023-12-27 | Fine-Grained Image-Text Alignment in Medical Imaging Enables Cyclic Image-Report Generation | Wenting Chen et.al. | 2312.08078v4 | null |
2023-12-13 | Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning | Jinta Weng et.al. | 2312.08027v1 | null |
2023-12-12 | Tell, don't show: Declarative facts influence how LLMs generalize | Alexander Meinke et.al. | 2312.07779v1 | null |
2023-12-05 | Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability, Explainability, and Safety | Manas Gaur et.al. | 2312.06798v1 | null |
2023-12-10 | Evidence-based Interpretable Open-domain Fact-checking with Large Language Models | Xin Tan et.al. | 2312.05834v1 | null |
2023-11-30 | Applying Large Language Models and Chain-of-Thought for Automatic Scoring | Gyeong-Geon Lee et.al. | 2312.03748v1 | null |
2023-12-06 | XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering | Joel Stremmel et.al. | 2312.03567v1 | null |
2023-12-03 | TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents | James Enouen et.al. | 2312.01279v1 | null |
2023-11-30 | Large Language Models for Travel Behavior Prediction | Baichuan Mo et.al. | 2312.00819v1 | null |
2023-11-30 | CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation | Pei Ke et.al. | 2311.18702v1 | link |
2023-11-30 | Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension | Akira Kawabata et.al. | 2311.18353v1 | null |
2023-11-29 | Understanding Your Agent: Leveraging Large Language Models for Behavior Explanation | Xijia Zhang et.al. | 2311.18062v1 | null |
2023-11-29 | Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning | Xiaoqian Wu et.al. | 2311.17365v1 | null |
2023-11-29 | Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering | Zeqing Wang et.al. | 2311.17331v1 | null |
2024-02-12 | Large language models can enhance persuasion through linguistic feature alignment | Minkyu Shin et.al. | 2311.16466v2 | null |
2023-11-16 | Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities | Avishree Khare et.al. | 2311.16169v1 | null |
2023-11-27 | Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models | Stephen MacNeil et.al. | 2311.16017v1 | null |
2023-11-27 | Justifiable Artificial Intelligence: Engineering Large Language Models for Legal Applications | Sabine Wehnert et.al. | 2311.15716v1 | null |
2023-11-27 | Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination | Haoqiang Kang et.al. | 2311.15548v1 | null |
2023-11-25 | Code Generation Based Grading: Evaluating an Auto-grading Mechanism for "Explain-in-Plain-English" Questions | David H. Smith IV et.al. | 2311.14903v1 | null |
2023-11-10 | ChatGPT Exhibits Gender and Racial Biases in Acute Coronary Syndrome Management | Angela Zhang et.al. | 2311.14703v1 | null |
2023-11-23 | Towards Auditing Large Language Models: Improving Text-based Stereotype Detection | Wu Zekun et.al. | 2311.14126v1 | null |
2023-11-23 | Towards Explainable Strategy Templates using NLP Transformers | Pallavi Bagga et.al. | 2311.14061v1 | null |
2023-11-22 | Large Language Models in Education: Vision and Opportunities | Wensheng Gan et.al. | 2311.13160v1 | null |
2023-11-21 | A Survey on Large Language Models for Personalized and Explainable Recommendations | Junyi Chen et.al. | 2311.12338v1 | null |
2023-11-20 | Unifying Corroborative and Contributive Attributions in Large Language Models | Theodora Worledge et.al. | 2311.12233v1 | null |
2023-11-20 | LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions | Songhao Han et.al. | 2311.11904v1 | null |
2023-11-20 | Large Language Models and Explainable Law: a Hybrid Methodology | Marco Billi et.al. | 2311.11811v1 | null |
2023-11-20 | Exploring Prompting Large Language Models as Explainable Metrics | Ghazaleh Mahmoudi et.al. | 2311.11552v1 | link |
2023-11-19 | Using Causal Threads to Explain Changes in a Dynamic System | Robert B. Allen et.al. | 2311.11334v1 | null |
2023-12-17 | Rethinking Large Language Models in Mental Health Applications | Shaoxiong Ji et.al. | 2311.11267v2 | null |
2023-11-16 | ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Literacy and Communication in Pediatric Populations and Beyond | Kanhai S. Amin et.al. | 2311.10075v1 | null |
2023-11-16 | Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts | Mingqian Zheng et.al. | 2311.10054v1 | null |
2023-11-15 | Explaining Explanation: An Empirical Study on Explanation in Code Reviews | Ratnadira Widyasari et.al. | 2311.09020v1 | null |
2023-11-15 | Data Similarity is Not Enough to Explain Language Model Performance | Gregory Yauney et.al. | 2311.09006v1 | link |
2023-11-15 | XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making | Zichen Chen et.al. | 2311.08614v1 | null |
2023-11-14 | UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations | Wenting Zhao et.al. | 2311.08469v1 | null |
2023-11-16 | Are Large Language Models Temporally Grounded? | Yifu Qiu et.al. | 2311.08398v2 | link |
2023-11-13 | In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax | Aaron Mueller et.al. | 2311.07811v1 | link |
2023-11-13 | On Measuring Faithfulness of Natural Language Explanations | Letitia Parcalabescu et.al. | 2311.07466v1 | link |
2023-11-12 | SELF-EXPLAIN: Teaching Large Language Models to Reason Complex Questions by Themselves | Jiachen Zhao et.al. | 2311.06985v1 | null |
2023-11-10 | Distilling Large Language Models using Skill-Occupation Graph Context for HR-Related Tasks | Pouya Pezeshkpour et.al. | 2311.06383v1 | link |
2023-11-08 | DEMASQ: Unmasking the ChatGPT Wordsmith | Kavita Kumari et.al. | 2311.05019v1 | null |
2023-11-01 | From Text to Structure: Using Large Language Models to Support the Development of Legal Expert Systems | Samyar Janatian et.al. | 2311.04911v1 | link |
2023-11-07 | Extracting human interpretable structure-property relationships in chemistry using XAI and large language models | Geemi P. Wellawatte et.al. | 2311.04047v1 | link |
2023-11-07 | Which is better? Exploring Prompting Strategy For LLM-based Metrics | Joonghoon Kim et.al. | 2311.03754v1 | link |
2023-11-07 | Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning | Ruosen Li et.al. | 2311.03734v1 | link |
2023-11-04 | Can ChatGPT support software verification? | Christian Janßen et.al. | 2311.02433v1 | null |
2023-11-12 | Proto-lm: A Prototypical Network-Based Framework for Built-in Interpretability in Large Language Models | Sean Xie et.al. | 2311.01732v2 | link |
2023-09-26 | Creating Trustworthy LLMs: Dealing with Hallucinations in Healthcare AI | Muhammad Aurangzeb Ahmad et.al. | 2311.01463v1 | null |
2023-11-01 | Emotion Detection for Misinformation: A Review | Zhiwei Liu et.al. | 2311.00671v1 | null |
2023-11-22 | HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning | Yongjin Yang et.al. | 2311.00321v2 | link |
2023-11-01 | ChatGPT-Powered Hierarchical Comparisons for Image Classification | Zhiyuan Ren et.al. | 2311.00206v1 | null |
2023-11-14 | Learning From Mistakes Makes LLM Better Reasoner | Shengnan An et.al. | 2310.20689v2 | link |
2023-10-31 | Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests | Max J. van Duijn et.al. | 2310.20320v1 | null |
2023-10-30 | The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics | Christoph Leiter et.al. | 2310.19792v1 | link |
2023-10-30 | Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection | Noah Ziems et.al. | 2310.19658v1 | null |
2023-10-28 | The Synergy of Speculative Decoding and Batching in Serving Large Language Models | Qidong Su et.al. | 2310.18813v1 | null |
2023-11-01 | Will releasing the weights of future large language models grant widespread access to pandemic agents? | Anjali Gopal et.al. | 2310.18233v2 | null |
2023-10-26 | Beyond MLE: Convex Learning for Text Generation | Chenze Shao et.al. | 2310.17217v1 | null |
2023-10-26 | DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models | Ge Zheng et.al. | 2310.16436v2 | null |
2023-10-25 | Graph Agent: Explicit Reasoning Agent for Graphs | Qinyong Wang et.al. | 2310.16421v1 | null |
2023-12-29 | Evaluating General-Purpose AI with Psychometrics | Xiting Wang et.al. | 2310.16379v2 | null |
2023-10-24 | UI Layout Generation with LLMs Guided by UI Grammar | Yuwen Lu et.al. | 2310.15455v1 | null |
2023-10-22 | Evaluating Subjective Cognitive Appraisals of Emotions from Large Language Models | Hongli Zhan et.al. | 2310.14389v1 | link |
2023-10-22 | Towards Harmful Erotic Content Detection through Coreference-Driven Contextual Analysis | Inez Okulska et.al. | 2310.14325v1 | null |
2023-10-21 | Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation | Anastasia Kritharoula et.al. | 2310.14025v1 | link |
2023-10-20 | Ecologically Valid Explanations for Label Variation in NLI | Nan-Jiang Jiang et.al. | 2310.13850v1 | link |
2023-10-30 | Why Can Large Language Models Generate Correct Chain-of-Thoughts? | Rasul Tutunov et.al. | 2310.13571v2 | null |
2023-10-20 | The Perils & Promises of Fact-checking with Large Language Models | Dorian Quelle et.al. | 2310.13549v1 | null |
2023-10-20 | Explaining Interactions Between Text Spans | Sagnik Ray Choudhury et.al. | 2310.13506v1 | link |
2023-10-19 | Frozen Transformers in Language Models Are Effective Visual Encoder Layers | Ziqi Pang et.al. | 2310.12973v1 | link |
2023-10-28 | Probing LLMs for hate speech detection: strengths and vulnerabilities | Sarthak Roy et.al. | 2310.12860v2 | null |
2023-10-19 | Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong | Chenglei Si et.al. | 2310.12558v1 | null |
2023-10-17 | Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations | Shiyuan Huang et.al. | 2310.11207v1 | null |
2023-11-11 | Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms | Seungju Han et.al. | 2310.10418v2 | link |
2023-10-15 | EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification | Huanhuan Ma et.al. | 2310.09754v1 | link |
2023-10-13 | A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models | Takuma Udagawa et.al. | 2310.08797v1 | null |
2023-10-12 | Circuit Component Reuse Across Tasks in Transformer Language Models | Jack Merullo et.al. | 2310.08744v1 | null |
2023-10-12 | Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification | Chia-Yu Hung et.al. | 2310.08123v1 | null |
2023-10-12 | Large Language Models for Scientific Synthesis, Inference and Explanation | Yizhen Zheng et.al. | 2310.07984v1 | link |
2023-10-11 | Large Language Models Are Zero-Shot Time Series Forecasters | Nate Gruver et.al. | 2310.07820v1 | link |
2023-10-10 | Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach | Zhenlan Ji et.al. | 2310.06680v1 | null |
2023-10-10 | SCAR: Power Side-Channel Analysis at RTL-Level | Amisha Srivastava et.al. | 2310.06257v1 | null |
2023-10-11 | The Importance of Prompt Tuning for Automated Neuron Explanations | Justin Lee et.al. | 2310.06200v2 | null |
2023-10-09 | A Meta-Learning Perspective on Transformers for Causal Language Modeling | Xinbo Wu et.al. | 2310.05884v1 | null |
2023-10-10 | Are Large Language Models Post Hoc Explainers? | Nicholas Kroeger et.al. | 2310.05797v2 | link |
2023-10-09 | A Closer Look into Automatic Evaluation Using Large Language Models | Cheng-Han Chiang et.al. | 2310.05657v1 | link |
2023-10-09 | Explaining the Complex Task Reasoning of Large Language Models with Template-Content Structure | Haotong Yang et.al. | 2310.05452v1 | null |
2023-10-20 | Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models | Haoran Wang et.al. | 2310.05253v2 | link |
2023-10-08 | Scaling Laws of RoPE-based Extrapolation | Xiaoran Liu et.al. | 2310.05209v1 | null |
2023-10-08 | Harnessing the Power of ChatGPT in Fake News: An In-Depth Exploration in Generation, Detection and Explanation | Yue Huang et.al. | 2310.05046v1 | null |
2023-10-08 | Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading | Howard Chen et.al. | 2310.05029v1 | null |
2023-10-08 | Domain Knowledge Graph Construction Via A Simple Checker | Yueling Zeng et.al. | 2310.04949v1 | null |
2023-11-11 | FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets | Neng Wang et.al. | 2310.04793v2 | link |
2023-10-03 | Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions | Naiming Liu et.al. | 2310.02439v1 | null |
2023-10-13 | Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving | Long Chen et.al. | 2310.01957v2 | link |
2023-11-28 | DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models | Albert Garde et.al. | 2310.01870v2 | link |
2023-12-07 | UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities | Hejia Geng et.al. | 2310.01441v2 | null |
2023-10-02 | Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback | Jacob Whitehill et.al. | 2310.01132v1 | null |
2023-10-08 | Back to the Future: Towards Explainable Temporal Reasoning with Large Language Models | Chenhan Yuan et.al. | 2310.01074v2 | link |
2023-10-01 | Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning | Mustafa Shukor et.al. | 2310.00647v1 | link |
2023-11-22 | Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals | Yair Gat et.al. | 2310.00603v2 | null |
2023-09-29 | Tell Me a Story! Narrative-Driven XAI with Large Language Models | David Martens et.al. | 2309.17057v1 | link |
2023-09-28 | T-COL: Generating Counterfactual Explanations for General User Preferences on Variable Machine Learning Systems | Ming Wang et.al. | 2309.16146v1 | link |
2023-09-28 | TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration | Hongru Wang et.al. | 2309.16090v1 | null |
2023-09-27 | HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs) | Tarek Ali et.al. | 2309.16021v1 | null |
2023-09-27 | MindGPT: Interpreting What You See with Non-invasive Brain Recordings | Jiaxuan Chen et.al. | 2309.15729v1 | link |
2023-09-23 | LLMs as Counterfactual Explanation Modules: Can ChatGPT Explain Black-box Text Classifiers? | Amrita Bhattacharjee et.al. | 2309.13340v1 | null |
2023-09-21 | JobRecoGPT -- Explainable job recommendations using LLMs | Preetam Ghosh et.al. | 2309.11805v1 | null |
2023-09-20 | Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction | Masahiro Kaneko et.al. | 2309.11439v1 | link |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-07-24 | How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations? | Leo Yu-Ho Lo et.al. | 2407.17291v1 | null |
2024-07-24 | SAFETY-J: Evaluating Safety with Critique | Yixiu Liu et.al. | 2407.17075v1 | null |
2024-07-24 | Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism | Anhao Zhao et.al. | 2407.17011v1 | null |
2024-07-23 | PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets | Jaeyoung Kim et.al. | 2407.16329v1 | null |
2024-07-22 | Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs | Abhay Sheshadri et.al. | 2407.15549v1 | null |
2024-07-22 | Decoding BACnet Packets: A Large Language Model Approach for Packet Interpretation | Rashi Sharma et.al. | 2407.15428v1 | null |
2024-07-22 | Dissecting Multiplication in Transformers: Insights into LLMs | Luyu Qiu et.al. | 2407.15360v1 | null |
2024-07-23 | LLMExplainer: Large Language Model based Bayesian Inference for Graph Explanation Generation | Jiaxing Zhang et.al. | 2407.15351v2 | null |
2024-07-21 | XAI meets LLMs: A Survey of the Relation between Explainable AI and Large Language Models | Erik Cambria et.al. | 2407.15248v1 | null |
2024-07-19 | Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context | Nilanjana Das et.al. | 2407.14644v1 | null |
2024-07-19 | On Pre-training of Multimodal Language Models Customized for Chart Understanding | Wan-Cyuan Fan et.al. | 2407.14506v1 | null |
2024-07-19 | Check-Eval: A Checklist-based Approach for Evaluating Text Quality | Jayr Pereira et.al. | 2407.14467v1 | null |
2024-07-02 | Predictive Simultaneous Interpretation: Harnessing Large Language Models for Democratizing Real-Time Multilingual Communication | Kurando Iida et.al. | 2407.14269v1 | null |
2024-07-19 | KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models | Kemou Jiang et.al. | 2407.14239v1 | null |
2024-07-19 | LeKUBE: A Legal Knowledge Update BEnchmark | Changyue Wang et.al. | 2407.14192v1 | null |
2024-07-19 | ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness? | Siddhant Waghjale et.al. | 2407.14044v1 | link |
2024-07-18 | PRAGyan -- Connecting the Dots in Tweets | Rahul Ravi et.al. | 2407.13909v1 | null |
2024-07-18 | X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs | Sirnam Swetha et.al. | 2407.13851v1 | null |
2024-07-24 | The Honorific Effect: Exploring the Impact of Japanese Linguistic Formalities on AI-Generated Physics Explanations | Keisuke Sato et.al. | 2407.13787v2 | null |
2024-07-03 | RDBE: Reasoning Distillation-Based Evaluation Enhances Automatic Essay Scoring | Ali Ghiasvand Mohammadkhani et.al. | 2407.13781v1 | null |
2024-07-20 | EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension | Wei Zhang et.al. | 2407.13596v2 | link |
2024-07-18 | CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis | Junying Chen et.al. | 2407.13301v1 | null |
2024-07-18 | SOMONITOR: Explainable Marketing Data Processing and Analysis with Large Language Models | Qi Yang et.al. | 2407.13117v1 | null |
2024-07-18 | TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models | Ling Yue et.al. | 2407.13115v1 | null |
2024-07-10 | Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey) | Krishnaram Kenthapadi et.al. | 2407.12858v1 | null |
2024-07-01 | AutoFlow: Automated Workflow Generation for Large Language Model Agents | Zelong Li et.al. | 2407.12821v1 | link |
2024-07-17 | AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism | William Brannon et.al. | 2407.12613v1 | link |
2024-07-17 | NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models | Gengze Zhou et.al. | 2407.12366v1 | link |
2024-07-16 | GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text | Kyle Hamilton et.al. | 2407.11827v1 | null |
2024-07-15 | Mechanistic interpretability of large language models with applications to the financial services industry | Ashkan Golgoon et.al. | 2407.11215v1 | null |
2024-06-27 | Does ChatGPT Have a Mind? | Simon Goldstein et.al. | 2407.11015v1 | null |
2024-06-24 | Visualization Literacy of Multimodal Large Language Models: A Comparative Study | Zhimin Li et.al. | 2407.10996v1 | null |
2024-07-15 | Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval | Shengjie Ma et.al. | 2407.10805v1 | null |
2024-07-15 | Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping | Wenhao Zhu et.al. | 2407.10795v1 | link |
2024-07-15 | Interpretability analysis on a pathology foundation model reveals biologically relevant embeddings across modalities | Nhat Le et.al. | 2407.10785v1 | null |
2024-07-15 | Learning Dynamics of LLM Finetuning | Yi Ren et.al. | 2407.10490v1 | link |
2024-07-17 | LAB-Bench: Measuring Capabilities of Language Models for Biology Research | Jon M. Laurent et.al. | 2407.10362v3 | null |
2024-07-22 | TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation | Roni Goldshmidt et.al. | 2407.10114v2 | null |
2024-07-14 | Enhancing Emotion Prediction in News Headlines: Insights from ChatGPT and Seq2Seq Models for Free-Text Generation | Ge Gao et.al. | 2407.10091v1 | null |
2024-07-13 | Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks | Shengbin Yue et.al. | 2407.09893v1 | link |
2024-07-13 | Speech-Guided Sequential Planning for Autonomous Navigation using Large Language Model Meta AI 3 (Llama3) | Alkesh K. Srivastava et.al. | 2407.09890v1 | null |
2024-06-26 | Prompting Whole Slide Image Based Genetic Biomarker Prediction | Ling Zhang et.al. | 2407.09540v1 | null |
2024-07-12 | SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers | Shraman Pramanick et.al. | 2407.09413v1 | link |
2024-07-11 | Fault Diagnosis in Power Grids with Large Language Model | Liu Jing et.al. | 2407.08836v1 | null |
2024-07-11 | Tamil Language Computing: the Present and the Future | Kengatharaiyer Sarveswaran et.al. | 2407.08618v1 | null |
2024-07-11 | Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility | Yuchen Xia et.al. | 2407.08550v1 | null |
2024-07-11 | Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models | Ying Zhang et.al. | 2407.08532v1 | null |
2024-07-11 | On the attribution of confidence to large language models | Geoff Keeling et.al. | 2407.08388v1 | null |
2024-07-11 | Towards Explainable Evolution Strategies with Large Language Models | Jill Baumann et.al. | 2407.08331v1 | null |
2024-07-11 | GeNet: A Multimodal LLM-Based Co-Pilot for Network Topology and Configuration | Beni Ifland et.al. | 2407.08249v1 | null |
2024-07-10 | On LLM Wizards: Identifying Large Language Models' Behaviors for Wizard of Oz Experiments | Jingchao Fang et.al. | 2407.08067v1 | null |
2024-07-10 | Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models | Yuji Zhang et.al. | 2407.08039v1 | null |
2024-07-10 | Transformer Alignment in Large Language Models | Murdock Aubry et.al. | 2407.07810v1 | null |
2024-07-10 | Interpretable Differential Diagnosis with Dual-Inference Large Language Models | Shuang Zhou et.al. | 2407.07330v1 | null |
2024-07-09 | Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges | Emilio Ferrara et.al. | 2407.07196v1 | null |
2024-07-09 | Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models | Flor Miriam Plaza-del-Arco et.al. | 2407.06908v1 | null |
2024-07-10 | Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts | Shuangkang Fang et.al. | 2407.06842v2 | null |
2024-07-09 | Combining Knowledge Graphs and Large Language Models | Amanda Kau et.al. | 2407.06564v1 | null |
2024-07-09 | Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons | Yongqi Leng et.al. | 2407.06488v1 | null |
2024-07-08 | Artificial Intuition: Efficient Classification of Scientific Abstracts | Harsh Sakhrani et.al. | 2407.06093v1 | null |
2024-07-08 | GenFollower: Enhancing Car-Following Prediction with Large Language Models | Xianda Chen et.al. | 2407.05611v1 | null |
2024-07-07 | Experiments with truth using Machine Learning: Spectral analysis and explainable classification of synthetic, false, and genuine information | Vishnu S. Pendyala et.al. | 2407.05464v1 | null |
2024-07-06 | Enhance the Robustness of Text-Centric Multimodal Alignments | Ting-Yu Yen et.al. | 2407.05036v1 | null |
2024-07-05 | MobileFlow: A Multimodal LLM For Mobile GUI Agent | Songqin Nong et.al. | 2407.04346v1 | null |
2024-07-05 | Crafting Large Language Models for Enhanced Interpretability | Chung-En Sun et.al. | 2407.04307v1 | null |
2024-07-17 | DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning | Chengpeng Li et.al. | 2407.04078v3 | link |
2024-07-04 | A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations | Md Tahmid Rahman Laskar et.al. | 2407.04069v1 | null |
2024-07-04 | Semantic Graphs for Syntactic Simplification: A Revisit from the Age of LLM | Peiran Yao et.al. | 2407.04067v1 | link |
2024-07-15 | LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking | Amy Xin et.al. | 2407.04020v2 | link |
2024-07-04 | Generative Technology for Human Emotion Recognition: A Scope Review | Fei Ma et.al. | 2407.03640v1 | null |
2024-07-04 | The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model | Brenden Smith et.al. | 2407.03621v1 | link |
2024-07-03 | Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering | Zhaohe Liao et.al. | 2407.03008v1 | null |
2024-07-03 | FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering | Xiaochen Wang et.al. | 2407.02964v1 | null |
2024-07-03 | Model-Enhanced LLM-Driven VUI Testing of VPA Apps | Suwan Li et.al. | 2407.02791v1 | null |
2024-06-27 | Meta Large Language Model Compiler: Foundation Models of Compiler Optimization | Chris Cummins et.al. | 2407.02524v1 | null |
2024-06-23 | INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness | Hung Le et.al. | 2407.02518v1 | null |
2024-07-02 | GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning | Zhisheng Tang et.al. | 2407.01892v1 | link |
2024-06-29 | Potential Renovation of Information Search Process with the Power of Large Language Model for Healthcare | Forhan Bin Emdad et.al. | 2407.01627v1 | null |
2024-07-01 | Agentless: Demystifying LLM-based Software Engineering Agents | Chunqiu Steven Xia et.al. | 2407.01489v1 | link |
2024-07-01 | Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models | Xiaolin Xing et.al. | 2407.01358v1 | link |
2024-07-01 | Calibrated Large Language Models for Binary Question Answering | Patrizio Giovannotti et.al. | 2407.01122v1 | null |
2024-07-01 | Human-like object concept representations emerge naturally in multimodal large language models | Changde Du et.al. | 2407.01067v1 | null |
2024-07-01 | Background-aware Multi-source Fusion Financial Trend Forecasting Mechanism | Fengting Mo et.al. | 2407.00904v1 | null |
2024-06-29 | Financial Knowledge Large Language Model | Cehao Yang et.al. | 2407.00365v1 | null |
2024-06-29 | LLM-Generated Natural Language Meets Scaling Laws: New Explorations and Data Augmentation Methods | Zhenhua Wang et.al. | 2407.00322v1 | null |
2024-06-27 | Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks | Ibrahim Abdelaziz et.al. | 2407.00121v1 | null |
2024-06-17 | A Personalised Learning Tool for Physics Undergraduate Students Built On a Large Language Model for Symbolic Regression | Yufan Zhu et.al. | 2407.00065v1 | null |
2024-06-28 | Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification | Anisha Gunjal et.al. | 2406.20079v1 | link |
2024-06-28 | Learning Interpretable Legal Case Retrieval via Knowledge-Guided Case Reformulation | Chenlong Deng et.al. | 2406.19760v1 | link |
2024-06-27 | PathAlign: A vision-language model for whole slide images in histopathology | Faruk Ahmed et.al. | 2406.19578v1 | null |
2024-06-27 | DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions | Nigel Fernandez et.al. | 2406.19356v1 | null |
2024-06-27 | Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding | Yue Fan et.al. | 2406.19263v1 | link |
2024-06-27 | Towards Learning Abductive Reasoning using VSA Distributed Representations | Giacomo Camposampiero et.al. | 2406.19121v1 | link |
2024-06-27 | LayoutCopilot: An LLM-powered Multi-agent Collaborative Framework for Interactive Analog Layout Design | Bingyang Liu et.al. | 2406.18873v1 | null |
2024-06-27 | DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment | Ke-Han Lu et.al. | 2406.18871v1 | null |
2024-06-27 | ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation | Jizheng Chen et.al. | 2406.18825v1 | null |
2024-06-26 | Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism | Shi Zong et.al. | 2406.18762v1 | null |
2024-07-15 | Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models | Georgios Tziafas et.al. | 2406.18746v2 | null |
2024-06-26 | Themis: Towards Flexible and Interpretable NLG Evaluation | Xinyu Hu et.al. | 2406.18365v1 | link |
2024-06-26 | AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations | Adam Dahlgren Lindström et.al. | 2406.18346v1 | null |
2024-06-26 | A Context-Driven Approach for Co-Auditing Smart Contracts with The Support of GPT-4 code interpreter | Mohamed Salah Bouafif et.al. | 2406.18075v1 | null |
2024-06-26 | Diagnosis Assistant for Liver Cancer Utilizing a Large Language Model with Three Types of Knowledge | Xuzhou Wu et.al. | 2406.18039v1 | null |
2024-06-26 | Automated Clinical Data Extraction with Knowledge Conditioned LLMs | Diya Li et.al. | 2406.18027v1 | null |
2024-06-25 | Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective | Hanqi Yan et.al. | 2406.17969v1 | null |
2024-06-25 | Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback | Zhongtao Miao et.al. | 2406.17873v1 | link |
2024-06-25 | Human-Object Interaction from Human-Level Instructions | Zhen Wu et.al. | 2406.17840v1 | null |
2024-06-22 | MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries? | Xirui Li et.al. | 2406.17806v1 | null |
2024-06-25 | Banishing LLM Hallucinations Requires Rethinking Generalization | Johnny Li et.al. | 2406.17642v1 | null |
2024-06-25 | Large Language Models are Interpretable Learners | Ruochen Wang et.al. | 2406.17224v1 | link |
2024-07-01 | Large Language Models Assume People are More Rational than We Really are | Ryan Liu et.al. | 2406.17055v2 | link |
2024-06-23 | Unveiling LLM Mechanisms Through Neural ODEs and Control Theory | Yukun Zhang et.al. | 2406.16985v1 | null |
2024-06-24 | USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations | Mounika Marreddy et.al. | 2406.16833v1 | null |
2024-06-25 | RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale | Beck LaBash et.al. | 2406.16801v2 | link |
2024-06-24 | OCALM: Object-Centric Assessment with Language Models | Timo Kaufmann et.al. | 2406.16748v1 | null |
2024-06-29 | EmoLLM: Multimodal Emotional Understanding Meets Large Language Models | Qu Yang et.al. | 2406.16442v2 | link |
2024-06-25 | Graph-Augmented LLMs for Personalized Health Insights: A Case Study in Sleep Analysis | Ajan Subramanian et.al. | 2406.16252v2 | null |
2024-06-23 | Preference Tuning For Toxicity Mitigation Generalizes Across Languages | Xiaochen Li et.al. | 2406.16235v1 | link |
2024-06-23 | Towards Natural Language-Driven Assembly Using Foundation Models | Omkar Joglekar et.al. | 2406.16093v1 | null |
2024-06-23 | Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models | Tianyi Men et.al. | 2406.16033v1 | null |
2024-06-25 | AudioBench: A Universal Benchmark for Audio Large Language Models | Bin Wang et.al. | 2406.16020v2 | link |
2024-06-23 | Memorizing Documents with Guidance in Large Language Models | Bumjin Park et.al. | 2406.15996v1 | null |
2024-06-30 | LLM-Powered Explanations: Unraveling Recommendations Through Subgraph Reasoning | Guangsi Shi et.al. | 2406.15859v2 | null |
2024-06-22 | DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models | Wei Guan et.al. | 2406.15781v1 | link |
2024-06-22 | MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception | Guanqun Wang et.al. | 2406.15768v1 | null |
2024-06-21 | Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph | Roman Vashurin et.al. | 2406.15627v1 | null |
2024-06-19 | Dr.E Bridges Graphs with Large Language Models through Words | Zipeng Liu et.al. | 2406.15504v1 | null |
2024-06-21 | A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation | Irune Zubiaga et.al. | 2406.15227v1 | null |
2024-06-21 | Unsupervised Extraction of Dialogue Policies from Conversations | Makesh Narsimhan Sreedhar et.al. | 2406.15214v1 | null |
2024-06-21 | Asynchronous Large Language Model Enhanced Planner for Autonomous Driving | Yuan Chen et.al. | 2406.14556v2 | null |
2024-06-20 | LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors | Sheikh Asif Imran et.al. | 2406.14498v1 | link |
2024-06-20 | Self-supervised Interpretable Concept-based Models for Text Classification | Francesco De Santis et.al. | 2406.14335v1 | null |
2024-07-01 | QuST-LLM: Integrating Large Language Models for Comprehensive Spatial Transcriptomics Analysis | Chao Hui Huang et.al. | 2406.14307v2 | link |
2024-06-20 | Definition generation for lexical semantic change detection | Mariia Fedorova et.al. | 2406.14167v1 | link |
2024-06-20 | Finding Safety Neurons in Large Language Models | Jianhui Chen et.al. | 2406.14144v1 | null |
2024-06-19 | Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning | Yuval Shalev et.al. | 2406.13858v1 | null |
2024-06-19 | Fine-Tuning Gemma-7B for Enhanced Sentiment Analysis of Financial News Headlines | Kangtong Mo et.al. | 2406.13626v1 | null |
2024-06-27 | VDebugger: Harnessing Execution Feedback for Debugging Visual Programs | Xueqing Wu et.al. | 2406.13444v2 | link |
2024-06-19 | Finding Blind Spots in Evaluator LLMs with Interpretable Checklists | Sumanth Doddapaneni et.al. | 2406.13439v1 | link |
2024-06-19 | Data Contamination Can Cross Language Barriers | Feng Yao et.al. | 2406.13236v1 | link |
2024-06-19 | Locating and Extracting Relational Concepts in Large Language Models | Zijian Wang et.al. | 2406.13184v1 | link |
2024-06-19 | LLMatDesign: Autonomous Materials Discovery with Large Language Models | Shuyi Jia et.al. | 2406.13163v1 | null |
2024-06-18 | Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Haoxiang Wang et.al. | 2406.12845v1 | link |
2024-06-18 | ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Team GLM et.al. | 2406.12793v1 | link |
2024-06-18 | UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions | Xunzhi Wang et.al. | 2406.12784v1 | link |
2024-06-18 | Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning | Bingchen Zhao et.al. | 2406.12742v1 | link |
2024-06-18 | On the Robustness of Language Models for Tabular Question Answering | Kushal Raj Bhandari et.al. | 2406.12719v1 | null |
2024-06-18 | Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction | Haoqiu Yan et.al. | 2406.12707v1 | link |
2024-06-18 | MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL | Arian Askari et.al. | 2406.12692v1 | null |
2024-06-18 | Estimating Knowledge in Large Language Models Without Generating a Single Token | Daniela Gottesman et.al. | 2406.12673v1 | null |
2024-06-18 | Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics | Huan Xu et.al. | 2406.12651v1 | null |
2024-06-19 | Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models | Hengyi Wang et.al. | 2406.12649v2 | null |
2024-06-19 | Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models | Eldar Kurtic et.al. | 2406.12572v2 | link |
2024-06-18 | LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation | Yuhao Wang et.al. | 2406.12529v1 | null |
2024-06-18 | Interpreting Bias in Large Language Models: A Feature-Based Approach | Nirmalendu Prakash et.al. | 2406.12347v1 | null |
2024-06-18 | A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning | Lijie Hu et.al. | 2406.12255v1 | null |
2024-06-29 | Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM | Huaxin Zhang et.al. | 2406.12235v2 | link |
2024-06-24 | Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector | Gangwei Jiang et.al. | 2406.12227v2 | null |
2024-06-17 | Satyrn: A Platform for Analytics Augmented Generation | Marko Sterbentz et.al. | 2406.12069v1 | null |
2024-06-17 | ARTIST: Improving the Generation of Text-rich Images by Disentanglement | Jianyi Zhang et.al. | 2406.12044v1 | null |
2024-06-17 | Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts | Junmo Kang et.al. | 2406.12034v1 | null |
2024-06-17 | How Do Large Language Models Acquire Factual Knowledge During Pretraining? | Hoyeon Chang et.al. | 2406.11813v1 | null |
2024-06-17 | WaDec: Decompile WebAssembly Using Large Language Model | Xinyu She et.al. | 2406.11346v1 | null |
2024-06-17 | Can Machines Resonate with Humans? Evaluating the Emotional and Empathic Comprehension of LMs | Muhammad Arslan Manzoor et.al. | 2406.11250v1 | null |
2024-06-17 | Enabling robots to follow abstract instructions and complete complex dynamic tasks | Ruaridh Mon-Williams et.al. | 2406.11231v1 | null |
2024-06-17 | Compound Schema Registry | Silvery D. Fu et.al. | 2406.11227v1 | null |
2024-06-17 | MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model | Jiahao Huo et.al. | 2406.11193v1 | null |
2024-06-18 | DELRec: Distilling Sequential Pattern to Enhance LLM-based Recommendation | Guohao Sun et.al. | 2406.11156v2 | null |
2024-07-01 | The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models | Bolei Ma et.al. | 2406.11096v2 | null |
2024-06-16 | Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens | Weiyao Luo et.al. | 2406.10985v1 | null |
2024-06-18 | City-LEO: Toward Transparent City Management Using LLM with End-to-End Optimization | Zihao Jiao et.al. | 2406.10958v2 | null |
2024-06-28 | Large Language Model Enhanced Clustering for News Event Detection | Adane Nega Tarekegn et.al. | 2406.10552v3 | null |
2024-06-17 | Requirements are All You Need: From Requirements to Code with LLMs | Bingyang Wei et.al. | 2406.10101v2 | link |
2024-06-14 | Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation | Xiaoman Wang et.al. | 2406.10091v1 | null |
2024-06-14 | Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam | Nabor C. Mendonça et.al. | 2406.09671v1 | link |
2024-06-12 | LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions | Nhat Hoang-Xuan et.al. | 2406.08572v1 | null |
2024-06-12 | Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning | Jaehyun Nam et.al. | 2406.08527v1 | null |
2024-06-12 | Leveraging Large Language Models for Web Scraping | Aman Ahluwalia et.al. | 2406.08246v1 | null |
2024-06-12 | AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection | Pia Pachinger et.al. | 2406.08080v1 | null |
2024-06-12 | A Concept-Based Explainability Framework for Large Multimodal Models | Jayneel Parekh et.al. | 2406.08074v1 | null |
2024-06-12 | Toward a Method to Generate Capability Ontologies from Natural Language Descriptions | Luis Miguel Vieira da Silva et.al. | 2406.07962v1 | null |
2024-06-11 | Estimating the Hallucination Rate of Generative AI | Andrew Jesson et.al. | 2406.07457v1 | null |
2024-06-11 | Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities | Delfina Sol Martinez Pandiani et.al. | 2406.07353v1 | link |
2024-06-11 | Instruct Large Language Models to Drive like Humans | Ruijun Zhang et.al. | 2406.07296v1 | link |
2024-06-10 | Harnessing AI for efficient analysis of complex policy documents: a case study of Executive Order 14110 | Mark A. Kramer et.al. | 2406.06657v1 | null |
2024-06-09 | Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering | Saman Pordanesh et.al. | 2406.06637v1 | null |
2024-06-09 | LLM Questionnaire Completion for Automatic Psychiatric Assessment | Gony Rosenman et.al. | 2406.06636v1 | null |
2024-06-07 | LinkQ: An LLM-Assisted Visual Interface for Knowledge Graph Question-Answering | Harry Li et.al. | 2406.06621v1 | link |
2024-06-06 | Prototypical Reward Network for Data-Efficient RLHF | Jinghan Zhang et.al. | 2406.06606v1 | null |
2024-06-13 | From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models | Xiaofeng Zhang et.al. | 2406.06579v2 | null |
2024-06-18 | OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step | Owen Dugan et.al. | 2406.06576v2 | null |
2024-06-02 | Inverse Constitutional AI: Compressing Preferences into Principles | Arduin Findeis et.al. | 2406.06560v1 | link |
2024-06-11 | Transforming Wearable Data into Health Insights using Large Language Model Agents | Mike A. Merrill et.al. | 2406.06464v2 | null |
2024-06-10 | Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization | Yi Gu et.al. | 2406.06382v1 | link |
2024-06-10 | MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows | Xingjian Zhang et.al. | 2406.06357v1 | link |
2024-06-11 | iMotion-LLM: Motion Prediction Instruction Tuning | Abdulwahab Felemban et.al. | 2406.06211v2 | null |
2024-06-10 | Prompting Large Language Models with Audio for General-Purpose Speech Summarization | Wonjune Kang et.al. | 2406.05968v1 | link |
2024-06-16 | RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation | Kiseung Kim et.al. | 2406.05794v2 | null |
2024-06-08 | VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification | Jianmeng Liu et.al. | 2406.05543v1 | null |
2024-06-08 | MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention | Prince Jha et.al. | 2406.05344v1 | link |
2024-06-07 | LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration | Tavor Lipman et.al. | 2406.05107v1 | null |
2024-06-07 | LLM-based speaker diarization correction: A generalizable approach | Georgios Efstathiadis et.al. | 2406.04927v1 | link |
2024-06-07 | Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models | Michał Romaszewski et.al. | 2406.04926v1 | null |
2024-06-07 | WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild | Bill Yuchen Lin et.al. | 2406.04770v1 | link |
2024-06-07 | LogiCode: an LLM-Driven Framework for Logical Anomaly Detection | Yiheng Zhang et.al. | 2406.04687v1 | link |
2024-06-07 | Large Language Model-guided Document Selection | Xiang Kong et.al. | 2406.04638v1 | null |
2024-06-07 | OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and Evaluation Framework | Wei Zhou et.al. | 2406.04598v1 | null |
2024-06-06 | MAIRA-2: Grounded Radiology Report Generation | Shruthi Bannur et.al. | 2406.04449v1 | null |
2024-06-01 | Large Language Model Confidence Estimation via Black-Box Access | Tejaswini Pedapati et.al. | 2406.04370v1 | null |
2024-06-06 | Verbalized Machine Learning: Revisiting Machine Learning with Language Models | Tim Z. Xiao et.al. | 2406.04344v1 | null |
2024-06-06 | Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People | Dun-Ming Huang et.al. | 2406.04278v1 | link |
2024-06-06 | Legal Judgment Reimagined: PredEx and the Rise of Intelligent AI Interpretation in Indian Courts | Shubham Kumar Nigam et.al. | 2406.04136v1 | link |
2024-06-06 | Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning | Xiaohu Du et.al. | 2406.03718v1 | link |
2024-06-13 | Ranking Manipulation for Conversational Search Engines | Samuel Pfrommer et.al. | 2406.03589v2 | link |
2024-06-04 | Dynamic and Adaptive Feature Generation with LLM | Xinhao Zhang et.al. | 2406.03505v1 | null |
2024-06-05 | Cycles of Thought: Measuring LLM Confidence through Stable Explanations | Evan Becker et.al. | 2406.03441v1 | null |
2024-06-05 | Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models | Qiang Sun et.al. | 2406.02962v1 | link |
2024-06-06 | Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers | Brian K Chen et.al. | 2406.02847v2 | null |
2024-06-04 | Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks | Tianyu He et.al. | 2406.02550v1 | link |
2024-06-04 | Iteration Head: A Mechanistic Study of Chain-of-Thought | Vivien Cabannes et.al. | 2406.02128v1 | null |
2024-06-04 | I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering | Valeriya Goloviznina et.al. | 2406.02060v1 | null |
2024-06-04 | Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs | Nik Bear Brown et.al. | 2406.01943v1 | null |
2024-06-05 | Dishonesty in Helpful and Harmless Alignment | Youcheng Huang et.al. | 2406.01931v2 | null |
2024-06-21 | Large Language Model-Enabled Multi-Agent Manufacturing Systems | Jonghan Lim et.al. | 2406.01893v2 | null |
2024-06-04 | PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning | Yupeng Zheng et.al. | 2406.01587v2 | null |
2024-06-03 | LoFiT: Localized Fine-tuning on LLM Representations | Fangcong Yin et.al. | 2406.01563v1 | link |
2024-06-20 | What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores | Ebrahim Feghhi et.al. | 2406.01538v2 | link |
2024-06-03 | The Geometry of Categorical and Hierarchical Concepts in Large Language Models | Kiho Park et.al. | 2406.01506v1 | link |
2024-06-11 | AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation | Junhao Cheng et.al. | 2406.01388v2 | link |
2024-06-03 | Large Language Model Assisted Optimal Bidding of BESS in FCAS Market: An AI-agent based Approach | Borui Zhang et.al. | 2406.00974v1 | null |
2024-06-04 | Efficient Behavior Tree Planning with Commonsense Pruning and Heuristic | Xinglin Chen et.al. | 2406.00965v2 | null |
2024-06-10 | Are you still on track!? Catching LLM Task Drift with Activations | Sahar Abdelnabi et.al. | 2406.00799v2 | null |
2024-06-02 | An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging | Sulaiman Khan et.al. | 2406.00667v1 | null |
2024-06-02 | Presence or Absence: Are Unknown Word Usages in Dictionaries? | Xianghe Ma et.al. | 2406.00656v1 | link |
2024-06-11 | InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation | Jacob Si et.al. | 2406.00426v3 | link |
2024-06-01 | Controlling Large Language Model Agents with Entropic Activation Steering | Nate Rahn et.al. | 2406.00244v1 | null |
2024-05-31 | DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models | Linli Yao et.al. | 2405.20985v1 | null |
2024-05-31 | Improving Reward Models with Synthetic Critiques | Zihuiwen Ye et.al. | 2405.20850v1 | null |
2024-05-31 | Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning | Cheng Tan et.al. | 2405.20834v1 | null |
2024-05-31 | UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation | Hanzhang Zhou et.al. | 2405.20612v1 | null |
2024-05-30 | XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution | Yurui Chang et.al. | 2405.20404v1 | null |
2024-05-30 | Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks | Chen Xiong et.al. | 2405.20099v1 | null |
2024-05-30 | Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models | Yuxiao Luo et.al. | 2405.19850v1 | null |
2024-05-30 | Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model | Chaochen Gao et.al. | 2405.19846v1 | null |
2024-05-30 | Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback | Jingwei Sun et.al. | 2405.19686v1 | null |
2024-05-29 | Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation | Atrisha Sarkar et.al. | 2405.19328v1 | null |
2024-05-29 | Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models | Tianrun Chen et.al. | 2405.19326v1 | null |
2024-05-29 | Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery | Sounak Lahiri et.al. | 2405.19164v1 | null |
2024-06-02 | Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design | Markus J. Buehler et.al. | 2405.19076v2 | link |
2024-06-03 | Genshin: General Shield for Natural Language Processing with Large Language Models | Xiao Peng et.al. | 2405.18741v2 | null |
2024-06-02 | LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification | Renyi Qu et.al. | 2405.18672v2 | null |
2024-05-28 | Large Language Models as Partners in Student Essay Evaluation | Toru Ishida et.al. | 2405.18632v1 | null |
2024-05-28 | OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning | Pengxiang Li et.al. | 2405.18380v1 | link |
2024-05-28 | FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models | Yang Zhang et.al. | 2405.18218v1 | null |
2024-05-28 | Exploring Context Window of Large Language Models via Decomposed Positional Vectors | Zican Dong et.al. | 2405.18009v1 | null |
2024-05-28 | SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions | Juexiao Zhou et.al. | 2405.18004v1 | null |
2024-05-28 | Knowledge Circuits in Pretrained Transformers | Yunzhi Yao et.al. | 2405.17969v1 | link |
2024-05-28 | Arithmetic Reasoning with LLM: Prolog Generation & Permutation | Xiaocheng Yang et.al. | 2405.17893v1 | null |
2024-05-27 | Mechanistic Interpretability of Binary and Ternary Transformers | Jason Li et.al. | 2405.17703v1 | link |
2024-05-27 | Deployment of NLP and LLM Techniques to Control Mobile Robots at the Edge: A Case Study Using GPT-4-Turbo and LLaMA 2 | Pascal Sikorski et.al. | 2405.17670v1 | null |
2024-05-27 | Enhanced Robot Arm at the Edge with NLP and Vision Systems | Pascal Sikorski et.al. | 2405.17665v1 | null |
2024-05-27 | BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments | Yusuf Roohani et.al. | 2405.17631v1 | link |
2024-05-25 | Revisit, Extend, and Enhance Hessian-Free Influence Functions | Ziao Yang et.al. | 2405.17490v1 | null |
2024-05-28 | LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding | Haoyu Zhao et.al. | 2405.17104v2 | null |
2024-05-27 | Exploring the LLM Journey from Cognition to Expression with Linear Representations | Yuzi Yan et.al. | 2405.16964v1 | null |
2024-05-27 | TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing | Xinyu Zhang et.al. | 2405.16803v1 | null |
2024-05-26 | Crafting Interpretable Embeddings by Asking LLMs Questions | Vinamra Benara et.al. | 2405.16714v1 | link |
2024-05-26 | Attaining Human`s Desirable Outcomes in Human-AI Interaction via Structural Causal Games | Anjie Liu et.al. | 2405.16588v1 | null |
2024-05-26 | Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search | Max Liu et.al. | 2405.16450v1 | null |
2024-05-26 | Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level | Runlin Lei et.al. | 2405.16405v1 | null |
2024-05-25 | Large Language Models Enable Automated Formative Feedback in Human-Robot Interaction Tasks | Emily Jensen et.al. | 2405.16344v1 | null |
2024-06-03 | Picturing Ambiguity: A Visual Twist on the Winograd Schema Challenge | Brendan Park et.al. | 2405.16277v3 | link |
2024-05-25 | Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention | Andrew Li et.al. | 2405.16042v1 | null |
2024-05-24 | Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models | Yue Zhang et.al. | 2405.15684v1 | null |
2024-05-24 | Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges | Jonas Becker et.al. | 2405.15604v1 | link |
2024-05-24 | ChatGPT Code Detection: Techniques for Uncovering the Source of Code | Marc Oedingen et.al. | 2405.15512v1 | link |
2024-05-24 | Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search | Nicola Dainese et.al. | 2405.15383v1 | null |
2024-05-24 | Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection | Jun Liu et.al. | 2405.15370v1 | null |
2024-05-24 | V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM | Abdur Rahman et.al. | 2405.15341v1 | null |
2024-05-24 | Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework | Minzhi Li et.al. | 2405.15329v1 | null |
2024-05-24 | Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation | Ge Qu et.al. | 2405.15307v1 | link |
2024-05-23 | AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct} | Bin Lei et.al. | 2405.14906v1 | link |
2024-05-28 | Explaining Multi-modal Large Language Models by Analyzing their Vision Perception | Loris Giulivi et.al. | 2405.14612v2 | link |
2024-05-23 | Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning | Jiapu Wang et.al. | 2405.14170v1 | null |
2024-05-28 | DeTox: Toxic Subspace Projection for Model Editing | Rheeya Uppaal et.al. | 2405.13967v3 | link |
2024-05-22 | Large Language Models are Good Spontaneous Multilingual Learners: Is the Multilingual Annotated Data Necessary? | Shimao Zhang et.al. | 2405.13816v1 | link |
2024-05-22 | Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation | Gauthier Guinet et.al. | 2405.13622v1 | null |
2024-05-24 | ECLIPSE: Semantic Entropy-LCS for Cross-Lingual Industrial Log Parsing | Wei Zhang et.al. | 2405.13548v2 | null |
2024-05-22 | HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model | Mustafa Yildirim et.al. | 2405.13547v1 | null |
2024-05-21 | A Survey of Robotic Language Grounding: Tradeoffs Between Symbols and Embeddings | Vanya Cohen et.al. | 2405.13245v1 | null |
2024-05-21 | GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation | Govind Ramesh et.al. | 2405.13077v1 | null |
2024-05-19 | Human-Centered LLM-Agent User Interface: A Position Paper | Daniel Chin et.al. | 2405.13050v1 | null |
2024-05-15 | IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues | Diji Yang et.al. | 2405.13021v1 | null |
2024-05-21 | Quantifying Emergence in Large Language Models | Hang Chen et.al. | 2405.12617v1 | link |
2024-05-21 | Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models | Charles O'Neill et.al. | 2405.12522v1 | null |
2024-05-20 | Directed Metric Structures arising in Large Language Models | Stéphane Gaubert et.al. | 2405.12264v1 | null |
2024-05-20 | "Set It Up!": Functional Object Arrangement with Compositional Generative Models | Yiqing Xu et.al. | 2405.11928v1 | null |
2024-05-20 | Unveiling and Manipulating Prompt Influence in Large Language Models | Zijian Feng et.al. | 2405.11891v1 | link |
2024-05-21 | Decoding by Contrasting Knowledge: Enhancing LLMs' Confidence on Edited Facts | Baolong Bi et.al. | 2405.11613v2 | link |
2024-05-17 | Exploring Subjectivity for more Human-Centric Assessment of Social Biases in Large Language Models | Paula Akemi Aoyagui et.al. | 2405.11048v1 | null |
2024-05-20 | The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks | Lucius Bushnaq et.al. | 2405.10928v2 | link |
2024-05-17 | COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain | Dimitrios P. Panagoulias et.al. | 2405.10893v1 | null |
2024-05-17 | MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains | Zhaohuan Zhan et.al. | 2405.10620v1 | null |
2024-05-20 | Language Models can Exploit Cross-Task In-context Learning for Data-Scarce Novel Tasks | Anwoy Chatterjee et.al. | 2405.10548v2 | null |
2024-05-14 | Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs | Akhila Yerukola et.al. | 2405.08760v1 | null |
2024-05-14 | Challenges and Opportunities in Text Generation Explainability | Kenza Amara et.al. | 2405.08468v1 | null |
2024-05-14 | Compositional Text-to-Image Generation with Dense Blob Representations | Weili Nie et.al. | 2405.08246v1 | null |
2024-05-13 | Interpreting Latent Student Knowledge Representations in Programming Assignments | Nigel Fernandez et.al. | 2405.08213v1 | null |
2024-05-11 | Translating Expert Intuition into Quantifiable Features: Encode Investigator Domain Knowledge via LLM for Enhanced Predictive Analytics | Phoebe Jing et.al. | 2405.08017v1 | null |
2024-05-13 | A Generalist Learner for Multifaceted Medical Image Interpretation | Hong-Yu Zhou et.al. | 2405.07988v1 | null |
2024-05-13 | MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning | Shuo Yin et.al. | 2405.07551v1 | null |
2024-05-13 | Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human Instructions | Xinglin Chen et.al. | 2405.07474v1 | null |
2024-05-12 | Human-interpretable clustering of short-text using large language models | Justin K. Miller et.al. | 2405.07278v1 | null |
2024-05-11 | Automating Thematic Analysis: How LLMs Analyse Controversial Topics | Awais Hameed Khan et.al. | 2405.06919v1 | null |
2024-05-21 | AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI Agents | Shuyuan Xu et.al. | 2405.06907v2 | link |
2024-05-10 | MEIC: Re-thinking RTL Debug Automation using LLMs | Ke Xu et.al. | 2405.06840v1 | null |
2024-05-10 | Large Language Model in Financial Regulatory Interpretation | Zhiyu Cao et.al. | 2405.06808v1 | null |
2024-05-15 | On the Shape of Brainscores for Large Language Models (LLMs) | Jingkai Li et.al. | 2405.06725v3 | link |
2024-05-09 | Digital Diagnostics: The Potential Of Large Language Models In Recognizing Symptoms Of Common Illnesses | Gaurav Kumar Gupta et.al. | 2405.06712v1 | null |
2024-05-08 | Interpretable Cross-Examination Technique (ICE-T): Using highly informative features to boost LLM performance | Goran Muric et.al. | 2405.06703v1 | null |
2024-05-13 | Storypark: Leveraging Large Language Models to Enhance Children Story Learning Through Child-AI collaboration Storytelling | Lyumanshan Ye et.al. | 2405.06495v2 | null |
2024-05-10 | Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL | Ning Cheng et.al. | 2405.06410v1 | null |
2024-05-09 | LLMs for XAI: Future Directions for Explaining Explanations | Alexandra Zytek et.al. | 2405.06064v1 | null |
2024-05-09 | Probing Multimodal LLMs as World Models for Driving | Shiva Sreeram et.al. | 2405.05956v1 | link |
2024-05-09 | One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations | Yoonjoo Lee et.al. | 2405.05581v1 | null |
2024-05-11 | Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals | Joshua Clymer et.al. | 2405.05466v2 | null |
2024-05-08 | Empathy Through Multimodality in Conversational Interfaces | Mahyar Abbasian et.al. | 2405.04777v1 | null |
2024-05-09 | Large Language Models for Cyber Security: A Systematic Literature Review | HanXiang Xu et.al. | 2405.04760v2 | null |
2024-05-13 | A Transformer with Stack Attention | Jiaoda Li et.al. | 2405.04515v2 | link |
2024-05-06 | In Situ AI Prototyping: Infusing Multimodal Prompts into Mobile Settings with MobileMaker | Savvas Petridis et.al. | 2405.03806v1 | null |
2024-05-06 | Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames | Keith Burghardt et.al. | 2405.03688v1 | link |
2024-05-23 | AlphaMath Almost Zero: process Supervision without process | Guoxin Chen et.al. | 2405.03553v2 | link |
2024-05-06 | MedDoc-Bot: A Chat Tool for Comparative Analysis of Large Language Models in the Context of the Pediatric Hypertension Guideline | Mohamed Yaseen Jabarulla et.al. | 2405.03359v1 | link |
2024-05-06 | WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning | Yuanhan Zhang et.al. | 2405.03272v1 | null |
2024-05-06 | A Philosophical Introduction to Language Models - Part II: The Way Forward | Raphaël Millière et.al. | 2405.03207v1 | null |
2024-05-23 | Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions | Ruizhe Li et.al. | 2405.03205v2 | link |
2024-05-06 | Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines | Md Main Uddin Rony et.al. | 2405.03153v1 | null |
2024-05-05 | Traffic Performance GPT (TP-GPT): Real-Time Data Informed Intelligent ChatBot for Transportation Surveillance and Management | Bingzhang Wang et.al. | 2405.03076v1 | null |
2024-05-22 | A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs) | Lingyao Li et.al. | 2405.03066v2 | null |
2024-05-07 | Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models | Tianze Xu et.al. | 2405.02801v2 | link |
2024-05-04 | TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants | Mohammad Aliannejadi et.al. | 2405.02637v1 | link |
2024-05-03 | What does the Knowledge Neuron Thesis Have to do with Knowledge? | Jingcheng Niu et.al. | 2405.02421v1 | link |
2024-05-03 | LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model | Yulin Luo et.al. | 2405.02363v1 | null |
2024-04-18 | NL2FOL: Translating Natural Language to First-Order Logic for Logical Fallacy Detection | Abhinav Lalwani et.al. | 2405.02318v1 | null |
2024-05-03 | Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows | Jasmine Y. Shih et.al. | 2405.02260v1 | null |
2024-05-03 | Argumentative Large Language Models for Explainable and Contestable Decision-Making | Gabriel Freedman et.al. | 2405.02079v1 | null |
2024-05-02 | A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law | Zhiyu Zoey Chen et.al. | 2405.01769v1 | null |
2024-05-02 | ALCM: Autonomous LLM-Augmented Causal Discovery Framework | Elahe Khatibi et.al. | 2405.01744v1 | null |
2024-05-01 | GOLD: Geometry Problem Solver with Natural Language Description | Jiaxin Zhang et.al. | 2405.00494v1 | link |
2024-05-01 | The Pyramid of Captions | Delong Chen et.al. | 2405.00485v1 | null |
2024-05-01 | CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model | Wei Zhang et.al. | 2405.00435v1 | null |
2024-04-30 | PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification | Leon Garza et.al. | 2404.19744v1 | null |
2024-05-22 | Neuro-Vision to Language: Enhancing Visual Reconstruction and Language Interaction through Brain Recordings | Guobin Shen et.al. | 2404.19438v3 | null |
2024-04-30 | Transcrib3D: 3D Referring Expression Resolution through Large Language Models | Jiading Fang et.al. | 2404.19221v1 | null |
2024-04-29 | SuperCLUE-Fin: Graded Fine-Grained Analysis of Chinese LLMs on Diverse Financial Tasks and Applications | Liang Xu et.al. | 2404.19063v1 | null |
2024-04-29 | AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering | Wenxiang Zhao et.al. | 2404.18816v1 | null |
2024-04-29 | PECC: Problem Extraction and Coding Challenges | Patrick Haller et.al. | 2404.18766v1 | link |
2024-04-29 | HFT: Half Fine-Tuning for Large Language Models | Tingfeng Hui et.al. | 2404.18466v1 | null |
2024-04-28 | Logic Agent: Enhancing Validity with Logic Rule Invocation | Hanmeng Liu et.al. | 2404.18130v1 | null |
2024-04-27 | MediFact at MEDIQA-CORR 2024: Why AI Needs a Human Touch | Nadia Saeed et.al. | 2404.17999v1 | link |
2024-04-27 | Verco: Learning Coordinated Verbal Communication for Multi-agent Reinforcement Learning | Dapeng Li et.al. | 2404.17780v1 | null |
2024-04-29 | On the Use of Large Language Models to Generate Capability Ontologies | Luis Miguel Vieira da Silva et.al. | 2404.17524v2 | null |
2024-04-26 | Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study | Yang Wu et.al. | 2404.17136v1 | link |
2024-04-25 | AutoGenesisAgent: Self-Generating Multi-Agent Systems for Complex Tasks | Jeremy Harper et.al. | 2404.17017v1 | null |
2024-04-25 | Evolve Cost-aware Acquisition Functions Using Large Language Models | Yiming Yao et.al. | 2404.16906v1 | null |
2024-04-11 | Rumour Evaluation with Very Large Language Models | Dahlia Shehata et.al. | 2404.16859v1 | link |
2024-04-25 | RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis | Xiaoman Zhang et.al. | 2404.16754v1 | null |
2024-04-25 | Evolutionary Large Language Models for Hardware Security: A Comparative Survey | Mohammad Akyash et.al. | 2404.16651v1 | null |
2024-04-25 | Interpreting Answers to Yes-No Questions in Dialogues from Multiple Domains | Zijie Wang et.al. | 2404.16262v1 | link |
2024-04-24 | Return of EM: Entity-driven Answer Set Expansion for QA Evaluation | Dongryeol Lee et.al. | 2404.15650v1 | null |
2024-04-27 | PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models | Shashi Kant Gupta et.al. | 2404.15549v2 | null |
2024-04-01 | Automated Assessment of Encouragement and Warmth in Classrooms Leveraging Multimodal Emotional Features and ChatGPT | Ruikun Hou et.al. | 2404.15310v1 | null |
2024-04-23 | Aligning LLM Agents by Learning Latent Preference from User Edits | Ge Gao et.al. | 2404.15269v1 | link |
2024-04-22 | Pixels and Predictions: Potential of GPT-4V in Meteorological Imagery Analysis and Forecast Communication | John R. Lawson et.al. | 2404.15166v1 | null |
2024-04-23 | Language in Vivo vs. in Silico: Size Matters but Larger Language Models Still Do Not Comprehend Language on a Par with Humans | Vittoria Dentella et.al. | 2404.14883v1 | null |
2024-04-23 | Think-Program-reCtify: 3D Situated Reasoning with Large Language Models | Qingrong He et.al. | 2404.14705v1 | null |
2024-04-26 | Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training | Mengzhao Jia et.al. | 2404.14604v3 | null |
2024-04-22 | Integrating Disambiguation and User Preferences into Large Language Models for Robot Motion Planning | Mohammed Abugurain et.al. | 2404.14547v1 | null |
2024-04-22 | CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment | Kanglei Zhou et.al. | 2404.13999v1 | link |
2024-05-23 | Towards General Conceptual Model Editing via Adversarial Representation Engineering | Yihao Zhang et.al. | 2404.13752v2 | link |
2024-04-21 | FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization | Zhaopeng Gu et.al. | 2404.13671v1 | null |
2024-04-21 | Trojan Detection in Large Language Models: Insights from The Trojan Detection Challenge | Narek Maloyan et.al. | 2404.13660v1 | null |
2024-04-21 | ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval | Kelong Mao et.al. | 2404.13556v1 | link |
2024-04-20 | "I Wish There Were an AI": Challenges and AI Potential in Cancer Patient-Provider Communication | Ziqi Yang et.al. | 2404.13409v1 | null |
2024-04-20 | Large Language Models as Test Case Generators: Performance Evaluation and Enhancement | Kefan Li et.al. | 2404.13340v1 | null |
2024-04-19 | CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models | Manish Bhatt et.al. | 2404.13161v1 | link |
2024-04-19 | Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation | Guanhua Chen et.al. | 2404.12879v1 | null |
2024-04-19 | Large Language Model Supply Chain: A Research Agenda | Shenao Wang et.al. | 2404.12736v1 | null |
2024-04-19 | Just Like Me: The Role of Opinions and Personal Experiences in The Perception of Explanations in Subjective Decision-Making | Sharon Ferguson et.al. | 2404.12558v1 | null |
2024-04-18 | BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models | Yu Feng et.al. | 2404.12494v1 | null |
2024-04-18 | MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale | Xiaotang Gai et.al. | 2404.12372v1 | null |
2024-04-23 | Large Language Models for Synthetic Participatory Planning of Synergistic Transportation Systems | Jiangbo Yu et.al. | 2404.12317v3 | null |
2024-04-18 | Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair | Yusuke Sakai et.al. | 2404.12299v1 | null |
2024-04-18 | Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM | Michelle S. Lam et.al. | 2404.12259v1 | link |
2024-04-18 | EVIT: Event-Oriented Instruction Tuning for Event Reasoning | Zhengwei Tao et.al. | 2404.11978v1 | null |
2024-04-18 | Aligning Language Models to Explicitly Handle Ambiguity | Hyuhng Joon Kim et.al. | 2404.11972v1 | null |
2024-04-18 | Concept Induction using LLMs: a user experiment for assessment | Adrita Barua et.al. | 2404.11875v1 | null |
2024-04-17 | MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory | Ali Modarressi et.al. | 2404.11672v1 | null |
2024-04-16 | Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases | Yanze Li et.al. | 2404.10595v1 | null |
2024-04-16 | Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning | Xiao Wang et.al. | 2404.10552v1 | null |
2024-04-15 | Evolving Interpretable Visual Classifiers with Large Language Models | Mia Chiquier et.al. | 2404.09941v1 | null |
2024-04-15 | Reimagining Self-Adaptation in the Age of Large Language Models | Raghav Donakanti et.al. | 2404.09866v1 | null |
2024-04-16 | How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models | Xiuwei Shang et.al. | 2404.09836v2 | null |
2024-04-15 | Resilience of Large Language Models for Noisy Instructions | Bin Wang et.al. | 2404.09754v1 | null |
2024-04-15 | Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction | David Sobrín-Hidalgo et.al. | 2404.09705v1 | null |
2024-04-15 | Bridging Vision and Language Spaces with Assignment Prediction | Jungin Park et.al. | 2404.09632v1 | link |
2024-04-15 | MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems | Kaixin Li et.al. | 2404.09486v1 | link |
2024-04-14 | Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions | Taojun Hu et.al. | 2404.09135v1 | null |
2024-04-17 | Incremental Residual Concept Bottleneck Models | Chenming Shang et.al. | 2404.08978v2 | null |
2024-04-13 | Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension | Mengnan Qi et.al. | 2404.08885v1 | null |
2024-04-12 | LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning | Junchi Wang et.al. | 2404.08767v1 | link |
2024-04-12 | Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs versus Traditional Relational Databases | Xiang Zhang et.al. | 2404.08727v1 | null |
2024-04-05 | Effects of Different Prompts on the Quality of GPT-4 Responses to Dementia Care Questions | Zhuochun Li et.al. | 2404.08674v1 | null |
2024-03-25 | Linear Cross-document Event Coreference Resolution with X-AMR | Shafiuddin Rehan Ahmed et.al. | 2404.08656v1 | link |
2024-04-12 | Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward | Xuan Xie et.al. | 2404.08517v1 | null |
2024-04-12 | Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task | Hassan Ali et.al. | 2404.08424v1 | null |
2024-03-22 | Content Knowledge Identification with Multi-Agent Large Language Models (LLMs) | Kaiqi Yang et.al. | 2404.07960v1 | null |
2024-04-11 | DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation | Anna C. Doris et.al. | 2404.07917v1 | link |
2024-04-12 | Reflectance Estimation for Proximity Sensing by Vision-Language Models: Utilizing Distributional Semantics for Low-Level Cognition in Robotics | Masashi Osada et.al. | 2404.07717v2 | link |
2024-04-11 | Can Large Language Models Assess Serendipity in Recommender Systems? | Yu Tokutake et.al. | 2404.07499v1 | null |
2024-04-10 | Vision-Language Model-based Physical Reasoning for Robot Liquid Perception | Wenqiang Lai et.al. | 2404.06904v1 | null |
2024-04-09 | Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language? | Omid Ghahroodi et.al. | 2404.06644v1 | null |
2024-04-09 | Building A Knowledge Graph to Enrich ChatGPT Responses in Manufacturing Service Discovery | Yunqing Li et.al. | 2404.06571v1 | null |
2024-04-09 | Enhancing Decision Analysis with a Large Language Model: pyDecision a Comprehensive Library of MCDA Methods in Python | Valdecy Pereira et.al. | 2404.06370v1 | link |
2024-04-21 | AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning | Senkang Hu et.al. | 2404.06345v2 | null |
2024-04-07 | X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model | Jan Held et.al. | 2404.06332v1 | null |
2024-04-08 | LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding | Chuwei Luo et.al. | 2404.05225v1 | link |
2024-04-08 | LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models | Shibo Hao et.al. | 2404.05221v1 | null |
2024-04-07 | Facial Affective Behavior Analysis with Instruction Tuning | Yifan Li et.al. | 2404.05052v1 | null |
2024-04-07 | FRACTAL: Fine-Grained Scoring from Aggregate Text Labels | Yukti Makhija et.al. | 2404.04817v1 | null |
2024-04-06 | Multicalibration for Confidence Scoring in LLMs | Gianluca Detommaso et.al. | 2404.04689v1 | null |
2024-04-06 | Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology | Dyke Ferber et.al. | 2404.04667v1 | null |
2024-04-06 | Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model | Zhonghan Zhao et.al. | 2404.04619v1 | null |
2024-04-05 | Scope Ambiguities in Large Language Models | Gaurav Kamath et.al. | 2404.04332v1 | link |
2024-04-05 | Assessing the quality of information extraction | Filip Seitl et.al. | 2404.04068v1 | null |
2024-04-04 | Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph | Marco Bronzini et.al. | 2404.03623v1 | null |
2024-04-04 | Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity | Jake Varley et.al. | 2404.03570v1 | null |
2024-04-03 | LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models | Gabriela Ben Melech Stan et.al. | 2404.03118v1 | null |
2024-04-03 | Towards a Fully Interpretable and More Scalable RSA Model for Metaphor Understanding | Gaia Carenini et.al. | 2404.02983v1 | null |
2024-04-13 | Explainable Traffic Flow Prediction with Large Language Models | Xusen Guo et.al. | 2404.02937v3 | null |
2024-04-13 | Toward Informal Language Processing: Knowledge of Slang in Large Language Models | Zhewei Sun et.al. | 2404.02323v2 | null |
2024-04-02 | ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models | Vishnunandan L. N. Venkatesh et.al. | 2404.02318v1 | null |
2024-04-02 | Towards Better Understanding of Cybercrime: The Role of Fine-Tuned LLMs in Translation | Veronica Valeros et.al. | 2404.01940v1 | null |
2024-04-02 | InsightLens: Discovering and Exploring Insights from Conversational Contexts in Large-Language-Model-Powered Data Analysis | Luoxuan Weng et.al. | 2404.01644v1 | null |
2024-03-29 | Wait, It's All Token Noise? Always Has Been: Interpreting LLM Behavior Using Shapley Value | Behnam Mohammadi et.al. | 2404.01332v1 | null |
2024-04-01 | Chat Modeling: Natural Language-based Procedural Modeling of Biological Structures without Training | Donggang Jia et.al. | 2404.01063v1 | null |
2024-04-11 | Source-Aware Training Enables Knowledge Attribution in Language Models | Muhammad Khalifa et.al. | 2404.01019v2 | link |
2024-04-01 | Query Performance Prediction using Relevance Judgments Generated by Large Language Models | Chuan Meng et.al. | 2404.01012v1 | link |
2024-04-01 | Exploring the Nexus of Large Language Models and Legal Systems: A Short Survey | Weicong Qin et.al. | 2404.00990v1 | null |
2024-04-12 | Harnessing the Power of Large Language Model for Uncertainty Aware Graph Processing | Zhenyu Qian et.al. | 2404.00589v2 | link |
2024-03-30 | PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression | Muhammad Asif Ali et.al. | 2404.00489v1 | null |
2024-03-30 | Do Vision-Language Models Understand Compound Nouns? | Sonal Kumar et.al. | 2404.00419v1 | null |
2024-03-30 | EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs | Cheng Jiayang et.al. | 2404.00209v1 | link |
2024-03-29 | User Modeling Challenges in Interactive AI Assistant Systems | Megan Su et.al. | 2403.20134v1 | null |
2024-03-28 | Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving | Akshay Gopalkrishnan et.al. | 2403.19838v1 | link |
2024-03-28 | AlloyBERT: Alloy Property Prediction with Large Language Models | Akshat Chaudhari et.al. | 2403.19783v1 | null |
2024-03-28 | Enhancing Anomaly Detection in Financial Markets with an LLM-based Multi-Agent Framework | Taejin Park et.al. | 2403.19735v1 | null |
2024-04-01 | Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis | Chenyang Liu et.al. | 2403.19646v2 | link |
2024-03-28 | Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation | Yutong He et.al. | 2403.19103v1 | null |
2024-03-27 | A Survey on Large Language Models from Concept to Implementation | Chen Wang et.al. | 2403.18969v1 | null |
2024-03-27 | CheckEval: Robust Evaluation Framework using Large Language Model via Checklist | Yukyung Lee et.al. | 2403.18771v1 | null |
2024-04-03 | Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective | Meiqi Chen et.al. | 2403.18346v3 | null |
2024-03-27 | LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models | Mingxing Peng et.al. | 2403.18344v1 | null |
2024-03-27 | Can LLMs Converse Formally? Automatically Assessing LLMs in Translating and Interpreting Formal Specifications | Rushang Karia et.al. | 2403.18327v1 | null |
2024-03-26 | Evaluating the Efficacy of Prompt-Engineered Large Multimodal Models Versus Fine-Tuned Vision Transformers in Image-Based Security Applications | Fouad Trad et.al. | 2403.17787v1 | null |
2024-03-25 | Generation of Asset Administration Shell with Large Language Model Agents: Interoperability in Digital Twins with Semantic Node | Yuchen Xia et.al. | 2403.17209v1 | null |
2024-03-25 | The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition | Georgios Chochlakis et.al. | 2403.17125v1 | null |
2024-03-25 | Grounding Language Plans in Demonstrations Through Counterfactual Perturbations | Yanwei Wang et.al. | 2403.17124v1 | null |
2024-03-25 | Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models | Hao Shao et.al. | 2403.16999v1 | link |
2024-03-25 | PropTest: Automatic Property Testing for Improved Visual Programming | Jaywon Koo et.al. | 2403.16921v1 | null |
2024-04-22 | Investigation of the effectiveness of applying ChatGPT in Dialogic Teaching Using Electroencephalography | Jiayue Zhang et.al. | 2403.16687v3 | null |
2024-03-28 | Can Language Models Pretend Solvers? Logic Code Simulation with LLMs | Minyu Chen et.al. | 2403.16097v2 | null |
2024-04-15 | Computational Sentence-level Metrics Predicting Human Sentence Comprehension | Kun Sun et.al. | 2403.15822v2 | null |
2024-03-23 | EDDA: A Encoder-Decoder Data Augmentation Framework for Zero-Shot Stance Detection | Daijun Ding et.al. | 2403.15715v1 | link |
2024-04-03 | Evaluating GPT-4 with Vision on Detection of Radiological Findings on Chest Radiographs | Yiliang Zhou et.al. | 2403.15528v2 | null |
2024-03-21 | Open Source Conversational LLMs do not know most Spanish words | Javier Conde et.al. | 2403.15491v1 | null |
2024-03-15 | ChatPattern: Layout Pattern Customization via Natural Language | Zixiao Wang et.al. | 2403.15434v1 | null |
2024-03-22 | Can large language models explore in-context? | Akshay Krishnamurthy et.al. | 2403.15371v1 | null |
2024-04-03 | AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models | Chaoyun Zhang et.al. | 2403.15157v2 | null |
2024-03-22 | Comprehensive Lipidomic Automation Workflow using Large Language Models | Connor Beveridge et.al. | 2403.15076v1 | null |
2024-03-21 | MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | Renrui Zhang et.al. | 2403.14624v1 | null |
2024-03-21 | Dermacen Analytica: A Novel Methodology Integrating Multi-Modal Large Language Models with Machine Learning in tele-dermatology | Dimitrios P. Panagoulias et.al. | 2403.14243v1 | null |
2024-04-08 | MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation | Longzheng Wang et.al. | 2403.14171v3 | link |
2024-03-20 | CoMo: Controllable Motion Generation through Language Guided Pose Code Editing | Yiming Huang et.al. | 2403.13900v1 | null |
2024-03-20 | Encoding the Subsurface in 3D with Seismic | Ben Lasscock et.al. | 2403.13593v1 | null |
2024-03-20 | IndiTag: An Online Media Bias Analysis and Annotation System Using Fine-Grained Bias Indicators | Luyang Lin et.al. | 2403.13446v1 | link |
2024-03-19 | A Canary in the AI Coal Mine: American Jews May Be Disproportionately Harmed by Intellectual Property Dispossession in Large Language Model Training | Heila Precel et.al. | 2403.13073v1 | null |
2024-04-02 | AutoTRIZ: Artificial Ideation with TRIZ and Large Language Models | Shuo Jiang et.al. | 2403.13002v2 | null |
2024-03-19 | Semantic Layering in Room Segmentation via LLMs | Taehyeon Kim et.al. | 2403.12920v1 | null |
2024-03-19 | Pragmatic Competence Evaluation of Large Language Models for Korean | Dojun Park et.al. | 2403.12675v1 | null |
2024-04-02 | Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code | Andreas Florath et.al. | 2403.12627v2 | null |
2024-03-19 | AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework | Xiang Li et.al. | 2403.12582v1 | link |
2024-03-19 | INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations | Lirui Luo et.al. | 2403.12451v1 | null |
2024-03-19 | Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales | Ayushi Nirmal et.al. | 2403.12403v1 | null |
2024-03-19 | Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models | Ying-Chun Lin et.al. | 2403.12388v1 | null |
2024-04-02 | Investigating Markers and Drivers of Gender Bias in Machine Translations | Peter J Barclay et.al. | 2403.11896v2 | null |
2024-03-18 | Metaphor Understanding Challenge Dataset for LLMs | Xiaoyu Tong et.al. | 2403.11810v1 | null |
2024-03-22 | Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning | Rao Fu et.al. | 2403.11401v2 | null |
2024-04-10 | StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows | Yiran Wu et.al. | 2403.11322v3 | link |
2024-03-17 | ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models | Siyuan Huang et.al. | 2403.11289v1 | link |
2024-03-26 | SelfIE: Self-Interpretation of Large Language Model Embeddings | Haozhe Chen et.al. | 2403.10949v2 | link |
2024-03-16 | A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment | Tianhe Wu et.al. | 2403.10854v1 | link |
2024-03-16 | LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices | Jingping Nie et.al. | 2403.10779v1 | null |
2024-03-16 | NARRATE: Versatile Language Architecture for Optimal Control in Robotics | Seif Ismail et.al. | 2403.10762v1 | null |
2024-03-15 | Uncovering Latent Themes of Messaging on Social Media by Integrating LLMs: A Case Study on Climate Campaigns | Tunazzina Islam et.al. | 2403.10707v1 | null |
2024-03-22 | Large Language Model-informed ECG Dual Attention Network for Heart Failure Risk Prediction | Chen Chen et.al. | 2403.10581v2 | null |
2024-03-15 | TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale | Pengcheng Jiang et.al. | 2403.10351v1 | null |
2024-03-14 | Re-Search for The Truth: Multi-round Retrieval-augmented Large Language Models are Strong Fake News Detectors | Guanghua Li et.al. | 2403.09747v1 | null |
2024-03-14 | XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization | Yequan Bie et.al. | 2403.09410v1 | null |
2024-03-14 | UniCode: Learning a Unified Codebook for Multimodal Large Language Models | Sipeng Zheng et.al. | 2403.09072v1 | null |
2024-02-21 | Diet-ODIN: A Novel Framework for Opioid Misuse Detection with Interpretable Dietary Patterns | Zheyuan Zhang et.al. | 2403.08820v1 | link |
2024-03-13 | A Picture Is Worth a Thousand Words: Exploring Diagram and Video-Based OOP Exercises to Counter LLM Over-Reliance | Bruno Pereira Cipriano et.al. | 2403.08396v1 | null |
2024-03-13 | Embedded Translations for Low-resource Automated Glossing | Changbing Yang et.al. | 2403.08189v1 | null |
2024-03-12 | NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning | Bingqian Lin et.al. | 2403.07376v1 | link |
2024-03-11 | From English to ASIC: Hardware Implementation with Large Language Model | Emil Goh et.al. | 2403.07039v1 | link |
2024-03-11 | Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning | Zijian Zhou et.al. | 2403.06728v1 | null |
2024-03-11 | FashionReGen: LLM-Empowered Fashion Report Generation | Yujuan Ding et.al. | 2403.06660v1 | null |
2024-03-10 | Are You Being Tracked? Discover the Power of Zero-Shot Trajectory Tracing with LLMs! | Huanqi Yang et.al. | 2403.06201v1 | null |
2024-03-10 | Reframe Anything: LLM Agent for Open World Video Reframing | Jiawang Cao et.al. | 2403.06070v1 | null |
2024-03-09 | LEVA: Using Large Language Models to Enhance Visual Analytics | Yuheng Zhao et.al. | 2403.05816v1 | null |
2024-03-08 | Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach | Zhen Tan et.al. | 2403.05636v1 | null |
2024-03-08 | ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment | Xiwei Hu et.al. | 2403.05135v1 | null |
2024-03-11 | Embracing Large Language and Multimodal Models for Prosthetic Technologies | Sharmita Dey et.al. | 2403.04974v2 | null |
2024-03-07 | Automatic and Universal Prompt Injection Attacks against Large Language Models | Xiaogeng Liu et.al. | 2403.04957v1 | link |
2024-03-07 | iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries | Adam Coscia et.al. | 2403.04760v1 | link |
2024-03-07 | KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts | Adam Coscia et.al. | 2403.04758v1 | link |
2024-03-07 | Wiki-TabNER:Advancing Table Interpretation Through Named Entity Recognition | Aneta Koleva et.al. | 2403.04577v1 | link |
2024-03-08 | Do Large Language Model Understand Multi-Intent Spoken Language ? | Shangjian Yin et.al. | 2403.04481v2 | link |
2024-03-18 | Measuring Meaning Composition in the Human Brain with Composition Scores from Large Language Models | Changjiang Gao et.al. | 2403.04325v2 | null |
2024-03-13 | Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning | Deepanway Ghosal et.al. | 2403.03864v3 | link |
2024-03-06 | Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery | Wei Zhang et.al. | 2403.03790v1 | null |
2024-03-06 | GPTopic: Dynamic and Interactive Topic Representations | Arik Reuter et.al. | 2403.03628v1 | null |
2024-03-06 | Explaining Genetic Programming Trees using Large Language Models | Paula Maddigan et.al. | 2403.03397v1 | null |
2024-03-05 | Towards Democratized Flood Risk Management: An Advanced AI Assistant Enabled by GPT-4 for Enhanced Interpretability and Public Engagement | Rafaela Martelo et.al. | 2403.03188v1 | link |
2024-03-05 | HINTs: Sensemaking on large collections of documents with Hypergraph visualization and INTelligent agents | Sam Yu-Te Lee et.al. | 2403.02752v1 | null |
2024-03-05 | HARGPT: Are LLMs Zero-Shot Human Activity Recognizers? | Sijie Ji et.al. | 2403.02727v1 | null |
2024-03-05 | Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research | Brenda Y. Miao et.al. | 2403.02558v1 | link |
2024-03-26 | FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction | Alessandro Scirè et.al. | 2403.02270v2 | null |
2024-03-04 | Towards Intent-Based Network Management: Large Language Models for Intent Extraction in 5G Core Networks | Dimitrios Michael Manias et.al. | 2403.02238v1 | null |
2024-03-04 | Evaluating the Explainability of Neural Rankers | Saran Pandian et.al. | 2403.01981v1 | null |
2024-03-03 | Logic Rules as Explanations for Legal Case Retrieval | Zhongxiang Sun et.al. | 2403.01457v1 | link |
2024-03-02 | Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers | Melanie Subbiah et.al. | 2403.01061v1 | link |
2024-03-01 | Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries | Zelalem Gero et.al. | 2403.01002v1 | link |
2024-02-26 | InteraRec: Interactive Recommendations Using Multimodal Large Language Models | Saketh Reddy Karra et.al. | 2403.00822v1 | null |
2024-02-25 | Bootstrapping Cognitive Agents with a Large Language Model | Feiyu Zhu et.al. | 2403.00810v1 | null |
2024-02-18 | Ploutos: Towards interpretable stock movement prediction with financial large language model | Hanshuang Tong et.al. | 2403.00782v1 | null |
2024-02-18 | ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework | Zhongqi Yang et.al. | 2403.00781v1 | null |
2024-03-27 | LLMs in Political Science: Heralding a New Era of Visual Analysis | Yu Wang et.al. | 2403.00154v2 | null |
2024-02-29 | FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition | Xiaoqiang Wang et.al. | 2403.00126v1 | null |
2024-02-29 | Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines | Lijia Ma et.al. | 2402.19421v1 | null |
2024-03-12 | Data Interpreter: An LLM Agent For Data Science | Sirui Hong et.al. | 2402.18679v3 | link |
2024-02-28 | Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning | Jiachun Li et.al. | 2402.18344v1 | null |
2024-02-29 | MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery | Feihong Lu et.al. | 2402.18169v2 | null |
2024-02-28 | Cause and Effect: Can Large Language Models Truly Understand Causality? | Swagata Ashwani et.al. | 2402.18139v1 | null |
2024-02-28 | ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection | Takashi Koide et.al. | 2402.18093v1 | null |
2024-02-27 | Automated Statistical Model Discovery with Language Models | Michael Y. Li et.al. | 2402.17879v1 | null |
2024-03-07 | ByteComposer: a Human-like Melody Composition Method based on Language Model Agent | Xia Liang et.al. | 2402.17785v2 | null |
2024-02-27 | Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data | Xiao Liu et.al. | 2402.17644v1 | link |
2024-02-27 | Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides | Kaikai An et.al. | 2402.17531v1 | null |
2024-02-27 | Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models | Xiaolong Wang et.al. | 2402.17226v1 | null |
2024-03-20 | OSCaR: Object State Captioning and State Change Representation | Nguyen Nguyen et.al. | 2402.17128v3 | link |
2024-02-24 | Enforcing Temporal Constraints on Generative Agent Behavior with Reactive Synthesis | Raven Rothkopf et.al. | 2402.16905v1 | null |
2024-02-26 | Mysterious Projections: Multimodal LLMs Gain Domain-Specific Visual Capabilities Without Richer Cross-Modal Projections | Gaurav Verma et.al. | 2402.16832v1 | null |
2024-02-28 | StructLM: Towards Building Generalist Models for Structured Knowledge Grounding | Alex Zhuang et.al. | 2402.16671v2 | null |
2024-03-04 | Improving LLM-based Machine Translation with Systematic Self-Correction | Zhaopeng Feng et.al. | 2402.16379v2 | link |
2024-02-25 | AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation | Yasheng Sun et.al. | 2402.16124v1 | null |
2024-02-25 | Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression | Xinze Li et.al. | 2402.16058v1 | link |
2024-02-25 | LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding | Yuxuan Wang et.al. | 2402.16050v1 | link |
2024-02-23 | Language-Based User Profiles for Recommendation | Joyce Zhou et.al. | 2402.15623v1 | null |
2024-02-19 | Detecting misinformation through Framing Theory: the Frame Element-based Model | Guan Wang et.al. | 2402.15525v1 | null |
2024-02-23 | Explorations of Self-Repair in Language Models | Cody Rushing et.al. | 2402.15390v1 | link |
2024-02-23 | Substrate Prediction for RiPP Biosynthetic Enzymes via Masked Language Modeling and Transfer Learning | Joseph D. Clark et.al. | 2402.15181v1 | null |
2024-02-23 | Large Multimodal Agents: A Survey | Junlin Xie et.al. | 2402.15116v1 | null |
2024-03-08 | LLMBind: A Unified Modality-Task Integration Framework | Bin Zhu et.al. | 2402.14891v3 | null |
2024-02-21 | Driving Generative Agents With Their Personality | Lawrence J. Klinkert et.al. | 2402.14879v1 | null |
2024-02-20 | A Dual-Prompting for Interpretable Mental Health Language Models | Hyolim Jeon et.al. | 2402.14854v1 | null |
2024-02-19 | RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning | Congyun Jin et.al. | 2402.14840v1 | null |
2024-02-23 | A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health | Nikhil Behari et.al. | 2402.14807v2 | null |
2024-02-22 | Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation | Jiawei Wang et.al. | 2402.14744v1 | null |
2024-02-22 | COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling | Baihan Lin et.al. | 2402.14701v1 | null |
2024-02-28 | OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement | Tianyu Zheng et.al. | 2402.14658v2 | null |
2024-02-22 | Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond | Xinyu Wang et.al. | 2402.14522v1 | null |
2024-02-22 | Data Science with LLMs and Interpretable Models | Sebastian Bordt et.al. | 2402.14474v1 | link |
2024-02-21 | MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms | Yiqiao Jin et.al. | 2402.14154v1 | null |
2024-02-21 | DeiSAM: Segment Anything with Deictic Prompting | Hikaru Shindo et.al. | 2402.14123v1 | link |
2024-02-21 | An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach | Mohammad Amaz Uddin et.al. | 2402.13871v1 | null |
2024-02-21 | LLM4SBR: A Lightweight and Effective Framework for Integrating Large Language Models in Session-based Recommendation | Shutong Qiao et.al. | 2402.13840v1 | null |
2024-03-15 | CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models | Fuwen Luo et.al. | 2402.13607v2 | null |
2024-02-21 | Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment | Yunxin Li et.al. | 2402.13561v1 | null |
2024-02-21 | Round Trip Translation Defence against Large Language Model Jailbreaking Attacks | Canaan Yung et.al. | 2402.13517v1 | link |
2024-02-20 | SymBa: Symbolic Backward Chaining for Multi-step Natural Language Reasoning | Jinu Lee et.al. | 2402.12806v1 | null |
2024-02-20 | Are Large Language Models Rational Investors? | Yuhang Zhou et.al. | 2402.12713v1 | null |
2024-02-18 | scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type Annotation | Cong Li et.al. | 2402.12405v1 | null |
2024-02-19 | Reformatted Alignment | Run-Ze Fan et.al. | 2402.12219v1 | link |
2024-02-19 | ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning | Renqiu Xia et.al. | 2402.12185v1 | link |
2024-02-19 | Distilling Large Language Models for Text-Attributed Graph Learning | Bo Pan et.al. | 2402.12022v1 | null |
2024-02-25 | How Interpretable are Reasoning Explanations from Prompting Large Language Models? | Wei Jie Yeo et.al. | 2402.11863v2 | link |
2024-02-22 | ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs | Fengqing Jiang et.al. | 2402.11753v2 | null |
2024-02-18 | A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models | Jaylen Jones et.al. | 2402.11676v1 | link |
2024-02-18 | Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals | Francesco Ortu et.al. | 2402.11655v1 | link |
2024-02-17 | TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks | Benjamin Feuer et.al. | 2402.11137v1 | link |
2024-02-09 | Zero-shot Explainable Mental Health Analysis on Social Media by incorporating Mental Scales | Wenyu Li et.al. | 2402.10948v1 | null |
2024-02-16 | How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs? | Ehsan Doostmohammadi et.al. | 2402.10770v1 | null |
2024-02-16 | Inference to the Best Explanation in Large Language Models | Dhairya Dalal et.al. | 2402.10767v1 | null |
2024-02-16 | Opening the Black Box of Large Language Models: Two Views on Holistic Interpretability | Haiyan Zhao et.al. | 2402.10688v1 | null |
2024-02-16 | LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models | Minsuk Kahng et.al. | 2402.10524v1 | null |
2024-02-15 | OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | Shubham Toshniwal et.al. | 2402.10176v1 | link |
2024-02-15 | Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States | Hanyu Duan et.al. | 2402.09733v1 | null |
2024-02-15 | Answer is All You Need: Instruction-following Text Embedding via Answering the Question | Letian Peng et.al. | 2402.09642v1 | link |
2024-02-14 | Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems | Liang Zhang et.al. | 2402.09584v1 | null |
2024-02-14 | AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach | Maryam Amirizaniani et.al. | 2402.09334v1 | null |
2024-02-14 | Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code | Vahid Majdinasab et.al. | 2402.09299v1 | null |
2024-02-14 | SyntaxShap: Syntax-aware Explainability Method for Text Generation | Kenza Amara et.al. | 2402.09259v1 | null |
2024-02-14 | Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models | Goutham Rajendran et.al. | 2402.09236v1 | null |
2024-02-13 | Large Language Models for the Automated Analysis of Optimization Algorithms | Camilo Chacón Sartori et.al. | 2402.08472v1 | link |
2024-02-13 | Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks | Jusung Lee et.al. | 2402.08360v1 | null |
2024-02-17 | LLaGA: Large Language and Graph Assistant | Runjin Chen et.al. | 2402.08170v2 | link |
2024-02-25 | Policy Improvement using Language Feedback Models | Victor Zhong et.al. | 2402.07876v3 | null |
2024-02-12 | Game Agent Driven by Free-Form Text Command: Using LLM-based Code Generation and Behavior Branch | Ray Ito et.al. | 2402.07442v1 | null |
2024-02-14 | Natural Language Reinforcement Learning | Xidong Feng et.al. | 2402.07157v2 | null |
2024-02-09 | InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning | Huaiyuan Ying et.al. | 2402.06332v1 | link |
2024-02-09 | ContPhy: Continuum Physical Concept Learning and Reasoning from Videos | Zhicheng Zheng et.al. | 2402.06119v1 | null |
2024-02-02 | Character-based Outfit Generation with Vision-augmented Style Extraction via LLMs | Najmeh Forouzandehmehr et.al. | 2402.05941v1 | null |
2024-02-08 | Driving Everywhere with Large Language Model Policy Adaptation | Boyi Li et.al. | 2402.05932v1 | null |
2024-02-05 | Zero-Shot Clinical Trial Patient Matching with LLMs | Michael Wornow et.al. | 2402.05125v1 | null |
2024-02-07 | Opening the AI black box: program synthesis via mechanistic interpretability | Eric J. Michaud et.al. | 2402.05110v1 | link |
2024-02-07 | Improving Cross-Domain Low-Resource Text Generation through LLM Post-Editing: A Programmer-Interpreter Approach | Zhuang Li et.al. | 2402.04609v1 | null |
2024-02-06 | Chatbot Meets Pipeline: Augment Large Language Model with Definite Finite Automaton | Yiyou Sun et.al. | 2402.04411v1 | null |
2024-02-06 | Assured LLM-Based Software Engineering | Nadia Alshahwan et.al. | 2402.04380v1 | null |
2024-02-06 | Explaining Autonomy: Enhancing Human-Robot Interaction through Explanation Generation with Large Language Models | David Sobrín-Hidalgo et.al. | 2402.04206v1 | null |
2024-02-06 | SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models | Yichen Shi et.al. | 2402.04178v1 | link |
2024-02-06 | Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science | Pengfei Liu et.al. | 2402.04119v1 | link |
2024-02-07 | Position Paper: Against Spurious Sparks |
Patrick Altmeyer et.al. | 2402.03962v2 | null |
2024-02-06 | Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience | Xilin Jiang et.al. | 2402.03710v1 | null |
2024-02-27 | Distinguishing the Knowable from the Unknowable with Language Models | Gustaf Ahdritz et.al. | 2402.03563v2 | link |
2024-01-25 | When Geoscience Meets Generative AI and Large Language Models: Foundations, Trends, and Future Challenges | Abdenour Hadid et.al. | 2402.03349v1 | null |
2024-03-04 | English Prompts are Better for NLI-based Zero-Shot Emotion Classification than Target-Language Prompts | Patrick Barreiß et.al. | 2402.03223v2 | null |
2024-02-22 | PuzzleBench: Can LLMs Solve Challenging First-Order Combinatorial Reasoning Problems? | Chinmay Mittal et.al. | 2402.02611v2 | null |
2024-02-04 | Integration of cognitive tasks into artificial general intelligence test for large models | Youzhi Qu et.al. | 2402.02547v1 | null |
2024-02-03 | A Data Generation Perspective to the Mechanism of In-Context Learning | Haitao Mao et.al. | 2402.02212v1 | null |
2024-02-03 | Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based Visualizations | Luca Podo et.al. | 2402.02167v1 | link |
2024-02-13 | PresAIse, A Prescriptive AI Solution for Enterprises | Wei Sun et.al. | 2402.02006v2 | null |
2024-02-02 | The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning | Daniel Cunnington et.al. | 2402.01889v1 | null |
2024-02-06 | Large Language Model Agent for Hyper-Parameter Optimization | Siyi Liu et.al. | 2402.01881v2 | null |
2024-02-02 | The Political Preferences of LLMs | David Rozado et.al. | 2402.01789v1 | null |
2024-01-30 | Rethinking Interpretability in the Era of Large Language Models | Chandan Singh et.al. | 2402.01761v1 | link |
2024-01-29 | Compensatory Biases Under Cognitive Load: Reducing Selection Bias in Large Language Models | J. E. Eicher et.al. | 2402.01740v1 | null |
2024-01-25 | ChatGPT vs Gemini vs LLaMA on Multilingual Sentiment Analysis | Alessio Buscemi et.al. | 2402.01715v1 | null |
2024-01-23 | Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study | Zhe He et.al. | 2402.01693v1 | null |
2024-02-16 | Emojis Decoded: Leveraging ChatGPT for Enhanced Understanding in Social Media Communications | Yuhang Zhou et.al. | 2402.01681v2 | null |
2024-02-02 | BAT: Learning to Reason about Spatial Sounds with Large Language Models | Zhisheng Zheng et.al. | 2402.01591v1 | null |
2024-02-02 | From Words to Molecules: A Survey of Large Language Models in Chemistry | Chang Liao et.al. | 2402.01439v1 | null |
2024-02-02 | Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis | Zeeshan Rasheed et.al. | 2402.01386v1 | null |
2024-02-02 | Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions | Pouya Pezeshkpour et.al. | 2402.01108v1 | null |
2024-02-01 | Executable Code Actions Elicit Better LLM Agents | Xingyao Wang et.al. | 2402.01030v1 | link |
2024-02-01 | Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement | Xin Quan et.al. | 2402.00745v1 | link |
2024-02-01 | Transforming and Combining Rewards for Aligning Large Language Models | Zihao Wang et.al. | 2402.00742v1 | null |
2024-02-01 | AssertLLM: Generating and Evaluating Hardware Verification Assertions from Design Specifications via Multi-LLMs | Wenji Fang et.al. | 2402.00386v1 | null |
2024-02-01 | IndiVec: An Exploration of Leveraging Large Language Models for Media Bias Detection with Fine-Grained Bias Indicators | Luyang Lin et.al. | 2402.00345v1 | null |
2024-02-01 | Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning | Yao-Hung Hubert Tsai et.al. | 2402.00251v1 | null |
2024-01-31 | Multimodal Neurodegenerative Disease Subtyping Explained by ChatGPT | Diego Machado Reyes et.al. | 2402.00137v1 | null |
2024-01-31 | ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation | Bhabesh Mali et.al. | 2402.00093v1 | null |
2024-02-07 | Detecting Multimedia Generated by Large AI Models: A Survey | Li Lin et.al. | 2402.00045v3 | link |
2024-01-21 | Training microrobots to swim by a large language model | Zhuoqun Xu et.al. | 2402.00044v1 | null |
2024-02-05 | Comparative Analysis of LLaMA and ChatGPT Embeddings for Molecule Embedding | Shaghayegh Sadeghi et.al. | 2402.00024v2 | link |
2024-02-03 | EEG-GPT: Exploring Capabilities of Large Language Models for EEG Classification and Interpretation | Jonathan W. Kim et.al. | 2401.18006v2 | null |
2024-01-31 | Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study | Qirui Jiao et.al. | 2401.17981v1 | null |
2024-01-31 | Probing Language Models' Gesture Understanding for Enhanced Human-AI Interaction | Philipp Wicke et.al. | 2401.17858v1 | null |
2024-01-30 | Detecting mental disorder on social media: a ChatGPT-augmented explainable approach | Loris Belcastro et.al. | 2401.17477v1 | link |
2024-02-05 | EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain | Wei Zhang et.al. | 2401.16822v2 | null |
2024-01-30 | A Cross-Language Investigation into Jailbreak Attacks in Large Language Models | Jie Li et.al. | 2401.16765v1 | null |
2024-02-03 | Engineering A Large Language Model From Scratch | Abiodun Finbarrs Oketunji et.al. | 2401.16736v3 | null |
2024-01-29 | Probabilistic Abduction for Visual Abstract Reasoning via Learning Rules in Vector-symbolic Architectures | Michael Hersche et.al. | 2401.16024v1 | link |
2024-01-29 | APIGen: Generative API Method Recommendation | Yujia Chen et.al. | 2401.15843v1 | link |
2024-02-12 | Scalable Qualitative Coding with LLMs: Chain-of-Thought Reasoning Matches Human Performance in Some Hermeneutic Tasks | Zackary Okun Dunivin et.al. | 2401.15170v2 | null |
2024-01-26 | Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias | Yu He Ke et.al. | 2401.14589v1 | null |
2024-01-25 | LongHealth: A Question Answering Benchmark with Long Clinical Documents | Lisa Adams et.al. | 2401.14490v1 | link |
2024-01-25 | GPTVoiceTasker: LLM-Powered Virtual Assistant for Smartphone | Minh Duc Vu et.al. | 2401.14268v1 | null |
2024-01-25 | CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks | Andrei Tomut et.al. | 2401.14109v1 | null |
2024-01-25 | A Survey of Deep Learning and Foundation Models for Time Series Forecasting | John A. Miller et.al. | 2401.13912v1 | null |
2024-01-24 | AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents | Chang Ma et.al. | 2401.13178v1 | link |
2024-01-23 | From Understanding to Utilization: A Survey on Explainability for Large Language Models | Haoyan Luo et.al. | 2401.12874v1 | null |
2024-01-23 | How well can large language models explain business processes? | Dirk Fahland et.al. | 2401.12846v1 | null |
2024-01-27 | C2Ideas: Supporting Creative Interior Color Design Ideation with Large Language Model | Yihan Hou et.al. | 2401.12586v2 | null |
2024-01-30 | SLANG: New Concept Comprehension of Large Language Models | Lingrui Mei et.al. | 2401.12585v2 | null |
2024-01-23 | LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools | Qianli Wang et.al. | 2401.12576v1 | link |
2024-01-23 | Automated Fact-Checking of Climate Change Claims with Large Language Models | Markus Leippold et.al. | 2401.12566v1 | null |
2024-01-22 | CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation | Zhihong Chen et.al. | 2401.12208v1 | null |
2024-01-21 | Integration of Large Language Models in Control of EHD Pumps for Precise Color Synthesis | Yanhong Peng et.al. | 2401.11500v1 | null |
2024-01-18 | LangProp: A code optimization framework using Language Models applied to driving | Shu Ishida et.al. | 2401.10314v1 | link |
2024-01-18 | Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation | Kohei Uehara et.al. | 2401.10005v1 | null |
2024-01-18 | Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models | Li Sun et.al. | 2401.09861v1 | null |
2024-01-17 | Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models | Haonan Guo et.al. | 2401.09083v1 | link |
2024-01-17 | What makes for a 'good' social actor? Using respect as a lens to evaluate interactions with language agents | Lize Alberts et.al. | 2401.09082v1 | null |
2024-01-16 | AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media | Alessandro Gambetti et.al. | 2401.08825v1 | null |
2024-01-15 | Assistant, Parrot, or Colonizing Loudspeaker? ChatGPT Metaphors for Developing Critical AI Literacies | Anuj Gupta et.al. | 2401.08711v1 | null |
2024-01-16 | Anchor function: a type of benchmark functions for studying language models | Zhongwang Zhang et.al. | 2401.08309v1 | null |
2024-01-16 | AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception | Yipo Huang et.al. | 2401.08276v1 | link |
2024-01-16 | LLM-Guided Multi-View Hypergraph Learning for Human-Centric Explainable Recommendation | Zhixuan Chu et.al. | 2401.08217v1 | null |
2024-02-16 | MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline | Minpeng Liao et.al. | 2401.08190v2 | link |
2024-02-15 | Are self-explanations from Large Language Models faithful? | Andreas Madsen et.al. | 2401.07927v3 | link |
2024-01-17 | See the Unseen: Better Context-Consistent Knowledge-Editing by Noises | Youcheng Huang et.al. | 2401.07544v2 | null |
2024-01-12 | Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data | Yubin Kim et.al. | 2401.06866v1 | null |
2024-01-12 | Enhancing the Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought | Zaijing Li et.al. | 2401.06836v1 | null |
2024-01-12 | From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape | Changrong Xiao et.al. | 2401.06431v1 | link |
2024-01-23 | How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs | Yi Zeng et.al. | 2401.06373v2 | link |
2024-01-12 | Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models | Asma Ghandeharioun et.al. | 2401.06102v2 | null |
2024-01-11 | Large Language Models vs. Search Engines: Evaluating User Preferences Across Varied Information Retrieval Scenarios | Kevin Matthe Caramancion et.al. | 2401.05761v1 | null |
2024-01-11 | Towards Conversational Diagnostic AI | Tao Tu et.al. | 2401.05654v1 | null |
2024-01-17 | Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion? | Mudit Verma et.al. | 2401.05302v2 | null |
2024-01-10 | Aligning Translation-Specific Understanding to General Understanding in Large Language Models | Yichong Huang et.al. | 2401.05072v1 | null |
2024-01-10 | ANGO: A Next-Level Evaluation Benchmark For Generation-Oriented Language Models In Chinese Domain | Bingchao Wang et.al. | 2401.04898v1 | null |
2024-01-08 | Evaluating Brain-Inspired Modular Training in Automated Circuit Discovery for Mechanistic Interpretability | Jatin Nainani et.al. | 2401.03646v1 | null |
2024-01-05 | UMIE: Unified Multimodal Information Extraction with Instruction Tuning | Lin Sun et.al. | 2401.03082v1 | link |
2024-02-01 | Object-Centric Instruction Augmentation for Robotic Manipulation | Junjie Wen et.al. | 2401.02814v2 | null |
2024-02-06 | VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model | Pengying Wu et.al. | 2401.02695v2 | null |
2024-01-05 | Correctness Comparison of ChatGPT-4, Bard, Claude-2, and Copilot for Spatial Tasks | Hartwig H. Hochmair et.al. | 2401.02404v2 | null |
2024-01-04 | DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models | Wendi Cui et.al. | 2401.02132v1 | link |
2024-01-03 | Large Language Models Relearn Removed Concepts | Michelle Lo et.al. | 2401.01814v1 | link |
2024-01-12 | WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope | Jun-Yan He et.al. | 2401.01699v2 | null |
2024-01-02 | VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics | Ammar A. Siddiqui et.al. | 2401.01414v1 | null |
2024-01-02 | A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models | Daniel Wankit Yip et.al. | 2401.00991v1 | null |
2023-12-31 | AllSpark: a multimodal spatiotemporal general model | Run Shao et.al. | 2401.00546v1 | null |
2023-12-31 | keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM | Chaojie Wang et.al. | 2401.00426v1 | null |
2024-01-12 | Advancing TTP Analysis: Harnessing the Power of Encoder-Only and Decoder-Only Language Models with Retrieval Augmented Generation | Reza Fayyazi et.al. | 2401.00280v2 | null |
2023-12-30 | Is Knowledge All Large Language Models Needed for Causal Reasoning? | Hengrui Cai et.al. | 2401.00139v1 | link |
2023-12-27 | Conversational Question Answering with Reformulations over Knowledge Graph | Lihui Liu et.al. | 2312.17269v1 | null |
2023-12-29 | Large Language Model for Causal Decision Making | Haitao Jiang et.al. | 2312.17122v2 | null |
2023-12-27 | Rethinking Tabular Data Understanding with Large Language Models | Tianyang Liu et.al. | 2312.16702v1 | link |
2023-12-26 | Observable Propagation: A Data-Efficient Approach to Uncover Feature Vectors in Transformers | Jacob Dunefsky et.al. | 2312.16291v1 | link |
2023-12-26 | Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models | Fan Liu et.al. | 2312.16275v1 | null |
2023-12-26 | Large Language Models as Traffic Signal Control Agents: Capacity and Opportunity | Siqi Lai et.al. | 2312.16044v1 | link |
2024-01-29 | ChartBench: A Benchmark for Complex Visual Reasoning in Charts | Zhengzhuo Xu et.al. | 2312.15915v2 | null |
2023-12-26 | Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models | Xinke Jiang et.al. | 2312.15883v1 | null |
2023-12-22 | Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention | Zhen Tan et.al. | 2312.15033v1 | null |
2023-12-22 | Theory of Hallucinations based on Equivariance | Hisaichi Shibata et.al. | 2312.14504v1 | null |
2023-12-22 | Don't Believe Everything You Read: Enhancing Summarization Interpretability through Automatic Identification of Hallucinations in Large Language Models | Priyesh Vakharia et.al. | 2312.14346v1 | null |
2023-12-19 | Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning | Xiaodan Zhang et.al. | 2312.14184v1 | null |
2023-12-21 | Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs | Juraj Vladika et.al. | 2312.13881v1 | null |
2023-12-21 | A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties | Junfei Xiao et.al. | 2312.13764v1 | link |
2023-12-20 | ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training | Rongsheng Wang et.al. | 2312.13316v1 | link |
2023-12-21 | AMD:Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion | Beibei Jing et.al. | 2312.12763v2 | null |
2023-12-21 | A Case Study on Test Case Construction with Large Language Models: Unveiling Practical Insights and Challenges | Roberto Francisco de Lima Junior et.al. | 2312.12598v2 | null |
2024-01-30 | Locating Factual Knowledge in Large Language Models: Exploring the Residual Stream and Analyzing Subvalues in Vocabulary Space | Zeping Yu et.al. | 2312.12141v2 | null |
2023-12-19 | Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach | Weiyu Ma et.al. | 2312.11865v1 | link |
2023-12-16 | Learning Interpretable Queries for Explainable Image Classification with Information Pursuit | Stefan Kolek et.al. | 2312.11548v1 | null |
2023-12-22 | A mathematical perspective on Transformers | Borjan Geshkovski et.al. | 2312.10794v2 | link |
2023-12-17 | kNN-ICL: Compositional Task-Oriented Parsing Generalization with Nearest Neighbor In-Context Learning | Wenting Zhao et.al. | 2312.10771v1 | null |
2023-12-17 | Knowledge Trees: Gradient Boosting Decision Trees on Knowledge Neurons as Probing Classifier | Sergey A. Saltykov et.al. | 2312.10746v1 | null |
2023-12-17 | Can persistent homology whiten Transformer-based black-box models? A case study on BERT compression | Luis Balderas et.al. | 2312.10702v1 | null |
2023-12-16 | Continuous Prompt Generation from Linear Combination of Discrete Prompt Embeddings | Pascal Passigan et.al. | 2312.10323v1 | null |
2023-12-23 | Shedding Light on Software Engineering-specific Metaphors and Idioms | Mia Mohammad Imran et.al. | 2312.10297v2 | link |
2023-12-15 | A Review of Repository Level Prompting for LLMs | Douglas Schonholtz et.al. | 2312.10101v1 | null |
2023-12-04 | Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarly Work | Rishab Jain et.al. | 2312.10057v1 | null |
2023-12-15 | Neurosymbolic Value-Inspired AI (Why, What, and How) | Amit Sheth et.al. | 2312.09928v1 | null |
2023-12-15 | GPT-4 Surpassing Human Performance in Linguistic Pragmatics | Ljubisa Bojic et.al. | 2312.09545v1 | null |
2023-12-14 | Large Language Models for Autonomous Driving: Real-World Experiments | Can Cui et.al. | 2312.09397v1 | null |
2023-12-14 | Successor Heads: Recurring, Interpretable Attention Heads In The Wild | Rhys Gould et.al. | 2312.09230v1 | null |
2023-12-14 | Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models | Zhiyuan You et.al. | 2312.08962v1 | null |
2023-12-14 | Learning Safety Constraints From Demonstration Using One-Class Decision Trees | Mattijs Baert et.al. | 2312.08837v1 | null |
2023-12-13 | Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning | Jinta Weng et.al. | 2312.08027v1 | null |
2023-12-07 | Large Language Models for Intent-Driven Session Recommendations | Zhu Sun et.al. | 2312.07552v1 | link |
2023-12-12 | Efficiently Programming Large Language Models using SGLang | Lianmin Zheng et.al. | 2312.07104v1 | link |
2023-12-12 | Towards Enhanced Human Activity Recognition through Natural Language Generation and Pose Estimation | Nikhil Kashyap et.al. | 2312.06965v1 | null |
2023-12-27 | Steering Llama 2 via Contrastive Activation Addition | Nina Rimsky et.al. | 2312.06681v2 | link |
2023-12-11 | AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes | Zehao Wen et.al. | 2312.06644v1 | null |
2023-12-11 | DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics | Zhiao Huang et.al. | 2312.06408v1 | null |
2023-12-11 | GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models | Jiaxu Zhao et.al. | 2312.06315v1 | null |
2023-12-11 | ProtoCode: Leveraging Large Language Models for Automated Generation of Machine-Readable Protocols from Scientific Publications | Shuo Jiang et.al. | 2312.06241v1 | null |
2023-12-10 | Evidence-based Interpretable Open-domain Fact-checking with Large Language Models | Xin Tan et.al. | 2312.05834v1 | null |
2023-12-19 | Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning | Subhabrata Dutta et.al. | 2312.05571v2 | link |
2023-12-09 | Image and Data Mining in Reticular Chemistry Using GPT-4V | Zhiling Zheng et.al. | 2312.05468v1 | null |
2023-12-09 | Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation | Atoosa Chegini et.al. | 2312.05464v1 | null |
2023-12-08 | GlitchBench: Can large multimodal models detect video game glitches? | Mohammad Reza Taesiri et.al. | 2312.05291v1 | null |
2023-12-08 | Retrieval-based Video Language Model for Efficient Long Video Question Answering | Jiaqi Xu et.al. | 2312.04931v1 | null |
2023-12-08 | Ophtha-LLaMA2: A Large Language Model for Ophthalmology | Huan Zhao et.al. | 2312.04906v1 | null |
2024-01-10 | KwaiAgents: Generalized Information-seeking Agent System with Large Language Models | Haojie Pan et.al. | 2312.04889v3 | link |
2023-12-07 | AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making | Shusen Liu et.al. | 2312.04494v1 | null |
2023-12-07 | LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs | Yunsheng Ma et.al. | 2312.04372v1 | null |
2023-12-27 | Towards Knowledge-driven Autonomous Driving | Xin Li et.al. | 2312.04316v3 | link |
2023-12-07 | Efficiently Predicting Protein Stability Changes Upon Single-point Mutation with Large Language Models | Yijie Zhang et.al. | 2312.04019v1 | null |
2023-12-05 | How should the advent of large language models affect the practice of science? | Marcel Binz et.al. | 2312.03759v1 | null |
2023-12-04 | Near-real-time Earthquake-induced Fatality Estimation using Crowdsourced Data and Large-Language Models | Chenguang Wang et.al. | 2312.03755v1 | null |
2023-12-08 | Methods to Estimate Large Language Model Confidence | Maia Kotelanski et.al. | 2312.03733v2 | null |
2023-12-06 | GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models | Haicheng Liao et.al. | 2312.03543v1 | link |
2023-12-05 | FlexModel: A Framework for Interpretability of Distributed Large Language Models | Matthew Choi et.al. | 2312.03140v1 | link |
2023-12-07 | Evaluating Agents using Social Choice Theory | Marc Lanctot et.al. | 2312.03121v2 | link |
2023-12-05 | Breast Ultrasound Report Generation using LangChain | Jaeyoung Huh et.al. | 2312.03013v1 | null |
2023-12-05 | Harmonizing Global Voices: Culturally-Aware Models for Enhanced Content Moderation | Alex J. Chan et.al. | 2312.02401v1 | null |
2023-12-04 | LLMs Accelerate Annotation for Medical Information Extraction | Akshay Goel et.al. | 2312.02296v1 | null |
2023-12-04 | Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition | Chengyou Jia et.al. | 2312.02226v1 | null |
2023-11-28 | Training Chain-of-Thought via Latent-Variable Inference | Du Phan et.al. | 2312.02179v1 | null |
2023-12-04 | Learning Machine Morality through Experience and Interaction | Elizaveta Tennant et.al. | 2312.01818v1 | null |
2023-12-26 | Jellyfish: A Large Language Model for Data Preprocessing | Haochen Zhang et.al. | 2312.01678v3 | null |
2023-12-11 | Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation | Randall Balestriero et.al. | 2312.01648v2 | link |
2023-12-04 | The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning | Bill Yuchen Lin et.al. | 2312.01552v1 | null |
2023-12-03 | SAGE: Bridging Semantic and Actionable Parts for GEneralizable Articulated-Object Manipulation under Language Instructions | Haoran Geng et.al. | 2312.01307v1 | null |
2023-12-03 | TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents | James Enouen et.al. | 2312.01279v1 | null |
2023-12-02 | From Voices to Validity: Leveraging Large Language Models (LLMs) for Textual Analysis of Policy Stakeholder Interviews | Alex Liu et.al. | 2312.01202v1 | null |
2023-12-01 | Leveraging Large Language Models to Improve REST API Testing | Myeongsoo Kim et.al. | 2312.00894v1 | null |
2023-12-18 | Empowering Autonomous Driving with Large Language Models: A Safety Perspective | Yixuan Wang et.al. | 2312.00812v3 | null |
2023-11-30 | Towards Accurate Differential Diagnosis with Large Language Models | Daniel McDuff et.al. | 2312.00164v1 | null |
2023-11-30 | PoseGPT: Chatting about 3D Human Pose | Yao Feng et.al. | 2311.18836v1 | null |
2023-11-30 | CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation | Zineng Tang et.al. | 2311.18775v1 | null |
2023-12-05 | AlignBench: Benchmarking Chinese Alignment of Large Language Models | Xiao Liu et.al. | 2311.18743v3 | link |
2023-11-30 | Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent | Yuxiao Chen et.al. | 2311.18307v1 | null |
2023-11-29 | Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings | Andrea W Wen-Yi et.al. | 2311.18034v1 | link |
2023-11-28 | Unlocking Spatial Comprehension in Text-to-Image Diffusion Models | Mohammad Mahdi Derakhshani et.al. | 2311.17937v1 | null |
2023-11-29 | VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following | Yujie Lu et.al. | 2311.17647v1 | null |
2023-11-29 | Exploring Large Language Models for Human Mobility Prediction under Public Events | Yuebing Liang et.al. | 2311.17351v1 | null |
2023-11-29 | Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering | Zeqing Wang et.al. | 2311.17331v1 | null |
2023-11-28 | Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis | Xiaohui Chen et.al. | 2311.17126v1 | null |
2023-11-30 | Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following | Yutong Feng et.al. | 2311.17002v2 | null |
2023-12-27 | StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models | Kazuki Yamauchi et.al. | 2311.16509v2 | null |
2023-12-10 | LLMGA: Multimodal Large Language Model based Generation Assistant | Bin Xia et.al. | 2311.16500v2 | link |
2023-11-27 | ChartLlama: A Multimodal LLM for Chart Understanding and Generation | Yucheng Han et.al. | 2311.16483v1 | null |
2023-11-27 | Have we built machines that think like people? | Luca M. Schulze Buschoff et.al. | 2311.16093v1 | link |
2023-11-27 | Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models | Stephen MacNeil et.al. | 2311.16017v1 | null |
2023-11-27 | Sparsify-then-Classify: From Internal Neurons of Large Language Models To Efficient Text Classifiers | Yilun Liu et.al. | 2311.15983v1 | link |
2023-11-27 | Dawning of a New Era in Gravitational Wave Data Analysis: Unveiling Cosmic Mysteries via Artificial Intelligence -- A Systematic Review | Tianyu Zhao et.al. | 2311.15585v1 | null |
2023-12-03 | See and Think: Embodied Agent in Virtual Environment | Zhonghan Zhao et.al. | 2311.15209v2 | null |
2023-11-25 | Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching | James Campbell et.al. | 2311.15131v1 | null |
2023-11-19 | Zero-Shot Question Answering over Financial Documents using Large Language Models | Karmvir Singh Phogat et.al. | 2311.14722v1 | null |
2023-11-24 | Benchmarking Large Language Models for Log Analysis, Security, and Interpretation | Egil Karlsen et.al. | 2311.14519v1 | null |
2023-11-30 | A density estimation perspective on learning from pairwise human preferences | Vincent Dumoulin et.al. | 2311.14115v2 | link |
2023-11-23 | Towards Explainable Strategy Templates using NLP Transformers | Pallavi Bagga et.al. | 2311.14061v1 | null |
2023-11-23 | Challenges of Large Language Models for Mental Health Counseling | Neo Christopher Chung et.al. | 2311.13857v1 | null |
2023-12-03 | FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design | Yangyang Yu et.al. | 2311.13743v2 | link |
2023-11-22 | Vamos: Versatile Action Models for Video Understanding | Shijie Wang et.al. | 2311.13627v1 | null |
2023-11-22 | ADriver-I: A General World Model for Autonomous Driving | Fan Jia et.al. | 2311.13549v1 | null |
2023-12-15 | Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs | Yonghui Wang et.al. | 2311.13194v2 | link |
2023-11-25 | From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models | Zachary Englhardt et.al. | 2311.13063v2 | null |
2023-11-21 | ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models | Jiankai Tang et.al. | 2311.12524v1 | link |
2023-11-21 | Adapting LLMs for Efficient, Personalized Information Retrieval: Methods and Implications | Samira Ghodratnama et.al. | 2311.12287v1 | null |
2023-11-20 | Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents | Zhuosheng Zhang et.al. | 2311.11797v1 | link |
2023-11-20 | Incorporating LLM Priors into Tabular Learners | Max Zhu et.al. | 2311.11628v1 | null |
2023-11-20 | GPT in Data Science: A Practical Exploration of Model Selection | Nathalia Nascimento et.al. | 2311.11516v1 | null |
2023-11-20 | Meta Prompting for AGI Systems | Yifan Zhang et.al. | 2311.11482v1 | link |
2023-12-17 | Rethinking Large Language Models in Mental Health Applications | Shaoxiong Ji et.al. | 2311.11267v2 | null |
2023-11-18 | Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models | Haoran Zhao et.al. | 2311.11012v1 | null |
2023-11-18 | RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability | Yuxuan Lei et.al. | 2311.10947v1 | null |
2023-11-17 | Flexible Model Interpretability through Natural Language Model Editing | Karel D'Oosterlinck et.al. | 2311.10905v1 | null |
2023-11-27 | A Language Agent for Autonomous Driving | Jiageng Mao et.al. | 2311.10813v3 | link |
2023-11-15 | MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning | Fuxiao Liu et.al. | 2311.10774v1 | link |
2023-11-16 | MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning | Xiangru Tang et.al. | 2311.10537v1 | link |
2023-11-16 | Interpreting User Requests in the Context of Natural Language Standing Instructions | Nikita Moghe et.al. | 2311.09796v1 | null |
2023-11-16 | On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering | Linyong Nan et.al. | 2311.09721v1 | null |
2023-11-16 | Evaluating In-Context Learning of Libraries for Code Generation | Arkil Patel et.al. | 2311.09635v1 | null |
2023-11-16 | Efficient End-to-End Visual Document Understanding with Rationale Distillation | Wang Zhu et.al. | 2311.09612v1 | null |
2023-11-16 | Pachinko: Patching Interpretable QA Models through Natural Language Feedback | Chaitanya Malaviya et.al. | 2311.09558v1 | link |
2023-11-09 | Chain of Images for Intuitively Reasoning | Fanxu Meng et.al. | 2311.09241v1 | link |
2023-11-15 | TableLlama: Towards Open Large Generalist Models for Tables | Tianshu Zhang et.al. | 2311.09206v1 | null |
2023-11-15 | MELA: Multilingual Evaluation of Linguistic Acceptability | Ziyin Zhang et.al. | 2311.09033v1 | null |
2023-11-15 | Identifying Linear Relational Concepts in Large Language Models | David Chanin et.al. | 2311.08968v1 | null |
2023-11-15 | I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots | Giulio Antonio Abbo et.al. | 2311.08957v1 | null |
2023-11-15 | HELLaMA: LLaMA-based Table to Text Generation by Highlighting the Important Evidence | Junyi Bian et.al. | 2311.08896v1 | null |
2023-11-15 | Token Prediction as Implicit Classification to Identify LLM-Generated Text | Yutian Chen et.al. | 2311.08723v1 | link |
2023-11-15 | Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling | Bairu Hou et.al. | 2311.08718v1 | link |
2023-11-15 | XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making | Zichen Chen et.al. | 2311.08614v1 | null |
2023-11-15 | Navigating the Ocean of Biases: Political Bias Attribution in Language Models via Causal Structures | David F. Jenny et.al. | 2311.08605v1 | link |
2023-11-14 | Towards Evaluating AI Systems for Moral Status Using Self-Reports | Ethan Perez et.al. | 2311.08576v1 | null |
2023-11-14 | Taxonomy, Semantic Data Schema, and Schema Alignment for Open Data in Urban Building Energy Modeling | Liang Zhang et.al. | 2311.08535v1 | null |
2023-11-14 | Plum: Prompt Learning using Metaheuristic | Rui Pan et.al. | 2311.08364v1 | link |
2023-11-14 | Human-Centric Autonomous Systems With LLMs for User Command Reasoning | Yi Yang et.al. | 2311.08206v1 | link |
2023-11-11 | Conceptual Model Interpreter for Large Language Models | Felix Härer et.al. | 2311.07605v1 | link |
2023-11-13 | It's Not Easy Being Wrong: Evaluating Process of Elimination Reasoning in Large Language Models | Nishant Balepur et.al. | 2311.07532v1 | link |
2023-11-13 | Finding and Editing Multi-Modal Neurons in Pre-Trained Transformer | Haowen Pan et.al. | 2311.07470v1 | null |
2023-11-13 | On Measuring Faithfulness of Natural Language Explanations | Letitia Parcalabescu et.al. | 2311.07466v1 | link |
2023-11-13 | Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models | Junpeng Li et.al. | 2311.07314v1 | null |
2023-11-12 | Assessing the Interpretability of Programmatic Policies with Large Language Models | Zahra Bashir et.al. | 2311.06979v1 | null |
2023-11-12 | Simulating Public Administration Crisis: A Novel Generative Agent-Based Simulation System to Lower Technology Barriers in Social Science Research | Bushi Xiao et.al. | 2311.06957v1 | null |
2023-11-10 | ChatGPT in the context of precision agriculture data analytics | Ilyas Potamitis et.al. | 2311.06390v1 | link |
2023-11-09 | Deep Natural Language Feature Learning for Interpretable Prediction | Felipe Urrutia et.al. | 2311.05754v1 | link |
2023-11-09 | Do personality tests generalize to Large Language Models? | Florian E. Dorner et.al. | 2311.05297v1 | null |
2023-11-02 | Chain of Empathy: Enhancing Empathetic Response of Large Language Models Based on Psychotherapy Models | Yoon Kyung Lee et.al. | 2311.04915v1 | null |
2023-11-08 | SEMQA: Semi-Extractive Multi-Source Question Answering | Tal Schuster et.al. | 2311.04886v1 | link |
2023-11-07 | Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning | Sai Munikoti et.al. | 2311.04348v1 | null |
2023-11-07 | Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves | Yihe Deng et.al. | 2311.04205v1 | link |
2023-11-07 | Perturbed examples reveal invariances shared by language models | Ruchit Rawal et.al. | 2311.04166v1 | null |
2023-11-07 | Extracting human interpretable structure-property relationships in chemistry using XAI and large language models | Geemi P. Wellawatte et.al. | 2311.04047v1 | link |
2023-11-07 | Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models | Yichao Cao et.al. | 2311.03799v1 | link |
2023-11-07 | Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning | Ruosen Li et.al. | 2311.03734v1 | link |
2023-11-07 | The Linear Representation Hypothesis and the Geometry of Large Language Models | Kiho Park et.al. | 2311.03658v1 | link |
2023-11-06 | Beyond Words: A Mathematical Framework for Interpreting Large Language Models | Javier González et.al. | 2311.03033v1 | null |
2023-11-06 | QualEval: Qualitative Evaluation for Model Improvement | Vishvak Murahari et.al. | 2311.02807v1 | link |
2023-11-03 | Don't Make Your LLM an Evaluation Benchmark Cheater | Kun Zhou et.al. | 2311.01964v1 | null |
2023-11-06 | Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPT | Mario Sänger et.al. | 2311.01825v2 | null |
2023-11-12 | Proto-lm: A Prototypical Network-Based Framework for Built-in Interpretability in Large Language Models | Sean Xie et.al. | 2311.01732v2 | link |
2023-11-02 | TopicGPT: A Prompt-based Topic Modeling Framework | Chau Minh Pham et.al. | 2311.01449v1 | link |
2023-11-02 | REAL: Resilience and Adaptation using Large Language Models on Autonomous Aerial Robots | Andrea Tagliabue et.al. | 2311.01403v1 | null |
2023-11-02 | Revisiting the Knowledge Injection Frameworks | Peng Fu et.al. | 2311.01150v1 | null |
2023-11-02 | Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game | Sam Toyer et.al. | 2311.01011v1 | null |
2023-11-02 | Vision-Language Interpreter for Robot Task Planning | Keisuke Shirai et.al. | 2311.00967v1 | link |
2023-11-02 | M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place | Wentao Yuan et.al. | 2311.00926v1 | null |
2023-11-01 | Emotion Detection for Misinformation: A Review | Zhiwei Liu et.al. | 2311.00671v1 | null |
2023-11-01 | De-Diffusion Makes Text a Strong Cross-Modal Interface | Chen Wei et.al. | 2311.00618v1 | null |
2023-11-01 | The Mystery and Fascination of LLMs: A Comprehensive Survey on the Interpretation and Analysis of Emergent Abilities | Yuxiang Zhou et.al. | 2311.00237v1 | null |
2023-11-01 | Is GPT Powerful Enough to Analyze the Emotions of Memes? | Jingjing Wang et.al. | 2311.00223v1 | null |
2023-10-31 | Large Language Model Can Interpret Latent Space of Sequential Recommender | Zhengyi Yang et.al. | 2310.20487v1 | link |
2023-10-31 | The SourceData-NLP dataset: integrating curation into scientific publishing for training large language models | Jorge Abreu-Vicente et.al. | 2310.20440v1 | link |
2023-10-30 | Generative retrieval-augmented ontologic graph and multi-agent strategies for interpretive large language model-based materials design | Markus J. Buehler et.al. | 2310.19998v1 | null |
2023-10-30 | GPCR-BERT: Interpreting Sequential Design of G Protein Coupled Receptors Using Protein Language Models | Seongwon Kim et.al. | 2310.19915v1 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-07-24 | Grammar-based Game Description Generation using Large Language Models | Tsunehiko Tanaka et.al. | 2407.17404v1 | null |
2024-07-24 | Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching | Yuyang Ding et.al. | 2407.17349v1 | null |
2024-07-24 | LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover | Zijian Wu et.al. | 2407.17227v1 | null |
2024-07-24 | Fusing LLMs and KGs for Formal Causal Reasoning behind Financial Risk Contagion | Guanyuan Yu et.al. | 2407.17190v1 | null |
2024-07-24 | Reinforced Prompt Personalization for Recommendation with Large Language Models | Wenyu Mao et.al. | 2407.17115v1 | link |
2024-07-24 | A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs | Jake R. Watts et.al. | 2407.16994v1 | null |
2024-07-24 | ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering | Xiuying Chen et.al. | 2407.16931v1 | null |
2024-07-23 | CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs | Jihyung Kil et.al. | 2407.16837v1 | link |
2024-07-23 | PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment | Jiahuan Li et.al. | 2407.16222v1 | null |
2024-07-23 | Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models | Shi Lin et.al. | 2407.16205v1 | null |
2024-07-23 | UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models | Liu Qi et.al. | 2407.16160v1 | null |
2024-07-22 | Enhancing Temporal Understanding in LLMs for Semi-structured Tables | Irwin Deng et.al. | 2407.16030v1 | null |
2024-07-22 | Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability | Zhuoyan Xu et.al. | 2407.15720v1 | link |
2024-07-22 | CrashEventLLM: Predicting System Crashes with Large Language Models | Priyanka Mudgal et.al. | 2407.15716v1 | null |
2024-07-22 | HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning | Zhecan Wang et.al. | 2407.15680v1 | null |
2024-07-22 | Dissecting Multiplication in Transformers: Insights into LLMs | Luyu Qiu et.al. | 2407.15360v1 | null |
2024-07-21 | Evidence-Based Temporal Fact Verification | Anab Maulana Barik et.al. | 2407.15291v1 | null |
2024-07-21 | MIBench: Evaluating Multimodal Large Language Models over Multiple Images | Haowei Liu et.al. | 2407.15272v1 | null |
2024-07-21 | Multi-Agent Causal Discovery Using Large Language Models | Hao Duong Le et.al. | 2407.15073v1 | null |
2024-07-22 | Knowledge Mechanisms in Large Language Models: A Survey and Perspective | Mengru Wang et.al. | 2407.15017v1 | null |
2024-07-20 | Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data | Antonis Antoniades et.al. | 2407.14985v1 | null |
2024-07-20 | TraveLLM: Could you plan my new public transit route in face of a network disruption? | Bowen Fang et.al. | 2407.14926v1 | null |
2024-07-20 | Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models | Ze Yu Zhang et.al. | 2407.14845v1 | null |
2024-07-20 | Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators | Harsh Lunia et.al. | 2407.14834v1 | null |
2024-07-20 | On the Design and Analysis of LLM-Based Algorithms | Yanxi Chen et.al. | 2407.14788v1 | link |
2024-07-19 | Adversarial Databases Improve Success in Retrieval-based Large Language Models | Sean Wu et.al. | 2407.14609v1 | null |
2024-07-18 | Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Though | Xiaoyu Tan et.al. | 2407.14562v1 | null |
2024-07-19 | Internal Consistency and Self-Feedback in Large Language Models: A Survey | Xun Liang et.al. | 2407.14507v1 | link |
2024-07-19 | On Pre-training of Multimodal Language Models Customized for Chart Understanding | Wan-Cyuan Fan et.al. | 2407.14506v1 | null |
2024-07-18 | ViLLa: Video Reasoning Segmentation with Large Language Model | Rongkun Zheng et.al. | 2407.14500v1 | link |
2024-07-19 | Evaluating the Reliability of Self-Explanations in Large Language Models | Korbinian Randl et.al. | 2407.14487v1 | link |
2024-07-19 | OpenSU3D: Open World 3D Scene Understanding using Foundation Models | Rafay Mohiuddin et.al. | 2407.14279v1 | null |
2024-07-19 | LeKUBE: A Legal Knowledge Update BEnchmark | Changyue Wang et.al. | 2407.14192v1 | null |
2024-07-19 | Visual Text Generation in the Wild | Yuanzhi Zhu et.al. | 2407.14138v1 | link |
2024-07-19 | Enhancing Data-Limited Graph Neural Networks by Actively Distilling Knowledge from Large Language Models | Quan Li et.al. | 2407.13989v1 | null |
2024-07-18 | Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction | Suma Bailis et.al. | 2407.13943v1 | null |
2024-07-18 | PRAGyan -- Connecting the Dots in Tweets | Rahul Ravi et.al. | 2407.13909v1 | null |
2024-07-18 | X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs | Sirnam Swetha et.al. | 2407.13851v1 | null |
2024-07-18 | Which objects help me to act effectively? Reasoning about physically-grounded affordances | Anne Kemmeren et.al. | 2407.13811v1 | null |
2024-07-18 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He et.al. | 2407.13761v1 | null |
2024-07-18 | Prover-Verifier Games improve legibility of LLM outputs | Jan Hendrik Kirchner et.al. | 2407.13692v1 | null |
2024-07-18 | Weak-to-Strong Reasoning | Yuqing Yang et.al. | 2407.13647v1 | link |
2024-07-18 | KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration | Youfu Yan et.al. | 2407.13598v1 | null |
2024-07-18 | Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation | Hassan Ali et.al. | 2407.13505v1 | null |
2024-07-18 | Combining Constraint Programming Reasoning with Large Language Model Predictions | Florian Régin et.al. | 2407.13490v1 | null |
2024-07-18 | BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models | Moon Ye-Bin et.al. | 2407.13442v1 | null |
2024-07-18 | Reconstruct the Pruned Model without Any Retraining | Pingjie Wang et.al. | 2407.13331v1 | null |
2024-07-18 | CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis | Junying Chen et.al. | 2407.13301v1 | null |
2024-07-18 | Are Large Language Models Capable of Generating Human-Level Narratives? | Yufei Tian et.al. | 2407.13248v1 | null |
2024-07-18 | Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data | Wufei Ma et.al. | 2407.13094v1 | null |
2024-07-17 | Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models | Sadegh Mahdavi et.al. | 2407.12979v1 | null |
2024-07-16 | BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval | Hongjin Su et.al. | 2407.12883v1 | null |
2024-07-16 | Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection | Ye Jiang et.al. | 2407.12879v1 | null |
2024-07-16 | Review-Feedback-Reason (ReFeR): A Novel Framework for NLG Evaluation and Reasoning | Yaswanth Narsupalli et.al. | 2407.12877v1 | null |
2024-07-12 | Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models | Jung Hyun Lee et.al. | 2407.12863v1 | null |
2024-07-10 | Analyzing Large language models chatbots: An experimental approach using a probability test | Melise Peruchini et.al. | 2407.12862v1 | null |
2024-07-17 | Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? | Ben Yao et.al. | 2407.12725v1 | null |
2024-07-17 | Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models | Xihe Qiu et.al. | 2407.12532v1 | null |
2024-07-17 | Struct-X: Enhancing Large Language Models Reasoning with Structured Data | Xiaoyu Tan et.al. | 2407.12522v1 | null |
2024-07-17 | Case2Code: Learning Inductive Reasoning with Synthetic Data | Yunfan Shao et.al. | 2407.12504v1 | link |
2024-07-17 | Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning | Mustafa Dogan et.al. | 2407.12498v1 | null |
2024-07-17 | F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions | Jie Yang et.al. | 2407.12435v1 | null |
2024-07-17 | TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish | Arda Yüksel et.al. | 2407.12402v1 | null |
2024-07-17 | Mamba-PTQ: Outlier Channels in Recurrent Large Language Models | Alessandro Pierro et.al. | 2407.12397v1 | null |
2024-07-17 | NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models | Gengze Zhou et.al. | 2407.12366v1 | link |
2024-07-17 | LLM-based query paraphrasing for video search | Jiaxin Wu et.al. | 2407.12341v1 | null |
2024-07-16 | Private prediction for large-scale synthetic text generation | Kareem Amin et.al. | 2407.12108v1 | null |
2024-07-16 | Better RAG using Relevant Information Gain | Marc Pickett et.al. | 2407.12101v1 | link |
2024-07-16 | NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? | Mo Li et.al. | 2407.11963v1 | link |
2024-07-17 | Harnessing Large Language Models for Multimodal Product Bundling | Xiaohao Liu et.al. | 2407.11712v2 | null |
2024-07-16 | A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting | He Chang et.al. | 2407.11638v1 | null |
2024-07-16 | Reasoning with Large Language Models, a Survey | Aske Plaat et.al. | 2407.11511v1 | null |
2024-07-16 | SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions | Shicheng Liu et.al. | 2407.11417v1 | null |
2024-07-19 | Reliable Reasoning Beyond Natural Language | Nasim Borazjanizadeh et.al. | 2407.11373v2 | null |
2024-07-16 | VISA: Reasoning Video Object Segmentation via Large Language Models | Cilin Yan et.al. | 2407.11325v1 | link |
2024-07-15 | Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game | Tim Merino et.al. | 2407.11240v1 | null |
2024-07-17 | Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay | Gonçalo Hora de Carvalho et.al. | 2407.11068v2 | link |
2024-07-15 | Can Textual Semantics Mitigate Sounding Object Segmentation Preference? | Yaoting Wang et.al. | 2407.10947v1 | link |
2024-07-15 | Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval | Shengjie Ma et.al. | 2407.10805v1 | null |
2024-07-15 | Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping | Wenhao Zhu et.al. | 2407.10795v1 | link |
2024-07-15 | Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education | Rui Yang et.al. | 2407.10794v1 | link |
2024-07-16 | Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning | Yulong Wang et.al. | 2407.10718v2 | link |
2024-07-18 | Qwen2 Technical Report | An Yang et.al. | 2407.10671v3 | link |
2024-07-17 | LAB-Bench: Measuring Capabilities of Language Models for Biology Research | Jon M. Laurent et.al. | 2407.10362v3 | null |
2024-07-20 | Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models | Yuchen Yang et.al. | 2407.10299v2 | link |
2024-07-14 | GenSco: Can Question Decomposition based Passage Alignment improve Question Answering? | Barah Fazili et.al. | 2407.10245v1 | null |
2024-07-20 | BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs | Zhiting Fan et.al. | 2407.10241v2 | null |
2024-07-22 | Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model | Xunyu Zhu et.al. | 2407.10167v2 | null |
2024-07-14 | ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning | Zhongsheng Wang et.al. | 2407.10162v1 | link |
2024-07-19 | Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine | Omid Rohanian et.al. | 2407.10086v2 | null |
2024-07-14 | All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era | Bo Chen et.al. | 2407.10081v1 | null |
2024-07-13 | Benchmarking LLMs for Optimization Modeling and Enhancing Reasoning via Reverse Socratic Synthesis | Zhicheng Yang et.al. | 2407.09887v1 | link |
2024-07-13 | IoT-LM: Large Multisensory Language Models for the Internet of Things | Shentong Mo et.al. | 2407.09801v1 | link |
2024-07-17 | Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study | Yulong Yang et.al. | 2407.09295v2 | null |
2024-07-17 | Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models | Dong Shu et.al. | 2407.09292v2 | null |
2024-07-12 | Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning | Thuy Ngoc Nguyen et.al. | 2407.09281v1 | null |
2024-07-12 | Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors | Nico Daheim et.al. | 2407.09136v1 | link |
2024-07-12 | STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs | Yiheng Huang et.al. | 2407.09096v1 | null |
2024-07-12 | SpreadsheetLLM: Encoding Spreadsheets for Large Language Models | Yuzhang Tian et.al. | 2407.09025v1 | null |
2024-07-12 | Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures? | Yingming Pu et.al. | 2407.08922v1 | link |
2024-07-11 | Evaluating Nuanced Bias in Large Language Model Free Response Answers | Jennifer Healey et.al. | 2407.08842v1 | null |
2024-07-11 | MAVIS: Mathematical Visual Instruction Tuning | Renrui Zhang et.al. | 2407.08739v1 | link |
2024-07-11 | Real-Time Anomaly Detection and Reactive Planning with Large Language Models | Rohan Sinha et.al. | 2407.08735v1 | null |
2024-07-11 | Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist | Zihao Zhou et.al. | 2407.08733v1 | null |
2024-07-11 | GTA: A Benchmark for General Tool Agents | Jize Wang et.al. | 2407.08713v1 | link |
2024-07-11 | Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight | Zhiqiang Xie et.al. | 2407.08694v1 | null |
2024-07-15 | Emergent Visual-Semantic Hierarchies in Image-Text Representations | Morris Alper et.al. | 2407.08521v2 | null |
2024-07-16 | Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents | Haoyi Xiong et.al. | 2407.08516v2 | null |
2024-07-11 | Investigating LLMs as Voting Assistants via Contextual Augmentation: A Case Study on the European Parliament Elections 2024 | Ilias Chalkidis et.al. | 2407.08495v1 | null |
2024-07-11 | Lynx: An Open Source Hallucination Evaluation Model | Selvan Sunitha Ravi et.al. | 2407.08488v1 | null |
2024-07-17 | Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On | Liang Zeng et.al. | 2407.08348v2 | null |
2024-07-12 | RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL | Zhenhe Wu et.al. | 2407.08273v2 | null |
2024-07-16 | Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding | Minghui Wu et.al. | 2407.08150v2 | null |
2024-07-10 | RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization | Xijie Huang et.al. | 2407.08044v1 | link |
2024-07-10 | A Critical Review of Causal Reasoning Benchmarks for Large Language Models | Linying Yang et.al. | 2407.08029v1 | null |
2024-07-04 | CaseGPT: a case reasoning framework based on language models and retrieval-augmented generation | Rui Yang et.al. | 2407.07913v1 | null |
2024-07-12 | A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends | Daizong Liu et.al. | 2407.07403v2 | link |
2024-07-10 | LokiLM: Technical Report | Justin Kiefel et.al. | 2407.07370v1 | null |
2024-07-10 | Interpretable Differential Diagnosis with Dual-Inference Large Language Models | Shuang Zhou et.al. | 2407.07330v1 | null |
2024-07-10 | Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model | Wenqi Zhang et.al. | 2407.07053v2 | link |
2024-07-09 | Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective | Shahana Ibrahim et.al. | 2407.06902v1 | null |
2024-07-08 | A Single Transformer for Scalable Vision-Language Modeling | Yangyi Chen et.al. | 2407.06438v1 | link |
2024-07-08 | Multimodal Chain-of-Thought Reasoning via ChatGPT to Protect Children from Age-Inappropriate Apps | Chuanbo Hu et.al. | 2407.06309v1 | null |
2024-07-08 | CodeUpdateArena: Benchmarking Knowledge Editing on API Updates | Zeyu Leo Liu et.al. | 2407.06249v1 | null |
2024-07-08 | SimPal: Towards a Meta-Conversational Framework to Understand Teacher's Instructional Goals for K-12 Physics | Effat Farhana et.al. | 2407.06241v1 | null |
2024-07-08 | Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision | Orr Zohar et.al. | 2407.06189v1 | link |
2024-07-08 | iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement | Aoyu Pang et.al. | 2407.06025v1 | link |
2024-07-09 | Distilling System 2 into System 1 | Ping Yu et.al. | 2407.06023v2 | null |
2024-07-08 | Towards Optimizing and Evaluating a Retrieval Augmented QA Chatbot using LLMs with Human in the Loop | Anum Afzal et.al. | 2407.05925v1 | null |
2024-07-08 | When is the consistent prediction likely to be a correct prediction? | Alex Nguyen et.al. | 2407.05778v1 | null |
2024-07-08 | Large Language Models Understand Layouts | Weiming Li et.al. | 2407.05750v1 | null |
2024-07-08 | Empirical Study of Symmetrical Reasoning in Conversational Chatbots | Daniela N. Rim et.al. | 2407.05734v1 | null |
2024-07-08 | Retrieved In-Context Principles from Previous Mistakes | Hao Sun et.al. | 2407.05682v1 | null |
2024-07-07 | Training Task Experts through Retrieval Based Distillation | Jiaxin Ge et.al. | 2407.05463v1 | null |
2024-07-07 | LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models | Weizhi Tang et.al. | 2407.05434v1 | link |
2024-07-10 | SBoRA: Low-Rank Adaptation with Regional Weight Updates | Lai-Man Po et.al. | 2407.05413v2 | link |
2024-07-07 | ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models | Xiyuan Zhou et.al. | 2407.05365v1 | link |
2024-07-07 | VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool | Yan Wang et.al. | 2407.05355v1 | null |
2024-07-07 | WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks | Léo Boisvert et.al. | 2407.05291v1 | link |
2024-07-07 | Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions | Zhiwen You et.al. | 2407.05271v1 | link |
2024-07-06 | Lucy: Think and Reason to Solve Text-to-SQL | Nina Narodytska et.al. | 2407.05153v1 | null |
2024-07-06 | Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns? | Kuei-Chun Kao et.al. | 2407.05134v1 | null |
2024-07-06 | Progress or Regress? Self-Improvement Reversal in Post-training | Ting Wu et.al. | 2407.05013v1 | null |
2024-07-06 | LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts | Yijia Xiao et.al. | 2407.04973v1 | link |
2024-07-06 | MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models | Yunjia Xi et.al. | 2407.04960v1 | link |
2024-07-06 | Safe Generative Chats in a WhatsApp Intelligent Tutoring System | Zachary Levonian et.al. | 2407.04915v1 | null |
2024-07-06 | Algorithmic Language Models with Neurally Compiled Libraries | Lucas Saldyt et.al. | 2407.04899v1 | null |
2024-07-12 | On scalable oversight with weak LLMs judging strong LLMs | Zachary Kenton et.al. | 2407.04622v2 | null |
2024-07-05 | Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model | Duy M. H. Nguyen et.al. | 2407.04489v1 | null |
2024-07-05 | cosmosage: A Natural-Language Assistant for Cosmologists | Tijmen de Haan et.al. | 2407.04420v1 | link |
2024-07-05 | AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents | Petr Anokhin et.al. | 2407.04363v1 | link |
2024-07-05 | Towards Context-aware Support for Color Vision Deficiency: An Approach Integrating LLM and AR | Shogo Morita et.al. | 2407.04362v1 | null |
2024-07-05 | WOMD-Reasoning: A Large-Scale Language Dataset for Interaction and Driving Intentions Reasoning | Yiheng Li et.al. | 2407.04281v1 | null |
2024-07-09 | DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning | Chengpeng Li et.al. | 2407.04078v2 | link |
2024-07-04 | Semantic Graphs for Syntactic Simplification: A Revisit from the Age of LLM | Peiran Yao et.al. | 2407.04067v1 | link |
2024-07-04 | A Survey on Natural Language Counterfactual Generation | Yongjie Wang et.al. | 2407.03993v1 | null |
2024-07-04 | MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices | Jiayi Zhang et.al. | 2407.03913v1 | null |
2024-07-04 | From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI | Stefanie Krause et.al. | 2407.03778v1 | null |
2024-07-04 | STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering | Zhenyu Bi et.al. | 2407.03687v1 | null |
2024-07-04 | Improving Self Consistency in LLMs through Probabilistic Tokenization | Ashutosh Sathe et.al. | 2407.03678v1 | null |
2024-07-14 | Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction | Amanda Dsouza et.al. | 2407.03651v2 | link |
2024-07-04 | Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models | Chang-Sheng Kao et.al. | 2407.03615v1 | link |
2024-07-03 | UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization | Md Nayem Uddin et.al. | 2407.03525v1 | null |
2024-07-03 | On Large Language Models in National Security Applications | William N. Caballero et.al. | 2407.03453v1 | null |
2024-07-03 | How Does Quantization Affect Multilingual LLMs? | Kelly Marchisio et.al. | 2407.03211v1 | null |
2024-07-03 | TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts | Ruida Wang et.al. | 2407.03203v1 | link |
2024-07-03 | Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models | Haritz Puerto et.al. | 2407.03181v1 | link |
2024-07-03 | Investigating Decoder-only Large Language Models for Speech-to-text Translation | Chao-Wei Huang et.al. | 2407.03169v1 | null |
2024-07-03 | Social Bias Evaluation for Large Language Models Requires Prompt Variations | Rem Hida et.al. | 2407.03129v1 | link |
2024-07-03 | ALTER: Augmentation for Large-Table-Based Reasoning | Han Zhang et.al. | 2407.03061v1 | link |
2024-07-03 | Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering | Zhaohe Liao et.al. | 2407.03008v1 | null |
2024-07-03 | SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research | Meghal Dani et.al. | 2407.03004v1 | null |
2024-07-03 | Large Language Models as Evaluators for Scientific Synthesis | Julia Evans et.al. | 2407.02977v1 | null |
2024-07-03 | FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering | Xiaochen Wang et.al. | 2407.02964v1 | null |
2024-07-03 | GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models | Zike Yuan et.al. | 2407.02936v1 | link |
2024-07-03 | LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation | Hongke Zhao et.al. | 2407.02833v1 | null |
2024-07-02 | Reasoning in Large Language Models: A Geometric Perspective | Romain Cosentino et.al. | 2407.02678v1 | null |
2024-07-02 | An AI-Based System Utilizing IoT-Enabled Ambient Sensors and LLMs for Complex Activity Tracking | Yuan Sun et.al. | 2407.02606v1 | null |
2024-07-02 | Open Scene Graphs for Open World Object-Goal Navigation | Joel Loo et.al. | 2407.02473v1 | null |
2024-07-02 | TokenPacker: Efficient Visual Projector for Multimodal LLM | Wentong Li et.al. | 2407.02392v1 | link |
2024-07-02 | Generative Large Language Models in Automated Fact-Checking: A Survey | Ivan Vykopal et.al. | 2407.02351v1 | null |
2024-07-02 | RVISA: Reasoning and Verification for Implicit Sentiment Analysis | Wenna Lai et.al. | 2407.02340v1 | null |
2024-07-02 | Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks | Adrian Rebmann et.al. | 2407.02310v1 | link |
2024-07-02 | Multilingual Trolley Problems for Language Models | Zhijing Jin et.al. | 2407.02273v1 | link |
2024-07-04 | Embodied AI in Mobile Robots: Coverage Path Planning with Large Language Models | Xiangrui Kong et.al. | 2407.02220v2 | null |
2024-07-02 | Automatic Adaptation Rule Optimization via Large Language Models | Yusei Ishimizu et.al. | 2407.02203v1 | null |
2024-07-02 | Is Your Large Language Model Knowledgeable or a Choices-Only Cheater? | Nishant Balepur et.al. | 2407.01992v1 | null |
2024-07-04 | Enabling Discriminative Reasoning in LLMs for Legal Judgment Prediction | Chenlong Deng et.al. | 2407.01964v3 | link |
2024-07-02 | Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness | Khyathi Raghavi Chandu et.al. | 2407.01942v1 | null |
2024-07-02 | GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning | Zhisheng Tang et.al. | 2407.01892v1 | link |
2024-07-01 | DiscoveryBench: Towards Data-Driven Discovery with Large Language Models | Bodhisattwa Prasad Majumder et.al. | 2407.01725v1 | link |
2024-07-01 | Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning | Akshara Prabhakar et.al. | 2407.01687v1 | link |
2024-07-01 | KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches | Jiayi Yuan et.al. | 2407.01527v1 | null |
2024-07-02 | Empowering 3D Visual Grounding with Reasoning Capabilities | Chenming Zhu et.al. | 2407.01525v2 | null |
2024-07-01 | TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models' Theory-of-Mind | Guiyang Hou et.al. | 2407.01455v1 | null |
2024-07-01 | MIRAI: Evaluating LLM Agents for Event Forecasting | Chenchen Ye et.al. | 2407.01231v1 | null |
2024-07-01 | EconNLI: Evaluating Large Language Models on Economics Reasoning | Yue Guo et.al. | 2407.01212v1 | link |
2024-07-01 | IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation | Senyu Han et.al. | 2407.01093v1 | link |
2024-07-03 | FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models | Yiyuan Li et.al. | 2407.01046v2 | link |
2024-07-01 | DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models | Jiabao Pan et.al. | 2407.01009v1 | null |
2024-07-01 | Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense | Yi Yu et.al. | 2407.00995v1 | null |
2024-07-01 | Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents | Shihan Deng et.al. | 2407.00993v1 | null |
2024-07-01 | Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving | Ran Tian et.al. | 2407.00959v1 | null |
2024-07-01 | MalAlgoQA: A Pedagogical Approach for Evaluating Counterfactual Reasoning Abilities | Naiming Liu et.al. | 2407.00938v1 | null |
2024-07-01 | MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula | Shubhra Mishra et.al. | 2407.00900v1 | link |
2024-07-01 | Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks | Yue Zhou et.al. | 2407.00869v1 | null |
2024-07-02 | Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning | Zimu Lu et.al. | 2407.00782v2 | link |
2024-06-30 | Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs | Yifei Zhang et.al. | 2407.00653v1 | null |
2024-06-29 | LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement | Jiahao Ying et.al. | 2407.00497v1 | null |
2024-06-29 | MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation | Jinsheng Huang et.al. | 2407.00468v1 | link |
2024-06-29 | Too Late to Train, Too Early To Use? A Study on Necessity and Viability of Low-Resource Bengali LLMs | Tamzeed Mahfuz et.al. | 2407.00416v1 | null |
2024-06-29 | Advancing Process Verification for Large Language Models via Tree-Based Preference Learning | Mingqian He et.al. | 2407.00390v1 | null |
2024-06-28 | Evaluating Human Alignment and Model Faithfulness of LLM Rationale | Mohsen Fayyaz et.al. | 2407.00219v1 | null |
2024-06-27 | From Efficient Multimodal Models to World Models: A Survey | Xinji Mai et.al. | 2407.00118v1 | null |
2024-06-26 | Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges | Mohammed Elhenawy et.al. | 2407.00092v1 | null |
2024-06-28 | LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | Xiang Li et.al. | 2406.20095v1 | link |
2024-06-28 | Scaling Synthetic Data Creation with 1,000,000,000 Personas | Xin Chan et.al. | 2406.20094v1 | link |
2024-06-28 | Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language | Yicheng Chen et.al. | 2406.20085v1 | null |
2024-07-02 | BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration | Noel Crawford et.al. | 2406.20041v3 | null |
2024-06-28 | ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models | Yuxiang Zhang et.al. | 2406.20015v1 | link |
2024-06-28 | Into the Unknown: Generating Geospatial Descriptions for New Environments | Tzuf Paz-Argaman et.al. | 2406.19967v1 | null |
2024-06-28 | BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering | Zheng Chu et.al. | 2406.19820v1 | null |
2024-06-28 | Belief Revision: The Adaptability of Large Language Models Reasoning | Bryan Wilie et.al. | 2406.19764v1 | null |
2024-07-02 | ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning | Christopher E. Mower et.al. | 2406.19741v2 | link |
2024-06-28 | MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics? | Jinming Li et.al. | 2406.19693v1 | null |
2024-06-27 | Rethinking harmless refusals when fine-tuning foundation models | Florin Pop et.al. | 2406.19552v1 | null |
2024-06-27 | Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations | Ritam Dutt et.al. | 2406.19545v1 | link |
2024-06-27 | Context Matters: An Empirical Study of the Impact of Contextual Information in Temporal Question Answering Systems | Dan Schumacher et.al. | 2406.19538v1 | null |
2024-07-04 | Using Large Language Models to Assist Video Content Analysis: An Exploratory Study of Short Videos on Depression | Jiaying Liu et.al. | 2406.19528v2 | null |
2024-06-27 | Investigating How Large Language Models Leverage Internal Knowledge to Perform Complex Reasoning | Miyoung Ko et.al. | 2406.19502v1 | link |
2024-07-02 | ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos | Jr-Jen Chen et.al. | 2406.19392v2 | link |
2024-06-27 | From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data | Zheyang Xiong et.al. | 2406.19292v1 | null |
2024-06-27 | Aligning Teacher with Student Preferences for Tailored Training Data Generation | Yantao Liu et.al. | 2406.19227v1 | null |
2024-06-27 | Towards Learning Abductive Reasoning using VSA Distributed Representations | Giacomo Camposampiero et.al. | 2406.19121v1 | link |
2024-06-27 | STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis | Wenbin Li et.al. | 2406.19065v1 | link |
2024-06-28 | UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models | Siyuan Wu et.al. | 2406.18966v2 | link |
2024-06-27 | Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA | Elham J. Barezi et.al. | 2406.18839v1 | null |
2024-06-26 | Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism | Shi Zong et.al. | 2406.18762v1 | null |
2024-06-26 | Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models | Georgios Tziafas et.al. | 2406.18746v1 | null |
2024-07-01 | Towards Open-World Grasping with Large Vision-Language Models | Georgios Tziafas et.al. | 2406.18722v2 | null |
2024-06-26 | Learning to Correct for QA Reasoning with Black-box LLMs | Jaehyung Kim et.al. | 2406.18695v1 | link |
2024-06-26 | Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation | Guanting Dong et.al. | 2406.18676v1 | link |
2024-06-26 | Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs | Xin Lai et.al. | 2406.18629v1 | link |
2024-06-26 | An LLM-based Knowledge Synthesis and Scientific Reasoning Framework for Biomedical Discovery | Oskar Wysocki et.al. | 2406.18626v1 | null |
2024-06-26 | CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs | Zirui Wang et.al. | 2406.18521v1 | link |
2024-06-26 | Mental Modeling of Reinforcement Learning Agents by Language Models | Wenhao Lu et.al. | 2406.18505v1 | null |
2024-06-26 | MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | Meng Fang et.al. | 2406.18321v1 | null |
2024-06-26 | AI-native Memory: A Pathway from LLMs Towards AGI | Jingbo Shang et.al. | 2406.18312v1 | null |
2024-06-26 | SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding | Zhenglin Wang et.al. | 2406.18200v1 | null |
2024-06-26 | Knowledge Graph Enhanced Retrieval-Augmented Generation for Failure Mode and Effects Analysis | Lukas Bahr et.al. | 2406.18114v1 | link |
2024-06-26 | Multi-step Knowledge Retrieval and Inference over Unstructured Data | Aditya Kalyanpur et.al. | 2406.17987v1 | null |
2024-06-25 | NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization | Md Mahadi Hasan Nahid et.al. | 2406.17961v1 | null |
2024-06-25 | Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback | Zhongtao Miao et.al. | 2406.17873v1 | link |
2024-06-22 | MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries? | Xirui Li et.al. | 2406.17806v1 | null |
2024-06-25 | LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic | Aditya Kalyanpur et.al. | 2406.17663v1 | null |
2024-06-25 | Banishing LLM Hallucinations Requires Rethinking Generalization | Johnny Li et.al. | 2406.17642v1 | null |
2024-06-25 | "Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations? | Beiduo Chen et.al. | 2406.17600v1 | null |
2024-06-26 | LongIns: A Challenging Long-context Instruction-based Exam for LLMs | Shawn Gavin et.al. | 2406.17588v2 | null |
2024-06-25 | Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats | Ryan Pavlich et.al. | 2406.17574v1 | null |
2024-06-25 | The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale | Guilherme Penedo et.al. | 2406.17557v1 | null |
2024-06-25 | Tell Me Where You Are: Multimodal LLMs Meet Place Recognition | Zonglin Lyu et.al. | 2406.17520v1 | null |
2024-06-25 | Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA | Minzheng Wang et.al. | 2406.17419v1 | link |
2024-06-25 | Leveraging LLMs for Dialogue Quality Measurement | Jinghan Jia et.al. | 2406.17304v1 | null |
2024-06-26 | Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models | Wenhao Shi et.al. | 2406.17294v2 | link |
2024-06-25 | DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph | Zhehao Zhang et.al. | 2406.17271v1 | link |
2024-06-24 | CogExplore: Contextual Exploration with Language-Encoded Environment Representations | Harel Biggie et.al. | 2406.17180v1 | null |
2024-06-24 | Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models | Nisarg Patel et.al. | 2406.17169v1 | link |
2024-06-24 | USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations | Mounika Marreddy et.al. | 2406.16833v1 | null |
2024-06-25 | Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs | Ashwinee Panda et.al. | 2406.16797v2 | link |
2024-06-24 | Scaling Laws for Linear Complexity Language Models | Xuyang Shen et.al. | 2406.16690v1 | link |
2024-06-24 | Large Language Models Are Cross-Lingual Knowledge-Free Reasoners | Peng Hu et.al. | 2406.16655v1 | link |
2024-06-25 | OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer | Lu Zhang et.al. | 2406.16620v2 | null |
2024-06-24 | Evaluating the Ability of Large Language Models to Reason about Cardinal Directions | Anthony G Cohn et.al. | 2406.16528v1 | null |
2024-06-24 | eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure | Hoorieh Sabzevari et.al. | 2406.16490v1 | link |
2024-06-24 | Evaluating and Analyzing Relationship Hallucinations in LVLMs | Mingrui Wu et.al. | 2406.16449v1 | link |
2024-06-29 | EmoLLM: Multimodal Emotional Understanding Meets Large Language Models | Qu Yang et.al. | 2406.16442v2 | link |
2024-06-24 | UniCoder: Scaling Code Large Language Model via Universal Code | Tao Sun et.al. | 2406.16441v1 | null |
2024-06-24 | Anomaly Detection of Tabular Data Using LLMs | Aodong Li et.al. | 2406.16308v1 | null |
2024-06-23 | GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets | Qiming Wu et.al. | 2406.16176v1 | null |
2024-06-23 | Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step | Zezhong Wang et.al. | 2406.16144v1 | null |
2024-06-23 | PORT: Preference Optimization on Reasoning Traces | Salem Lahlou et.al. | 2406.16061v1 | null |
2024-06-23 | Can LLM Graph Reasoning Generalize beyond Pattern Memorization? | Yizhuo Zhang et.al. | 2406.15992v1 | null |
2024-06-26 | BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions | Terry Yue Zhuo et.al. | 2406.15877v2 | link |
2024-06-30 | LLM-Powered Explanations: Unraveling Recommendations Through Subgraph Reasoning | Guangsi Shi et.al. | 2406.15859v2 | null |
2024-06-22 | MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception | Guanqun Wang et.al. | 2406.15768v1 | null |
2024-06-22 | video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models | Guangzhi Sun et.al. | 2406.15704v1 | link |
2024-06-21 | Robust Reinforcement Learning from Corrupted Human Feedback | Alexander Bukharin et.al. | 2406.15568v1 | null |
2024-06-18 | On the Principles behind Opinion Dynamics in Multi-Agent Systems of Large Language Models | Pedro Cisneros-Velarde et.al. | 2406.15492v1 | null |
2024-06-21 | Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network | Badr AlKhamissi et.al. | 2406.15109v1 | link |
2024-06-21 | MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens | Yongqi Fan et.al. | 2406.15019v1 | link |
2024-06-21 | Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers | Manuel Mondal et.al. | 2406.14986v1 | null |
2024-06-21 | ICLEval: Evaluating In-Context Learning Ability of Large Language Models | Wentong Chen et.al. | 2406.14955v1 | link |
2024-06-21 | Autonomous Agents for Collaborative Task under Information Asymmetry | Wei Liu et.al. | 2406.14928v1 | link |
2024-06-21 | Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video | Zhengbang Yang et.al. | 2406.14877v1 | null |
2024-06-21 | DistiLRR: Transferring Code Repair for Low-Resource Programming Languages | Kyle Wong et.al. | 2406.14867v1 | link |
2024-06-21 | Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models | Jiayu Wang et.al. | 2406.14852v1 | null |
2024-06-20 | ACR: A Benchmark for Automatic Cohort Retrieval | Dung Ngoc Thai et.al. | 2406.14780v1 | null |
2024-06-20 | A Learn-Then-Reason Model Towards Generalization in Knowledge Base Question Answering | Lingxi Zhang et.al. | 2406.14763v1 | null |
2024-06-20 | Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task? | Zhiqiang Pi et.al. | 2406.14737v1 | null |
2024-06-20 | Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell | Taiming Lu et.al. | 2406.14673v1 | link |
2024-06-20 | HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation | Jin Wang et.al. | 2406.14655v1 | null |
2024-06-20 | Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Sachit Menon et.al. | 2406.14562v1 | null |
2024-06-21 | Asynchronous Large Language Model Enhanced Planner for Autonomous Driving | Yuan Chen et.al. | 2406.14556v2 | null |
2024-06-20 | Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | Johannes Treutlein et.al. | 2406.14546v1 | link |
2024-06-20 | Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs | Yuxuan Qiao et.al. | 2406.14544v1 | link |
2024-06-25 | SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages | Gayane Ghazaryan et.al. | 2406.14425v2 | null |
2024-06-20 | The neural correlates of logical-mathematical symbol systems processing resemble that of spatial cognition more than natural language processing | Yuannan Li et.al. | 2406.14358v1 | null |
2024-06-20 | medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs | Mingyi Jia et.al. | 2406.14326v1 | null |
2024-06-27 | Q: Improving Multi-step Reasoning for LLMs with Deliberative Planning* | Chaojie Wang et.al. | 2406.14283v3 | null |
2024-06-20 | SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots | Weixing Wang et.al. | 2406.14208v1 | null |
2024-06-20 | Timo: Towards Better Temporal Reasoning for Language Models | Zhaochen Su et.al. | 2406.14192v1 | link |
2024-06-20 | Definition generation for lexical semantic change detection | Mariia Fedorova et.al. | 2406.14167v1 | link |
2024-07-01 | Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration | Haokun Liu et.al. | 2406.14097v2 | null |
2024-06-20 | MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models | Zhongshen Zeng et.al. | 2406.13975v1 | null |
2024-06-20 | Causal Inference with Latent Variables: Recent Advances and Future Prospectives | Yaochen Zhu et.al. | 2406.13966v1 | null |
2024-06-20 | CityGPT: Empowering Urban Spatial Cognition of Large Language Models | Jie Feng et.al. | 2406.13948v1 | null |
2024-06-20 | AspirinSum: an Aspect-based utility-preserved de-identification Summarization framework | Ya-Lun Li et.al. | 2406.13947v1 | null |
2024-06-19 | Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events | Mohammad Abu Tami et.al. | 2406.13894v1 | null |
2024-06-19 | Adaptable Logical Control for Large Language Models | Honghua Zhang et.al. | 2406.13892v1 | link |
2024-06-19 | Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning | Yuval Shalev et.al. | 2406.13858v1 | null |
2024-06-27 | Can Low-Rank Knowledge Distillation in LLMs be Useful for Microelectronic Reasoning? | Nirjhor Rouf et.al. | 2406.13808v3 | null |
2024-06-19 | WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia | Yufang Hou et.al. | 2406.13805v1 | null |
2024-06-19 | Semantic Structure-Mapping in LLM and Human Analogical Reasoning | Sam Musker et.al. | 2406.13803v1 | link |
2024-06-19 | Can LLMs Reason in the Wild with Programs? | Yuan Yang et.al. | 2406.13764v1 | link |
2024-06-19 | Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models | Zhawnen Chen et.al. | 2406.13763v1 | null |
2024-06-19 | Improving Visual Commonsense in Language Models via Multiple Image Generation | Guy Yariv et.al. | 2406.13621v1 | link |
2024-06-27 | VDebugger: Harnessing Execution Feedback for Debugging Visual Programs | Xueqing Wu et.al. | 2406.13444v2 | link |
2024-06-19 | Finding Blind Spots in Evaluator LLMs with Interpretable Checklists | Sumanth Doddapaneni et.al. | 2406.13439v1 | link |
2024-06-19 | MoreHopQA: More Than Multi-hop Reasoning | Julian Schnitzler et.al. | 2406.13397v1 | link |
2024-06-19 | ALiiCE: Evaluating Positional Fine-grained Citation Generation | Yilong Xu et.al. | 2406.13375v1 | null |
2024-06-19 | Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets | Lucas Druart et.al. | 2406.13269v1 | null |
2024-06-19 | Bridging Law and Data: Augmenting Reasoning via a Semi-Structured Dataset with IRAC methodology | Xiaoxi Kang et.al. | 2406.13217v1 | null |
2024-06-19 | Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata | Mykhailo Poliakov et.al. | 2406.13213v1 | link |
2024-06-19 | DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents | Jiho Kim et.al. | 2406.13144v1 | link |
2024-06-19 | Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation | Yuhang Zhou et.al. | 2406.13114v1 | null |
2024-06-18 | Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study Using the TRAPD Method | Jerson Francia et.al. | 2406.13049v1 | null |
2024-06-18 | MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction | Yuyan Liu et.al. | 2406.12950v1 | link |
2024-06-18 | DrVideo: Document Retrieval Based Long Video Understanding | Ziyu Ma et.al. | 2406.12846v1 | null |
2024-06-18 | LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation | Seyedarmin Azizi et.al. | 2406.12832v1 | link |
2024-06-18 | UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions | Xunzhi Wang et.al. | 2406.12784v1 | link |
2024-06-18 | Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries | Eden Biran et.al. | 2406.12775v1 | link |
2024-06-18 | OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI | Zhen Huang et.al. | 2406.12753v1 | link |
2024-06-18 | Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning | Bingchen Zhao et.al. | 2406.12742v1 | link |
2024-06-18 | MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL | Arian Askari et.al. | 2406.12692v1 | null |
2024-06-18 | DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence? | Zhouhong Gu et.al. | 2406.12641v1 | link |
2024-06-18 | Ask-before-Plan: Proactive Language Agents for Real-World Planning | Xuan Zhang et.al. | 2406.12639v1 | link |
2024-06-18 | Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics | Chenggang Cui et.al. | 2406.12628v1 | null |
2024-06-18 | Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges | Aman Singh Thakur et.al. | 2406.12624v1 | null |
2024-06-18 | Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling | Yao-Ching Yu et.al. | 2406.12585v1 | link |
2024-06-19 | Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models | Eldar Kurtic et.al. | 2406.12572v2 | link |
2024-06-18 | Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models | Philipp Mondorf et.al. | 2406.12546v1 | null |
2024-06-18 | LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation | Yuhao Wang et.al. | 2406.12529v1 | null |
2024-06-18 | LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization | Masafumi Enomoto et.al. | 2406.12494v1 | null |
2024-06-18 | RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding | Linrui Xu et.al. | 2406.12479v1 | link |
2024-06-18 | IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models | Qiyao Wang et.al. | 2406.12386v1 | link |
2024-06-18 | Problem-Solving in Language Model Networks | Ciaran Regan et.al. | 2406.12374v1 | link |
2024-06-18 | Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding | Weizhi Fei et.al. | 2406.12331v1 | null |
2024-06-18 | PRePair: Pointwise Reasoning Enhance Pairwise Evaluating for Robust Instruction-Following Assessments | Hawon Jeong et.al. | 2406.12319v1 | null |
2024-06-18 | An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs | Daking Rai et.al. | 2406.12288v1 | null |
2024-06-18 | Unveiling Implicit Table Knowledge with Question-Then-Pinpoint Reasoner for Insightful Table Summarization | Kwangwook Seo et.al. | 2406.12269v1 | null |
2024-06-18 | A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning | Lijie Hu et.al. | 2406.12255v1 | null |
2024-06-24 | Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector | Gangwei Jiang et.al. | 2406.12227v2 | null |
2024-06-18 | Leveraging Large Language Model for Heterogeneous Ad Hoc Teamwork Collaboration | Xinzhu Liu et.al. | 2406.12224v1 | null |
2024-06-18 | Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems | Nasim Borazjanizadeh et.al. | 2406.12172v1 | null |
2024-06-19 | Is poisoning a real threat to LLM alignment? Maybe more so than you think | Pankayaraj Pathmanathan et.al. | 2406.12091v2 | link |
2024-06-17 | InternalInspector |
Mohammad Beigi et.al. | 2406.12053v1 | null |
2024-06-17 | MedCalc-Bench: Evaluating Large Language Models for Medical Calculations | Nikhil Khandekar et.al. | 2406.12036v1 | link |
2024-06-17 | Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts | Junmo Kang et.al. | 2406.12034v1 | null |
2024-06-17 | GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models | Yi Fang et.al. | 2406.11945v1 | link |
2024-06-16 | A Notion of Complexity for Theory of Mind via Discrete World Models | X. Angelo Huang et.al. | 2406.11911v1 | link |
2024-06-15 | A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges | Yuqi Nie et.al. | 2406.11903v1 | null |
2024-06-17 | Improving Multi-Agent Debate with Sparse Communication Topology | Yunxuan Li et.al. | 2406.11776v1 | null |
2024-06-17 | Meta Reasoning for Large Language Models | Peizhong Gao et.al. | 2406.11698v1 | null |
2024-06-17 | TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy | Yiqun Chen et.al. | 2406.11678v1 | link |
2024-06-17 | A Two-dimensional Zero-shot Dialogue State Tracking Evaluation Method using GPT-4 | Ming Gu et.al. | 2406.11651v1 | link |
2024-06-17 | Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models | Sheng Feng et.al. | 2406.11568v1 | link |
2024-06-17 | MEMLA: Enhancing Multilingual Knowledge Editing with Neuron-Masked Low-Rank Adaptation | Jiakuan Xie et.al. | 2406.11566v1 | null |
2024-06-17 | AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation | Chuyan Xiong et.al. | 2406.11548v1 | null |
2024-06-17 | Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs | Yi Fang et.al. | 2406.11514v1 | null |
2024-06-17 | Can AI with High Reasoning Ability Replicate Human-like Decision Making in Economic Experiments? | Ayato Kitadai et.al. | 2406.11426v1 | null |
2024-06-17 | P-TA: Using Proximal Policy Optimization to Enhance Tabular Data Augmentation via Large Language Models | Shuo Yang et.al. | 2406.11391v1 | null |
2024-06-17 | A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences | Leonardo Bertolazzi et.al. | 2406.11341v1 | null |
2024-06-17 | ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding | Tianren Ma et.al. | 2406.11327v1 | null |
2024-06-17 | Enhancing Biomedical Knowledge Retrieval-Augmented Generation with Self-Rewarding Tree Search and Proximal Policy Optimization | Minda Hu et.al. | 2406.11258v1 | null |
2024-06-18 | AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval | Shirley Wu et.al. | 2406.11200v2 | link |
2024-06-17 | Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning | Zebang Cheng et.al. | 2406.11161v1 | link |
2024-06-21 | Contextual Knowledge Graph | Chengjin Xu et.al. | 2406.11160v2 | null |
2024-06-19 | Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG | Xueying Du et.al. | 2406.11147v2 | null |
2024-06-17 | RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents | Weizhe Chen et.al. | 2406.11132v1 | null |
2024-06-17 | Exploring Safety-Utility Trade-Offs in Personalized Language Models | Anvesh Rao Vijjini et.al. | 2406.11107v1 | null |
2024-06-16 | A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners | Bowen Jiang et.al. | 2406.11050v1 | null |
2024-06-16 | RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models | Yuqing Wang et.al. | 2406.11020v1 | null |
2024-06-18 | Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game | Prisha Samadarshi et.al. | 2406.11012v2 | link |
2024-06-16 | Not All Bias is Bad: Balancing Rational Deviations and Cognitive Biases in Large Language Model Reasoning | Liman Wang et.al. | 2406.10999v1 | null |
2024-06-18 | City-LEO: Toward Transparent City Management Using LLM with End-to-End Optimization | Zihao Jiao et.al. | 2406.10958v2 | null |
2024-06-16 | E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models | Zhenyu Zhang et.al. | 2406.10950v1 | null |
2024-06-16 | Effective Generative AI: The Human-Algorithm Centaur | Soroush Saghafian et.al. | 2406.10942v1 | null |
2024-06-16 | Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies | Hung-Ting Su et.al. | 2406.10923v1 | null |
2024-06-16 | RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models | Zhuoran Jin et.al. | 2406.10890v1 | link |
2024-06-16 | Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactions | Yiming Tang et.al. | 2406.10878v1 | null |
2024-06-16 | Step-level Value Preference Optimization for Mathematical Reasoning | Guoxin Chen et.al. | 2406.10858v1 | null |
2024-06-16 | Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning | Joykirat Singh et.al. | 2406.10834v1 | null |
2024-06-16 | Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses | Zhiwen Fan et.al. | 2406.10789v1 | null |
2024-06-15 | FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models | Zhikai Zhang et.al. | 2406.10740v1 | null |
2024-06-15 | Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions | Yexin Liu et.al. | 2406.10638v1 | link |
2024-06-15 | On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models | Sree Harsha Tanneru et.al. | 2406.10625v1 | null |
2024-06-15 | Reactor Mk.1 performances: MMLU, HumanEval and BBH test results | TJ Dunham et.al. | 2406.10515v1 | null |
2024-06-14 | What is the Visual Cognition Gap between Humans and Multimodal LLMs? | Xu Cao et.al. | 2406.10424v1 | link |
2024-06-14 | Self-Reflection Outcome is Sensitive to Prompt Construction | Fengyuan Liu et.al. | 2406.10400v1 | link |
2024-06-18 | Efficient Prompting for LLM-based Generative Internet of Things | Bin Xiao et.al. | 2406.10382v2 | null |
2024-06-14 | Unlock the Correlation between Supervised Fine-Tuning and Reinforcement Learning in Training Code Large Language Models | Jie Chen et.al. | 2406.10305v1 | null |
2024-06-12 | Mimicking User Data: On Mitigating Fine-Tuning Risks in Closed Large Language Models | Francisco Eiras et.al. | 2406.10288v1 | null |
2024-06-11 | FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination | Pengfei Zhou et.al. | 2406.10261v1 | null |
2024-06-10 | The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs | Mert Yazan et.al. | 2406.10251v1 | null |
2024-06-14 | BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack | Yuri Kuratov et.al. | 2406.10149v1 | null |
2024-06-14 | Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning | Jiaqi Li et.al. | 2406.10099v1 | null |
2024-06-18 | First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models | Enming Zhang et.al. | 2406.10057v2 | link |
2024-06-14 | Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models | Manas Jhalani et.al. | 2406.09994v1 | null |
2024-06-14 | A Better LLM Evaluator for Text Generation: The Impact of Prompt Output Sequencing and Optimization | KuanChao Chu et.al. | 2406.09972v1 | null |
2024-06-14 | Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam | Nabor C. Mendonça et.al. | 2406.09671v1 | link |
2024-06-13 | ImageNet3D: Towards General-Purpose Object-Level 3D Understanding | Wufei Ma et.al. | 2406.09613v1 | link |
2024-06-12 | Pandora: Towards General World Model with Natural Language Actions and Video States | Jiannan Xiang et.al. | 2406.09455v1 | null |
2024-06-13 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Muhammad Maaz et.al. | 2406.09418v1 | link |
2024-06-13 | Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms | Miaosen Zhang et.al. | 2406.09397v1 | null |
2024-06-13 | GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning | Zhen Xiang et.al. | 2406.09187v1 | null |
2024-06-13 | ReMI: A Dataset for Reasoning with Multiple Images | Mehran Kazemi et.al. | 2406.09175v1 | null |
2024-06-13 | Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning | Bahare Fatemi et.al. | 2406.09170v1 | null |
2024-06-13 | Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs | Xuan Zhang et.al. | 2406.09136v1 | link |
2024-06-13 | MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era | Jiahao Nie et.al. | 2406.09121v1 | link |
2024-06-13 | Chain-of-Though (CoT) prompting strategies for medical error detection and correction | Zhaolong Wu et.al. | 2406.09103v1 | null |
2024-06-13 | SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models | Kehua Feng et.al. | 2406.09098v1 | link |
2024-06-13 | Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? | Zhaochen Su et.al. | 2406.09072v1 | link |
2024-06-13 | MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning | Hanqing Wang et.al. | 2406.09044v1 | null |
2024-06-14 | Language Models are Crossword Solvers | Soumadeep Saha et.al. | 2406.09043v2 | null |
2024-06-13 | ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models | Jing Liu et.al. | 2406.09041v1 | null |
2024-06-13 | Cognitively Inspired Energy-Based World Models | Alexi Gladstone et.al. | 2406.08862v1 | null |
2024-06-13 | LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions | Rumaisa Azeem et.al. | 2406.08824v1 | null |
2024-06-13 | Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models | Minghao Wu et.al. | 2406.08811v1 | null |
2024-06-13 | A Survey on Compositional Learning of AI Models: Theoretical and Experimetnal Practices | Sania Sinha et.al. | 2406.08787v1 | null |
2024-06-12 | Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs | Chen Zheng et.al. | 2406.08657v1 | null |
2024-06-12 | LLM-Craft: Robotic Crafting of Elasto-Plastic Objects with Large Language Models | Alison Bartsch et.al. | 2406.08648v1 | null |
2024-06-12 | CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery | Xiaoshuai Song et.al. | 2406.08587v1 | link |
2024-06-12 | Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning | Jaehyun Nam et.al. | 2406.08527v1 | null |
2024-06-12 | Research Trends for the Interplay between Large Language Models and Knowledge Graphs | Hanieh Khorashadizadeh et.al. | 2406.08223v1 | null |
2024-06-12 | ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs | Irene Huang et.al. | 2406.08164v1 | link |
2024-06-16 | Making Task-Oriented Dialogue Datasets More Natural by Synthetically Generating Indirect User Requests | Amogh Mannekote et.al. | 2406.07794v2 | null |
2024-06-11 | Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions | Leonardo Cotta et.al. | 2406.07685v1 | null |
2024-06-11 | QuickLLaMA: Query-aware Inference Acceleration for Large Language Models | Jingyao Li et.al. | 2406.07528v1 | link |
2024-06-11 | TextGrad: Automatic "Differentiation" via Text | Mert Yuksekgonul et.al. | 2406.07496v1 | link |
2024-06-17 | VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | Zesen Cheng et.al. | 2406.07476v2 | link |
2024-06-11 | On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations | Shiao Meng et.al. | 2406.07444v1 | link |
2024-06-13 | Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B | Di Zhang et.al. | 2406.07394v2 | link |
2024-06-11 | Limited Out-of-Context Knowledge Reasoning in Large Language Models | Peng Hu et.al. | 2406.07393v1 | null |
2024-06-11 | Large Language Models for Constrained-Based Causal Discovery | Kai-Hendrik Cohrs et.al. | 2406.07378v1 | link |
2024-06-11 | Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities | Delfina Sol Martinez Pandiani et.al. | 2406.07353v1 | null |
2024-06-11 | Instruct Large Language Models to Drive like Humans | Ruijun Zhang et.al. | 2406.07296v1 | link |
2024-06-11 | Needle In A Multimodal Haystack | Weiyun Wang et.al. | 2406.07230v1 | link |
2024-06-11 | Scaling Large-Language-Model-based Multi-Agent Collaboration | Chen Qian et.al. | 2406.07155v1 | link |
2024-06-11 | Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees | Sijia Chen et.al. | 2406.07115v1 | null |
2024-06-17 | Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph | Sergey Linok et.al. | 2406.07113v2 | null |
2024-06-11 | DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs | Haishuo Fang et.al. | 2406.07080v1 | link |
2024-06-11 | CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only | Junhee Cho et.al. | 2406.06947v1 | link |
2024-06-15 | What's in an embedding? Would a rose by any embedding smell as sweet? | Venkat Venkatasubramanian et.al. | 2406.06870v3 | null |
2024-06-11 | Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems | Mohammed Elhenawy et.al. | 2406.06865v1 | null |
2024-06-11 | Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent Cybersecurity | Tam n. Nguyen et.al. | 2406.06863v1 | link |
2024-06-07 | GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents | Anthony Costarelli et.al. | 2406.06613v1 | link |
2024-06-06 | Reinterpreting 'the Company a Word Keeps': Towards Explainable and Ontologically Grounded Language Models | Walid S. Saba et.al. | 2406.06610v1 | null |
2024-06-05 | Improve Mathematical Reasoning in Language Models by Automated Process Supervision | Liangchen Luo et.al. | 2406.06592v1 | null |
2024-06-05 | Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models | Flavio Petruzzellis et.al. | 2406.06588v1 | null |
2024-06-05 | Bi-Chainer: Automated Large Language Models Reasoning with Bidirectional Chaining | Shuqi Liu et.al. | 2406.06586v1 | null |
2024-06-04 | Break the Chain: Large Language Models Can be Shortcut Reasoners | Mengru Ding et.al. | 2406.06580v1 | null |
2024-06-04 | From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models | Xiaofeng Zhang et.al. | 2406.06579v1 | null |
2024-06-10 | Towards a Personal Health Large Language Model | Justin Cosentino et.al. | 2406.06474v1 | null |
2024-06-11 | Transforming Wearable Data into Health Insights using Large Language Model Agents | Mike A. Merrill et.al. | 2406.06464v2 | null |
2024-06-15 | Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies | Junlin Wang et.al. | 2406.06461v3 | null |
2024-06-15 | Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching | Xiaoying Zhang et.al. | 2406.06326v3 | null |
2024-06-11 | LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages | Andrew M. Bean et.al. | 2406.06196v2 | link |
2024-06-10 | Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation | Aadharsh Aadhithya A et.al. | 2406.06124v1 | null |
2024-06-10 | Prompting Large Language Models with Audio for General-Purpose Speech Summarization | Wonjune Kang et.al. | 2406.05968v1 | link |
2024-06-10 | CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark | David Romero et.al. | 2406.05967v1 | null |
2024-06-10 | Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models | Xi Li et.al. | 2406.05948v1 | null |
2024-06-09 | Hello Again! LLM-powered Personalized Agent for Long-term Dialogue | Hao Li et.al. | 2406.05925v1 | link |
2024-06-09 | Why Don't Prompt-Based Fairness Metrics Correlate? | Abdelrahman Zayed et.al. | 2406.05918v1 | null |
2024-06-09 | LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning | Utsav Singh et.al. | 2406.05881v1 | null |
2024-06-09 | A Survey on LLM-Based Agentic Workflows and LLM-Profiled Components | Xinzhe Li et.al. | 2406.05804v1 | null |
2024-06-09 | Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking | Fangxu Yu et.al. | 2406.05673v1 | link |
2024-06-09 | Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses | Maryam Amirizaniani et.al. | 2406.05659v1 | null |
2024-06-08 | Verbalized Probabilistic Graphical Modeling with Large Language Models | Hengguan Huang et.al. | 2406.05516v1 | null |
2024-06-08 | Towards a Benchmark for Causal Business Process Reasoning with LLMs | Fabiana Fournier et.al. | 2406.05506v1 | null |
2024-06-08 | Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation | Neeraj Varshney et.al. | 2406.05494v1 | null |
2024-06-08 | Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from Imperfect Teacher Models in Low-Budget Scenarios | Yuhang Zhou et.al. | 2406.05322v1 | null |
2024-06-07 | LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs | Arash Gholami Davoodi et.al. | 2406.05194v1 | link |
2024-06-07 | Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions | Shi-Yu Tian et.al. | 2406.05055v1 | null |
2024-06-07 | Quantifying Geospatial in the Common Crawl Corpus | Ilya Ilyankou et.al. | 2406.04952v1 | null |
2024-06-07 | Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models | Michał Romaszewski et.al. | 2406.04926v1 | null |
2024-06-07 | ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering | Raphael Gruber et.al. | 2406.04866v1 | link |
2024-06-07 | Experiences from Integrating Large Language Model Chatbots into the Classroom | Arto Hellas et.al. | 2406.04817v1 | null |
2024-06-07 | Zero, Finite, and Infinite Belief History of Theory of Mind Reasoning in Large Language Models | Weizhi Tang et.al. | 2406.04800v1 | null |
2024-06-07 | Think out Loud: Emotion Deducing Explanation in Dialogues | Jiangnan Li et.al. | 2406.04758v1 | null |
2024-06-07 | LogiCode: an LLM-Driven Framework for Logical Anomaly Detection | Yiheng Zhang et.al. | 2406.04687v1 | link |
2024-06-07 | LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model | Dongkai Wang et.al. | 2406.04659v1 | link |
2024-06-07 | LinkGPT: Teaching Large Language Models To Predict Missing Links | Zhongmou He et.al. | 2406.04640v1 | null |
2024-06-07 | What do MLLMs hear? Examining reasoning with text and sound components in Multimodal Large Language Models | Enis Berk Çoban et.al. | 2406.04615v1 | null |
2024-06-07 | StackSight: Unveiling WebAssembly through Large Language Models and Neurosymbolic Chain-of-Thought Decompilation | Weike Fang et.al. | 2406.04568v1 | null |
2024-06-07 | SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models | Md Imbesat Hassan Rizvi et.al. | 2406.04566v1 | link |
2024-06-06 | FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models | Max Zhu et.al. | 2406.04501v1 | null |
2024-06-06 | Time Sensitive Knowledge Editing through Efficient Finetuning | Xiou Ge et.al. | 2406.04496v1 | null |
2024-06-06 | On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing | Alexander Kovrigin et.al. | 2406.04464v1 | link |
2024-06-06 | MAIRA-2: Grounded Radiology Report Generation | Shruthi Bannur et.al. | 2406.04449v1 | null |
2024-06-06 | MoralBench: Moral Evaluation of LLMs | Jianchao Ji et.al. | 2406.04428v1 | link |
2024-06-06 | RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation | Jiaming Liu et.al. | 2406.04339v1 | null |
2024-06-06 | Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models | Phat Nguyen et.al. | 2406.04300v1 | null |
2024-06-06 | Generative AI-in-the-loop: Integrating LLMs and GPTs into the Next Generation Networks | Han Zhang et.al. | 2406.04276v1 | null |
2024-06-06 | Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models | Ling Yang et.al. | 2406.04271v1 | link |
2024-06-06 | DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning | Shangqing Tu et.al. | 2406.04197v1 | link |
2024-06-06 | ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints | Divij Handa et.al. | 2406.04046v1 | null |
2024-06-06 | Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt | Zonghao Ying et.al. | 2406.04031v1 | link |
2024-06-14 | POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models | Jianben He et.al. | 2406.03843v2 | null |
2024-06-06 | Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering | Yanming Liu et.al. | 2406.03807v1 | link |
2024-06-06 | Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective | Xinhao Yao et.al. | 2406.03768v1 | link |
2024-06-06 | VisLTR: Visualization-in-the-Loop Table Reasoning | Jianing Hao et.al. | 2406.03753v1 | null |
2024-06-06 | A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions | Lei Liu et.al. | 2406.03712v1 | null |
2024-06-06 | Evaluating the World Model Implicit in a Generative Model | Keyon Vafa et.al. | 2406.03689v1 | link |
2024-06-05 | TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools | Avi Caciularu et.al. | 2406.03618v1 | null |
2024-06-05 | AD-H: Autonomous Driving with Hierarchical Agents | Zaibin Zhang et.al. | 2406.03474v1 | null |
2024-06-05 | Pre-trained Large Language Models Use Fourier Features to Compute Addition | Tianyi Zhou et.al. | 2406.03445v1 | null |
2024-06-05 | IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models | David Ifeoluwa Adelani et.al. | 2406.03368v1 | null |
2024-06-05 | CLMASP: Coupling Large Language Models with Answer Set Programming for Robotic Task Planning | Xinrui Lin et.al. | 2406.03367v1 | null |
2024-06-06 | Large Language Models as Evaluators for Recommendation Explanations | Xiaoyu Zhang et.al. | 2406.03248v2 | link |
2024-06-05 | Missci: Reconstructing Fallacies in Misrepresented Science | Max Glockner et.al. | 2406.03181v1 | link |
2024-06-05 | Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation | Tingjia Shen et.al. | 2406.03085v1 | null |
2024-06-05 | How Truncating Weights Improves Reasoning in Language Models | Lei Chen et.al. | 2406.03068v1 | null |
2024-06-05 | Verified Code Transpilation with LLMs | Sahil Bhatia et.al. | 2406.03003v1 | null |
2024-06-05 | NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models | Ancheng Xu et.al. | 2406.02864v1 | link |
2024-06-05 | LLM as a Scorer: The Impact of Output Order on Dialogue Evaluation | Yi-Pei Chen et.al. | 2406.02863v1 | null |
2024-06-05 | Item-Language Model for Conversational Recommendation | Li Yang et.al. | 2406.02844v1 | null |
2024-06-04 | Chain of Agents: Large Language Models Collaborating on Long-Context Tasks | Yusen Zhang et.al. | 2406.02818v1 | null |
2024-06-04 | François Roewer-Després et.al. | 2406.02804v1 | link | |
2024-06-04 | Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities | Wenyue Hua et.al. | 2406.02787v1 | null |
2024-06-04 | Adaptive Preference Scaling for Reinforcement Learning with Human Feedback | Ilgee Hong et.al. | 2406.02764v1 | null |
2024-06-09 | RATT: A Thought Structure for Coherent and Correct LLM Reasoning | Jinghan Zhang et.al. | 2406.02746v2 | null |
2024-06-04 | Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller | Min Cai et.al. | 2406.02721v1 | link |
2024-06-04 | Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data | Maxime Griot et.al. | 2406.02394v1 | link |
2024-06-04 | Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks | Andrew Gambardella et.al. | 2406.02356v1 | null |
2024-06-04 | mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models | Huiyuan Lai et.al. | 2406.02301v1 | link |
2024-06-04 | Iteration Head: A Mechanistic Study of Chain-of-Thought | Vivien Cabannes et.al. | 2406.02128v1 | null |
2024-06-04 | MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset | Weiqi Wang et.al. | 2406.02106v1 | link |
2024-06-04 | Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data | Haolong Li et.al. | 2406.02100v1 | null |
2024-06-05 | Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models | Marianna Nezhurina et.al. | 2406.02061v2 | link |
2024-06-05 | Multimodal Reasoning with Multimodal Knowledge Graph | Junlin Lee et.al. | 2406.02030v2 | null |
2024-06-04 | Why Would You Suggest That? Human Trust in Language Model Responses | Manasi Sharma et.al. | 2406.02018v1 | null |
2024-06-04 | Process-Driven Autoformalization in Lean 4 | Jianqiao Lu et.al. | 2406.01940v1 | link |
2024-06-04 | PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning | Yupeng Zheng et.al. | 2406.01587v2 | null |
2024-06-03 | LoFiT: Localized Fine-tuning on LLM Representations | Fangcong Yin et.al. | 2406.01563v1 | link |
2024-06-03 | FactGenius: Combining Zero-Shot Prompting and Fuzzy Relation Mining to Improve Fact Verification with Knowledge Graphs | Sushant Gautam et.al. | 2406.01311v1 | null |
2024-06-03 | EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs | Zixuan Dong et.al. | 2406.01238v1 | null |
2024-06-03 | Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph | Guangyi Liu et.al. | 2406.01145v1 | null |
2024-06-03 | SemCoder: Training Code Language Models with Comprehensive Semantics | Yangruibo Ding et.al. | 2406.01006v1 | null |
2024-06-04 | Efficient Behavior Tree Planning with Commonsense Pruning and Heuristic | Xinglin Chen et.al. | 2406.00965v2 | null |
2024-06-04 | MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning | Shuyue Stella Li et.al. | 2406.00922v2 | link |
2024-06-02 | Pretrained Hybrids with MAD Skills | Nicholas Roberts et.al. | 2406.00894v1 | null |
2024-06-02 | OLIVE: Object Level In-Context Visual Embeddings | Timothy Ossowski et.al. | 2406.00872v1 | link |
2024-06-02 | Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection | Chentao Cao et.al. | 2406.00806v1 | null |
2024-06-02 | Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction | Xiaoyuan Li et.al. | 2406.00755v1 | link |
2024-06-01 | Task Planning for Object Rearrangement in Multi-room Environments | Karan Mirakhor et.al. | 2406.00451v1 | null |
2024-06-01 | Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners | Zhi Zheng et.al. | 2406.00430v1 | null |
2024-06-01 | A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters | Long Hei Matthew Lam et.al. | 2406.00284v1 | link |
2024-06-01 | Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs | Mohammed Saidul Islam et.al. | 2406.00257v1 | null |
2024-06-05 | Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey | Bowen Jiang et.al. | 2406.00252v2 | link |
2024-05-31 | Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training | Maximillian Chen et.al. | 2406.00222v1 | null |
2024-05-31 | Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation | Bernd Bohnet et.al. | 2406.00179v1 | null |
2024-05-31 | QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation | Zhuo Chen et.al. | 2406.00132v1 | null |
2024-05-31 | Towards LLM-Powered Verilog RTL Assistant: Self-Verification and Self-Correction | Hanxian Huang et.al. | 2406.00115v1 | null |
2024-05-31 | Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training | Feiteng Fang et.al. | 2405.20978v1 | null |
2024-06-05 | SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | Tianyang Xu et.al. | 2405.20974v2 | link |
2024-06-03 | Large Language Models are Zero-Shot Next Location Predictors | Ciro Beneduce et.al. | 2405.20962v2 | link |
2024-05-31 | Preemptive Answer "Attacks" on Chain-of-Thought Reasoning | Rongwu Xu et.al. | 2405.20902v1 | null |
2024-05-31 | Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning | Cheng Tan et.al. | 2405.20834v1 | null |
2024-05-27 | Exploring Backdoor Attacks against Large Language Model-based Decision Making | Ruochen Jiao et.al. | 2405.20774v1 | null |
2024-05-31 | Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning | Atharva Gundawar et.al. | 2405.20625v1 | null |
2024-05-30 | Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning | Xinlu Zhang et.al. | 2405.20535v1 | null |
2024-05-30 | SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory | Dipkamal Bhusal et.al. | 2405.20441v1 | null |
2024-05-30 | MotionLLM: Understanding Human Behaviors from Human Motions and Videos | Ling-Hao Chen et.al. | 2405.20340v1 | null |
2024-05-30 | TAIA: Large Language Models are Out-of-Distribution Data Learners | Shuyang Jiang et.al. | 2405.20192v1 | link |
2024-05-30 | Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory | Hangyeol Kang et.al. | 2405.20189v1 | null |
2024-05-30 | Reasoning about concepts with LLMs: Inconsistencies abound | Rosario Uceda-Sosa et.al. | 2405.20163v1 | null |
2024-05-30 | GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning | Costas Mavromatis et.al. | 2405.20139v1 | link |
2024-05-30 | Improve Student's Reasoning Generalizability through Cascading Decomposed CoTs Distillation | Chengwei Dai et.al. | 2405.19842v1 | link |
2024-05-30 | VQA Training Sets are Self-play Environments for Generating Few-shot Pools | Tautvydas Misiunas et.al. | 2405.19773v1 | null |
2024-05-30 | Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation | Chengwei Dai et.al. | 2405.19737v1 | link |
2024-05-30 | Enhancing Large Vision Language Models with Self-Training on Image Comprehension | Yihe Deng et.al. | 2405.19716v1 | null |
2024-05-30 | AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization | Jiawei Chen et.al. | 2405.19668v1 | null |
2024-06-01 | Easy Problems That LLMs Get Wrong | Sean Williams et.al. | 2405.19616v2 | link |
2024-05-30 | The Accuracy of Domain Specific and Descriptive Analysis Generated by Large Language Models | Denish Omondi Otieno et.al. | 2405.19578v1 | null |
2024-05-29 | Quo Vadis ChatGPT? From Large Language Models to Large Knowledge Models | Venkat Venkatasubramanian et.al. | 2405.19561v1 | null |
2024-05-29 | MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | Zhenwen Liang et.al. | 2405.19444v1 | link |
2024-05-29 | X-VILA: Cross-Modality Alignment for Large Language Model | Hanrong Ye et.al. | 2405.19335v1 | null |
2024-06-02 | MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series | Ge Zhang et.al. | 2405.19327v3 | link |
2024-05-29 | Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models | Tianrun Chen et.al. | 2405.19326v1 | null |
2024-05-29 | Towards Next-Generation Urban Decision Support Systems through AI-Powered Generation of Scientific Ontology using Large Language Models -- A Case in Optimizing Intermodal Freight Transportation | Jose Tupayachi et.al. | 2405.19255v1 | null |
2024-05-29 | VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos | Ziyang Wang et.al. | 2405.19209v1 | link |
2024-05-29 | Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery | Sounak Lahiri et.al. | 2405.19164v1 | null |
2024-05-29 | PathReasoner: Modeling Reasoning Path with Equivalent Extension for Logical Question Answering | Fangzhi Xu et.al. | 2405.19109v1 | null |
2024-06-02 | Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design | Markus J. Buehler et.al. | 2405.19076v2 | link |
2024-05-29 | Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners | Jiachun Li et.al. | 2405.18915v1 | null |
2024-05-31 | LLMs achieve adult human performance on higher-order theory of mind tasks | Winnie Street et.al. | 2405.18870v2 | null |
2024-06-02 | Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts | S. Mostafa Mousavi et.al. | 2405.18732v2 | null |
2024-05-29 | Efficient Model-agnostic Alignment via Bayesian Persuasion | Fengshuo Bai et.al. | 2405.18718v1 | null |
2024-05-29 | Calibrating Reasoning in Language Models with Internal Consistency | Zhihui Xie et.al. | 2405.18711v1 | null |
2024-05-30 | Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning | Tiansheng Huang et.al. | 2405.18641v2 | link |
2024-05-28 | Don't Forget to Connect! Improving RAG with Graph-based Reranking | Jialin Dong et.al. | 2405.18414v1 | null |
2024-05-28 | OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning | Pengxiang Li et.al. | 2405.18380v1 | link |
2024-05-28 | LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models | Anthony Sarah et.al. | 2405.18377v1 | null |
2024-05-28 | Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning | Phakphum Artkaew et.al. | 2405.18375v1 | link |
2024-05-28 | PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework | Eshaan Agarwal et.al. | 2405.18369v1 | null |
2024-05-28 | Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? | Yifan Bai et.al. | 2405.18361v1 | null |
2024-05-28 | MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning | Somnath Kumar et.al. | 2405.18358v1 | null |
2024-05-28 | Faithful Logical Reasoning via Symbolic Chain-of-Thought | Jundong Xu et.al. | 2405.18357v1 | link |
2024-05-28 | Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning | Renzhi Wang et.al. | 2405.18292v1 | null |
2024-05-28 | A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models | Chengxing Xie et.al. | 2405.18208v1 | null |
2024-05-28 | LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins | Yuchen Xia et.al. | 2405.18092v1 | link |
2024-05-28 | Towards Dialogues for Joint Human-AI Reasoning and Value Alignment | Elfia Bezou-Vrakatseli et.al. | 2405.18073v1 | null |
2024-05-28 | TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | Jaewoo Ahn et.al. | 2405.18027v1 | null |
2024-05-28 | Knowledge Circuits in Pretrained Transformers | Yunzhi Yao et.al. | 2405.17969v1 | link |
2024-05-28 | Self-Guiding Exploration for Combinatorial Problems | Zangir Iklassov et.al. | 2405.17950v1 | link |
2024-05-28 | Arithmetic Reasoning with LLM: Prolog Generation & Permutation | Xiaocheng Yang et.al. | 2405.17893v1 | null |
2024-05-28 | Conv-CoA: Improving Open-domain Question Answering in Large Language Models via Conversational Chain-of-Action | Zhenyu Pan et.al. | 2405.17822v1 | null |
2024-05-28 | XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference | Shengnan Wang et.al. | 2405.17755v1 | null |
2024-05-28 | CLAIM Your Data: Enhancing Imputation Accuracy with Contextual Large Language Models | Ahatsham Hayat et.al. | 2405.17712v1 | null |
2024-05-27 | Video Enriched Retrieval Augmented Generation Using Aligned Video Captions | Kevin Dela Rosa et.al. | 2405.17706v1 | link |
2024-05-27 | BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments | Yusuf Roohani et.al. | 2405.17631v1 | link |
2024-05-30 | Code Repair with LLMs gives an Exploration-Exploitation Tradeoff | Hao Tang et.al. | 2405.17503v2 | null |
2024-05-27 | Matryoshka Multimodal Models | Mu Cai et.al. | 2405.17430v1 | null |
2024-05-27 | Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | Kuan-Chih Huang et.al. | 2405.17427v1 | link |
2024-05-27 | Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation | Jiaming Liu et.al. | 2405.17418v1 | null |
2024-05-27 | MindMerger: Efficient Boosting LLM Reasoning in non-English Languages | Zixian Huang et.al. | 2405.17386v1 | link |
2024-05-27 | Assessing LLMs Suitability for Knowledge Graph Completion | Vasile Ionut Remus Iga et.al. | 2405.17249v1 | link |
2024-05-27 | LLM-Assisted Static Analysis for Detecting Security Vulnerabilities | Ziyang Li et.al. | 2405.17238v1 | null |
2024-05-29 | Position: Foundation Agents as the Paradigm Shift for Decision Making | Xiaoqian Liu et.al. | 2405.17009v3 | link |
2024-05-28 | Entity Alignment with Noisy Annotations from Large Language Models | Shengyuan Chen et.al. | 2405.16806v2 | link |
2024-05-27 | TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing | Xinyu Zhang et.al. | 2405.16803v1 | null |
2024-05-29 | AutoCV: Empowering Reasoning with Automated Process Labeling via Confidence Variation | Jianqiao Lu et.al. | 2405.16802v3 | link |
2024-05-28 | Large Scale Knowledge Washing | Yu Wang et.al. | 2405.16720v2 | null |
2024-05-26 | RLSF: Reinforcement Learning via Symbolic Feedback | Piyush Jha et.al. | 2405.16661v1 | null |
2024-05-30 | Meta-Task Planning for Language Agents | Cong Zhang et.al. | 2405.16510v3 | null |
2024-05-26 | M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought | Qiguang Chen et.al. | 2405.16473v1 | link |
2024-05-26 | Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search | Max Liu et.al. | 2405.16450v1 | null |
2024-05-26 | Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models | Jiankun Wang et.al. | 2405.16413v1 | null |
2024-05-28 | SpinQuant: LLM quantization with learned rotations | Zechun Liu et.al. | 2405.16406v2 | null |
2024-05-28 | STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making | Chuanhao Li et.al. | 2405.16376v2 | link |
2024-06-03 | Picturing Ambiguity: A Visual Twist on the Winograd Schema Challenge | Brendan Park et.al. | 2405.16277v3 | link |
2024-05-25 | MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time | Jikun Kang et.al. | 2405.16265v1 | null |
2024-05-25 | Finetuning Large Language Model for Personalized Ranking | Zhuoxi Bai et.al. | 2405.16127v1 | null |
2024-05-25 | Keypoint-based Progressive Chain-of-Thought Distillation for LLMs | Kaituo Feng et.al. | 2405.16064v1 | null |
2024-05-25 | Streaming Long Video Understanding with Large Language Models | Rui Qian et.al. | 2405.16009v1 | null |
2024-05-30 | SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation | Kun Zhao et.al. | 2405.15924v3 | link |
2024-05-24 | HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis | Shraddha Barke et.al. | 2405.15880v1 | null |
2024-05-24 | Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications | Yang Li et.al. | 2405.15877v1 | null |
2024-05-24 | Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models | Yue Zhang et.al. | 2405.15684v1 | null |
2024-05-24 | M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models | Hongyu Wang et.al. | 2405.15638v1 | link |
2024-05-24 | Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges | Jonas Becker et.al. | 2405.15604v1 | link |
2024-05-24 | Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation | Ge Qu et.al. | 2405.15307v1 | link |
2024-05-24 | Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation | Zhiwei Wang et.al. | 2405.15302v1 | null |
2024-05-24 | Coaching Copilot: Blended Form of an LLM-Powered Chatbot and a Human Coach to Effectively Support Self-Reflection for Leadership Growth | Riku Arakawa et.al. | 2405.15250v1 | null |
2024-05-24 | A Solution-based LLM API-using Methodology for Academic Information Seeking | Yuanchun Wang et.al. | 2405.15165v1 | link |
2024-05-24 | From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks | Jacob Russin et.al. | 2405.15164v1 | null |
2024-05-24 | OptLLM: Optimal Assignment of Queries to Large Language Models | Yueyue Liu et.al. | 2405.15130v1 | link |
2024-05-24 | Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning | Yuyue Zhao et.al. | 2405.15114v1 | null |
2024-05-23 | Dissociation of Faithful and Unfaithful Reasoning in LLMs | Evelyn Yee et.al. | 2405.15092v1 | link |
2024-05-23 | OAC: Output-adaptive Calibration for Accurate Post-training Quantization | Ali Edalati et.al. | 2405.15025v1 | null |
2024-05-23 | Agentic Skill Discovery | Xufeng Zhao et.al. | 2405.15019v1 | null |
2024-05-23 | A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns | Asaf Yehudai et.al. | 2405.14863v1 | null |
2024-05-23 | Bitune: Bidirectional Instruction-Tuning | Dawid J. Kopiczko et.al. | 2405.14862v1 | null |
2024-05-23 | Efficient Medical Question Answering with Knowledge-Augmented Question Generation | Julien Khlaut et.al. | 2405.14654v1 | null |
2024-05-24 | Generating Exceptional Behavior Tests with Reasoning Augmented Large Language Models | Jiyang Zhang et.al. | 2405.14619v2 | null |
2024-05-26 | Explainable Few-shot Knowledge Tracing | Haoxuan Li et.al. | 2405.14391v2 | link |
2024-05-23 | Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks? | Thomas Greatrix et.al. | 2405.14379v1 | null |
2024-05-23 | JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models | Kun Zhou et.al. | 2405.14365v1 | null |
2024-05-23 | DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data | Huajian Xin et.al. | 2405.14333v1 | null |
2024-05-26 | Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration | Yang Zhang et.al. | 2405.14314v2 | null |
2024-05-23 | Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning | Jiapu Wang et.al. | 2405.14170v1 | null |
2024-05-23 | Towards Transferable Attacks Against Vision-LLMs in Autonomous Driving with Typography | Nhat Chung et.al. | 2405.14169v1 | null |
2024-05-23 | Large Language Models Can Self-Correct with Minimal Effort | Zhenyu Wu et.al. | 2405.14092v1 | null |
2024-05-23 | Chengkun Cai et.al. | 2405.14075v1 | null | |
2024-05-22 | On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models | Mudit Verma et.al. | 2405.13966v1 | null |
2024-05-22 | PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery | Runlong He et.al. | 2405.13949v1 | link |
2024-05-22 | FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering | Yuan Sui et.al. | 2405.13873v1 | null |
2024-05-29 | Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models | Qiji Zhou et.al. | 2405.13872v2 | null |
2024-05-22 | Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation | Cyril Chhun et.al. | 2405.13769v1 | link |
2024-05-22 | HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model | Mustafa Yildirim et.al. | 2405.13547v1 | null |
2024-05-22 | LIRE: listwise reward enhancement for preference alignment | Mingye Zhu et.al. | 2405.13516v1 | null |
2024-05-22 | Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning | Yuanhao Yue et.al. | 2405.13448v1 | null |
2024-05-22 | Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction | Tingchen Fu et.al. | 2405.13432v1 | null |
2024-05-21 | Investigating Symbolic Capabilities of Large Language Models | Neisarg Dave et.al. | 2405.13209v1 | null |
2024-05-21 | Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding | Rong Gao et.al. | 2405.13206v1 | null |
2024-05-20 | Can Github issues be solved with Tree Of Thoughts? | Ricardo La Rosa et.al. | 2405.13057v1 | link |
2024-05-17 | Surgical Feature-Space Decomposition of LLMs: Why, When and How? | Arnav Chavan et.al. | 2405.13039v1 | null |
2024-05-16 | Can formal argumentative reasoning enhance LLMs performances? | Federico Castagna et.al. | 2405.13036v1 | null |
2024-05-15 | IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues | Diji Yang et.al. | 2405.13021v1 | null |
2024-05-14 | QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models | Wei Wang et.al. | 2405.13014v1 | null |
2024-05-12 | MathDivide: Improved mathematical reasoning by large language models | Saksham Sahai Srivastava et.al. | 2405.13004v1 | null |
2024-05-21 | Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models | Zhangyue Yin et.al. | 2405.12939v1 | link |
2024-05-21 | Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs | Bilgehan Sel et.al. | 2405.12933v1 | null |
2024-05-21 | DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge | Bufang Yang et.al. | 2405.12541v1 | null |
2024-05-21 | LLM+Reasoning+Planning for supporting incomplete user queries in presence of APIs | Sudhir Agarwal et.al. | 2405.12433v1 | null |
2024-05-20 | Eliciting Problem Specifications via Large Language Models | Robert E. Wray et.al. | 2405.12147v1 | null |
2024-05-20 | MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | Ting Jiang et.al. | 2405.12130v1 | link |
2024-05-20 | DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction | Hao Chen et.al. | 2405.12100v1 | null |
2024-05-20 | KG-RAG: Bridging the Gap Between Knowledge and Creativity | Diego Sanmartin et.al. | 2405.12035v1 | null |
2024-05-20 | Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs | Siyu Lou et.al. | 2405.11880v1 | null |
2024-05-20 | Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities | Junqi Wang et.al. | 2405.11841v1 | link |
2024-05-19 | Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning | Zishan Gu et.al. | 2405.11640v1 | null |
2024-05-19 | MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation | Jianbo Dai et.al. | 2405.11430v1 | link |
2024-05-17 | Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations | José Luiz Nunes et.al. | 2405.11100v1 | null |
2024-05-17 | From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT | Jace Grandinetti et.al. | 2405.11040v1 | null |
2024-05-17 | Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities | Hao Zhou et.al. | 2405.10825v1 | null |
2024-05-17 | Efficient Multimodal Large Language Models: A Survey | Yizhang Jin et.al. | 2405.10739v1 | link |
2024-05-17 | MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains | Zhaohuan Zhan et.al. | 2405.10620v1 | null |
2024-05-17 | RDRec: Rationale Distillation for LLM-based Recommendation | Xinfeng Wang et.al. | 2405.10587v1 | link |
2024-05-17 | Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset | Jie Zhu et.al. | 2405.10542v1 | link |
2024-05-16 | Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease Identification | Jinge Wu et.al. | 2405.10440v1 | null |
2024-05-16 | When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | Xianzheng Ma et.al. | 2405.10255v1 | null |
2024-05-16 | A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks | Xuanfan Ni et.al. | 2405.10251v1 | null |
2024-05-16 | LFED: A Literary Fiction Evaluation Dataset for Large Language Models | Linhao Yu et.al. | 2405.10166v1 | link |
2024-05-16 | SEEK: Semantic Reasoning for Object Goal Navigation in Real World Inspection Tasks | Muhammad Fadhil Ginting et.al. | 2405.09822v1 | null |
2024-05-16 | LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery | Pingchuan Ma et.al. | 2405.09783v1 | null |
2024-05-15 | Matching domain experts by training from scratch on domain knowledge | Xiaoliang Luo et.al. | 2405.09395v1 | null |
2024-05-15 | Exploring the Potential of Large Language Models for Automation in Technical Customer Service | Jochen Wulf et.al. | 2405.09161v1 | null |
2024-05-14 | A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine | Hanguang Xiao et.al. | 2405.08603v1 | null |
2024-05-14 | Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure | Odysseas S. Chlapanis et.al. | 2405.08502v1 | link |
2024-05-14 | PromptMind Team at MEDIQA-CORR 2024: Improving Clinical Text Correction with Error Categorization and LLM Ensembles | Satya Kesav Gundabathula et.al. | 2405.08373v1 | null |
2024-05-13 | LLM Theory of Mind and Alignment: Opportunities and Risks | Winnie Street et.al. | 2405.08154v1 | null |
2024-05-13 | EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning | Yinzhu Quan et.al. | 2405.07938v1 | null |
2024-05-13 | Generating Human Motion in 3D Scenes from Text Descriptions | Zhi Cen et.al. | 2405.07784v1 | null |
2024-05-13 | Backdoor Removal for Generative Large Language Models | Haoran Li et.al. | 2405.07667v1 | null |
2024-05-13 | MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning | Shuo Yin et.al. | 2405.07551v1 | null |
2024-05-13 | Oedipus: LLM-enchanced Reasoning CAPTCHA Solver | Gelei Deng et.al. | 2405.07496v1 | null |
2024-05-14 | MedConceptsQA: Open Source Medical Concepts QA Benchmark | Ofir Ben Shoham et.al. | 2405.07348v2 | link |
2024-05-12 | Learnable Tokenizer for LLM-based Generative Recommendation | Wenjie Wang et.al. | 2405.07314v1 | null |
2024-05-12 | MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning Tasks | Xiaocui Yang et.al. | 2405.07229v1 | link |
2024-05-11 | Automating Thematic Analysis: How LLMs Analyse Controversial Topics | Awais Hameed Khan et.al. | 2405.06919v1 | null |
2024-05-09 | Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models | Yitian Li et.al. | 2405.06707v1 | null |
2024-05-09 | LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought | Zhuoxuan Jiang et.al. | 2405.06705v1 | null |
2024-05-07 | SUTRA: Scalable Multilingual Language Model Architecture | Abhijit Bendale et.al. | 2405.06694v1 | null |
2024-05-07 | Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle Filtering | Akhil Arora et.al. | 2405.06691v1 | null |
2024-05-05 | Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning | Jun Zhao et.al. | 2405.06680v1 | null |
2024-05-10 | Program Synthesis using Inductive Logic Programming for the Abstraction and Reasoning Corpus | Filipe Marinho Rocha et.al. | 2405.06399v1 | null |
2024-05-09 | LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models | Ruihao Gong et.al. | 2405.06001v1 | link |
2024-05-09 | OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning | Dan Qiao et.al. | 2405.05957v1 | link |
2024-05-09 | Probing Multimodal LLMs as World Models for Driving | Shiva Sreeram et.al. | 2405.05956v1 | link |
2024-05-09 | Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes | Ziang Guo et.al. | 2405.05885v1 | null |
2024-05-09 | Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning | Artem Lykov et.al. | 2405.05824v1 | link |
2024-05-09 | Redefining Information Retrieval of Structured Database via Large Language Models | Mingzhu Wang et.al. | 2405.05508v1 | null |
2024-05-08 | SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants | Masoud Moghani et.al. | 2405.05226v1 | null |
2024-05-08 | MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning | Inderjeet Nair et.al. | 2405.05189v1 | null |
2024-05-08 | QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs | Weijia Zhang et.al. | 2405.05109v1 | null |
2024-05-08 | Federated Adaptation for Foundation Model-based Recommendations | Chunxu Zhang et.al. | 2405.04840v1 | link |
2024-05-08 | ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation | Ana Brassard et.al. | 2405.04818v1 | link |
2024-05-08 | Chain of Thoughtlessness: An Analysis of CoT in Planning | Kaya Stechly et.al. | 2405.04776v1 | null |
2024-05-08 | BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models | Chu Fei Luo et.al. | 2405.04756v1 | link |
2024-05-07 | Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking | Emre Can Acikgoz et.al. | 2405.04685v1 | null |
2024-05-07 | Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics | Hanlin Zhu et.al. | 2405.04669v1 | null |
2024-05-07 | ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning | Jing Lin et.al. | 2405.04533v1 | null |
2024-05-08 | Unveiling Disparities in Web Task Handling Between Human and Web Agent | Kihoon Son et.al. | 2405.04497v2 | null |
2024-05-07 | Large Language Models Cannot Explain Themselves | Advait Sarkar et.al. | 2405.04382v1 | null |
2024-05-07 | NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions | Elliot Gestrin et.al. | 2405.04215v1 | null |
2024-05-07 | D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities of Large Language Models | Duygu Altinok et.al. | 2405.04170v1 | link |
2024-05-07 | Optimizing Language Model's Reasoning Abilities with Weak Supervision | Yongqi Tong et.al. | 2405.04086v1 | null |
2024-05-14 | Generating Probabilistic Scenario Programs from Natural Language | Karim Elmaaroufi et.al. | 2405.03709v2 | null |
2024-05-08 | How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs | Muhammad Uzair Khattak et.al. | 2405.03690v2 | null |
2024-05-06 | Language-Image Models with 3D Understanding | Jang Hyun Cho et.al. | 2405.03685v1 | null |
2024-05-06 | Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment | Abhinav Agarwalla et.al. | 2405.03594v1 | null |
2024-05-23 | AlphaMath Almost Zero: process Supervision without process | Guoxin Chen et.al. | 2405.03553v2 | link |
2024-05-15 | MAmmoTH2: Scaling Instructions from the Web | Xiang Yue et.al. | 2405.03548v3 | null |
2024-05-06 | Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning | Yubo Mai et.al. | 2405.03509v1 | null |
2024-05-06 | Explainable Fake News Detection With Large Language Model via Defense Among Competing Wisdom | Bo Wang et.al. | 2405.03371v1 | link |
2024-05-06 | MedDoc-Bot: A Chat Tool for Comparative Analysis of Large Language Models in the Context of the Pediatric Hypertension Guideline | Mohamed Yaseen Jabarulla et.al. | 2405.03359v1 | link |
2024-05-06 | WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning | Yuanhan Zhang et.al. | 2405.03272v1 | null |
2024-05-06 | CRAFT: Extracting and Tuning Cultural Instructions from the Wild | Bin Wang et.al. | 2405.03138v1 | link |
2024-05-05 | High Order Reasoning for Time Critical Recommendation in Evidence-based Medicine | Manjiang Yu et.al. | 2405.03010v1 | null |
2024-05-05 | MedAdapter: Efficient Test-Time Adaptation of Large Language Models towards Medical Reasoning | Wenqi Shi et.al. | 2405.03000v1 | null |
2024-05-05 | Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy | Aftab Hussain et.al. | 2405.02828v1 | null |
2024-05-04 | CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions | Hanchong Zhang et.al. | 2405.02712v1 | link |
2024-05-04 | A Literature Review and Framework for Human Evaluation of Generative Large Language Models in Healthcare | Thomas Yu Chow Tam et.al. | 2405.02559v1 | null |
2024-05-20 | GigSense: An LLM-Infused Tool forWorkers' Collective Intelligence | Kashif Imteyaz et.al. | 2405.02528v2 | null |
2024-05-09 | REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs | Deepa Tilwani et.al. | 2405.02228v2 | null |
2024-05-03 | Argumentative Large Language Models for Explainable and Contestable Decision-Making | Gabriel Freedman et.al. | 2405.02079v1 | null |
2024-05-03 | Exploring Combinatorial Problem Solving with Large Language Models: A Case Study on the Travelling Salesman Problem Using GPT-3.5 Turbo | Mahmoud Masoud et.al. | 2405.01997v1 | null |
2024-05-03 | Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems | Chuang Li et.al. | 2405.01868v1 | null |
2024-05-02 | ALCM: Autonomous LLM-Augmented Causal Discovery Framework | Elahe Khatibi et.al. | 2405.01744v1 | null |
2024-05-08 | Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning | Tianle Xia et.al. | 2405.01649v3 | null |
2024-04-30 | Large Language Model Agent for Fake News Detection | Xinyi Li et.al. | 2405.01593v1 | null |
2024-04-28 | Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications | Sujit Khanna et.al. | 2405.01585v1 | null |
2024-05-02 | OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning | Shihao Wang et.al. | 2405.01533v1 | link |
2024-05-02 | Analyzing the Role of Semantic Representations in the Era of Large Language Models | Zhijing Jin et.al. | 2405.01502v1 | link |
2024-05-08 | Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving | Xin Quan et.al. | 2405.01379v2 | null |
2024-05-02 | GAIA: A General AI Assistant for Intelligent Accelerator Operations | Frank Mayet et.al. | 2405.01359v1 | null |
2024-05-02 | The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights | Wenhao Zhu et.al. | 2405.01345v1 | link |
2024-05-02 | Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation | David Eric Austin et.al. | 2405.00981v1 | null |
2024-05-02 | CACTUS: Chemistry Agent Connecting Tool-Usage to Science | Andrew D. McNaughton et.al. | 2405.00972v1 | link |
2024-04-25 | Can't say cant? Measuring and Reasoning of Dark Jargons in Large Language Models | Xu Ji et.al. | 2405.00718v1 | null |
2024-04-25 | Large Language Models in Healthcare: A Comprehensive Benchmark | Andrew Liu et.al. | 2405.00716v1 | null |
2024-05-01 | HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models | Ningke Li et.al. | 2405.00648v1 | null |
2024-05-01 | Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning | Yuxi Xie et.al. | 2405.00451v1 | null |
2024-05-01 | RAG-based Explainable Prediction of Road Users Behaviors for Automated Driving using Knowledge Graphs and Large Language Models | Mohamed Manzour Hussien et.al. | 2405.00449v1 | null |
2024-05-01 | Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models | Leonardo Ranaldi et.al. | 2405.00402v1 | null |
2024-05-01 | AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts | Zefang Liu et.al. | 2405.00361v1 | link |
2024-05-03 | Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model | Yu Cui et.al. | 2405.00338v2 | null |
2024-05-03 | A Careful Examination of Large Language Model Performance on Grade School Arithmetic | Hugh Zhang et.al. | 2405.00332v3 | null |
2024-05-01 | DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training | Bhuvanesh Verma et.al. | 2405.00321v1 | null |
2024-04-30 | General Purpose Verification for Chain of Thought Prompting | Robert Vacareanu et.al. | 2405.00204v1 | null |
2024-04-30 | Better & Faster Large Language Models via Multi-token Prediction | Fabian Gloeckle et.al. | 2404.19737v1 | null |
2024-04-30 | Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners | Chun Feng et.al. | 2404.19696v1 | null |
2024-04-30 | Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom | Shisen Yue et.al. | 2404.19509v1 | link |
2024-05-01 | Neuro-Vision to Language: Image Reconstruction and Language enabled Interaction via Brain Recordings | Guobin Shen et.al. | 2404.19438v2 | null |
2024-04-30 | Can Large Language Models put 2 and 2 together? Probing for Entailed Arithmetical Relationships | D. Panas et.al. | 2404.19432v1 | null |
2024-04-30 | Evaluating Telugu Proficiency in Large Language Models_ A Comparative Analysis of ChatGPT and Gemini | Katikela Sreeharsha Kishore et.al. | 2404.19369v1 | null |
2024-04-30 | Multi-hop Question Answering over Knowledge Graphs using Large Language Models | Abir Chakraborty et.al. | 2404.19234v1 | null |
2024-04-30 | Transcrib3D: 3D Referring Expression Resolution through Large Language Models | Jiading Fang et.al. | 2404.19221v1 | null |
2024-04-29 | SuperCLUE-Fin: Graded Fine-Grained Analysis of Chinese LLMs on Diverse Financial Tasks and Applications | Liang Xu et.al. | 2404.19063v1 | null |
2024-04-29 | Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models | Houjun Liu et.al. | 2404.19055v1 | null |
2024-04-29 | Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs | Bahar Radmehr et.al. | 2404.18978v1 | null |
2024-04-29 | Benchmarking Benchmark Leakage in Large Language Models | Ruijie Xu et.al. | 2404.18824v1 | link |
2024-04-29 | PECC: Problem Extraction and Coding Challenges | Patrick Haller et.al. | 2404.18766v1 | link |
2024-04-29 | Injecting Salesperson's Dialogue Strategies in Large Language Models with Chain-of-Thought Reasoning | Wen-Yu Chang et.al. | 2404.18564v1 | null |
2024-04-29 | Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language we Prompt them in | Utkarsh Agarwal et.al. | 2404.18460v1 | null |
2024-04-29 | FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models | Wei Li et.al. | 2404.18359v1 | null |
2024-04-30 | Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages | David Ifeoluwa Adelani et.al. | 2404.18286v2 | null |
2024-04-28 | Logic Agent: Enhancing Validity with Logic Rule Invocation | Hanmeng Liu et.al. | 2404.18130v1 | null |
2024-04-28 | Generative AI for Low-Carbon Artificial Intelligence of Things | Jinbo Wen et.al. | 2404.18077v1 | null |
2024-04-27 | CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments | Kaixuan Huang et.al. | 2404.18021v1 | null |
2024-04-27 | Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction | Guozheng Li et.al. | 2404.17809v1 | null |
2024-04-26 | CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving | Pei Chen et.al. | 2404.17729v1 | link |
2024-04-26 | PLAYER: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games* | Qinglin Zhu et.al. | 2404.17662v1 | link |
2024-05-09 | Large Language Model Agent as a Mechanical Designer | Yayati Jadhav et.al. | 2404.17525v2 | null |
2024-04-29 | On the Use of Large Language Models to Generate Capability Ontologies | Luis Miguel Vieira da Silva et.al. | 2404.17524v2 | null |
2024-04-26 | Enhancing Legal Compliance and Regulation Analysis with Large Language Models | Shabnam Hassani et.al. | 2404.17522v1 | null |
2024-04-26 | A Comprehensive Evaluation on Event Reasoning of Large Language Models | Zhengwei Tao et.al. | 2404.17513v1 | link |
2024-04-26 | Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System | Robin Schmucker et.al. | 2404.17460v1 | null |
2024-04-26 | Small Language Models Need Strong Verifiers to Self-Correct Reasoning | Yunxiang Zhang et.al. | 2404.17140v1 | null |
2024-04-26 | Make Your LLM Fully Utilize the Context | Shengnan An et.al. | 2404.16811v2 | link |
2024-04-25 | Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning | Tianhui Zhang et.al. | 2404.16807v1 | null |
2024-04-25 | RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis | Xiaoman Zhang et.al. | 2404.16754v1 | null |
2024-04-25 | Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents | Giorgio Piatti et.al. | 2404.16698v1 | null |
2024-04-25 | EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning | Hongxia Xie et.al. | 2404.16670v1 | link |
2024-04-25 | Evolutionary Large Language Models for Hardware Security: A Comparative Survey | Mohammad Akyash et.al. | 2404.16651v1 | null |
2024-04-25 | Evaluating Consistency and Reasoning Capabilities of Large Language Models | Yash Saxena et.al. | 2404.16478v1 | null |
2024-04-25 | List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs | An Yan et.al. | 2404.16375v1 | link |
2024-04-24 | The Feasibility of Implementing Large-Scale Transformers on Multi-FPGA Platforms | Yu Gao et.al. | 2404.16158v1 | null |
2024-04-24 | Cantor: Inspiring Multimodal Chain-of-Thought of MLLM | Timin Gao et.al. | 2404.16033v1 | null |
2024-04-24 | GeckOpt: LLM System Efficiency via Intent-Based Tool Selection | Michael Fore et.al. | 2404.15804v1 | null |
2024-04-24 | Leveraging Large Language Models for Multimodal Search | Oriol Barbany et.al. | 2404.15790v1 | null |
2024-04-24 | Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs | Yu Xia et.al. | 2404.15676v1 | null |
2024-04-24 | Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations? | Hossein Salami et.al. | 2404.15578v1 | null |
2024-04-23 | Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models | Mihir Parmar et.al. | 2404.15522v1 | link |
2024-04-25 | ToM-LM: Delegating Theory of Mind Reasoning to External Symbolic Executors in Large Language Models | Weizhi Tang et.al. | 2404.15515v2 | null |
2024-04-23 | Re-Thinking Inverse Graphics With Large Language Models | Peter Kulits et.al. | 2404.15228v1 | null |
2024-04-23 | Regressive Side Effects of Training Language Models to Mimic Student Misconceptions | Shashank Sonkar et.al. | 2404.15156v1 | null |
2024-04-23 | Rethinking LLM Memorization through the Lens of Adversarial Compression | Avi Schwarzschild et.al. | 2404.15146v1 | null |
2024-04-28 | Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Reasoners | Qihuang Zhong et.al. | 2404.14963v2 | null |
2024-04-23 | Graph Machine Learning in the Era of Large Language Models (LLMs) | Wenqi Fan et.al. | 2404.14928v1 | null |
2024-04-23 | Pattern-Aware Chain-of-Thought Prompting in Large Language Models | Yufeng Zhang et.al. | 2404.14812v1 | null |
2024-04-23 | A Survey of Large Language Models on Generative Graph Analytics: Query, Learning, and Applications | Wenbo Shang et.al. | 2404.14809v1 | null |
2024-04-23 | Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches | Clément Christophe et.al. | 2404.14779v1 | null |
2024-04-23 | CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning | Ling Yue et.al. | 2404.14777v1 | null |
2024-04-23 | Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks | Amir Saeidi et.al. | 2404.14723v1 | null |
2024-04-23 | Think-Program-reCtify: 3D Situated Reasoning with Large Language Models | Qingrong He et.al. | 2404.14705v1 | null |
2024-04-23 | NExT: Teaching Large Language Models to Reason about Code Execution | Ansong Ni et.al. | 2404.14662v1 | null |
2024-04-26 | Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training | Mengzhao Jia et.al. | 2404.14604v3 | null |
2024-04-22 | Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-hop Question Answering | Li Jiapeng et.al. | 2404.14464v1 | null |
2024-04-14 | Enhancing Fault Detection for Large Language Models via Mutation-Based Confidence Smoothing | Qiang Hu et.al. | 2404.14419v1 | null |
2024-04-22 | An Artificial Neuron for Enhanced Problem Solving in Large Language Models | Sumedh Rasal et.al. | 2404.14222v1 | null |
2024-04-22 | Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction | Zheye Deng et.al. | 2404.14215v1 | link |
2024-04-24 | Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion | Yingxuan Li et.al. | 2404.13993v2 | null |
2024-04-22 | Information Re-Organization Improves Reasoning in Large Language Models | Xiaoxia Cheng et.al. | 2404.13985v1 | null |
2024-04-22 | MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit | Boning Zhang et.al. | 2404.13925v1 | link |
2024-04-22 | Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models | Yukyung Lee et.al. | 2404.13919v1 | null |
2024-04-22 | EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning | Mingjie Ma et.al. | 2404.13847v1 | null |
2024-04-24 | MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning | Yifan Jiang et.al. | 2404.13591v2 | link |
2024-04-20 | Large Language Models as Test Case Generators: Performance Evaluation and Enhancement | Kefan Li et.al. | 2404.13340v1 | null |
2024-05-03 | LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models | Mouhamed Amine Bouchiha et.al. | 2404.13236v2 | link |
2024-04-19 | Beyond Self-Consistency: Ensemble Reasoning Boosts Consistency and Accuracy of LLMs in Cancer Staging | Chia-Hsuan Chang et.al. | 2404.13149v1 | null |
2024-04-17 | TREACLE: Thrifty Reasoning via Context-Aware LLM and Prompt Selection | Xuechen Zhang et.al. | 2404.13082v1 | null |
2024-04-14 | Evidence from counterfactual tasks supports emergent analogical reasoning in large language models | Taylor Webb et.al. | 2404.13070v1 | link |
2024-04-19 | Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs | Biyang Guo et.al. | 2404.13033v1 | link |
2024-04-24 | Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models | Yian Li et.al. | 2404.12966v2 | null |
2024-04-29 | Large Language Models for Networking: Workflow, Advances and Challenges | Chang Liu et.al. | 2404.12901v2 | null |
2024-04-19 | Towards Logically Consistent Language Models via Probabilistic Reasoning | Diego Calanzone et.al. | 2404.12843v1 | null |
2024-04-19 | TextSquare: Scaling up Text-Centric Visual Instruction Tuning | Jingqun Tang et.al. | 2404.12803v1 | null |
2024-04-19 | Relevant or Random: Can LLMs Truly Perform Analogical Reasoning? | Chengwei Qin et.al. | 2404.12728v1 | null |
2024-04-19 | Enabling Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration | Yichong Huang et.al. | 2404.12715v1 | null |
2024-04-22 | Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs | Boyang Yang et.al. | 2404.12636v2 | null |
2024-04-18 | BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models | Yu Feng et.al. | 2404.12494v1 | null |
2024-04-18 | NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models | Abhinav Rao et.al. | 2404.12464v1 | null |
2024-04-25 | BLINK: Multimodal Large Language Models Can See but Not Perceive | Xingyu Fu et.al. | 2404.12390v2 | null |
2024-04-18 | MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale | Xiaotang Gai et.al. | 2404.12372v1 | null |
2024-04-18 | Large Language Models in Targeted Sentiment Analysis | Nicolay Rusnachenko et.al. | 2404.12342v1 | link |
2024-04-18 | Normative Requirements Operationalization with Large Language Models | Nick Feng et.al. | 2404.12335v1 | null |
2024-04-18 | Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing | Ye Tian et.al. | 2404.12253v1 | null |
2024-04-19 | AccidentBlip2: Accident Detection With Multi-View MotionBlip2 | Yihua Shao et.al. | 2404.12149v2 | link |
2024-04-18 | RAGAR, Your Falsehood RADAR: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models | M. Abdul Khaliq et.al. | 2404.12065v1 | null |
2024-04-18 | EVIT: Event-Oriented Instruction Tuning for Event Reasoning | Zhengwei Tao et.al. | 2404.11978v1 | null |
2024-04-18 | Large Language Models Can Plan Your Travels Rigorously with Formal Verification Tools | Yilun Hao et.al. | 2404.11891v1 | null |
2024-04-18 | CAUS: A Dataset for Question Generation based on Human Cognition Leveraging Large Language Models | Minjung Shin et.al. | 2404.11835v1 | null |
2024-04-19 | Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study | Zooey Nguyen et.al. | 2404.11792v2 | null |
2024-04-21 | Missed Connections: Lateral Thinking Puzzles for Large Language Models | Graham Todd et.al. | 2404.11730v2 | null |
2024-04-17 | How often are errors in natural language reasoning due to paraphrastic variability? | Neha Srikanth et.al. | 2404.11717v1 | null |
2024-04-17 | Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models | Yue Zhou et.al. | 2404.11500v1 | link |
2024-04-17 | Exploring the Transferability of Visual Prompting for Multimodal Large Language Models | Yichi Zhang et.al. | 2404.11207v1 | link |
2024-04-17 | Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales | Minghe Gao et.al. | 2404.11129v1 | null |
2024-04-17 | TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment | Qinfeng Li et.al. | 2404.11121v1 | null |
2024-04-18 | ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models | Trong-Hieu Nguyen et.al. | 2404.11086v2 | null |
2024-04-17 | On the Empirical Complexity of Reasoning and Planning in LLMs | Liwei Kang et.al. | 2404.11041v1 | null |
2024-04-17 | Empowering Large Language Models on Robotic Manipulation with Affordance Prompting | Guangran Cheng et.al. | 2404.11027v1 | null |
2024-04-17 | Many-Shot In-Context Learning | Rishabh Agarwal et.al. | 2404.11018v1 | null |
2024-04-16 | Self-playing Adversarial Language Game Enhances LLM Reasoning | Pengyu Cheng et.al. | 2404.10642v1 | link |
2024-04-16 | Private Attribute Inference from Images with Vision-Language Models | Batuhan Tömekçe et.al. | 2404.10618v1 | null |
2024-04-16 | Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases | Yanze Li et.al. | 2404.10595v1 | null |
2024-04-16 | CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity | Moshe Berchansky et.al. | 2404.10513v1 | null |
2024-04-16 | MEEL: Multi-Modal Event Evolution Learning | Zhengwei Tao et.al. | 2404.10429v1 | link |
2024-04-16 | Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering | Yuqi Wang et.al. | 2404.10384v1 | null |
2024-04-16 | Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards | Hyeonbin Hwang et.al. | 2404.10346v1 | link |
2024-04-28 | RLRF:Reinforcement Learning from Reflection through Debates as Feedback for Bias Mitigation in LLMs | Ruoxi Cheng et.al. | 2404.10160v2 | null |
2024-04-15 | TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition | Md Mahadi Hasan Nahid et.al. | 2404.10150v1 | link |
2024-04-15 | ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis | Aashish Anantha Ramakrishnan et.al. | 2404.10141v1 | link |
2024-04-15 | A Survey on Deep Learning for Theorem Proving | Zhaoyu Li et.al. | 2404.09939v1 | link |
2024-04-15 | Compression Represents Intelligence Linearly | Yuzhen Huang et.al. | 2404.09937v1 | link |
2024-04-15 | AI-Driven Statutory Reasoning via Software Engineering Methods | Rohan Padhye et.al. | 2404.09868v1 | null |
2024-04-15 | Reimagining Self-Adaptation in the Age of Large Language Models | Raghav Donakanti et.al. | 2404.09866v1 | null |
2024-04-15 | Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model | Hyunsoo Cho et.al. | 2404.09717v1 | null |
2024-04-15 | Generative AI for Game Theory-based Mobile Networking | Long He et.al. | 2404.09699v1 | null |
2024-04-15 | Bridging Vision and Language Spaces with Assignment Prediction | Jungin Park et.al. | 2404.09632v1 | link |
2024-04-15 | Bridging the Gap between Different Vocabularies for LLM Ensemble | Yangyifan Xu et.al. | 2404.09492v1 | link |
2024-04-15 | Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning | Sungwon Han et.al. | 2404.09491v1 | link |
2024-04-15 | MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems | Kaixin Li et.al. | 2404.09486v1 | link |
2024-04-14 | A Survey on Integration of Large Language Models with Intelligent Robots | Yeseung Kim et.al. | 2404.09228v1 | null |
2024-04-16 | Post-Semantic-Thinking: A Robust Strategy to Distill Reasoning Capacity from Large Language Models | Xiaoshu Chen et.al. | 2404.09170v2 | null |
2024-04-14 | When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models | Yanhong Li et.al. | 2404.09129v1 | null |
2024-04-13 | CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting | Zukang Yang et.al. | 2404.09077v1 | link |
2024-04-12 | "Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations | James F. Mullen Jr et.al. | 2404.08827v1 | null |
2024-04-12 | LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning | Junchi Wang et.al. | 2404.08767v1 | link |
2024-04-11 | MM-PhyQA: Multimodal Physics Question-Answering With Multi-Image CoT Prompting | Avinash Anand et.al. | 2404.08704v1 | null |
2024-04-10 | Apollonion: Profile-centric Dialog Agent | Shangyu Chen et.al. | 2404.08692v1 | null |
2024-04-06 | ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming | Simone Tedeschi et.al. | 2404.08676v1 | link |
2024-04-12 | Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts | Övgü Özdemir et.al. | 2404.08589v1 | link |
2024-04-12 | LaSagnA: Language-based Segmentation Assistant for Complex Queries | Cong Wei et.al. | 2404.08506v1 | link |
2024-04-12 | Strategic Interactions between Large Language Models-based Agents in Beauty Contests | Siting Lu et.al. | 2404.08492v1 | null |
2024-04-12 | Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian | Stefano De Paoli et.al. | 2404.08488v1 | null |
2024-04-11 | Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs | Jierui Li et.al. | 2404.08148v1 | null |
2024-04-11 | Data-Augmentation-Based Dialectal Adaptation for LLMs | Fahim Faisal et.al. | 2404.08092v1 | link |
2024-04-10 | Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition | Kehua Feng et.al. | 2404.08008v1 | link |
2024-04-17 | LaVy: Vietnamese Multimodal Large Language Model | Chi Tran et.al. | 2404.07922v4 | link |
2024-04-11 | ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs | Lei Sun et.al. | 2404.07677v1 | null |
2024-04-11 | WESE: Weak Exploration to Strong Exploitation for LLM Agents | Xu Huang et.al. | 2404.07456v1 | null |
2024-04-11 | Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs | Kanchana Ranasinghe et.al. | 2404.07449v1 | null |
2024-04-10 | Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs | Bowen Jin et.al. | 2404.07103v1 | link |
2024-04-10 | VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning | Alexandros Xenos et.al. | 2404.07078v1 | link |
2024-04-10 | Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study | Hongru Du et.al. | 2404.06962v1 | link |
2024-04-10 | Vision-Language Model-based Physical Reasoning for Robot Liquid Perception | Wenqiang Lai et.al. | 2404.06904v1 | null |
2024-04-09 | GenCHiP: Generating Robot Policy Code for High-Precision and Contact-Rich Manipulation Tasks | Kaylee Burns et.al. | 2404.06645v1 | null |
2024-04-09 | Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language? | Omid Ghahroodi et.al. | 2404.06644v1 | null |
2024-04-09 | AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | Luca Gioacchini et.al. | 2404.06411v1 | link |
2024-04-09 | Model Generation from Requirements with LLMs: an Exploratory Study | Alessio Ferrari et.al. | 2404.06371v1 | null |
2024-04-21 | AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning | Senkang Hu et.al. | 2404.06345v2 | null |
2024-04-09 | DRE: Generating Recommendation Explanations by Aligning Large Language Models at Data-level | Shen Gao et.al. | 2404.06311v1 | null |
2024-04-09 | Multimodal Road Network Generation Based on Large Language Model | Jiajing Chen et.al. | 2404.06227v1 | null |
2024-04-08 | Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning | Ruiqi Zhang et.al. | 2404.05868v1 | null |
2024-04-08 | Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs | Keen You et.al. | 2404.05719v1 | null |
2024-04-08 | Evaluating Mathematical Reasoning Beyond Accuracy | Shijie Xia et.al. | 2404.05692v1 | link |
2024-04-18 | CoReS: Orchestrating the Dance of Reasoning and Segmentation | Xiaoyi Bao et.al. | 2404.05673v2 | null |
2024-04-08 | MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering | Iñigo Alonso et.al. | 2404.05590v1 | null |
2024-04-08 | Evaluating Interventional Reasoning Capabilities of Large Language Models | Tejas Kasetty et.al. | 2404.05545v1 | null |
2024-04-08 | HAMMR: HierArchical MultiModal React agents for generic VQA | Lluis Castrejon et.al. | 2404.05465v1 | null |
2024-04-11 | RoT: Enhancing Large Language Models with Reflection on Search Trees | Wenyang Hui et.al. | 2404.05449v2 | link |
2024-04-08 | Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models | Yutao Ouyang et.al. | 2404.05291v1 | null |
2024-04-08 | LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models | Shibo Hao et.al. | 2404.05221v1 | null |
2024-04-08 | LLM-BT: Performing Robotic Adaptive Tasks based on Large Language Models and Behavior Trees | Haotian Zhou et.al. | 2404.05134v1 | null |
2024-04-07 | Facial Affective Behavior Analysis with Instruction Tuning | Yifan Li et.al. | 2404.05052v1 | null |
2024-04-07 | MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models | Zihao Wei et.al. | 2404.04990v1 | link |
2024-04-07 | SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials | Mael Jullien et.al. | 2404.04963v1 | null |
2024-04-07 | RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models | Qi Lv et.al. | 2404.04929v1 | null |
2024-04-07 | LLM-Based Multi-Agent Systems for Software Engineering: Vision and the Road Ahead | Junda He et.al. | 2404.04834v1 | null |
2024-04-07 | FRACTAL: Fine-Grained Scoring from Aggregate Text Labels | Yukti Makhija et.al. | 2404.04817v1 | null |
2024-04-07 | GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling | Hritik Bansal et.al. | 2404.04763v1 | null |
2024-04-06 | Challenges Faced by Large Language Models in Solving Multi-Agent Flocking | Peihan Li et.al. | 2404.04752v1 | null |
2024-04-06 | Navigating the Landscape of Hint Generation Research: From the Past to the Future | Anubhav Jangra et.al. | 2404.04728v1 | null |
2024-04-06 | Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology | Dyke Ferber et.al. | 2404.04667v1 | null |
2024-04-06 | Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement | Zaid Khan et.al. | 2404.04627v1 | null |
2024-04-06 | IITK at SemEval-2024 Task 2: Exploring the Capabilities of LLMs for Safe Biomedical Natural Language Inference for Clinical Trials | Shreyasi Mandal et.al. | 2404.04510v1 | link |
2024-04-05 | Exploring Autonomous Agents through the Lens of Large Language Models: A Review | Saikat Barua et.al. | 2404.04442v1 | null |
2024-04-05 | Assisting humans in complex comparisons: automated information comparison at scale | Truman Yuen et.al. | 2404.04351v1 | null |
2024-04-05 | Koala: Key frame-conditioned long video-LLM | Reuben Tan et.al. | 2404.04346v1 | null |
2024-04-04 | CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering | Nirmalie Wiratunga et.al. | 2404.04302v1 | link |
2024-04-04 | Reason from Fallacy: Enhancing Large Language Models' Logical Reasoning through Logical Fallacy Understanding | Yanda Li et.al. | 2404.04293v1 | null |
2024-04-05 | Physical Property Understanding from Language-Embedded Feature Fields | Albert J. Zhai et.al. | 2404.04242v1 | null |
2024-04-05 | Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents | Harsh Kohli et.al. | 2404.04237v1 | null |
2024-04-05 | Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer | Hele-Andra Kuulmets et.al. | 2404.04042v1 | null |
2024-04-05 | Can only LLMs do Reasoning?: Potential of Small Language Models in Task Planning | Gawon Choi et.al. | 2404.03891v1 | link |
2024-04-08 | SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models | Hyeonwoo Kim et.al. | 2404.03887v2 | null |
2024-04-04 | Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra | Darioush Kevian et.al. | 2404.03647v1 | null |
2024-04-04 | Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph | Marco Bronzini et.al. | 2404.03623v1 | null |
2024-04-04 | Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models | Wenshan Wu et.al. | 2404.03622v1 | null |
2024-04-04 | Sailor: Open Language Models for South-East Asia | Longxu Dou et.al. | 2404.03608v1 | link |
2024-04-04 | Evaluating LLMs at Detecting Errors in LLM Responses | Ryo Kamoi et.al. | 2404.03602v1 | link |
2024-04-04 | Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models | Yantao Liu et.al. | 2404.03577v1 | link |
2024-04-04 | Edisum: Summarizing and Explaining Wikipedia Edits at Scale | Marija Šakota et.al. | 2404.03428v1 | link |
2024-04-04 | Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought | Jooyoung Lee et.al. | 2404.03414v1 | null |
2024-04-04 | nicolay-r at SemEval-2024 Task 3: Using Flan-T5 for Reasoning Emotion Cause in Conversations with Chain-of-Thought on Emotion States | Nicolay Rusnachenko et.al. | 2404.03361v1 | link |
2024-04-04 | Probing Large Language Models for Scalar Adjective Lexical Semantics and Scalar Diversity Pragmatics | Fangru Lin et.al. | 2404.03301v1 | link |
2024-04-04 | The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models | Noah Y. Siegel et.al. | 2404.03189v1 | null |
2024-04-04 | Robust Pronoun Use Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased? | Vagrant Gautam et.al. | 2404.03134v1 | link |
2024-04-10 | An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models | Emmy Liu et.al. | 2404.03028v2 | null |
2024-04-03 | Towards a Fully Interpretable and More Scalable RSA Model for Metaphor Understanding | Gaia Carenini et.al. | 2404.02983v1 | null |
2024-04-03 | Explainable Traffic Flow Prediction with Large Language Models | Xusen Guo et.al. | 2404.02937v1 | null |
2024-04-03 | KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking | Jiawei Zhang et.al. | 2404.02935v1 | link |
2024-04-03 | GreedLlama: Performance of Financial Value-Aligned Large Language Models in Moral Reasoning | Jeffy Yu et.al. | 2404.02934v1 | null |
2024-04-03 | I-Design: Personalized LLM Interior Designer | Ata Çelen et.al. | 2404.02838v1 | null |
2024-04-03 | Empowering Biomedical Discovery with AI Agents | Shanghua Gao et.al. | 2404.02831v1 | null |
2024-04-05 | A Survey of Optimization-based Task and Motion Planning: From Classical To Learning Approaches | Zhigen Zhao et.al. | 2404.02817v2 | null |
2024-04-03 | Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models | Hyungjoo Chae et.al. | 2404.02575v1 | null |
2024-04-03 | VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments | Bufang Yang et.al. | 2404.02508v1 | null |
2024-04-03 | Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT | Amirhossein Abaskohi et.al. | 2404.02403v1 | link |
2024-04-02 | Gurusha Juneja et.al. | 2404.02255v1 | null | |
2024-04-02 | Advancing LLM Reasoning Generalists with Preference Trees | Lifan Yuan et.al. | 2404.02078v1 | link |
2024-04-04 | Long-context LLMs Struggle with Long In-context Learning | Tianle Li et.al. | 2404.02060v2 | link |
2024-04-02 | Large Language Models for Orchestrating Bimanual Robots | Kun Chu et.al. | 2404.02018v1 | null |
2024-04-13 | HyperCLOVA X Technical Report | Kang Min Yoo et.al. | 2404.01954v2 | null |
2024-04-02 | Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey | Philipp Mondorf et.al. | 2404.01869v1 | null |
2024-04-02 | Where to Move Next: Zero-shot Generalization of LLMs for Next POI Recommendation | Shanshan Feng et.al. | 2404.01855v1 | link |
2024-04-03 | Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation | Zhouhao Sun et.al. | 2404.01677v2 | null |
2024-04-02 | METAL: Towards Multilingual Meta-Evaluation | Rishav Hada et.al. | 2404.01667v1 | null |
2024-04-02 | InsightLens: Discovering and Exploring Insights from Conversational Contexts in Large-Language-Model-Powered Data Analysis | Luoxuan Weng et.al. | 2404.01644v1 | null |
2024-04-01 | Syntactic Robustness for LLM-based Code Generation | Laboni Sarker et.al. | 2404.01535v1 | null |
2024-04-01 | Are large language models superhuman chemists? | Adrian Mirza et.al. | 2404.01475v1 | null |
2024-04-01 | Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs | Pengda Wang et.al. | 2404.01461v1 | null |
2024-03-31 | CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs | Jingzhe Shi et.al. | 2404.01343v1 | null |
2024-04-01 | FABLES: Evaluating faithfulness and content selection in book-length summarization | Yekyung Kim et.al. | 2404.01261v1 | link |
2024-04-01 | A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules | Xiang Li et.al. | 2404.01245v1 | null |
2024-04-01 | LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models | Yadong Zhang et.al. | 2404.01230v1 | null |
2024-04-01 | Enhancing Reasoning Capacity of SLM using Cognitive Enhancement | Jonathan Pan et.al. | 2404.01135v1 | null |
2024-04-01 | Enabling Memory Safety of C Programs using LLMs | Nausheen Mohammed et.al. | 2404.01096v1 | null |
2024-04-01 | Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning | Rongjie Li et.al. | 2404.00909v1 | null |
2024-04-02 | An Abundance of Katherines: The Game Theory of Baby Naming | Katy Blumer et.al. | 2404.00732v2 | null |
2024-03-30 | Multi-hop Question Answering under Temporal Knowledge Editing | Keyuan Cheng et.al. | 2404.00492v1 | null |
2024-04-04 | Planning and Editing What You Retrieve for Enhanced Tool Learning | Tenghao Huang et.al. | 2404.00450v2 | link |
2024-03-30 | Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks | Hyunjae Kim et.al. | 2404.00376v1 | null |
2024-03-30 | Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Ankit Satpute et.al. | 2404.00344v1 | link |
2024-03-30 | Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World | Guande Wu et.al. | 2404.00246v1 | link |
2024-03-30 | Aligning Large Language Models with Recommendation Knowledge | Yuwei Cao et.al. | 2404.00245v1 | null |
2024-03-30 | DeFT: Flash Tree-attention with IO-Awareness for Efficient Tree-search-based LLM Inference | Jinwei Yao et.al. | 2404.00242v1 | null |
2024-03-30 | Multi-Conditional Ranking with Large Language Models | Pouya Pezeshkpour et.al. | 2404.00211v1 | link |
2024-03-30 | EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs | Cheng Jiayang et.al. | 2404.00209v1 | link |
2024-03-30 | Conceptual and Unbiased Reasoning in Language Models | Ben Zhou et.al. | 2404.00205v1 | null |
2024-03-29 | Classifying Conspiratorial Narratives At Scale: False Alarms and Erroneous Connections | Ahmad Diab et.al. | 2404.00141v1 | null |
2024-03-29 | Measuring Taiwanese Mandarin Language Understanding | Po-Heng Chen et.al. | 2403.20180v1 | null |
2024-03-29 | ITCMA: A Generative Agent Based on a Computational Consciousness Structure | Hanzhong Zhang et.al. | 2403.20097v1 | null |
2024-03-29 | On Large Language Models' Hallucination with Regard to Known Facts | Che Jiang et.al. | 2403.20009v1 | null |
2024-03-29 | Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning | Qinhao Zhou et.al. | 2403.19962v1 | null |
2024-03-28 | LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces | Xiaomin Ouyang et.al. | 2403.19857v1 | null |
2024-03-28 | Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving | Akshay Gopalkrishnan et.al. | 2403.19838v1 | link |
2024-03-28 | Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models | Yucheng Shi et.al. | 2403.19631v1 | null |
2024-03-28 | BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue Generation | Yuhong He et.al. | 2403.19414v1 | null |
2024-03-28 | RAIL: Robot Affordance Imagination with Large Language Models | Ceng Zhang et.al. | 2403.19369v1 | null |
2024-03-28 | IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation | Jiacui Huang et.al. | 2403.19336v1 | null |
2024-03-28 | Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models | Jiaxing Chen et.al. | 2403.19322v1 | null |
2024-04-01 | TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios | Xiaokang Zhang et.al. | 2403.19318v2 | link |
2024-03-28 | Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering | Yexin Wu et.al. | 2403.19167v1 | null |
2024-03-28 | MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering | Che Guan et.al. | 2403.19116v1 | null |
2024-03-28 | Learning From Correctness Without Prompting Makes LLM Efficient Reasoner | Yuxuan Yao et.al. | 2403.19094v1 | null |
2024-03-27 | LITA: Language Instructed Temporal-Localization Assistant | De-An Huang et.al. | 2403.19046v1 | link |
2024-03-27 | Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | Yanwei Li et.al. | 2403.18814v1 | link |
2024-04-03 | Long-form factuality in large language models | Jerry Wei et.al. | 2403.18802v3 | link |
2024-03-27 | A Path Towards Legal Autonomy: An interoperable and explainable approach to extracting, transforming, loading and computing legal information using large language models, expert systems and Bayesian networks | Axel Constant et.al. | 2403.18537v1 | null |
2024-03-27 | TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions | Jamshid Mozafari et.al. | 2403.18426v1 | link |
2024-03-27 | The Topos of Transformer Networks | Mattia Jacopo Villani et.al. | 2403.18415v1 | null |
2024-03-27 | An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM | Wonkyun Kim et.al. | 2403.18406v1 | link |
2024-03-27 | Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval | Shengjie Ma et.al. | 2403.18405v1 | null |
2024-03-27 | BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models | Haitao Li et.al. | 2403.18365v1 | null |
2024-04-03 | Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective | Meiqi Chen et.al. | 2403.18346v3 | null |
2024-03-27 | LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models | Mingxing Peng et.al. | 2403.18344v1 | null |
2024-03-27 | Dual Instruction Tuning with Large Language Models for Mathematical Reasoning | Yongwei Zhou et.al. | 2403.18295v1 | null |
2024-03-27 | Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models | Yiwu Zhong et.al. | 2403.18252v1 | link |
2024-03-27 | Large Language Models Need Consultants for Reasoning: Becoming an Expert in a Complex Human System Through Behavior Simulation | Chuwen Wang et.al. | 2403.18230v1 | link |
2024-03-28 | Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language Models | Kartikeya Bhardwaj et.al. | 2403.18159v2 | null |
2024-03-26 | Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization | Jin Peng Zhou et.al. | 2403.18120v1 | link |
2024-03-26 | ShapeGrasp: Zero-Shot Task-Oriented Grasping with Large Language Models through Geometric Decomposition | Samuel Li et.al. | 2403.18062v1 | null |
2024-03-26 | MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution | Wei Tao et.al. | 2403.17927v1 | null |
2024-03-26 | Assessment of Multimodal Large Language Models in Alignment with Human Values | Zhelun Shi et.al. | 2403.17830v1 | null |
2024-03-26 | Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons | Shijia Zhou et.al. | 2403.17760v1 | link |
2024-03-26 | Large Language Models Enhanced Collaborative Filtering | Zhongxiang Sun et.al. | 2403.17688v1 | null |
2024-03-26 | DGoT: Dynamic Graph of Thoughts for Scientific Abstract Generation | Xinyu Ning et.al. | 2403.17491v1 | link |
2024-03-26 | ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales? | Fan Huang et.al. | 2403.17368v1 | link |
2024-03-26 | Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models | Zhenyu Pan et.al. | 2403.17359v1 | null |
2024-03-25 | TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models | Ishika Singh et.al. | 2403.17246v1 | null |
2024-03-25 | A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection | Benjamin Steenhoek et.al. | 2403.17218v1 | null |
2024-03-25 | Grounding Language Plans in Demonstrations Through Counterfactual Perturbations | Yanwei Wang et.al. | 2403.17124v1 | null |
2024-03-25 | Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models | Hao Shao et.al. | 2403.16999v1 | link |
2024-03-25 | PropTest: Automatic Property Testing for Improved Visual Programming | Jaywon Koo et.al. | 2403.16921v1 | null |
2024-03-25 | Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art | Neeloy Chakraborty et.al. | 2403.16527v1 | null |
2024-03-25 | Harnessing the power of LLMs for normative reasoning in MASs | Bastin Tony Roy Savarimuthu et.al. | 2403.16524v1 | null |
2024-03-25 | Norm Violation Detection in Multi-Agent Systems using Large Language Models: A Pilot Study | Shawn He et.al. | 2403.16517v1 | null |
2024-03-25 | Evaluating Large Language Models with Runtime Behavior of Program Execution | Junkai Chen et.al. | 2403.16437v1 | null |
2024-03-27 | Re2LLM: Reflective Reinforcement Large Language Model for Session-based Recommendation | Ziyan Wang et.al. | 2403.16427v3 | null |
2024-03-28 | Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA | Zhuowan Li et.al. | 2403.16385v2 | null |
2024-03-28 | Can Language Models Pretend Solvers? Logic Code Simulation with LLMs | Minyu Chen et.al. | 2403.16097v2 | null |
2024-03-24 | Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications | Wei Ma et.al. | 2403.16073v1 | null |
2024-03-23 | Few-shot Dialogue Strategy Learning for Motivational Interviewing via Inductive Reasoning | Zhouhang Xie et.al. | 2403.15737v1 | null |
2024-03-23 | LLMs Instruct LLMs:An Extraction and Editing Method | Xin Zhang et.al. | 2403.15736v1 | null |
2024-03-21 | Open Source Conversational LLMs do not know most Spanish words | Javier Conde et.al. | 2403.15491v1 | null |
2024-03-19 | LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction | Hejie Cui et.al. | 2403.15464v1 | null |
2024-04-01 | LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models | Yuzhang Shang et.al. | 2403.15388v3 | null |
2024-03-22 | Can large language models explore in-context? | Akshay Krishnamurthy et.al. | 2403.15371v1 | null |
2024-03-22 | CoLLEGe: Concept Embedding Generation for Large Language Models | Ryan Teehan et.al. | 2403.15362v1 | null |
2024-03-22 | Sphere Neural-Networks for Rational Reasoning | Tiansi Dong et.al. | 2403.15297v1 | null |
2024-03-22 | MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection | Taeheon Kim et.al. | 2403.15209v1 | null |
2024-03-22 | CACA Agent: Capability Collaboration based AI Agent | Peng Xu et.al. | 2403.15137v1 | null |
2024-04-03 | MasonTigers at SemEval-2024 Task 9: Solving Puzzles with an Ensemble of Chain-of-Thoughts | Md Nishat Raihan et.al. | 2403.14982v2 | null |
2024-03-22 | Attention-Driven Reasoning: Unlocking the Potential of Large Language Models | Bingli Liao et.al. | 2403.14932v1 | null |
2024-03-25 | VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding | Ahmad Mahmood et.al. | 2403.14743v2 | null |
2024-03-21 | MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | Renrui Zhang et.al. | 2403.14624v1 | null |
2024-03-21 | A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science | Clayton Cohn et.al. | 2403.14565v1 | null |
2024-03-21 | ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting | Xiaoxue Cheng et.al. | 2403.14312v1 | link |
2024-03-21 | ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification | Sehee Lim et.al. | 2403.14255v1 | null |
2024-03-23 | K-Act2Emo: Korean Commonsense Knowledge Graph for Indirect Emotional Expression | Kyuhee Kim et.al. | 2403.14253v2 | link |
2024-03-21 | Empowering Segmentation Ability to Multi-modal Large Language Models | Yuqi Yang et.al. | 2403.14141v1 | null |
2024-03-21 | Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations | Jiaxing Sun et.al. | 2403.14112v1 | link |
2024-03-21 | Empowering Personalized Learning through a Conversation-based Tutoring System with Student Modeling | Minju Park et.al. | 2403.14071v1 | null |
2024-03-14 | Circuit Transformer: End-to-end Circuit Design by Predicting the Next Gate | Xihan Li et.al. | 2403.13838v1 | null |
2024-03-23 | Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts | Guangzeng Han et.al. | 2403.13786v2 | null |
2024-03-22 | Llama meets EU: Investigating the European Political Spectrum through the Lens of LLMs | Ilias Chalkidis et.al. | 2403.13592v2 | link |
2024-03-20 | PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns | Yew Ken Chia et.al. | 2403.13315v1 | link |
2024-03-20 | LeanReasoner: Boosting Complex Logical Reasoning with Lean | Dongwei Jiang et.al. | 2403.13312v1 | link |
2024-03-20 | Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs | Zhihong Sun et.al. | 2403.13271v1 | null |
2024-03-19 | VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning | Yongshuo Zong et.al. | 2403.13164v1 | link |
2024-03-13 | AutoTRIZ: Artificial Ideation with TRIZ and Large Language Models | Shuo Jiang et.al. | 2403.13002v1 | null |
2024-03-11 | Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control | On Tai Wu et.al. | 2403.12999v1 | null |
2024-03-19 | Dated Data: Tracing Knowledge Cutoffs in Large Language Models | Jeffrey Cheng et.al. | 2403.12958v1 | null |
2024-03-19 | Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models | Joana Ribeiro de Faria et.al. | 2403.12936v1 | null |
2024-03-19 | mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | Anwen Hu et.al. | 2403.12895v1 | link |
2024-03-19 | HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning | Fucai Ke et.al. | 2403.12884v1 | null |
2024-03-19 | Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | Zehui Chen et.al. | 2403.12881v1 | link |
2024-03-19 | Compositional 3D Scene Synthesis with Scene Graph Guided Layout-Shape Generation | Yao Wei et.al. | 2403.12848v1 | null |
2024-03-19 | RelationVLM: Making Large Vision-Language Models Understand Visual Relations | Zhipeng Huang et.al. | 2403.12801v1 | null |
2024-03-18 | NovelQA: A Benchmark for Long-Range Novel Question Answering | Cunxiang Wang et.al. | 2403.12766v1 | link |
2024-03-19 | Instructing Large Language Models to Identify and Ignore Irrelevant Conditions | Zhenyu Wu et.al. | 2403.12744v1 | link |
2024-03-19 | Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs | Victor Carbune et.al. | 2403.12596v1 | null |
2024-03-19 | AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework | Xiang Li et.al. | 2403.12582v1 | link |
2024-03-19 | To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions | Daniel Tanneberg et.al. | 2403.12533v1 | null |
2024-03-19 | Embodied LLM Agents Learn to Cooperate in Organized Teams | Xudong Guo et.al. | 2403.12482v1 | null |
2024-03-19 | Dr3: Ask Large Language Models Not to Give Off-Topic Answers in Open Domain Multi-Hop Question Answering | Yuan Gao et.al. | 2403.12393v1 | null |
2024-03-22 | RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners | Chi Hu et.al. | 2403.12373v3 | null |
2024-03-18 | OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety | Chuang Liu et.al. | 2403.12316v1 | null |
2024-03-18 | TnT-LLM: Text Mining at Scale with Large Language Models | Mengting Wan et.al. | 2403.12173v1 | null |
2024-03-18 | EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents | Abhay Zala et.al. | 2403.12014v1 | null |
2024-03-18 | QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction | Xiang Huang et.al. | 2403.11886v1 | null |
2024-03-18 | Agent3D-Zero: An Agent for Zero-shot 3D Understanding | Sha Zhang et.al. | 2403.11835v1 | null |
2024-03-18 | Metaphor Understanding Challenge Dataset for LLMs | Xiaoyu Tong et.al. | 2403.11810v1 | null |
2024-03-25 | Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models | Mingyang Song et.al. | 2403.11802v2 | link |
2024-03-18 | Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus | Seungpil Lee et.al. | 2403.11793v1 | null |
2024-03-20 | LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning | Shu Wang et.al. | 2403.11552v2 | link |
2024-03-22 | Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning | Rao Fu et.al. | 2403.11401v2 | null |
2024-03-17 | ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models | Siyuan Huang et.al. | 2403.11289v1 | link |
2024-03-17 | Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering | Baiyan Zhang et.al. | 2403.11129v1 | null |
2024-03-17 | GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment | Lance Ying et.al. | 2403.11075v1 | null |
2024-03-26 | SelfIE: Self-Interpretation of Large Language Model Embeddings | Haozhe Chen et.al. | 2403.10949v2 | link |
2024-03-16 | BEnQA: A Question Answering and Reasoning Benchmark for Bengali and English | Sheikh Shafayat et.al. | 2403.10900v1 | link |
2024-03-16 | A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment | Tianhe Wu et.al. | 2403.10854v1 | link |
2024-03-16 | NARRATE: Versatile Language Architecture for Optimal Control in Robotics | Seif Ismail et.al. | 2403.10762v1 | null |
2024-03-15 | VideoAgent: Long-form Video Understanding with Large Language Model as Agent | Xiaohan Wang et.al. | 2403.10517v1 | null |
2024-03-15 | Demystifying Faulty Code with LLM: Step-by-Step Reasoning for Explainable Fault Localization | Ratnadira Widyasari et.al. | 2403.10507v1 | null |
2024-03-15 | HawkEye: Training Video-Text LLMs for Grounding Text in Videos | Yueqian Wang et.al. | 2403.10228v1 | link |
2024-03-15 | AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation | Arkajit Datta et.al. | 2403.10171v1 | null |
2024-03-15 | RAFT: Adapting Language Model to Domain Specific RAG | Tianjun Zhang et.al. | 2403.10131v1 | link |
2024-03-15 | Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning | Hang Zhang et.al. | 2403.10107v1 | null |
2024-03-15 | Knowledge Condensation and Reasoning for Knowledge-based VQA | Dongze Hao et.al. | 2403.10037v1 | null |
2024-03-15 | ViTCN: Vision Transformer Contrastive Network For Reasoning | Bo Song et.al. | 2403.09962v1 | null |
2024-03-14 | Meta-Cognitive Analysis: Evaluating Declarative and Procedural Knowledge in Datasets and Large Language Models | Zhuoqun Li et.al. | 2403.09750v1 | link |
2024-03-14 | Re-Search for The Truth: Multi-round Retrieval-augmented Large Language Models are Strong Fake News Detectors | Guanghua Li et.al. | 2403.09747v1 | null |
2024-03-13 | Do Large Language Models Solve ARC Visual Analogies Like People Do? | Gustaw Opiełka et.al. | 2403.09734v1 | null |
2024-03-14 | 3D-VLA: A 3D Vision-Language-Action Generative World Model | Haoyu Zhen et.al. | 2403.09631v1 | null |
2024-03-22 | MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | Brandon McKinzie et.al. | 2403.09611v3 | null |
2024-03-14 | Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey | Xiaoyu Liu et.al. | 2403.09606v1 | null |
2024-03-14 | Logical Discrete Graphical Models Must Supplement Large Language Models for Information Synthesis | Gregory Coppola et.al. | 2403.09599v1 | null |
2024-03-15 | ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models | Runyu Ma et.al. | 2403.09583v2 | null |
2024-03-22 | Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation | Yunhao Gou et.al. | 2403.09572v2 | null |
2024-03-21 | Less is More: Data Value Estimation for Visual Instruction Tuning | Zikang Liu et.al. | 2403.09559v2 | null |
2024-03-14 | Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge | Li Yizhen et.al. | 2403.09164v1 | null |
2024-03-14 | Caveat Lector: Large Language Models in Legal Practice | Eliza Mik et.al. | 2403.09163v1 | null |
2024-03-14 | USimAgent: Large Language Models for Simulating Search Users | Erhan Zhang et.al. | 2403.09142v1 | null |
2024-03-14 | Meaningful Learning: Advancing Abstract Reasoning in Large Language Models via Generic Fact Guidance | Kai Xiong et.al. | 2403.09085v1 | null |
2024-03-14 | Query Rewriting via Large Language Models | Jie Liu et.al. | 2403.09060v1 | null |
2024-03-13 | Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era | Xuansheng Wu et.al. | 2403.08946v1 | link |
2024-03-13 | AcademiaOS: Automating Grounded Theory Development in Qualitative Research with Large Language Models | Thomas Übellacker et.al. | 2403.08844v1 | link |
2024-03-13 | TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation | Dingbang Li et.al. | 2403.08833v1 | null |
2024-03-13 | Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework | Jingling Li et.al. | 2403.08743v1 | null |
2024-03-13 | The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models | Carlo Nicolini et.al. | 2403.08739v1 | null |
2024-03-14 | Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation | Daniel Honerkamp et.al. | 2403.08605v2 | link |
2024-03-13 | Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments | Sitao Cheng et.al. | 2403.08593v1 | null |
2024-03-13 | CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model | Cheng Chen et.al. | 2403.08350v1 | link |
2024-03-13 | LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments | Maonan Wang et.al. | 2403.08337v1 | link |
2024-03-13 | Can Large Language Models Identify Authorship? | Baixiang Huang et.al. | 2403.08213v1 | link |
2024-03-13 | Large Language Models are Contrastive Reasoners | Liang Yao et.al. | 2403.08211v1 | link |
2024-03-12 | DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies | William Xie et.al. | 2403.07832v1 | null |
2024-03-12 | Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | Sainbayar Sukhbaatar et.al. | 2403.07816v1 | null |
2024-03-12 | Fine-tuning Large Language Models with Sequential Instructions | Hanxu Hu et.al. | 2403.07794v1 | link |
2024-03-15 | Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations | Carlos Jose Xavier Cruz et.al. | 2403.07769v3 | link |
2024-03-12 | FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models | Yan Liu et.al. | 2403.07747v1 | null |
2024-03-12 | Multi-modal Auto-regressive Modeling via Visual Words | Tianshuo Peng et.al. | 2403.07720v1 | link |
2024-03-12 | DrPlanner: Diagnosis and Repair of Motion Planners Using Large Language Models | Yuanfei Lin et.al. | 2403.07470v1 | link |
2024-03-12 | Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs | Tianqing Fang et.al. | 2403.07398v1 | null |
2024-03-12 | NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning | Bingqian Lin et.al. | 2403.07376v1 | link |
2024-03-11 | Narrating Causal Graphs with Large Language Models | Atharva Phatak et.al. | 2403.07118v1 | null |
2024-03-13 | Naming, Describing, and Quantifying Visual Objects in Humans and LLMs | Alberto Testoni et.al. | 2403.06935v2 | link |
2024-03-11 | ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis | Yanming Liu et.al. | 2403.06932v1 | link |
2024-03-11 | RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback | Yanming Liu et.al. | 2403.06840v1 | link |
2024-03-11 | KELLMRec: Knowledge-Enhanced Large Language Models for Recommendation | Weiqing Luo et.al. | 2403.06642v1 | null |
2024-03-11 | **Guiding Clinical Reasoning with Large Language Models via K |