-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathLLMs-arxiv-daily.json
1 lines (1 loc) · 820 KB
/
LLMs-arxiv-daily.json
1
{"LLM - Explainable": {"2311.18702": "|**2023-11-30**|**CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation**|Pei Ke et.al.|[2311.18702v1](http://arxiv.org/abs/2311.18702v1)|**[link](https://github.com/thu-coai/critiquellm)**|\n", "2311.18353": "|**2023-11-30**|**Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension**|Akira Kawabata et.al.|[2311.18353v1](http://arxiv.org/abs/2311.18353v1)|null|\n", "2311.18062": "|**2023-11-29**|**Understanding Your Agent: Leveraging Large Language Models for Behavior Explanation**|Xijia Zhang et.al.|[2311.18062v1](http://arxiv.org/abs/2311.18062v1)|null|\n", "2311.17365": "|**2023-11-29**|**Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning**|Xiaoqian Wu et.al.|[2311.17365v1](http://arxiv.org/abs/2311.17365v1)|null|\n", "2311.17331": "|**2023-11-29**|**Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering**|Zeqing Wang et.al.|[2311.17331v1](http://arxiv.org/abs/2311.17331v1)|null|\n", "2311.16017": "|**2023-11-27**|**Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models**|Stephen MacNeil et.al.|[2311.16017v1](http://arxiv.org/abs/2311.16017v1)|null|\n", "2311.15716": "|**2023-11-27**|**Justifiable Artificial Intelligence: Engineering Large Language Models for Legal Applications**|Sabine Wehnert et.al.|[2311.15716v1](http://arxiv.org/abs/2311.15716v1)|null|\n", "2311.15548": "|**2023-11-27**|**Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination**|Haoqiang Kang et.al.|[2311.15548v1](http://arxiv.org/abs/2311.15548v1)|null|\n", "2311.14903": "|**2023-11-25**|**Code Generation Based Grading: Evaluating an Auto-grading Mechanism for \"Explain-in-Plain-English\" Questions**|David H. Smith IV et.al.|[2311.14903v1](http://arxiv.org/abs/2311.14903v1)|null|\n", "2311.14126": "|**2023-11-23**|**Towards Auditing Large Language Models: Improving Text-based Stereotype Detection**|Wu Zekun et.al.|[2311.14126v1](http://arxiv.org/abs/2311.14126v1)|null|\n", "2311.14061": "|**2023-11-23**|**Towards Explainable Strategy Templates using NLP Transformers**|Pallavi Bagga et.al.|[2311.14061v1](http://arxiv.org/abs/2311.14061v1)|null|\n", "2311.13160": "|**2023-11-22**|**Large Language Models in Education: Vision and Opportunities**|Wensheng Gan et.al.|[2311.13160v1](http://arxiv.org/abs/2311.13160v1)|null|\n", "2311.12338": "|**2023-11-21**|**A Survey on Large Language Models for Personalized and Explainable Recommendations**|Junyi Chen et.al.|[2311.12338v1](http://arxiv.org/abs/2311.12338v1)|null|\n", "2311.12233": "|**2023-11-20**|**Unifying Corroborative and Contributive Attributions in Large Language Models**|Theodora Worledge et.al.|[2311.12233v1](http://arxiv.org/abs/2311.12233v1)|null|\n", "2311.11904": "|**2023-11-20**|**LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions**|Songhao Han et.al.|[2311.11904v1](http://arxiv.org/abs/2311.11904v1)|null|\n", "2311.11811": "|**2023-11-20**|**Large Language Models and Explainable Law: a Hybrid Methodology**|Marco Billi et.al.|[2311.11811v1](http://arxiv.org/abs/2311.11811v1)|null|\n", "2311.11552": "|**2023-11-20**|**Exploring Prompting Large Language Models as Explainable Metrics**|Ghazaleh Mahmoudi et.al.|[2311.11552v1](http://arxiv.org/abs/2311.11552v1)|**[link](https://github.com/ghazaleh-mahmoodi/Prompting_LLMs_AS_Explainable_Metrics)**|\n", "2311.11334": "|**2023-11-19**|**Using Causal Threads to Explain Changes in a Dynamic System**|Robert B. Allen et.al.|[2311.11334v1](http://arxiv.org/abs/2311.11334v1)|null|\n", "2311.11267": "|**2023-12-17**|**Rethinking Large Language Models in Mental Health Applications**|Shaoxiong Ji et.al.|[2311.11267v2](http://arxiv.org/abs/2311.11267v2)|null|\n", "2311.10075": "|**2023-11-16**|**ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Literacy and Communication in Pediatric Populations and Beyond**|Kanhai S. Amin et.al.|[2311.10075v1](http://arxiv.org/abs/2311.10075v1)|null|\n", "2311.10054": "|**2023-11-16**|**Is \"A Helpful Assistant\" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts**|Mingqian Zheng et.al.|[2311.10054v1](http://arxiv.org/abs/2311.10054v1)|null|\n", "2311.16169": "|**2023-11-16**|**Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities**|Avishree Khare et.al.|[2311.16169v1](http://arxiv.org/abs/2311.16169v1)|null|\n", "2311.09020": "|**2023-11-15**|**Explaining Explanation: An Empirical Study on Explanation in Code Reviews**|Ratnadira Widyasari et.al.|[2311.09020v1](http://arxiv.org/abs/2311.09020v1)|null|\n", "2311.09006": "|**2023-11-15**|**Data Similarity is Not Enough to Explain Language Model Performance**|Gregory Yauney et.al.|[2311.09006v1](http://arxiv.org/abs/2311.09006v1)|**[link](https://github.com/gyauney/data-similarity-is-not-enough)**|\n", "2311.08614": "|**2023-11-15**|**XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making**|Zichen Chen et.al.|[2311.08614v1](http://arxiv.org/abs/2311.08614v1)|null|\n", "2311.08469": "|**2023-11-14**|**UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations**|Wenting Zhao et.al.|[2311.08469v1](http://arxiv.org/abs/2311.08469v1)|null|\n", "2311.08398": "|**2023-11-16**|**Are Large Language Models Temporally Grounded?**|Yifu Qiu et.al.|[2311.08398v2](http://arxiv.org/abs/2311.08398v2)|**[link](https://github.com/yfqiu-nlp/temporal-llms)**|\n", "2311.07811": "|**2023-11-13**|**In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax**|Aaron Mueller et.al.|[2311.07811v1](http://arxiv.org/abs/2311.07811v1)|**[link](https://github.com/aaronmueller/syntax-icl)**|\n", "2311.07466": "|**2023-11-13**|**On Measuring Faithfulness of Natural Language Explanations**|Letitia Parcalabescu et.al.|[2311.07466v1](http://arxiv.org/abs/2311.07466v1)|**[link](https://github.com/heidelberg-nlp/cc-shap)**|\n", "2311.06985": "|**2023-11-12**|**SELF-EXPLAIN: Teaching Large Language Models to Reason Complex Questions by Themselves**|Jiachen Zhao et.al.|[2311.06985v1](http://arxiv.org/abs/2311.06985v1)|null|\n", "2311.06383": "|**2023-11-10**|**Distilling Large Language Models using Skill-Occupation Graph Context for HR-Related Tasks**|Pouya Pezeshkpour et.al.|[2311.06383v1](http://arxiv.org/abs/2311.06383v1)|**[link](https://github.com/megagonlabs/rjdb)**|\n", "2311.14703": "|**2023-11-10**|**ChatGPT Exhibits Gender and Racial Biases in Acute Coronary Syndrome Management**|Angela Zhang et.al.|[2311.14703v1](http://arxiv.org/abs/2311.14703v1)|null|\n", "2311.05019": "|**2023-11-08**|**DEMASQ: Unmasking the ChatGPT Wordsmith**|Kavita Kumari et.al.|[2311.05019v1](http://arxiv.org/abs/2311.05019v1)|null|\n", "2311.04047": "|**2023-11-07**|**Extracting human interpretable structure-property relationships in chemistry using XAI and large language models**|Geemi P. Wellawatte et.al.|[2311.04047v1](http://arxiv.org/abs/2311.04047v1)|**[link](https://github.com/geemi725/xpertai)**|\n", "2311.03754": "|**2023-11-07**|**Which is better? Exploring Prompting Strategy For LLM-based Metrics**|Joonghoon Kim et.al.|[2311.03754v1](http://arxiv.org/abs/2311.03754v1)|**[link](https://github.com/SeoroMin/Prompt4LLM-Eval)**|\n", "2311.03734": "|**2023-11-07**|**Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning**|Ruosen Li et.al.|[2311.03734v1](http://arxiv.org/abs/2311.03734v1)|**[link](https://github.com/bcdnlp/structure-qa)**|\n", "2311.02433": "|**2023-11-04**|**Can ChatGPT support software verification?**|Christian Jan\u00dfen et.al.|[2311.02433v1](http://arxiv.org/abs/2311.02433v1)|null|\n", "2311.01732": "|**2023-11-12**|**Proto-lm: A Prototypical Network-Based Framework for Built-in Interpretability in Large Language Models**|Sean Xie et.al.|[2311.01732v2](http://arxiv.org/abs/2311.01732v2)|**[link](https://github.com/yx131/proto-lm)**|\n", "2311.04911": "|**2023-11-01**|**From Text to Structure: Using Large Language Models to Support the Development of Legal Expert Systems**|Samyar Janatian et.al.|[2311.04911v1](http://arxiv.org/abs/2311.04911v1)|**[link](https://github.com/samyarj/jcapg-jurix2023)**|\n", "2311.00671": "|**2023-11-01**|**Emotion Detection for Misinformation: A Review**|Zhiwei Liu et.al.|[2311.00671v1](http://arxiv.org/abs/2311.00671v1)|null|\n", "2311.00321": "|**2023-11-22**|**HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning**|Yongjin Yang et.al.|[2311.00321v2](http://arxiv.org/abs/2311.00321v2)|**[link](https://github.com/joonkeekim/hare-hate-speech)**|\n", "2311.00206": "|**2023-11-01**|**ChatGPT-Powered Hierarchical Comparisons for Image Classification**|Zhiyuan Ren et.al.|[2311.00206v1](http://arxiv.org/abs/2311.00206v1)|null|\n", "2310.20689": "|**2023-11-14**|**Learning From Mistakes Makes LLM Better Reasoner**|Shengnan An et.al.|[2310.20689v2](http://arxiv.org/abs/2310.20689v2)|**[link](https://github.com/microsoft/lema)**|\n", "2310.20320": "|**2023-10-31**|**Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests**|Max J. van Duijn et.al.|[2310.20320v1](http://arxiv.org/abs/2310.20320v1)|null|\n", "2310.19792": "|**2023-10-30**|**The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics**|Christoph Leiter et.al.|[2310.19792v1](http://arxiv.org/abs/2310.19792v1)|**[link](https://github.com/eval4nlp/sharedtask2023)**|\n", "2310.19658": "|**2023-10-30**|**Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection**|Noah Ziems et.al.|[2310.19658v1](http://arxiv.org/abs/2310.19658v1)|null|\n", "2310.18813": "|**2023-10-28**|**The Synergy of Speculative Decoding and Batching in Serving Large Language Models**|Qidong Su et.al.|[2310.18813v1](http://arxiv.org/abs/2310.18813v1)|null|\n", "2310.17217": "|**2023-10-26**|**Beyond MLE: Convex Learning for Text Generation**|Chenze Shao et.al.|[2310.17217v1](http://arxiv.org/abs/2310.17217v1)|null|\n", "2310.18233": "|**2023-11-01**|**Will releasing the weights of future large language models grant widespread access to pandemic agents?**|Anjali Gopal et.al.|[2310.18233v2](http://arxiv.org/abs/2310.18233v2)|null|\n", "2310.16436": "|**2023-10-26**|**DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models**|Ge Zheng et.al.|[2310.16436v2](http://arxiv.org/abs/2310.16436v2)|null|\n", "2310.16421": "|**2023-10-25**|**Graph Agent: Explicit Reasoning Agent for Graphs**|Qinyong Wang et.al.|[2310.16421v1](http://arxiv.org/abs/2310.16421v1)|null|\n", "2310.15455": "|**2023-10-24**|**UI Layout Generation with LLMs Guided by UI Grammar**|Yuwen Lu et.al.|[2310.15455v1](http://arxiv.org/abs/2310.15455v1)|null|\n", "2310.14389": "|**2023-10-22**|**Evaluating Subjective Cognitive Appraisals of Emotions from Large Language Models**|Hongli Zhan et.al.|[2310.14389v1](http://arxiv.org/abs/2310.14389v1)|**[link](https://github.com/honglizhan/covidet-appraisals-public)**|\n", "2310.14325": "|**2023-10-22**|**Towards Harmful Erotic Content Detection through Coreference-Driven Contextual Analysis**|Inez Okulska et.al.|[2310.14325v1](http://arxiv.org/abs/2310.14325v1)|null|\n", "2310.14025": "|**2023-10-21**|**Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation**|Anastasia Kritharoula et.al.|[2310.14025v1](http://arxiv.org/abs/2310.14025v1)|**[link](https://github.com/anastasiakrith/multimodal-retrieval-for-vwsd)**|\n", "2310.13850": "|**2023-10-20**|**Ecologically Valid Explanations for Label Variation in NLI**|Nan-Jiang Jiang et.al.|[2310.13850v1](http://arxiv.org/abs/2310.13850v1)|**[link](https://github.com/njjiang/livenli)**|\n", "2310.13571": "|**2023-10-30**|**Why Can Large Language Models Generate Correct Chain-of-Thoughts?**|Rasul Tutunov et.al.|[2310.13571v2](http://arxiv.org/abs/2310.13571v2)|null|\n", "2310.13549": "|**2023-10-20**|**The Perils & Promises of Fact-checking with Large Language Models**|Dorian Quelle et.al.|[2310.13549v1](http://arxiv.org/abs/2310.13549v1)|null|\n", "2310.13506": "|**2023-10-20**|**Explaining Interactions Between Text Spans**|Sagnik Ray Choudhury et.al.|[2310.13506v1](http://arxiv.org/abs/2310.13506v1)|**[link](https://github.com/copenlu/spanex)**|\n", "2310.12973": "|**2023-10-19**|**Frozen Transformers in Language Models Are Effective Visual Encoder Layers**|Ziqi Pang et.al.|[2310.12973v1](http://arxiv.org/abs/2310.12973v1)|**[link](https://github.com/ziqipang/lm4visualencoding)**|\n", "2310.12860": "|**2023-10-28**|**Probing LLMs for hate speech detection: strengths and vulnerabilities**|Sarthak Roy et.al.|[2310.12860v2](http://arxiv.org/abs/2310.12860v2)|null|\n", "2310.12558": "|**2023-10-19**|**Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong**|Chenglei Si et.al.|[2310.12558v1](http://arxiv.org/abs/2310.12558v1)|null|\n", "2310.11207": "|**2023-10-17**|**Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations**|Shiyuan Huang et.al.|[2310.11207v1](http://arxiv.org/abs/2310.11207v1)|null|\n", "2310.10418": "|**2023-11-11**|**Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms**|Seungju Han et.al.|[2310.10418v2](http://arxiv.org/abs/2310.10418v2)|**[link](https://github.com/wade3han/normlens)**|\n", "2310.09754": "|**2023-10-15**|**EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification**|Huanhuan Ma et.al.|[2310.09754v1](http://arxiv.org/abs/2310.09754v1)|**[link](https://github.com/dependentsign/EX-FEVER)**|\n", "2310.08797": "|**2023-10-13**|**A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models**|Takuma Udagawa et.al.|[2310.08797v1](http://arxiv.org/abs/2310.08797v1)|null|\n", "2310.08744": "|**2023-10-12**|**Circuit Component Reuse Across Tasks in Transformer Language Models**|Jack Merullo et.al.|[2310.08744v1](http://arxiv.org/abs/2310.08744v1)|null|\n", "2310.08123": "|**2023-10-12**|**Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification**|Chia-Yu Hung et.al.|[2310.08123v1](http://arxiv.org/abs/2310.08123v1)|null|\n", "2310.07984": "|**2023-10-12**|**Large Language Models for Scientific Synthesis, Inference and Explanation**|Yizhen Zheng et.al.|[2310.07984v1](http://arxiv.org/abs/2310.07984v1)|**[link](https://github.com/zyzisastudyreallyhardguy/llm4sd)**|\n", "2310.07820": "|**2023-10-11**|**Large Language Models Are Zero-Shot Time Series Forecasters**|Nate Gruver et.al.|[2310.07820v1](http://arxiv.org/abs/2310.07820v1)|**[link](https://github.com/ngruver/llmtime)**|\n", "2310.06680": "|**2023-10-10**|**Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach**|Zhenlan Ji et.al.|[2310.06680v1](http://arxiv.org/abs/2310.06680v1)|null|\n", "2310.06257": "|**2023-10-10**|**SCAR: Power Side-Channel Analysis at RTL-Level**|Amisha Srivastava et.al.|[2310.06257v1](http://arxiv.org/abs/2310.06257v1)|null|\n", "2310.06200": "|**2023-10-11**|**The Importance of Prompt Tuning for Automated Neuron Explanations**|Justin Lee et.al.|[2310.06200v2](http://arxiv.org/abs/2310.06200v2)|null|\n", "2310.05884": "|**2023-10-09**|**A Meta-Learning Perspective on Transformers for Causal Language Modeling**|Xinbo Wu et.al.|[2310.05884v1](http://arxiv.org/abs/2310.05884v1)|null|\n", "2310.05797": "|**2023-10-10**|**Are Large Language Models Post Hoc Explainers?**|Nicholas Kroeger et.al.|[2310.05797v2](http://arxiv.org/abs/2310.05797v2)|**[link](https://github.com/AI4LIFE-GROUP/LLM_Explainer)**|\n", "2310.05657": "|**2023-10-09**|**A Closer Look into Automatic Evaluation Using Large Language Models**|Cheng-Han Chiang et.al.|[2310.05657v1](http://arxiv.org/abs/2310.05657v1)|**[link](https://github.com/d223302/a-closer-look-to-llm-evaluation)**|\n", "2310.05452": "|**2023-10-09**|**Explaining the Complex Task Reasoning of Large Language Models with Template-Content Structure**|Haotong Yang et.al.|[2310.05452v1](http://arxiv.org/abs/2310.05452v1)|null|\n", "2310.05253": "|**2023-10-20**|**Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models**|Haoran Wang et.al.|[2310.05253v2](http://arxiv.org/abs/2310.05253v2)|**[link](https://github.com/wang2226/folk)**|\n", "2310.05209": "|**2023-10-08**|**Scaling Laws of RoPE-based Extrapolation**|Xiaoran Liu et.al.|[2310.05209v1](http://arxiv.org/abs/2310.05209v1)|null|\n", "2310.05046": "|**2023-10-08**|**Harnessing the Power of ChatGPT in Fake News: An In-Depth Exploration in Generation, Detection and Explanation**|Yue Huang et.al.|[2310.05046v1](http://arxiv.org/abs/2310.05046v1)|null|\n", "2310.05029": "|**2023-10-08**|**Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading**|Howard Chen et.al.|[2310.05029v1](http://arxiv.org/abs/2310.05029v1)|null|\n", "2310.04949": "|**2023-10-08**|**Domain Knowledge Graph Construction Via A Simple Checker**|Yueling Zeng et.al.|[2310.04949v1](http://arxiv.org/abs/2310.04949v1)|null|\n", "2310.04793": "|**2023-11-11**|**FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets**|Neng Wang et.al.|[2310.04793v2](http://arxiv.org/abs/2310.04793v2)|**[link](https://github.com/ai4finance-foundation/fingpt)**|\n", "2310.02439": "|**2023-10-03**|**Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions**|Naiming Liu et.al.|[2310.02439v1](http://arxiv.org/abs/2310.02439v1)|null|\n", "2310.01957": "|**2023-10-13**|**Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving**|Long Chen et.al.|[2310.01957v2](http://arxiv.org/abs/2310.01957v2)|**[link](https://github.com/wayveai/driving-with-llms)**|\n", "2310.01870": "|**2023-11-28**|**DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models**|Albert Garde et.al.|[2310.01870v2](http://arxiv.org/abs/2310.01870v2)|**[link](https://github.com/apartresearch/deepdecipher)**|\n", "2310.01132": "|**2023-10-02**|**Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback**|Jacob Whitehill et.al.|[2310.01132v1](http://arxiv.org/abs/2310.01132v1)|null|\n", "2310.01074": "|**2023-10-08**|**Back to the Future: Towards Explainable Temporal Reasoning with Large Language Models**|Chenhan Yuan et.al.|[2310.01074v2](http://arxiv.org/abs/2310.01074v2)|**[link](https://github.com/chenhan97/timellama)**|\n", "2310.00647": "|**2023-10-01**|**Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning**|Mustafa Shukor et.al.|[2310.00647v1](http://arxiv.org/abs/2310.00647v1)|**[link](https://github.com/mshukor/EvALign-ICL)**|\n", "2310.00603": "|**2023-11-22**|**Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals**|Yair Gat et.al.|[2310.00603v2](http://arxiv.org/abs/2310.00603v2)|null|\n", "2310.01441": "|**2023-12-07**|**UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities**|Hejia Geng et.al.|[2310.01441v2](http://arxiv.org/abs/2310.01441v2)|null|\n", "2309.17057": "|**2023-09-29**|**Tell Me a Story! Narrative-Driven XAI with Large Language Models**|David Martens et.al.|[2309.17057v1](http://arxiv.org/abs/2309.17057v1)|**[link](https://github.com/admantwerp/xaistories)**|\n", "2309.16146": "|**2023-09-28**|**T-COL: Generating Counterfactual Explanations for General User Preferences on Variable Machine Learning Systems**|Ming Wang et.al.|[2309.16146v1](http://arxiv.org/abs/2309.16146v1)|**[link](https://github.com/neu-datamining/t-col)**|\n", "2309.16090": "|**2023-09-28**|**TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration**|Hongru Wang et.al.|[2309.16090v1](http://arxiv.org/abs/2309.16090v1)|null|\n", "2309.16021": "|**2023-09-27**|**HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs)**|Tarek Ali et.al.|[2309.16021v1](http://arxiv.org/abs/2309.16021v1)|null|\n", "2309.15729": "|**2023-09-27**|**MindGPT: Interpreting What You See with Non-invasive Brain Recordings**|Jiaxuan Chen et.al.|[2309.15729v1](http://arxiv.org/abs/2309.15729v1)|**[link](https://github.com/jxuanc/mindgpt)**|\n", "2311.01463": "|**2023-09-26**|**Creating Trustworthy LLMs: Dealing with Hallucinations in Healthcare AI**|Muhammad Aurangzeb Ahmad et.al.|[2311.01463v1](http://arxiv.org/abs/2311.01463v1)|null|\n", "2309.13340": "|**2023-09-23**|**LLMs as Counterfactual Explanation Modules: Can ChatGPT Explain Black-box Text Classifiers?**|Amrita Bhattacharjee et.al.|[2309.13340v1](http://arxiv.org/abs/2309.13340v1)|null|\n", "2309.11805": "|**2023-09-21**|**JobRecoGPT -- Explainable job recommendations using LLMs**|Preetam Ghosh et.al.|[2309.11805v1](http://arxiv.org/abs/2309.11805v1)|null|\n", "2309.11439": "|**2023-09-20**|**Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction**|Masahiro Kaneko et.al.|[2309.11439v1](http://arxiv.org/abs/2309.11439v1)|**[link](https://github.com/kanekomasahiro/gec-explanation)**|\n", "2312.01279": "|**2023-12-03**|**TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents**|James Enouen et.al.|[2312.01279v1](http://arxiv.org/abs/2312.01279v1)|null|\n", "2312.00819": "|**2023-11-30**|**Large Language Models for Travel Behavior Prediction**|Baichuan Mo et.al.|[2312.00819v1](http://arxiv.org/abs/2312.00819v1)|null|\n", "2312.03567": "|**2023-12-06**|**XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering**|Joel Stremmel et.al.|[2312.03567v1](http://arxiv.org/abs/2312.03567v1)|null|\n", "2312.03748": "|**2023-11-30**|**Applying Large Language Models and Chain-of-Thought for Automatic Scoring**|Gyeong-Geon Lee et.al.|[2312.03748v1](http://arxiv.org/abs/2312.03748v1)|null|\n", "2312.05834": "|**2023-12-10**|**Evidence-based Interpretable Open-domain Fact-checking with Large Language Models**|Xin Tan et.al.|[2312.05834v1](http://arxiv.org/abs/2312.05834v1)|null|\n", "2312.06798": "|**2023-12-05**|**Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability, Explainability, and Safety**|Manas Gaur et.al.|[2312.06798v1](http://arxiv.org/abs/2312.06798v1)|null|\n", "2312.08078": "|**2023-12-27**|**Fine-Grained Image-Text Alignment in Medical Imaging Enables Cyclic Image-Report Generation**|Wenting Chen et.al.|[2312.08078v4](http://arxiv.org/abs/2312.08078v4)|null|\n", "2312.08027": "|**2023-12-13**|**Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning**|Jinta Weng et.al.|[2312.08027v1](http://arxiv.org/abs/2312.08027v1)|null|\n", "2312.07779": "|**2023-12-12**|**Tell, don't show: Declarative facts influence how LLMs generalize**|Alexander Meinke et.al.|[2312.07779v1](http://arxiv.org/abs/2312.07779v1)|null|\n", "2312.10702": "|**2023-12-17**|**Can persistent homology whiten Transformer-based black-box models? A case study on BERT compression**|Luis Balderas et.al.|[2312.10702v1](http://arxiv.org/abs/2312.10702v1)|null|\n", "2312.10321": "|**2024-01-17**|**LLM-SQL-Solver: Can LLMs Determine SQL Equivalence?**|Fuheng Zhao et.al.|[2312.10321v2](http://arxiv.org/abs/2312.10321v2)|null|\n", "2312.10225": "|**2023-12-15**|**GPT-doctor: Customizing Large Language Models for Medical Consultation**|Wen Wang et.al.|[2312.10225v1](http://arxiv.org/abs/2312.10225v1)|null|\n", "2312.09947": "|**2023-12-15**|**Prompting Datasets: Data Discovery with Conversational Agents**|Johanna Walker et.al.|[2312.09947v1](http://arxiv.org/abs/2312.09947v1)|null|\n", "2312.09818": "|**2023-12-15**|**SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models**|Lee Hyun et.al.|[2312.09818v1](http://arxiv.org/abs/2312.09818v1)|**[link](https://github.com/smile-data/smile)**|\n", "2312.09230": "|**2023-12-14**|**Successor Heads: Recurring, Interpretable Attention Heads In The Wild**|Rhys Gould et.al.|[2312.09230v1](http://arxiv.org/abs/2312.09230v1)|null|\n", "2312.10059": "|**2023-12-04**|**A collection of principles for guiding and evaluating large language models**|Konstantin Hebenstreit et.al.|[2312.10059v1](http://arxiv.org/abs/2312.10059v1)|null|\n", "2312.11111": "|**2023-12-19**|**The Good, The Bad, and Why: Unveiling Emotions in Generative AI**|Cheng Li et.al.|[2312.11111v2](http://arxiv.org/abs/2312.11111v2)|null|\n", "2312.11548": "|**2023-12-16**|**Learning Interpretable Queries for Explainable Image Classification with Information Pursuit**|Stefan Kolek et.al.|[2312.11548v1](http://arxiv.org/abs/2312.11548v1)|null|\n", "2312.14867": "|**2023-12-22**|**VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation**|Max Ku et.al.|[2312.14867v1](http://arxiv.org/abs/2312.14867v1)|null|\n", "2312.14226": "|**2023-12-21**|**Deep de Finetti: Recovering Topic Distributions from Large Language Models**|Liyi Zhang et.al.|[2312.14226v1](http://arxiv.org/abs/2312.14226v1)|null|\n", "2312.15661": "|**2024-01-03**|**Unlocking the Potential of Large Language Models for Explainable Recommendations**|Yucong Luo et.al.|[2312.15661v3](http://arxiv.org/abs/2312.15661v3)|**[link](https://github.com/godfire66666/llm_rec_explanation)**|\n", "2312.14953": "|**2023-12-11**|**Transportation Transformed: A Comprehensive Review of Dynamic Rerouting in Multimodal Networks**|Suyash Pratap et.al.|[2312.14953v1](http://arxiv.org/abs/2312.14953v1)|null|\n", "2312.16211": "|**2023-12-23**|**An Explainable AI Approach to Large Language Model Assisted Causal Model Auditing and Development**|Yanming Zhang et.al.|[2312.16211v1](http://arxiv.org/abs/2312.16211v1)|null|\n", "2312.17543": "|**2023-12-29**|**Building Efficient Universal Classifiers with Natural Language Inference**|Moritz Laurer et.al.|[2312.17543v1](http://arxiv.org/abs/2312.17543v1)|**[link](https://github.com/moritzlaurer/zeroshot-classifier)**|\n", "2310.16379": "|**2023-12-29**|**Evaluating General-Purpose AI with Psychometrics**|Xiting Wang et.al.|[2310.16379v2](http://arxiv.org/abs/2310.16379v2)|null|\n", "2401.01414": "|**2024-01-02**|**VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics**|Ammar A. Siddiqui et.al.|[2401.01414v1](http://arxiv.org/abs/2401.01414v1)|null|\n", "2401.00210": "|**2023-12-30**|**The Problem of Alignment**|Tsvetelina Hristova et.al.|[2401.00210v1](http://arxiv.org/abs/2401.00210v1)|null|\n", "2401.02789": "|**2024-01-05**|**Large Language Models in Plant Biology**|Hilbert Yuen In Lam et.al.|[2401.02789v1](http://arxiv.org/abs/2401.02789v1)|null|\n", "2401.03701": "|**2024-01-08**|**ExTraCT -- Explainable Trajectory Corrections from language inputs using Textual description of features**|J-Anne Yow et.al.|[2401.03701v1](http://arxiv.org/abs/2401.03701v1)|null|\n", "2401.03229": "|**2024-01-06**|**Autonomous Crowdsensing: Operating and Organizing Crowdsensing for Sensing Automation**|Wansen Wu et.al.|[2401.03229v1](http://arxiv.org/abs/2401.03229v1)|null|\n", "2401.02985": "|**2024-01-02**|**Evaluating Large Language Models on the GMAT: Implications for the Future of Business Education**|Vahid Ashrafimoghari et.al.|[2401.02985v1](http://arxiv.org/abs/2401.02985v1)|null|\n", "2401.04997": "|**2024-01-10**|**Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis**|Lanling Xu et.al.|[2401.04997v1](http://arxiv.org/abs/2401.04997v1)|null|\n", "2401.06102": "|**2024-01-12**|**Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models**|Asma Ghandeharioun et.al.|[2401.06102v2](http://arxiv.org/abs/2401.06102v2)|null|\n", "2401.05702": "|**2024-01-11**|**Video Anomaly Detection and Explanation via Large Language Models**|Hui Lv et.al.|[2401.05702v1](http://arxiv.org/abs/2401.05702v1)|null|\n", "2401.05604": "|**2024-01-11**|**REBUS: A Robust Evaluation Benchmark of Understanding Symbols**|Andrew Gritsevskiy et.al.|[2401.05604v1](http://arxiv.org/abs/2401.05604v1)|**[link](https://github.com/cvndsh/rebus)**|\n", "2401.05443": "|**2024-01-08**|**LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems**|Mohamad Fakih et.al.|[2401.05443v1](http://arxiv.org/abs/2401.05443v1)|**[link](https://github.com/AICPS/LLM_4_PLC)**|\n", "2401.06580": "|**2024-01-12**|**TestSpark: IntelliJ IDEA's Ultimate Test Generation Companion**|Arkadii Sapozhnikov et.al.|[2401.06580v1](http://arxiv.org/abs/2401.06580v1)|**[link](https://github.com/jetbrains-research/testspark)**|\n", "2401.08217": "|**2024-01-16**|**LLM-Guided Multi-View Hypergraph Learning for Human-Centric Explainable Recommendation**|Zhixuan Chu et.al.|[2401.08217v1](http://arxiv.org/abs/2401.08217v1)|null|\n", "2401.07927": "|**2024-02-15**|**Are self-explanations from Large Language Models faithful?**|Andreas Madsen et.al.|[2401.07927v3](http://arxiv.org/abs/2401.07927v3)|**[link](https://github.com/AndreasMadsen/llm-introspection)**|\n", "2401.07777": "|**2024-01-15**|**Quantum Transfer Learning for Acceptability Judgements**|Giuseppe Buonaiuto et.al.|[2401.07777v1](http://arxiv.org/abs/2401.07777v1)|null|\n", "2401.07310": "|**2024-01-14**|**Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study**|Ahmadul Karim Chowdhury et.al.|[2401.07310v1](http://arxiv.org/abs/2401.07310v1)|null|\n", "2401.09414": "|**2024-01-17**|**Vlogger: Make Your Dream A Vlog**|Shaobin Zhuang et.al.|[2401.09414v1](http://arxiv.org/abs/2401.09414v1)|**[link](https://github.com/zhuangshaobin/vlogger)**|\n", "2401.08517": "|**2024-01-24**|**Supporting Student Decisions on Learning Recommendations: An LLM-Based Chatbot with Knowledge Graph Contextualization for Conversational Explainability and Mentoring**|Hasan Abu-Rasheed et.al.|[2401.08517v3](http://arxiv.org/abs/2401.08517v3)|null|\n", "2401.11467": "|**2024-01-21**|**Over-Reasoning and Redundant Calculation of Large Language Models**|Cheng-Han Chiang et.al.|[2401.11467v1](http://arxiv.org/abs/2401.11467v1)|**[link](https://github.com/d223302/over-reasoning-of-llms)**|\n", "2401.11323": "|**2024-01-20**|**Analyzing Task-Encoding Tokens in Large Language Models**|Yu Bai et.al.|[2401.11323v1](http://arxiv.org/abs/2401.11323v1)|null|\n", "2401.12874": "|**2024-02-22**|**From Understanding to Utilization: A Survey on Explainability for Large Language Models**|Haoyan Luo et.al.|[2401.12874v2](http://arxiv.org/abs/2401.12874v2)|null|\n", "2401.12846": "|**2024-01-23**|**How well can large language models explain business processes?**|Dirk Fahland et.al.|[2401.12846v1](http://arxiv.org/abs/2401.12846v1)|null|\n", "2401.12713": "|**2024-02-23**|**Generating Zero-shot Abstractive Explanations for Rumour Verification**|Iman Munire Bilal et.al.|[2401.12713v3](http://arxiv.org/abs/2401.12713v3)|**[link](https://github.com/bilaliman/rv_explainability)**|\n", "2401.12576": "|**2024-01-23**|**LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools**|Qianli Wang et.al.|[2401.12576v1](http://arxiv.org/abs/2401.12576v1)|**[link](https://github.com/dfki-nlp/llmcheckup)**|\n", "2401.13641": "|**2024-02-27**|**How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability**|Ivan DeAndres-Tame et.al.|[2401.13641v2](http://arxiv.org/abs/2401.13641v2)|**[link](https://github.com/bidalab/chatgpt_facebiometrics)**|\n", "2401.13298": "|**2024-01-24**|**Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models**|Hongzhan Lin et.al.|[2401.13298v1](http://arxiv.org/abs/2401.13298v1)|**[link](https://github.com/hkbunlp/explainhm-www2024)**|\n", "2401.13110": "|**2024-01-23**|**XAI for All: Can Large Language Models Simplify Explainable AI?**|Philip Mavrepis et.al.|[2401.13110v1](http://arxiv.org/abs/2401.13110v1)|null|\n", "2401.16646": "|**2024-01-30**|**Incoherent Probability Judgments in Large Language Models**|Jian-Qiao Zhu et.al.|[2401.16646v1](http://arxiv.org/abs/2401.16646v1)|null|\n", "2401.17505": "|**2024-03-10**|**Arrows of Time for Large Language Models**|Vassilis Papadopoulos et.al.|[2401.17505v2](http://arxiv.org/abs/2401.17505v2)|null|\n", "2401.17477": "|**2024-01-30**|**Detecting mental disorder on social media: a ChatGPT-augmented explainable approach**|Loris Belcastro et.al.|[2401.17477v1](http://arxiv.org/abs/2401.17477v1)|**[link](https://github.com/scalabunical/bert-xdd)**|\n", "2401.17345": "|**2024-02-10**|**Reproducibility, energy efficiency and performance of pseudorandom number generators in machine learning: a comparative study of python, numpy, tensorflow, and pytorch implementations**|Benjamin Antunes et.al.|[2401.17345v2](http://arxiv.org/abs/2401.17345v2)|null|\n", "2402.00854": "|**2024-02-05**|**SymbolicAI: A framework for logic-based approaches combining generative models and solvers**|Marius-Constantin Dinu et.al.|[2402.00854v2](http://arxiv.org/abs/2402.00854v2)|**[link](https://github.com/extensityai/benchmark)**|\n", "2402.00745": "|**2024-02-01**|**Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement**|Xin Quan et.al.|[2402.00745v1](http://arxiv.org/abs/2402.00745v1)|**[link](https://github.com/neuro-symbolic-ai/explanation_based_ethical_reasoning)**|\n", "2402.00345": "|**2024-02-01**|**IndiVec: An Exploration of Leveraging Large Language Models for Media Bias Detection with Fine-Grained Bias Indicators**|Luyang Lin et.al.|[2402.00345v1](http://arxiv.org/abs/2402.00345v1)|null|\n", "2402.00262": "|**2024-02-01**|**Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective**|Qun Ma et.al.|[2402.00262v1](http://arxiv.org/abs/2402.00262v1)|null|\n", "2402.00137": "|**2024-01-31**|**Multimodal Neurodegenerative Disease Subtyping Explained by ChatGPT**|Diego Machado Reyes et.al.|[2402.00137v1](http://arxiv.org/abs/2402.00137v1)|null|\n", "2402.03175": "|**2024-02-05**|**The Matrix: A Bayesian learning model for LLMs**|Siddhartha Dalal et.al.|[2402.03175v1](http://arxiv.org/abs/2402.03175v1)|null|\n", "2402.03142": "|**2024-02-05**|**Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language Models**|Michele Mastromattei et.al.|[2402.03142v1](http://arxiv.org/abs/2402.03142v1)|**[link](https://github.com/itsmattei/ken)**|\n", "2402.02872": "|**2024-02-05**|**How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning**|Zeping Yu et.al.|[2402.02872v1](http://arxiv.org/abs/2402.02872v1)|null|\n", "2402.02314": "|**2024-02-04**|**Selecting Large Language Model to Fine-tune via Rectified Scaling Law**|Haowei Lin et.al.|[2402.02314v1](http://arxiv.org/abs/2402.02314v1)|null|\n", "2402.02255": "|**2024-02-03**|**Frequency Explains the Inverse Correlation of Large Language Models' Size, Training Data Amount, and Surprisal's Fit to Reading Times**|Byung-Doh Oh et.al.|[2402.02255v1](http://arxiv.org/abs/2402.02255v1)|**[link](https://github.com/byungdoh/llm_surprisal)**|\n", "2402.01881": "|**2024-02-06**|**Large Language Model Agent for Hyper-Parameter Optimization**|Siyi Liu et.al.|[2402.01881v2](http://arxiv.org/abs/2402.01881v2)|null|\n", "2402.01874": "|**2024-02-02**|**The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models**|Moschoula Pternea et.al.|[2402.01874v1](http://arxiv.org/abs/2402.01874v1)|null|\n", "2402.01821": "|**2024-02-02**|**Ecologically rational meta-learned inference explains human category learning**|Akshay K. Jagadish et.al.|[2402.01821v1](http://arxiv.org/abs/2402.01821v1)|null|\n", "2402.01781": "|**2024-02-01**|**When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards**|Norah Alzahrani et.al.|[2402.01781v1](http://arxiv.org/abs/2402.01781v1)|null|\n", "2402.01761": "|**2024-01-30**|**Rethinking Interpretability in the Era of Large Language Models**|Chandan Singh et.al.|[2402.01761v1](http://arxiv.org/abs/2402.01761v1)|**[link](https://github.com/csinva/imodelsX)**|\n", "2402.01729": "|**2024-02-24**|**Contextualization Distillation from Large Language Model for Knowledge Graph Completion**|Dawei Li et.al.|[2402.01729v3](http://arxiv.org/abs/2402.01729v3)|null|\n", "2402.01719": "|**2024-03-01**|**Measuring Moral Inconsistencies in Large Language Models**|Vamshi Krishna Bonagiri et.al.|[2402.01719v3](http://arxiv.org/abs/2402.01719v3)|null|\n", "2402.01681": "|**2024-02-16**|**Emojis Decoded: Leveraging ChatGPT for Enhanced Understanding in Social Media Communications**|Yuhang Zhou et.al.|[2402.01681v2](http://arxiv.org/abs/2402.01681v2)|null|\n", "2402.04206": "|**2024-02-06**|**Explaining Autonomy: Enhancing Human-Robot Interaction through Explanation Generation with Large Language Models**|David Sobr\u00edn-Hidalgo et.al.|[2402.04206v1](http://arxiv.org/abs/2402.04206v1)|null|\n", "2402.03659": "|**2024-02-29**|**Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models**|Kelvin J. L. Koa et.al.|[2402.03659v3](http://arxiv.org/abs/2402.03659v3)|**[link](https://github.com/koa-fin/sep)**|\n", "2402.03366": "|**2024-01-31**|**Uncertainty-Aware Explainable Recommendation with Large Language Models**|Yicui Peng et.al.|[2402.03366v1](http://arxiv.org/abs/2402.03366v1)|null|\n", "2402.04678": "|**2024-02-07**|**Large Language Models As Faithful Explainers**|Yu-Neng Chuang et.al.|[2402.04678v1](http://arxiv.org/abs/2402.04678v1)|null|\n", "2402.04614": "|**2024-03-14**|**Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models**|Chirag Agarwal et.al.|[2402.04614v3](http://arxiv.org/abs/2402.04614v3)|null|\n", "2402.05133": "|**2024-02-06**|**Personalized Language Modeling from Personalized Human Feedback**|Xinyu Li et.al.|[2402.05133v1](http://arxiv.org/abs/2402.05133v1)|null|\n", "2402.05127": "|**2024-02-05**|**Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering**|Aryan Agrawal et.al.|[2402.05127v1](http://arxiv.org/abs/2402.05127v1)|null|\n", "2402.06557": "|**2024-02-09**|**The Quantified Boolean Bayesian Network: Theory and Experiments with a Logical Graphical Model**|Gregory Coppola et.al.|[2402.06557v1](http://arxiv.org/abs/2402.06557v1)|**[link](https://github.com/gregorycoppola/bayes-star)**|\n", "2402.07776": "|**2024-02-12**|**TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection**|Hui Liu et.al.|[2402.07776v1](http://arxiv.org/abs/2402.07776v1)|**[link](https://github.com/less-and-less-bugs/trust_teller)**|\n", "2402.07401": "|**2024-02-12**|**Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate**|Kyungha Kim et.al.|[2402.07401v1](http://arxiv.org/abs/2402.07401v1)|null|\n", "2402.07233": "|**2024-02-11**|**TransGPT: Multi-modal Generative Pre-trained Transformer for Transportation**|Peng Wang et.al.|[2402.07233v1](http://arxiv.org/abs/2402.07233v1)|null|\n", "2402.07148": "|**2024-02-11**|**X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Design**|Eric L. Buehler et.al.|[2402.07148v1](http://arxiv.org/abs/2402.07148v1)|**[link](https://github.com/ericlbuehler/xlora)**|\n", "2402.06695": "|**2024-02-08**|**Integrating LLMs for Explainable Fault Diagnosis in Complex Systems**|Akshay J. Dave et.al.|[2402.06695v1](http://arxiv.org/abs/2402.06695v1)|null|\n", "2311.16466": "|**2024-02-12**|**Large language models can enhance persuasion through linguistic feature alignment**|Minkyu Shin et.al.|[2311.16466v2](http://arxiv.org/abs/2311.16466v2)|null|\n", "2402.08030": "|**2024-02-12**|**Why and When LLM-Based Assistants Can Go Wrong: Investigating the Effectiveness of Prompt-Based Interactions for Software Help-Seeking**|Anjali Khurana et.al.|[2402.08030v1](http://arxiv.org/abs/2402.08030v1)|null|\n", "2402.07920": "|**2024-02-02**|**Exploring patient trust in clinical advice from AI-driven LLMs like ChatGPT for self-diagnosis**|Delong Du et.al.|[2402.07920v1](http://arxiv.org/abs/2402.07920v1)|null|\n", "2402.07910": "|**2024-01-29**|**Experimental Interface for Multimodal and Large Language Model Based Explanations of Educational Recommender Systems**|Hasan Abu-Rasheed et.al.|[2402.07910v1](http://arxiv.org/abs/2402.07910v1)|null|\n", "2402.09259": "|**2024-02-14**|**SyntaxShap: Syntax-aware Explainability Method for Text Generation**|Kenza Amara et.al.|[2402.09259v1](http://arxiv.org/abs/2402.09259v1)|null|\n", "2402.09967": "|**2024-02-15**|**Case Study: Testing Model Capabilities in Some Reasoning Tasks**|Min Zhang et.al.|[2402.09967v1](http://arxiv.org/abs/2402.09967v1)|null|\n", "2402.09733": "|**2024-02-15**|**Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States**|Hanyu Duan et.al.|[2402.09733v1](http://arxiv.org/abs/2402.09733v1)|null|\n", "2402.09584": "|**2024-02-14**|**Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems**|Liang Zhang et.al.|[2402.09584v1](http://arxiv.org/abs/2402.09584v1)|null|\n", "2402.10835": "|**2024-02-19**|**Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities**|Mingyu Jin et.al.|[2402.10835v2](http://arxiv.org/abs/2402.10835v2)|null|\n", "2402.10828": "|**2024-02-16**|**RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model**|Jianhao Yuan et.al.|[2402.10828v1](http://arxiv.org/abs/2402.10828v1)|null|\n", "2402.10811": "|**2024-02-16**|**Quantifying the Persona Effect in LLM Simulations**|Tiancheng Hu et.al.|[2402.10811v1](http://arxiv.org/abs/2402.10811v1)|null|\n", "2402.10532": "|**2024-02-16**|**Properties and Challenges of LLM-Generated Explanations**|Jenny Kunz et.al.|[2402.10532v1](http://arxiv.org/abs/2402.10532v1)|null|\n", "2402.10350": "|**2024-02-15**|**Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review**|Jing Su et.al.|[2402.10350v1](http://arxiv.org/abs/2402.10350v1)|null|\n", "2402.12276": "|**2024-02-19**|**Explain then Rank: Scale Calibration of Neural Rankers Using Natural Language Explanations from Large Language Models**|Puxuan Yu et.al.|[2402.12276v1](http://arxiv.org/abs/2402.12276v1)|**[link](https://github.com/pxyu/llm-nle-for-calibration)**|\n", "2402.11681": "|**2024-02-18**|**Opening the black box of language acquisition**|J\u00e9r\u00f4me Michaud et.al.|[2402.11681v1](http://arxiv.org/abs/2402.11681v1)|**[link](https://github.com/michaudj/languagelearner)**|\n", "2402.11621": "|**2024-02-23**|**Decoding News Narratives: A Critical Analysis of Large Language Models in Framing Bias Detection**|Valeria Pastorino et.al.|[2402.11621v2](http://arxiv.org/abs/2402.11621v2)|null|\n", "2402.11518": "|**2024-02-18**|**Large Language Model-driven Meta-structure Discovery in Heterogeneous Information Network**|Lin Chen et.al.|[2402.11518v1](http://arxiv.org/abs/2402.11518v1)|null|\n", "2402.11420": "|**2024-02-18**|**Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction**|Yinghui Li et.al.|[2402.11420v1](http://arxiv.org/abs/2402.11420v1)|null|\n", "2402.11296": "|**2024-02-17**|**Dissecting Human and LLM Preferences**|Junlong Li et.al.|[2402.11296v1](http://arxiv.org/abs/2402.11296v1)|**[link](https://github.com/gair-nlp/preference-dissection)**|\n", "2402.11166": "|**2024-02-17**|**GenDec: A robust generative Question-decomposition method for Multi-hop reasoning**|Jian Wu et.al.|[2402.11166v1](http://arxiv.org/abs/2402.11166v1)|null|\n", "2402.11122": "|**2024-02-16**|**Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models**|Zihao Lin et.al.|[2402.11122v1](http://arxiv.org/abs/2402.11122v1)|null|\n", "2402.11005": "|**2024-02-21**|**Exploring Value Biases: How LLMs Deviate Towards the Ideal**|Sarath Sivaprasad et.al.|[2402.11005v2](http://arxiv.org/abs/2402.11005v2)|null|\n", "2402.10948": "|**2024-03-15**|**Zero-shot Explainable Mental Health Analysis on Social Media by Incorporating Mental Scales**|Wenyu Li et.al.|[2402.10948v2](http://arxiv.org/abs/2402.10948v2)|null|\n", "2402.12483": "|**2024-02-19**|**Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?**|Nishant Balepur et.al.|[2402.12483v1](http://arxiv.org/abs/2402.12483v1)|**[link](https://github.com/nbalepur/mcqa-artifacts)**|\n", "2402.13871": "|**2024-02-21**|**An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach**|Mohammad Amaz Uddin et.al.|[2402.13871v1](http://arxiv.org/abs/2402.13871v1)|null|\n", "2402.13758": "|**2024-02-21**|**Factual Consistency Evaluation of Summarisation in the Era of Large Language Models**|Zheheng Luo et.al.|[2402.13758v1](http://arxiv.org/abs/2402.13758v1)|null|\n", "2402.13709": "|**2024-03-08**|**SaGE: Evaluating Moral Consistency in Large Language Models**|Vamshi Krishna Bonagiri et.al.|[2402.13709v2](http://arxiv.org/abs/2402.13709v2)|**[link](https://github.com/vnnm404/SaGE)**|\n", "2402.14359": "|**2024-02-22**|**Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark**|Xiuying Chen et.al.|[2402.14359v1](http://arxiv.org/abs/2402.14359v1)|null|\n", "2402.14182": "|**2024-02-22**|**Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization**|Jiliang Li et.al.|[2402.14182v1](http://arxiv.org/abs/2402.14182v1)|null|\n", "2402.09664": "|**2024-02-21**|**CodeMind: A Framework to Challenge Large Language Models for Code Reasoning**|Changshu Liu et.al.|[2402.09664v3](http://arxiv.org/abs/2402.09664v3)|**[link](https://github.com/intelligent-cat-lab/codemind)**|\n", "2402.15175": "|**2024-02-26**|**Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition**|Yufei Huang et.al.|[2402.15175v2](http://arxiv.org/abs/2402.15175v2)|null|\n", "2402.16459": "|**2024-02-28**|**Defending LLMs against Jailbreaking Attacks via Backtranslation**|Yihan Wang et.al.|[2402.16459v2](http://arxiv.org/abs/2402.16459v2)|**[link](https://github.com/yihanwang617/llm-jailbreaking-defense-backtranslation)**|\n", "2402.16444": "|**2024-02-26**|**ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors**|Zhexin Zhang et.al.|[2402.16444v1](http://arxiv.org/abs/2402.16444v1)|**[link](https://github.com/thu-coai/shieldlm)**|\n", "2402.16438": "|**2024-02-26**|**Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models**|Tianyi Tang et.al.|[2402.16438v1](http://arxiv.org/abs/2402.16438v1)|null|\n", "2402.16315": "|**2024-03-11**|**Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models**|Jeonghwan Kim et.al.|[2402.16315v2](http://arxiv.org/abs/2402.16315v2)|null|\n", "2402.15754": "|**2024-02-24**|**HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition**|Yuxuan Liu et.al.|[2402.15754v1](http://arxiv.org/abs/2402.15754v1)|null|\n", "2402.15751": "|**2024-02-24**|**Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning**|Yong Liu et.al.|[2402.15751v1](http://arxiv.org/abs/2402.15751v1)|null|\n", "2402.15727": "|**2024-03-04**|**LLMs Can Defend Themselves Against Jailbreaking in a Practical Manner: A Vision Paper**|Daoyuan Wu et.al.|[2402.15727v2](http://arxiv.org/abs/2402.15727v2)|null|\n", "2402.17019": "|**2024-02-26**|**Leveraging Large Language Models for Learning Complex Legal Concepts through Storytelling**|Hang Jiang et.al.|[2402.17019v1](http://arxiv.org/abs/2402.17019v1)|**[link](https://github.com/hjian42/legalstories)**|\n", "2402.18139": "|**2024-04-15**|**Cause and Effect: Can Large Language Models Truly Understand Causality?**|Swagata Ashwani et.al.|[2402.18139v2](http://arxiv.org/abs/2402.18139v2)|null|\n", "2402.18060": "|**2024-03-13**|**Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions**|Hanjie Chen et.al.|[2402.18060v3](http://arxiv.org/abs/2402.18060v3)|**[link](https://github.com/hanjiechen/challengeclinicalqa)**|\n", "2402.17897": "|**2024-03-04**|**A Language Model based Framework for New Concept Placement in Ontologies**|Hang Dong et.al.|[2402.17897v2](http://arxiv.org/abs/2402.17897v2)|**[link](https://github.com/krr-oxford/lm-ontology-concept-placement)**|\n", "2402.18819": "|**2024-02-29**|**Dual Operating Modes of In-Context Learning**|Ziqian Lin et.al.|[2402.18819v1](http://arxiv.org/abs/2402.18819v1)|**[link](https://github.com/uw-madison-lee-lab/dual_operating_modes_of_icl)**|\n", "2403.01304": "|**2024-03-02**|**Improving the Validity of Automatically Generated Feedback via Reinforcement Learning**|Alexander Scarlatos et.al.|[2403.01304v1](http://arxiv.org/abs/2403.01304v1)|**[link](https://github.com/umass-ml4ed/feedback-gen-dpo)**|\n", "2403.01165": "|**2024-03-02**|**STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models**|Linhai Zhang et.al.|[2403.01165v1](http://arxiv.org/abs/2403.01165v1)|**[link](https://github.com/callanwu/star)**|\n", "2403.00126": "|**2024-02-29**|**FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition**|Xiaoqiang Wang et.al.|[2403.00126v1](http://arxiv.org/abs/2403.00126v1)|null|\n", "2403.00811": "|**2024-02-25**|**Cognitive Bias in High-Stakes Decision-Making with LLMs**|Jessica Echterhoff et.al.|[2403.00811v1](http://arxiv.org/abs/2403.00811v1)|null|\n", "2403.00781": "|**2024-03-16**|**ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework**|Zhongqi Yang et.al.|[2403.00781v2](http://arxiv.org/abs/2403.00781v2)|null|\n", "2403.03170": "|**2024-03-05**|**SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection**|Peng Qi et.al.|[2403.03170v1](http://arxiv.org/abs/2403.03170v1)|null|\n", "2403.03028": "|**2024-03-05**|**Word Importance Explains How Prompts Affect Language Model Outputs**|Stefan Hackmann et.al.|[2403.03028v1](http://arxiv.org/abs/2403.03028v1)|null|\n", "2403.02647": "|**2024-03-05**|**FinReport: Explainable Stock Earnings Forecasting via News Factor Analyzing Model**|Xiangyu Li et.al.|[2403.02647v1](http://arxiv.org/abs/2403.02647v1)|**[link](https://github.com/frinkleko/finreport)**|\n", "2403.01981": "|**2024-03-04**|**Evaluating the Explainability of Neural Rankers**|Saran Pandian et.al.|[2403.01981v1](http://arxiv.org/abs/2403.01981v1)|null|\n", "2403.01599": "|**2024-03-03**|**SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos**|Yulei Niu et.al.|[2403.01599v1](http://arxiv.org/abs/2403.01599v1)|null|\n", "2403.01457": "|**2024-03-03**|**Logic Rules as Explanations for Legal Case Retrieval**|Zhongxiang Sun et.al.|[2403.01457v1](http://arxiv.org/abs/2403.01457v1)|**[link](https://github.com/ke-01/ns-lcr)**|\n", "2403.03627": "|**2024-04-26**|**Multimodal Large Language Models to Support Real-World Fact-Checking**|Jiahui Geng et.al.|[2403.03627v2](http://arxiv.org/abs/2403.03627v2)|null|\n", "2403.03585": "|**2024-03-06**|**RouteExplainer: An Explanation Framework for Vehicle Routing Problem**|Daisuke Kikuta et.al.|[2403.03585v1](http://arxiv.org/abs/2403.03585v1)|**[link](https://github.com/ntt-dkiku/route-explainer)**|\n", "2403.03397": "|**2024-03-06**|**Explaining Genetic Programming Trees using Large Language Models**|Paula Maddigan et.al.|[2403.03397v1](http://arxiv.org/abs/2403.03397v1)|null|\n", "2403.04132": "|**2024-03-07**|**Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference**|Wei-Lin Chiang et.al.|[2403.04132v1](http://arxiv.org/abs/2403.04132v1)|null|\n", "2403.05338": "|**2024-03-08**|**Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings**|Wei Zhou et.al.|[2403.05338v1](http://arxiv.org/abs/2403.05338v1)|null|\n", "2403.05063": "|**2024-03-08**|**Aligning Large Language Models for Controllable Recommendations**|Wensheng Lu et.al.|[2403.05063v1](http://arxiv.org/abs/2403.05063v1)|null|\n", "2403.06965": "|**2024-03-11**|**Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena**|Leonie Weissweiler et.al.|[2403.06965v1](http://arxiv.org/abs/2403.06965v1)|null|\n", "2403.06465": "|**2024-03-11**|**RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems**|Jianxun Lian et.al.|[2403.06465v1](http://arxiv.org/abs/2403.06465v1)|**[link](https://github.com/microsoft/recai)**|\n", "2403.06294": "|**2024-03-10**|**ArgMed-Agents: Explainable Clinical Decision Reasoning with Large Language Models via Argumentation Schemes**|Shengxin Hong et.al.|[2403.06294v1](http://arxiv.org/abs/2403.06294v1)|null|\n", "2403.06128": "|**2024-03-10**|**Low-dose CT Denoising with Language-engaged Dual-space Alignment**|Zhihao Chen et.al.|[2403.06128v1](http://arxiv.org/abs/2403.06128v1)|**[link](https://github.com/hao1635/leda)**|\n", "2403.06050": "|**2024-03-10**|**Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting Skills**|Paul Denny et.al.|[2403.06050v1](http://arxiv.org/abs/2403.06050v1)|null|\n", "2403.07627": "|**2024-03-12**|**generAItor: Tree-in-the-Loop Text Generation for Language Model Explainability and Adaptation**|Thilo Spinner et.al.|[2403.07627v1](http://arxiv.org/abs/2403.07627v1)|null|\n", "2403.07506": "|**2024-03-12**|**Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code**|Zhou Yang et.al.|[2403.07506v1](http://arxiv.org/abs/2403.07506v1)|null|\n", "2403.08213": "|**2024-03-13**|**Can Large Language Models Identify Authorship?**|Baixiang Huang et.al.|[2403.08213v1](http://arxiv.org/abs/2403.08213v1)|**[link](https://github.com/baixianghuang/authorship-llm)**|\n", "2403.09606": "|**2024-03-14**|**Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey**|Xiaoyu Liu et.al.|[2403.09606v1](http://arxiv.org/abs/2403.09606v1)|null|\n", "2403.09567": "|**2024-04-23**|**Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models**|Laura Fern\u00e1ndez-Becerra et.al.|[2403.09567v2](http://arxiv.org/abs/2403.09567v2)|null|\n", "2403.09410": "|**2024-03-14**|**XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization**|Yequan Bie et.al.|[2403.09410v1](http://arxiv.org/abs/2403.09410v1)|null|\n", "2403.09085": "|**2024-03-14**|**Meaningful Learning: Advancing Abstract Reasoning in Large Language Models via Generic Fact Guidance**|Kai Xiong et.al.|[2403.09085v1](http://arxiv.org/abs/2403.09085v1)|null|\n", "2403.08946": "|**2024-03-13**|**Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era**|Xuansheng Wu et.al.|[2403.08946v1](http://arxiv.org/abs/2403.08946v1)|**[link](https://github.com/jacksonwuxs/usablexai_llm)**|\n", "2403.08833": "|**2024-03-13**|**TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation**|Dingbang Li et.al.|[2403.08833v1](http://arxiv.org/abs/2403.08833v1)|null|\n", "2403.10507": "|**2024-03-15**|**Demystifying Faulty Code with LLM: Step-by-Step Reasoning for Explainable Fault Localization**|Ratnadira Widyasari et.al.|[2403.10507v1](http://arxiv.org/abs/2403.10507v1)|null|\n", "2403.10482": "|**2024-03-22**|**Can a GPT4-Powered AI Agent Be a Good Enough Performance Attribution Analyst?**|Bruno de Melo et.al.|[2403.10482v2](http://arxiv.org/abs/2403.10482v2)|null|\n", "2403.10275": "|**2024-03-15**|**A Question on the Explainability of Large Language Models and the Word-Level Univariate First-Order Plausibility Assumption**|Jeremie Bogaert et.al.|[2403.10275v1](http://arxiv.org/abs/2403.10275v1)|null|\n", "2403.10008": "|**2024-03-15**|**Language to Map: Topological map generation from natural language path instructions**|Hideki Deguchi et.al.|[2403.10008v1](http://arxiv.org/abs/2403.10008v1)|null|\n", "2403.11509": "|**2024-03-18**|**DEE: Dual-stage Explainable Evaluation Method for Text Generation**|Shenyu Zhang et.al.|[2403.11509v1](http://arxiv.org/abs/2403.11509v1)|null|\n", "2403.11169": "|**2024-04-30**|**Correcting misinformation on social media with a large language model**|Xinyi Zhou et.al.|[2403.11169v3](http://arxiv.org/abs/2403.11169v3)|**[link](https://github.com/social-futures-lab/muse)**|\n", "2403.11129": "|**2024-03-17**|**Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering**|Baiyan Zhang et.al.|[2403.11129v1](http://arxiv.org/abs/2403.11129v1)|null|\n", "2403.10949": "|**2024-03-26**|**SelfIE: Self-Interpretation of Large Language Model Embeddings**|Haozhe Chen et.al.|[2403.10949v2](http://arxiv.org/abs/2403.10949v2)|**[link](https://github.com/tonychenxyz/selfie)**|\n", "2403.10750": "|**2024-03-16**|**Depression Detection on Social Media with Large Language Models**|Xiaochong Lan et.al.|[2403.10750v1](http://arxiv.org/abs/2403.10750v1)|null|\n", "2403.12451": "|**2024-03-19**|**INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations**|Lirui Luo et.al.|[2403.12451v1](http://arxiv.org/abs/2403.12451v1)|null|\n", "2403.12403": "|**2024-05-08**|**Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales**|Ayushi Nirmal et.al.|[2403.12403v2](http://arxiv.org/abs/2403.12403v2)|**[link](https://github.com/amritabh/shield)**|\n", "2403.13000": "|**2024-03-12**|**Duwak: Dual Watermarks in Large Language Models**|Chaoyi Zhu et.al.|[2403.13000v1](http://arxiv.org/abs/2403.13000v1)|null|\n", "2403.14565": "|**2024-03-21**|**A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science**|Clayton Cohn et.al.|[2403.14565v1](http://arxiv.org/abs/2403.14565v1)|null|\n", "2403.14171": "|**2024-04-08**|**MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation**|Longzheng Wang et.al.|[2403.14171v3](http://arxiv.org/abs/2403.14171v3)|**[link](https://github.com/wishever/mmidr)**|\n", "2403.14118": "|**2024-03-21**|**From Handcrafted Features to LLMs: A Brief Survey for Machine Translation Quality Estimation**|Haofei Zhao et.al.|[2403.14118v1](http://arxiv.org/abs/2403.14118v1)|null|\n", "2403.14059": "|**2024-03-21**|**PE-GPT: A Physics-Informed Interactive Large Language Model for Power Converter Modulation Design**|Fanfan Lin et.al.|[2403.14059v1](http://arxiv.org/abs/2403.14059v1)|null|\n", "2403.14801": "|**2024-04-02**|**Assessing the Utility of Large Language Models for Phenotype-Driven Gene Prioritization in Rare Genetic Disorder Diagnosis**|Junyoung Kim et.al.|[2403.14801v2](http://arxiv.org/abs/2403.14801v2)|null|\n", "2403.16812": "|**2024-03-25**|**Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making**|Shuai Ma et.al.|[2403.16812v1](http://arxiv.org/abs/2403.16812v1)|null|\n", "2403.16662": "|**2024-03-26**|**RU22Fact: Optimizing Evidence for Multilingual Explainable Fact-Checking on Russia-Ukraine Conflict**|Yirong Zeng et.al.|[2403.16662v2](http://arxiv.org/abs/2403.16662v2)|**[link](https://github.com/zeng-yirong/ru22fact)**|\n", "2403.16354": "|**2024-03-25**|**ChatDBG: An AI-Powered Debugging Assistant**|Kyla Levin et.al.|[2403.16354v1](http://arxiv.org/abs/2403.16354v1)|**[link](https://github.com/plasma-umass/chatdbg)**|\n", "2403.15729": "|**2024-03-26**|**Towards a RAG-based Summarization Agent for the Electron-Ion Collider**|Karthik Suresh et.al.|[2403.15729v2](http://arxiv.org/abs/2403.15729v2)|null|\n", "2403.15587": "|**2024-03-22**|**Large language models for crowd decision making based on prompt design strategies using ChatGPT: models, analysis and challenges**|Cristina Zuheros et.al.|[2403.15587v1](http://arxiv.org/abs/2403.15587v1)|null|\n", "2403.17873": "|**2024-03-26**|**Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach**|Andrea Ferrario et.al.|[2403.17873v1](http://arxiv.org/abs/2403.17873v1)|null|\n", "2403.17760": "|**2024-03-26**|**Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons**|Shijia Zhou et.al.|[2403.17760v1](http://arxiv.org/abs/2403.17760v1)|**[link](https://github.com/shijiazh/constructions-are-so-difficult)**|\n", "2403.17218": "|**2024-03-25**|**A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection**|Benjamin Steenhoek et.al.|[2403.17218v1](http://arxiv.org/abs/2403.17218v1)|null|\n", "2403.18537": "|**2024-03-27**|**A Path Towards Legal Autonomy: An interoperable and explainable approach to extracting, transforming, loading and computing legal information using large language models, expert systems and Bayesian networks**|Axel Constant et.al.|[2403.18537v1](http://arxiv.org/abs/2403.18537v1)|null|\n", "2403.18344": "|**2024-03-27**|**LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models**|Mingxing Peng et.al.|[2403.18344v1](http://arxiv.org/abs/2403.18344v1)|null|\n", "2403.18205": "|**2024-03-27**|**Exploring the Privacy Protection Capabilities of Chinese Large Language Models**|Yuqi Yang et.al.|[2403.18205v1](http://arxiv.org/abs/2403.18205v1)|null|\n", "2403.18932": "|**2024-03-27**|**Measuring Political Bias in Large Language Models: What Is Said and How It Is Said**|Yejin Bang et.al.|[2403.18932v1](http://arxiv.org/abs/2403.18932v1)|null|\n", "2403.18872": "|**2024-03-26**|**Targeted Visualization of the Backbone of Encoder LLMs**|Isaac Roberts et.al.|[2403.18872v1](http://arxiv.org/abs/2403.18872v1)|**[link](https://github.com/LucaHermes/DeepView)**|\n", "2403.19876": "|**2024-03-28**|**\"I'm categorizing LLM as a productivity tool\": Examining ethics of LLM use in HCI research practices**|Shivani Kapania et.al.|[2403.19876v1](http://arxiv.org/abs/2403.19876v1)|null|\n", "2404.01135": "|**2024-04-01**|**Enhancing Reasoning Capacity of SLM using Cognitive Enhancement**|Jonathan Pan et.al.|[2404.01135v1](http://arxiv.org/abs/2404.01135v1)|null|\n", "2404.01012": "|**2024-04-01**|**Query Performance Prediction using Relevance Judgments Generated by Large Language Models**|Chuan Meng et.al.|[2404.01012v1](http://arxiv.org/abs/2404.01012v1)|**[link](https://github.com/chuanmeng/qpp-genre)**|\n", "2404.00589": "|**2024-04-12**|**Harnessing the Power of Large Language Model for Uncertainty Aware Graph Processing**|Zhenyu Qian et.al.|[2404.00589v2](http://arxiv.org/abs/2404.00589v2)|**[link](https://github.com/code4paper-2024/code4paper)**|\n", "2404.02650": "|**2024-04-03**|**Towards detecting unanticipated bias in Large Language Models**|Anna Kruspe et.al.|[2404.02650v1](http://arxiv.org/abs/2404.02650v1)|null|\n", "2404.02450": "|**2024-04-03**|**Task Agnostic Architecture for Algorithm Induction via Implicit Composition**|Sahil J. Sindhi et.al.|[2404.02450v1](http://arxiv.org/abs/2404.02450v1)|null|\n", "2404.03623": "|**2024-04-04**|**Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph**|Marco Bronzini et.al.|[2404.03623v1](http://arxiv.org/abs/2404.03623v1)|null|\n", "2404.03577": "|**2024-04-04**|**Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models**|Yantao Liu et.al.|[2404.03577v1](http://arxiv.org/abs/2404.03577v1)|**[link](https://github.com/thu-keg/knot)**|\n", "2404.03428": "|**2024-04-04**|**Edisum: Summarizing and Explaining Wikipedia Edits at Scale**|Marija \u0160akota et.al.|[2404.03428v1](http://arxiv.org/abs/2404.03428v1)|**[link](https://github.com/epfl-dlab/edisum)**|\n", "2404.03301": "|**2024-04-04**|**Probing Large Language Models for Scalar Adjective Lexical Semantics and Scalar Diversity Pragmatics**|Fangru Lin et.al.|[2404.03301v1](http://arxiv.org/abs/2404.03301v1)|**[link](https://github.com/fangru-lin/llm_scalar_adj)**|\n", "2404.03275": "|**2024-04-04**|**DELTA: Decomposed Efficient Long-Term Robot Task Planning using Large Language Models**|Yuchen Liu et.al.|[2404.03275v1](http://arxiv.org/abs/2404.03275v1)|null|\n", "2404.03118": "|**2024-04-03**|**LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models**|Gabriela Ben Melech Stan et.al.|[2404.03118v1](http://arxiv.org/abs/2404.03118v1)|null|\n", "2404.03028": "|**2024-04-10**|**An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models**|Emmy Liu et.al.|[2404.03028v2](http://arxiv.org/abs/2404.03028v2)|null|\n", "2404.02937": "|**2024-04-13**|**Explainable Traffic Flow Prediction with Large Language Models**|Xusen Guo et.al.|[2404.02937v3](http://arxiv.org/abs/2404.02937v3)|null|\n", "2404.05101": "|**2024-04-07**|**StockGPT: A GenAI Model for Stock Prediction and Trading**|Dat Mai et.al.|[2404.05101v1](http://arxiv.org/abs/2404.05101v1)|null|\n", "2404.04838": "|**2024-04-07**|**Data Bias According to Bipol: Men are Naturally Right and It is the Role of Women to Follow Their Lead**|Irene Pagliai et.al.|[2404.04838v1](http://arxiv.org/abs/2404.04838v1)|**[link](https://github.com/ltu-machine-learning/bipolmulti)**|\n", "2404.04656": "|**2024-04-06**|**Binary Classifier Optimization for Large Language Model Alignment**|Seungjae Jung et.al.|[2404.04656v1](http://arxiv.org/abs/2404.04656v1)|null|\n", "2404.04286": "|**2024-04-04**|**Language Model Evolution: An Iterated Learning Perspective**|Yi Ren et.al.|[2404.04286v1](http://arxiv.org/abs/2404.04286v1)|**[link](https://github.com/joshua-ren/iicl)**|\n", "2404.06349": "|**2024-04-09**|**CausalBench: A Comprehensive Benchmark for Causal Learning Capability of Large Language Models**|Yu Zhou et.al.|[2404.06349v1](http://arxiv.org/abs/2404.06349v1)|null|\n", "2404.06332": "|**2024-04-07**|**X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model**|Jan Held et.al.|[2404.06332v1](http://arxiv.org/abs/2404.06332v1)|null|\n", "2404.07108": "|**2024-04-11**|**From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications**|Yongqiang Ma et.al.|[2404.07108v2](http://arxiv.org/abs/2404.07108v2)|null|\n", "2404.07009": "|**2024-05-15**|**A Mathematical Theory for Learning Semantic Languages by Abstract Learners**|Kuo-Yu Liao et.al.|[2404.07009v3](http://arxiv.org/abs/2404.07009v3)|null|\n", "2404.07005": "|**2024-04-10**|**WordDecipher: Enhancing Digital Workspace Communication with Explainable AI for Non-native English Speakers**|Yuexi Chen et.al.|[2404.07005v1](http://arxiv.org/abs/2404.07005v1)|null|\n", "2404.07725": "|**2024-04-11**|**Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models**|Marvin Pafla et.al.|[2404.07725v1](http://arxiv.org/abs/2404.07725v1)|null|\n", "2404.07235": "|**2024-04-07**|**Explaining EDA synthesis errors with LLMs**|Siyu Qiu et.al.|[2404.07235v1](http://arxiv.org/abs/2404.07235v1)|null|\n", "2404.08148": "|**2024-04-11**|**Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs**|Jierui Li et.al.|[2404.08148v1](http://arxiv.org/abs/2404.08148v1)|null|\n", "2402.17097": "|**2024-04-12**|**Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses**|Juyeon Kim et.al.|[2402.17097v2](http://arxiv.org/abs/2402.17097v2)|**[link](https://github.com/juyeonnn/reex)**|\n", "2404.10306": "|**2024-06-03**|**Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model**|Hengyuan Zhang et.al.|[2404.10306v4](http://arxiv.org/abs/2404.10306v4)|**[link](https://github.com/rattlesnakey/cofitune)**|\n", "2404.12372": "|**2024-04-18**|**MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale**|Xiaotang Gai et.al.|[2404.12372v1](http://arxiv.org/abs/2404.12372v1)|null|\n", "2404.11875": "|**2024-04-18**|**Concept Induction using LLMs: a user experiment for assessment**|Adrita Barua et.al.|[2404.11875v1](http://arxiv.org/abs/2404.11875v1)|null|\n", "2404.10876": "|**2024-05-01**|**Course Recommender Systems Need to Consider the Job Market**|Jibril Frej et.al.|[2404.10876v2](http://arxiv.org/abs/2404.10876v2)|**[link](https://github.com/jibril-frej/jcrec)**|\n", "2404.12901": "|**2024-04-29**|**Large Language Models for Networking: Workflow, Advances and Challenges**|Chang Liu et.al.|[2404.12901v2](http://arxiv.org/abs/2404.12901v2)|null|\n", "2404.14304": "|**2024-05-10**|**Explaining Arguments' Strength: Unveiling the Role of Attacks and Supports (Technical Report)**|Xiang Yin et.al.|[2404.14304v2](http://arxiv.org/abs/2404.14304v2)|**[link](https://github.com/XiangYin2021/RAE)**|\n", "2404.14296": "|**2024-04-22**|**Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach**|Yao Wan et.al.|[2404.14296v1](http://arxiv.org/abs/2404.14296v1)|**[link](https://github.com/CGCL-codes/naturalcc)**|\n", "2404.13847": "|**2024-04-22**|**EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning**|Mingjie Ma et.al.|[2404.13847v1](http://arxiv.org/abs/2404.13847v1)|null|\n", "2404.14928": "|**2024-06-04**|**Graph Machine Learning in the Era of Large Language Models (LLMs)**|Wenqi Fan et.al.|[2404.14928v2](http://arxiv.org/abs/2404.14928v2)|null|\n", "2404.15166": "|**2024-04-22**|**Pixels and Predictions: Potential of GPT-4V in Meteorological Imagery Analysis and Forecast Communication**|John R. Lawson et.al.|[2404.15166v1](http://arxiv.org/abs/2404.15166v1)|null|\n", "2404.16635": "|**2024-04-25**|**TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning**|Liang Zhang et.al.|[2404.16635v1](http://arxiv.org/abs/2404.16635v1)|**[link](https://github.com/x-plug/mplug-docowl)**|\n", "2404.15993": "|**2024-04-24**|**Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach**|Linyu Liu et.al.|[2404.15993v1](http://arxiv.org/abs/2404.15993v1)|null|\n", "2404.15848": "|**2024-04-25**|**Detecting Conceptual Abstraction in LLMs**|Michaela Regneri et.al.|[2404.15848v2](http://arxiv.org/abs/2404.15848v2)|null|\n", "2404.16045": "|**2024-04-04**|**Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation**|Mohammadmehdi Ataei et.al.|[2404.16045v1](http://arxiv.org/abs/2404.16045v1)|null|\n", "2404.16859": "|**2024-04-11**|**Rumour Evaluation with Very Large Language Models**|Dahlia Shehata et.al.|[2404.16859v1](http://arxiv.org/abs/2404.16859v1)|**[link](https://github.com/dahlia-chehata/rumoureval-with-vllms)**|\n", "2404.18533": "|**2024-04-30**|**Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability**|Meng Li et.al.|[2404.18533v2](http://arxiv.org/abs/2404.18533v2)|**[link](https://github.com/hr-jin/concept-explaination-evaluation)**|\n", "2404.18286": "|**2024-04-30**|**Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages**|David Ifeoluwa Adelani et.al.|[2404.18286v2](http://arxiv.org/abs/2404.18286v2)|null|\n", "2404.17977": "|**2024-04-27**|**Advancing Healthcare Automation: Multi-Agent Systems for Medical Necessity Justification**|Himanshu Pandey et.al.|[2404.17977v1](http://arxiv.org/abs/2404.17977v1)|null|\n", "2404.19729": "|**2024-04-30**|**A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications**|Steph Buongiorno et.al.|[2404.19729v1](http://arxiv.org/abs/2404.19729v1)|null|\n", "2404.19631": "|**2024-04-30**|**On Training a Neural Network to Explain Binaries**|Alexander Interrante-Grant et.al.|[2404.19631v1](http://arxiv.org/abs/2404.19631v1)|null|\n", "2404.19093": "|**2024-04-29**|**Large Language Models as Conversational Movie Recommenders: A User Study**|Ruixuan Sun et.al.|[2404.19093v1](http://arxiv.org/abs/2404.19093v1)|null|\n", "2405.00449": "|**2024-05-01**|**RAG-based Explainable Prediction of Road Users Behaviors for Automated Driving using Knowledge Graphs and Large Language Models**|Mohamed Manzour Hussien et.al.|[2405.00449v1](http://arxiv.org/abs/2405.00449v1)|null|\n", "2405.00273": "|**2024-05-01**|**Social Life Simulation for Non-Cognitive Skills Learning**|Zihan Yan et.al.|[2405.00273v1](http://arxiv.org/abs/2405.00273v1)|null|\n", "2405.01379": "|**2024-05-08**|**Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving**|Xin Quan et.al.|[2405.01379v2](http://arxiv.org/abs/2405.01379v2)|null|\n", "2405.00722": "|**2024-04-26**|**LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study**|Van Bach Nguyen et.al.|[2405.00722v1](http://arxiv.org/abs/2405.00722v1)|null|\n", "2405.02079": "|**2024-05-03**|**Argumentative Large Language Models for Explainable and Contestable Decision-Making**|Gabriel Freedman et.al.|[2405.02079v1](http://arxiv.org/abs/2405.02079v1)|null|\n", "2405.01904": "|**2024-05-03**|**Which Identities Are Mobilized: Towards an automated detection of social group appeals in political texts**|Felicia Riethm\u00fcller et.al.|[2405.01904v1](http://arxiv.org/abs/2405.01904v1)|null|\n", "2405.01768": "|**2024-05-02**|**CoS: Enhancing Personalization and Mitigating Bias with Context Steering**|Jerry Zhi-Yang He et.al.|[2405.01768v1](http://arxiv.org/abs/2405.01768v1)|null|\n", "2403.11894": "|**2024-05-09**|**From Explainable to Interpretable Deep Learning for Natural Language Processing in Healthcare: How Far from Reality?**|Guangming Huang et.al.|[2403.11894v3](http://arxiv.org/abs/2403.11894v3)|null|\n", "2405.03371": "|**2024-05-06**|**Explainable Fake News Detection With Large Language Model via Defense Among Competing Wisdom**|Bo Wang et.al.|[2405.03371v1](http://arxiv.org/abs/2405.03371v1)|**[link](https://github.com/wangbo9719/L-Defense_EFND)**|\n", "2405.02421": "|**2024-05-03**|**What does the Knowledge Neuron Thesis Have to do with Knowledge?**|Jingcheng Niu et.al.|[2405.02421v1](http://arxiv.org/abs/2405.02421v1)|**[link](https://github.com/frankniujc/kn_thesis)**|\n", "2405.04382": "|**2024-05-07**|**Large Language Models Cannot Explain Themselves**|Advait Sarkar et.al.|[2405.04382v1](http://arxiv.org/abs/2405.04382v1)|null|\n", "2405.04325": "|**2024-05-07**|**Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation**|Atharvan Dogra et.al.|[2405.04325v1](http://arxiv.org/abs/2405.04325v1)|null|\n", "2405.04324": "|**2024-05-07**|**Granite Code Models: A Family of Open Foundation Models for Code Intelligence**|Mayank Mishra et.al.|[2405.04324v1](http://arxiv.org/abs/2405.04324v1)|**[link](https://github.com/ibm-granite/granite-code-models)**|\n", "2405.04236": "|**2024-05-07**|**Semantic API Alignment: Linking High-level User Goals to APIs**|Robert Feldt et.al.|[2405.04236v1](http://arxiv.org/abs/2405.04236v1)|null|\n", "2405.04215": "|**2024-05-07**|**NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions**|Elliot Gestrin et.al.|[2405.04215v1](http://arxiv.org/abs/2405.04215v1)|null|\n", "2405.04160": "|**2024-05-07**|**A Causal Explainable Guardrails for Large Language Models**|Zhixuan Chu et.al.|[2405.04160v1](http://arxiv.org/abs/2405.04160v1)|null|\n", "2405.03734": "|**2024-05-06**|**FOKE: A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering**|Silan Hu et.al.|[2405.03734v1](http://arxiv.org/abs/2405.03734v1)|null|\n", "2405.02358": "|**2024-05-07**|**A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model**|Jiexia Ye et.al.|[2405.02358v2](http://arxiv.org/abs/2405.02358v2)|**[link](https://github.com/start2020/awesome-timeseries-llm-fm)**|\n", "2405.05248": "|**2024-05-09**|**LLMs with Personalities in Multi-issue Negotiation Games**|Sean Noh et.al.|[2405.05248v2](http://arxiv.org/abs/2405.05248v2)|null|\n", "2405.04793": "|**2024-05-08**|**Zero-shot LLM-guided Counterfactual Generation for Text**|Amrita Bhattacharjee et.al.|[2405.04793v1](http://arxiv.org/abs/2405.04793v1)|null|\n", "2405.04760": "|**2024-05-09**|**Large Language Models for Cyber Security: A Systematic Literature Review**|HanXiang Xu et.al.|[2405.04760v2](http://arxiv.org/abs/2405.04760v2)|null|\n", "2405.05548": "|**2024-05-09**|**Investigating Interaction Modes and User Agency in Human-LLM Collaboration for Domain-Specific Data Analysis**|Jiajing Guo et.al.|[2405.05548v1](http://arxiv.org/abs/2405.05548v1)|null|\n", "2405.05348": "|**2024-05-08**|**The Effect of Model Size on LLM Post-hoc Explainability via LIME**|Henning Heyen et.al.|[2405.05348v1](http://arxiv.org/abs/2405.05348v1)|**[link](https://github.com/henningheyen/scalability-of-llm-posthoc-explanations)**|\n", "2405.06270": "|**2024-06-03**|**XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare**|Fatemeh Nazary et.al.|[2405.06270v3](http://arxiv.org/abs/2405.06270v3)|null|\n", "2405.06105": "|**2024-05-09**|**Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding?**|Yutong Hu et.al.|[2405.06105v1](http://arxiv.org/abs/2405.06105v1)|null|\n", "2405.06064": "|**2024-05-09**|**LLMs for XAI: Future Directions for Explaining Explanations**|Alexandra Zytek et.al.|[2405.06064v1](http://arxiv.org/abs/2405.06064v1)|null|\n", "2405.07436": "|**2024-05-13**|**Can Language Models Explain Their Own Classification Behavior?**|Dane Sherburn et.al.|[2405.07436v1](http://arxiv.org/abs/2405.07436v1)|**[link](https://github.com/danesherbs/articulate-rules)**|\n", "2405.06800": "|**2024-05-10**|**LLM-Generated Black-box Explanations Can Be Adversarially Helpful**|Rohan Ajwani et.al.|[2405.06800v1](http://arxiv.org/abs/2405.06800v1)|null|\n", "2405.06671": "|**2024-05-15**|**Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling**|Subhendu Khatuya et.al.|[2405.06671v2](http://arxiv.org/abs/2405.06671v2)|**[link](https://github.com/subhendukhatuya/FLAN-FinXC)**|\n", "2405.08502": "|**2024-05-14**|**Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure**|Odysseas S. Chlapanis et.al.|[2405.08502v1](http://arxiv.org/abs/2405.08502v1)|**[link](https://github.com/nlpaueb/multiple-choice-mutation)**|\n", "2405.08468": "|**2024-05-14**|**Challenges and Opportunities in Text Generation Explainability**|Kenza Amara et.al.|[2405.08468v1](http://arxiv.org/abs/2405.08468v1)|null|\n", "2405.08448": "|**2024-05-14**|**Understanding the performance gap between online and offline alignment algorithms**|Yunhao Tang et.al.|[2405.08448v1](http://arxiv.org/abs/2405.08448v1)|null|\n", "2405.08026": "|**2024-05-12**|**ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis**|Mohammad Amaz Uddin et.al.|[2405.08026v1](http://arxiv.org/abs/2405.08026v1)|null|\n", "2405.09454": "|**2024-05-15**|**Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models**|Majid Zarharan et.al.|[2405.09454v1](http://arxiv.org/abs/2405.09454v1)|**[link](https://github.com/Zarharan/NLE-for-fact-checking)**|\n", "2405.09673": "|**2024-05-15**|**LoRA Learns Less and Forgets Less**|Dan Biderman et.al.|[2405.09673v1](http://arxiv.org/abs/2405.09673v1)|null|\n", "2405.10700": "|**2024-05-17**|**SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks**|Michael Shliselberg et.al.|[2405.10700v1](http://arxiv.org/abs/2405.10700v1)|null|\n", "2405.12264": "|**2024-05-20**|**Directed Metric Structures arising in Large Language Models**|St\u00e9phane Gaubert et.al.|[2405.12264v1](http://arxiv.org/abs/2405.12264v1)|null|\n", "2405.11579": "|**2024-05-19**|**Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications**|Subhankar Maity et.al.|[2405.11579v1](http://arxiv.org/abs/2405.11579v1)|null|\n", "2405.14612": "|**2024-05-28**|**Explaining Multi-modal Large Language Models by Analyzing their Vision Perception**|Loris Giulivi et.al.|[2405.14612v2](http://arxiv.org/abs/2405.14612v2)|**[link](https://github.com/loris2222/ExplainingMLLMs)**|\n", "2405.14411": "|**2024-05-23**|**Large Language Models for Explainable Decisions in Dynamic Digital Twins**|Nan Zhang et.al.|[2405.14411v1](http://arxiv.org/abs/2405.14411v1)|**[link](https://github.com/explainable-digital-twins/rag-dddas)**|\n", "2405.14391": "|**2024-05-26**|**Explainable Few-shot Knowledge Tracing**|Haoxuan Li et.al.|[2405.14391v2](http://arxiv.org/abs/2405.14391v2)|**[link](https://github.com/leavesli1015/explainable-few-shot-knowledge-tracing)**|\n", "2405.14117": "|**2024-05-23**|**Knowledge Localization: Mission Not Accomplished? Enter Query Localization!**|Yuheng Chen et.al.|[2405.14117v1](http://arxiv.org/abs/2405.14117v1)|null|\n", "2405.13769": "|**2024-05-22**|**Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation**|Cyril Chhun et.al.|[2405.13769v1](http://arxiv.org/abs/2405.13769v1)|**[link](https://github.com/dig-team/hanna-benchmark-asg)**|\n", "2405.13740": "|**2024-05-22**|**Mining Action Rules for Defect Reduction Planning**|Khouloud Oueslati et.al.|[2405.13740v1](http://arxiv.org/abs/2405.13740v1)|null|\n", "2405.13560": "|**2024-05-22**|**Navigating User Experience of ChatGPT-based Conversational Recommender Systems: The Effects of Prompt Guidance and Recommendation Domain**|Yizhe Zhang et.al.|[2405.13560v1](http://arxiv.org/abs/2405.13560v1)|null|\n", "2405.13547": "|**2024-05-22**|**HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model**|Mustafa Yildirim et.al.|[2405.13547v1](http://arxiv.org/abs/2405.13547v1)|null|\n", "2405.13209": "|**2024-05-21**|**Investigating Symbolic Capabilities of Large Language Models**|Neisarg Dave et.al.|[2405.13209v1](http://arxiv.org/abs/2405.13209v1)|null|\n", "2405.13000": "|**2024-05-11**|**RAGE Against the Machine: Retrieval-Augmented LLM Explanations**|Joel Rorseth et.al.|[2405.13000v1](http://arxiv.org/abs/2405.13000v1)|null|\n", "2405.15624": "|**2024-05-24**|**Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment**|Hao Sun et.al.|[2405.15624v1](http://arxiv.org/abs/2405.15624v1)|null|\n", "2405.15512": "|**2024-07-03**|**ChatGPT Code Detection: Techniques for Uncovering the Source of Code**|Marc Oedingen et.al.|[2405.15512v2](http://arxiv.org/abs/2405.15512v2)|**[link](https://github.com/MarcOedingen/ChatGPT-Code-Detection)**|\n", "2405.15164": "|**2024-05-24**|**From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks**|Jacob Russin et.al.|[2405.15164v1](http://arxiv.org/abs/2405.15164v1)|null|\n", "2405.17129": "|**2024-07-02**|**TEII: Think, Explain, Interact and Iterate with Large Language Models to Solve Cross-lingual Emotion Detection**|Long Cheng et.al.|[2405.17129v2](http://arxiv.org/abs/2405.17129v2)|**[link](https://github.com/cl-victor1/exalt_2024_bcsz)**|\n", "2405.16918": "|**2024-05-27**|**The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective**|Nils Philipp Walter et.al.|[2405.16918v1](http://arxiv.org/abs/2405.16918v1)|null|\n", "2405.16344": "|**2024-05-25**|**Large Language Models Enable Automated Formative Feedback in Human-Robot Interaction Tasks**|Emily Jensen et.al.|[2405.16344v1](http://arxiv.org/abs/2405.16344v1)|null|\n", "2405.16127": "|**2024-06-20**|**Finetuning Large Language Model for Personalized Ranking**|Zhuoxi Bai et.al.|[2405.16127v2](http://arxiv.org/abs/2405.16127v2)|**[link](https://github.com/BZX667/DMPO)**|\n", "2405.15943": "|**2024-05-24**|**Transformers represent belief state geometry in their residual stream**|Adam S. Shai et.al.|[2405.15943v1](http://arxiv.org/abs/2405.15943v1)|null|\n", "2405.18357": "|**2024-06-11**|**Faithful Logical Reasoning via Symbolic Chain-of-Thought**|Jundong Xu et.al.|[2405.18357v2](http://arxiv.org/abs/2405.18357v2)|**[link](https://github.com/aiden0526/symbcot)**|\n", "2405.18241": "|**2024-05-28**|**Active Use of Latent Constituency Representation in both Humans and Large Language Models**|Wei Liu et.al.|[2405.18241v1](http://arxiv.org/abs/2405.18241v1)|**[link](https://github.com/y1ny/WordDeletion)**|\n", "2405.17799": "|**2024-05-28**|**Exploring Activation Patterns of Parameters in Language Models**|Yudong Wang et.al.|[2405.17799v1](http://arxiv.org/abs/2405.17799v1)|null|\n", "2405.17728": "|**2024-05-28**|**Facilitating Holistic Evaluations with LLMs: Insights from Scenario-Based Experiments**|Toru Ishida et.al.|[2405.17728v1](http://arxiv.org/abs/2405.17728v1)|null|\n", "2405.17533": "|**2024-05-27**|**PAE: LLM-based Product Attribute Extraction for E-Commerce Fashion Trends**|Apurva Sinha et.al.|[2405.17533v1](http://arxiv.org/abs/2405.17533v1)|null|\n", "2405.18915": "|**2024-05-29**|**Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners**|Jiachun Li et.al.|[2405.18915v1](http://arxiv.org/abs/2405.18915v1)|null|\n", "2404.07774": "|**2024-05-29**|**Sketch-Plan-Generalize: Continual Few-Shot Learning of Inductively Generalizable Spatial Concepts**|Namasivayam Kalithasan et.al.|[2404.07774v2](http://arxiv.org/abs/2404.07774v2)|null|\n", "2405.19561": "|**2024-05-29**|**Quo Vadis ChatGPT? From Large Language Models to Large Knowledge Models**|Venkat Venkatasubramanian et.al.|[2405.19561v1](http://arxiv.org/abs/2405.19561v1)|null|\n", "2405.20974": "|**2024-06-05**|**SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales**|Tianyang Xu et.al.|[2405.20974v2](http://arxiv.org/abs/2405.20974v2)|**[link](https://github.com/xu1868/sayself)**|\n", "2405.20962": "|**2024-06-03**|**Large Language Models are Zero-Shot Next Location Predictors**|Ciro Beneduce et.al.|[2405.20962v2](http://arxiv.org/abs/2405.20962v2)|**[link](https://github.com/ssai-trento/llm-zero-shot-nl)**|\n", "2405.20613": "|**2024-05-31**|**FineRadScore: A Radiology Report Line-by-Line Evaluation Technique Generating Corrections with Severity Scores**|Alyssa Huang et.al.|[2405.20613v1](http://arxiv.org/abs/2405.20613v1)|**[link](https://github.com/rajpurkarlab/FineRadScore)**|\n", "2405.20404": "|**2024-05-30**|**XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution**|Yurui Chang et.al.|[2405.20404v1](http://arxiv.org/abs/2405.20404v1)|null|\n", "2406.02377": "|**2024-06-04**|**XRec: Large Language Models for Explainable Recommendation**|Qiyao Ma et.al.|[2406.02377v1](http://arxiv.org/abs/2406.02377v1)|**[link](https://github.com/hkuds/xrec)**|\n", "2406.02060": "|**2024-06-04**|**I've got the \"Answer\"! Interpretation of LLMs Hidden States in Question Answering**|Valeriya Goloviznina et.al.|[2406.02060v1](http://arxiv.org/abs/2406.02060v1)|null|\n", "2406.01538": "|**2024-06-20**|**What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores**|Ebrahim Feghhi et.al.|[2406.01538v2](http://arxiv.org/abs/2406.01538v2)|**[link](https://github.com/ebrahimfeghhi/beyond-brainscore)**|\n", "2406.01428": "|**2024-06-04**|**Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study**|Martin J. Hetz et.al.|[2406.01428v2](http://arxiv.org/abs/2406.01428v2)|null|\n", "2406.01126": "|**2024-06-03**|**TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine**|Wenjing Yue et.al.|[2406.01126v1](http://arxiv.org/abs/2406.01126v1)|null|\n", "2406.00944": "|**2024-06-03**|**Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution**|Shicheng Xu et.al.|[2406.00944v1](http://arxiv.org/abs/2406.00944v1)|null|\n", "2406.00430": "|**2024-06-01**|**Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners**|Zhi Zheng et.al.|[2406.00430v1](http://arxiv.org/abs/2406.00430v1)|null|\n", "2406.00131": "|**2024-05-31**|**How In-Context Learning Emerges from Training on Unstructured Data: On the Role of Co-Occurrence, Positional Information, and Noise Structures**|Kevin Christian Wibisono et.al.|[2406.00131v1](http://arxiv.org/abs/2406.00131v1)|**[link](https://github.com/yixinw-lab/icl-unstructured)**|\n", "2406.00039": "|**2024-05-27**|**How Ready Are Generative Pre-trained Large Language Models for Explaining Bengali Grammatical Errors?**|Subhankar Maity et.al.|[2406.00039v1](http://arxiv.org/abs/2406.00039v1)|null|\n", "2406.00030": "|**2024-05-24**|**Large Language Model Pruning**|Hanjuan Huang et.al.|[2406.00030v1](http://arxiv.org/abs/2406.00030v1)|null|\n", "2406.03474": "|**2024-06-05**|**AD-H: Autonomous Driving with Hierarchical Agents**|Zaibin Zhang et.al.|[2406.03474v1](http://arxiv.org/abs/2406.03474v1)|null|\n", "2406.03248": "|**2024-06-06**|**Large Language Models as Evaluators for Recommendation Explanations**|Xiaoyu Zhang et.al.|[2406.03248v2](http://arxiv.org/abs/2406.03248v2)|**[link](https://github.com/xiaoyu-sz/llmasevaluator)**|\n", "2406.03181": "|**2024-06-05**|**Missci: Reconstructing Fallacies in Misrepresented Science**|Max Glockner et.al.|[2406.03181v1](http://arxiv.org/abs/2406.03181v1)|**[link](https://github.com/UKPLab/acl2024-missci)**|\n", "2406.04216": "|**2024-06-08**|**What Do Language Models Learn in Context? The Structured Task Hypothesis**|Jiaoda Li et.al.|[2406.04216v2](http://arxiv.org/abs/2406.04216v2)|**[link](https://github.com/eth-lre/llm_icl)**|\n", "2406.03768": "|**2024-06-06**|**Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective**|Xinhao Yao et.al.|[2406.03768v1](http://arxiv.org/abs/2406.03768v1)|**[link](https://github.com/chen123ctrls/enhancingicl_svdpruning)**|\n", "2406.03505": "|**2024-06-04**|**Dynamic and Adaptive Feature Generation with LLM**|Xinhao Zhang et.al.|[2406.03505v1](http://arxiv.org/abs/2406.03505v1)|null|\n", "2406.04926": "|**2024-06-07**|**Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models**|Micha\u0142 Romaszewski et.al.|[2406.04926v1](http://arxiv.org/abs/2406.04926v1)|null|\n", "2406.04758": "|**2024-06-07**|**Think out Loud: Emotion Deducing Explanation in Dialogues**|Jiangnan Li et.al.|[2406.04758v1](http://arxiv.org/abs/2406.04758v1)|null|\n", "2406.04606": "|**2024-06-07**|**Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions**|Jingtan Wang et.al.|[2406.04606v1](http://arxiv.org/abs/2406.04606v1)|**[link](https://github.com/jtwang2000/freeshap)**|\n", "2406.06451": "|**2024-06-10**|**Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course**|Aadarsh Padiyath et.al.|[2406.06451v1](http://arxiv.org/abs/2406.06451v1)|null|\n", "2406.06399": "|**2024-07-05**|**Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue**|Simone Alghisi et.al.|[2406.06399v2](http://arxiv.org/abs/2406.06399v2)|null|\n", "2406.06331": "|**2024-07-03**|**MedExQA: Medical Question Answering Benchmark with Multiple Explanations**|Yunsoo Kim et.al.|[2406.06331v2](http://arxiv.org/abs/2406.06331v2)|**[link](https://github.com/knowlab/medexqa)**|\n", "2406.05946": "|**2024-06-10**|**Safety Alignment Should Be Made More Than Just a Few Tokens Deep**|Xiangyu Qi et.al.|[2406.05946v1](http://arxiv.org/abs/2406.05946v1)|**[link](https://github.com/unispac/shallow-vs-deep-alignment)**|\n", "2406.05644": "|**2024-06-13**|**How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States**|Zhenhong Zhou et.al.|[2406.05644v2](http://arxiv.org/abs/2406.05644v2)|**[link](https://github.com/ydyjya/llm-ihs-explanation)**|\n", "2406.05596": "|**2024-06-08**|**Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification**|Yunhe Gao et.al.|[2406.05596v1](http://arxiv.org/abs/2406.05596v1)|null|\n", "2406.06870": "|**2024-06-15**|**What's in an embedding? Would a rose by any embedding smell as sweet?**|Venkat Venkatasubramanian et.al.|[2406.06870v3](http://arxiv.org/abs/2406.06870v3)|null|\n", "2406.06773": "|**2024-06-10**|**Evaluating Zero-Shot Long-Context LLM Compression**|Chenyu Wang et.al.|[2406.06773v1](http://arxiv.org/abs/2406.06773v1)|null|\n", "2406.06637": "|**2024-06-09**|**Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering**|Saman Pordanesh et.al.|[2406.06637v1](http://arxiv.org/abs/2406.06637v1)|null|\n", "2406.06610": "|**2024-06-06**|**Reinterpreting 'the Company a Word Keeps': Towards Explainable and Ontologically Grounded Language Models**|Walid S. Saba et.al.|[2406.06610v1](http://arxiv.org/abs/2406.06610v1)|null|\n", "2406.06596": "|**2024-06-06**|**Are Large Language Models the New Interface for Data Pipelines?**|Sylvio Barbon Junior et.al.|[2406.06596v1](http://arxiv.org/abs/2406.06596v1)|null|\n", "2406.06579": "|**2024-06-13**|**From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models**|Xiaofeng Zhang et.al.|[2406.06579v2](http://arxiv.org/abs/2406.06579v2)|null|\n", "2406.08101": "|**2024-06-13**|**CoXQL: A Dataset for Parsing Explanation Requests in Conversational XAI Systems**|Qianli Wang et.al.|[2406.08101v2](http://arxiv.org/abs/2406.08101v2)|**[link](https://github.com/DFKI-NLP/CoXQL)**|\n", "2406.08074": "|**2024-06-12**|**A Concept-Based Explainability Framework for Large Multimodal Models**|Jayneel Parekh et.al.|[2406.08074v1](http://arxiv.org/abs/2406.08074v1)|null|\n", "2406.07714": "|**2024-06-13**|**LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing**|Hongxiang Zhang et.al.|[2406.07714v2](http://arxiv.org/abs/2406.07714v2)|null|\n", "2406.08572": "|**2024-06-12**|**LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions**|Nhat Hoang-Xuan et.al.|[2406.08572v1](http://arxiv.org/abs/2406.08572v1)|null|\n", "2406.09701": "|**2024-06-14**|**Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models**|Qiheng Mao et.al.|[2406.09701v1](http://arxiv.org/abs/2406.09701v1)|null|\n", "2406.09612": "|**2024-06-13**|**Automated Molecular Concept Generation and Labeling with Large Language Models**|Shichang Zhang et.al.|[2406.09612v1](http://arxiv.org/abs/2406.09612v1)|null|\n", "2406.11785": "|**2024-06-17**|**CELL your Model: Contrastive Explanation Methods for Large Language Models**|Ronny Luss et.al.|[2406.11785v1](http://arxiv.org/abs/2406.11785v1)|null|\n", "2406.11547": "|**2024-06-17**|**GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations**|Rick Wilming et.al.|[2406.11547v1](http://arxiv.org/abs/2406.11547v1)|**[link](https://github.com/braindatalab/gecobench)**|\n", "2406.11341": "|**2024-06-17**|**A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences**|Leonardo Bertolazzi et.al.|[2406.11341v1](http://arxiv.org/abs/2406.11341v1)|null|\n", "2406.11177": "|**2024-06-17**|**TIFG: Text-Informed Feature Generation with Large Language Models**|Xinhao Zhang et.al.|[2406.11177v1](http://arxiv.org/abs/2406.11177v1)|null|\n", "2406.10811": "|**2024-06-16**|**LLMFactor: Extracting Profitable Factors through Prompts for Explainable Stock Movement Prediction**|Meiyun Wang et.al.|[2406.10811v1](http://arxiv.org/abs/2406.10811v1)|null|\n", "2406.10729": "|**2024-06-15**|**A Comprehensive Survey of Foundation Models in Medicine**|Wasif Khan et.al.|[2406.10729v1](http://arxiv.org/abs/2406.10729v1)|null|\n", "2406.10602": "|**2024-06-15**|**Multilingual Large Language Models and Curse of Multilinguality**|Daniil Gurgurov et.al.|[2406.10602v1](http://arxiv.org/abs/2406.10602v1)|null|\n", "2406.12649": "|**2024-06-19**|**Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models**|Hengyi Wang et.al.|[2406.12649v2](http://arxiv.org/abs/2406.12649v2)|null|\n", "2406.12288": "|**2024-06-18**|**An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs**|Daking Rai et.al.|[2406.12288v1](http://arxiv.org/abs/2406.12288v1)|**[link](https://github.com/dakingrai/neuron-analysis-cot-arithmetic-reasoning)**|\n", "2406.12269": "|**2024-06-18**|**Unveiling Implicit Table Knowledge with Question-Then-Pinpoint Reasoner for Insightful Table Summarization**|Kwangwook Seo et.al.|[2406.12269v1](http://arxiv.org/abs/2406.12269v1)|null|\n", "2406.12255": "|**2024-06-18**|**A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning**|Lijie Hu et.al.|[2406.12255v1](http://arxiv.org/abs/2406.12255v1)|null|\n", "2406.12235": "|**2024-06-29**|**Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM**|Huaxin Zhang et.al.|[2406.12235v2](http://arxiv.org/abs/2406.12235v2)|**[link](https://github.com/pipixin321/holmesvad)**|\n", "2406.12058": "|**2024-06-28**|**WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions**|Seyedali Mohammadi et.al.|[2406.12058v3](http://arxiv.org/abs/2406.12058v3)|null|\n", "2406.11871": "|**2024-05-31**|**Generative AI Voting: Fair Collective Choice is Resilient to LLM Biases and Inconsistencies**|Srijoni Majumdar et.al.|[2406.11871v1](http://arxiv.org/abs/2406.11871v1)|null|\n", "2406.14335": "|**2024-06-20**|**Self-supervised Interpretable Concept-based Models for Text Classification**|Francesco De Santis et.al.|[2406.14335v1](http://arxiv.org/abs/2406.14335v1)|null|\n", "2406.14167": "|**2024-06-20**|**Definition generation for lexical semantic change detection**|Mariia Fedorova et.al.|[2406.14167v1](http://arxiv.org/abs/2406.14167v1)|**[link](https://github.com/ltgoslo/Definition-generation-for-LSCD)**|\n", "2406.13558": "|**2024-06-22**|**Enhancing Travel Choice Modeling with Large Language Models: A Prompt-Learning Approach**|Xuehao Zhai et.al.|[2406.13558v2](http://arxiv.org/abs/2406.13558v2)|null|\n", "2406.12934": "|**2024-06-16**|**Current state of LLM Risks and AI Guardrails**|Suriya Ganesh Ayyamperumal et.al.|[2406.12934v1](http://arxiv.org/abs/2406.12934v1)|null|\n", "2406.15109": "|**2024-06-21**|**Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network**|Badr AlKhamissi et.al.|[2406.15109v1](http://arxiv.org/abs/2406.15109v1)|**[link](https://github.com/bkhmsi/brain-language-suma)**|\n", "2406.15045": "|**2024-06-21**|**Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction**|Jinge Wu et.al.|[2406.15045v1](http://arxiv.org/abs/2406.15045v1)|null|\n", "2406.14737": "|**2024-06-20**|**Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?**|Zhiqiang Pi et.al.|[2406.14737v1](http://arxiv.org/abs/2406.14737v1)|null|\n", "2406.16828": "|**2024-06-24**|**Ragnar\u00f6k: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track**|Ronak Pradeep et.al.|[2406.16828v1](http://arxiv.org/abs/2406.16828v1)|**[link](https://github.com/castorini/ragnarok)**|\n", "2406.16655": "|**2024-06-24**|**Large Language Models Are Cross-Lingual Knowledge-Free Reasoners**|Peng Hu et.al.|[2406.16655v1](http://arxiv.org/abs/2406.16655v1)|**[link](https://github.com/NJUNLP/Knowledge-Free-Reasoning)**|\n", "2406.16382": "|**2024-06-24**|**UNO Arena for Evaluating Sequential Decision-Making Capability of Large Language Models**|Zhanyue Qin et.al.|[2406.16382v1](http://arxiv.org/abs/2406.16382v1)|null|\n", "2406.16235": "|**2024-06-23**|**Preference Tuning For Toxicity Mitigation Generalizes Across Languages**|Xiaochen Li et.al.|[2406.16235v1](http://arxiv.org/abs/2406.16235v1)|**[link](https://github.com/batsresearch/cross-lingual-detox)**|\n", "2406.15963": "|**2024-06-23**|**Effectiveness of ChatGPT in explaining complex medical reports to patients**|Mengxuan Sun et.al.|[2406.15963v1](http://arxiv.org/abs/2406.15963v1)|null|\n", "2406.15859": "|**2024-06-30**|**LLM-Powered Explanations: Unraveling Recommendations Through Subgraph Reasoning**|Guangsi Shi et.al.|[2406.15859v2](http://arxiv.org/abs/2406.15859v2)|null|\n", "2406.17692": "|**2024-06-25**|**From Distributional to Overton Pluralism: Investigating Large Language Model Alignment**|Thom Lake et.al.|[2406.17692v1](http://arxiv.org/abs/2406.17692v1)|**[link](https://github.com/thomlake/investigating-alignment)**|\n", "2406.17642": "|**2024-06-25**|**Banishing LLM Hallucinations Requires Rethinking Generalization**|Johnny Li et.al.|[2406.17642v1](http://arxiv.org/abs/2406.17642v1)|null|\n", "2406.16985": "|**2024-06-23**|**Unveiling LLM Mechanisms Through Neural ODEs and Control Theory**|Yukun Zhang et.al.|[2406.16985v1](http://arxiv.org/abs/2406.16985v1)|null|\n", "2406.18512": "|**2024-06-26**|**\"Is ChatGPT a Better Explainer than My Professor?\": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline**|Grace Li et.al.|[2406.18512v1](http://arxiv.org/abs/2406.18512v1)|null|\n", "2406.18505": "|**2024-06-26**|**Mental Modeling of Reinforcement Learning Agents by Language Models**|Wenhao Lu et.al.|[2406.18505v1](http://arxiv.org/abs/2406.18505v1)|null|\n", "2406.18501": "|**2024-06-26**|**Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming**|Zhenghao Zhou et.al.|[2406.18501v1](http://arxiv.org/abs/2406.18501v1)|null|\n", "2406.19949": "|**2024-06-28**|**Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring**|Jiazheng Li et.al.|[2406.19949v1](http://arxiv.org/abs/2406.19949v1)|null|\n", "2406.19482": "|**2024-06-27**|**xTower: A Multilingual LLM for Explaining and Correcting Translation Errors**|Marcos Treviso et.al.|[2406.19482v1](http://arxiv.org/abs/2406.19482v1)|null|\n", "2407.02833": "|**2024-07-03**|**LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation**|Hongke Zhao et.al.|[2407.02833v1](http://arxiv.org/abs/2407.02833v1)|null|\n", "2407.00997": "|**2024-07-01**|**Engineering Conversational Search Systems: A Review of Applications, Architectures, and Functional Components**|Phillip Schneider et.al.|[2407.00997v1](http://arxiv.org/abs/2407.00997v1)|null|\n", "2407.00994": "|**2024-07-08**|**LLM Uncertainty Quantification through Directional Entailment Graph and Claim Level Response Augmentation**|Longchao Da et.al.|[2407.00994v2](http://arxiv.org/abs/2407.00994v2)|null|\n", "2407.00668": "|**2024-07-03**|**HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability**|Yanfang Chen et.al.|[2407.00668v2](http://arxiv.org/abs/2407.00668v2)|**[link](https://github.com/hush-cd/HRDE)**|\n", "2407.00468": "|**2024-06-29**|**MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation**|Jinsheng Huang et.al.|[2407.00468v1](http://arxiv.org/abs/2407.00468v1)|**[link](https://github.com/chenllliang/mmevalpro)**|\n", "2407.00219": "|**2024-06-28**|**Evaluating Human Alignment and Model Faithfulness of LLM Rationale**|Mohsen Fayyaz et.al.|[2407.00219v1](http://arxiv.org/abs/2407.00219v1)|null|\n", "2407.00167": "|**2024-06-28**|**Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach**|Sai Krishna Revanth Vuruma et.al.|[2407.00167v1](http://arxiv.org/abs/2407.00167v1)|null|\n", "2407.03778": "|**2024-07-04**|**From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI**|Stefanie Krause et.al.|[2407.03778v1](http://arxiv.org/abs/2407.03778v1)|null|\n", "2407.03678": "|**2024-07-04**|**Improving Self Consistency in LLMs through Probabilistic Tokenization**|Ashutosh Sathe et.al.|[2407.03678v1](http://arxiv.org/abs/2407.03678v1)|null|\n", "2407.03621": "|**2024-07-04**|**The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model**|Brenden Smith et.al.|[2407.03621v1](http://arxiv.org/abs/2407.03621v1)|**[link](https://github.com/dragnlabs/injectable-alignment-model)**|\n", "2407.07890": "|**2024-07-10**|**Training on the Test Task Confounds Evaluation and Emergence**|Ricardo Dominguez-Olmedo et.al.|[2407.07890v1](http://arxiv.org/abs/2407.07890v1)|**[link](https://github.com/socialfoundations/training-on-the-test-task)**|\n", "2407.07666": "|**2024-07-10**|**A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability**|Ting Fang Tan et.al.|[2407.07666v1](http://arxiv.org/abs/2407.07666v1)|null|\n", "2407.06241": "|**2024-07-08**|**SimPal: Towards a Meta-Conversational Framework to Understand Teacher's Instructional Goals for K-12 Physics**|Effat Farhana et.al.|[2407.06241v1](http://arxiv.org/abs/2407.06241v1)|null|\n", "2407.05464": "|**2024-07-07**|**Experiments with truth using Machine Learning: Spectral analysis and explainable classification of synthetic, false, and genuine information**|Vishnu S. Pendyala et.al.|[2407.05464v1](http://arxiv.org/abs/2407.05464v1)|null|\n", "2407.05308": "|**2024-07-07**|**Exploring the Educational Landscape of AI: Large Language Models' Approaches to Explaining Conservation of Momentum in Physics**|Keisuke Sato et.al.|[2407.05308v1](http://arxiv.org/abs/2407.05308v1)|null|\n", "2407.08331": "|**2024-07-11**|**Towards Explainable Evolution Strategies with Large Language Models**|Jill Baumann et.al.|[2407.08331v1](http://arxiv.org/abs/2407.08331v1)|null|\n", "2407.09292": "|**2024-07-17**|**Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models**|Dong Shu et.al.|[2407.09292v2](http://arxiv.org/abs/2407.09292v2)|null|\n", "2407.09283": "|**2024-07-12**|**DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection**|Sangpil Youm et.al.|[2407.09283v1](http://arxiv.org/abs/2407.09283v1)|null|\n", "2407.08836": "|**2024-07-11**|**Fault Diagnosis in Power Grids with Large Language Model**|Liu Jing et.al.|[2407.08836v1](http://arxiv.org/abs/2407.08836v1)|null|\n", "2407.10793": "|**2024-07-15**|**GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework**|Hannah Sansford et.al.|[2407.10793v1](http://arxiv.org/abs/2407.10793v1)|null|\n", "2407.10735": "|**2024-07-16**|**Transforming Agency. On the mode of existence of Large Language Models**|Xabier E. Barandiaran et.al.|[2407.10735v2](http://arxiv.org/abs/2407.10735v2)|null|\n", "2407.10490": "|**2024-07-15**|**Learning Dynamics of LLM Finetuning**|Yi Ren et.al.|[2407.10490v1](http://arxiv.org/abs/2407.10490v1)|**[link](https://github.com/joshua-ren/learning_dynamics_llm)**|\n", "2407.10086": "|**2024-07-19**|**Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine**|Omid Rohanian et.al.|[2407.10086v2](http://arxiv.org/abs/2407.10086v2)|null|\n", "2407.09855": "|**2024-07-13**|**Building pre-train LLM Dataset for the INDIC Languages: a case study on Hindi**|Shantipriya Parida et.al.|[2407.09855v1](http://arxiv.org/abs/2407.09855v1)|null|\n", "2407.11384": "|**2024-07-16**|**InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains**|Yinzhu Quan et.al.|[2407.11384v1](http://arxiv.org/abs/2407.11384v1)|**[link](https://github.com/zefang-liu/InvAgent)**|\n", "2407.11005": "|**2024-06-25**|**RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems**|Robert Friel et.al.|[2407.11005v1](http://arxiv.org/abs/2407.11005v1)|null|\n", "2407.10996": "|**2024-06-24**|**Visualization Literacy of Multimodal Large Language Models: A Comparative Study**|Zhimin Li et.al.|[2407.10996v1](http://arxiv.org/abs/2407.10996v1)|null|\n", "2407.10989": "|**2024-06-23**|**Do Large Language Models Understand Verbal Indicators of Romantic Attraction?**|Sandra C. Matz et.al.|[2407.10989v1](http://arxiv.org/abs/2407.10989v1)|null|\n", "2407.11203": "|**2024-06-03**|**The Life Cycle of Large Language Models: A Review of Biases in Education**|Jinsook Lee et.al.|[2407.11203v1](http://arxiv.org/abs/2407.11203v1)|null|\n", "2407.13648": "|**2024-07-18**|**COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization**|Skyler Grandel et.al.|[2407.13648v1](http://arxiv.org/abs/2407.13648v1)|null|\n", "2407.13117": "|**2024-07-18**|**SOMONITOR: Explainable Marketing Data Processing and Analysis with Large Language Models**|Qi Yang et.al.|[2407.13117v1](http://arxiv.org/abs/2407.13117v1)|null|\n", "2407.12888": "|**2024-07-17**|**Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models**|Alexander R. Pelletier et.al.|[2407.12888v1](http://arxiv.org/abs/2407.12888v1)|null|\n", "2407.12882": "|**2024-07-16**|**InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification**|Yujia Hu et.al.|[2407.12882v1](http://arxiv.org/abs/2407.12882v1)|**[link](https://github.com/Social-AI-Studio/InstructAV)**|\n", "2407.12831": "|**2024-07-03**|**Truth is Universal: Robust Detection of Lies in LLMs**|Lennart B\u00fcrger et.al.|[2407.12831v1](http://arxiv.org/abs/2407.12831v1)|null|\n", "2407.15720": "|**2024-07-22**|**Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability**|Zhuoyan Xu et.al.|[2407.15720v1](http://arxiv.org/abs/2407.15720v1)|**[link](https://github.com/oliverxuzy/llm_compose)**|\n", "2407.15360": "|**2024-07-22**|**Dissecting Multiplication in Transformers: Insights into LLMs**|Luyu Qiu et.al.|[2407.15360v1](http://arxiv.org/abs/2407.15360v1)|null|\n", "2407.15255": "|**2024-07-21**|**Explaining Decisions of Agents in Mixed-Motive Games**|Maayan Orner et.al.|[2407.15255v1](http://arxiv.org/abs/2407.15255v1)|null|\n", "2407.15248": "|**2024-07-21**|**XAI meets LLMs: A Survey of the Relation between Explainable AI and Large Language Models**|Erik Cambria et.al.|[2407.15248v1](http://arxiv.org/abs/2407.15248v1)|null|\n", "2407.14573": "|**2024-07-21**|**Trading Devil Final: Backdoor attack via Stock market and Bayesian Optimization**|Orson Mengara et.al.|[2407.14573v1](http://arxiv.org/abs/2407.14573v1)|null|\n", "2407.14845": "|**2024-07-20**|**Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models**|Ze Yu Zhang et.al.|[2407.14845v1](http://arxiv.org/abs/2407.14845v1)|null|\n", "2407.14487": "|**2024-07-19**|**Evaluating the Reliability of Self-Explanations in Large Language Models**|Korbinian Randl et.al.|[2407.14487v1](http://arxiv.org/abs/2407.14487v1)|**[link](https://github.com/k-randl/self-explaining_llms)**|\n", "2407.14452": "|**2024-07-19**|**Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier**|Zachary Wojtowicz et.al.|[2407.14452v1](http://arxiv.org/abs/2407.14452v1)|null|\n", "2407.13880": "|**2024-07-18**|**The Software Complexity of Nations**|S\u00e1ndor Juh\u00e1sz et.al.|[2407.13880v1](http://arxiv.org/abs/2407.13880v1)|null|\n", "2407.13787": "|**2024-07-24**|**The Honorific Effect: Exploring the Impact of Japanese Linguistic Formalities on AI-Generated Physics Explanations**|Keisuke Sato et.al.|[2407.13787v2](http://arxiv.org/abs/2407.13787v2)|null|\n", "2407.16552": "|**2024-07-24**|**MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues**|Liyun Zhang et.al.|[2407.16552v2](http://arxiv.org/abs/2407.16552v2)|null|\n", "2407.15987": "|**2024-07-22**|**AI for Handball: predicting and explaining the 2024 Olympic Games tournament with Deep Learning and Large Language Models**|Florian Felice et.al.|[2407.15987v1](http://arxiv.org/abs/2407.15987v1)|null|\n", "2407.17365": "|**2024-07-24**|**ViPer: Visual Personalization of Generative Models via Individual Preference Learning**|Sogand Salehi et.al.|[2407.17365v1](http://arxiv.org/abs/2407.17365v1)|null|\n", "2407.17011": "|**2024-07-24**|**Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism**|Anhao Zhao et.al.|[2407.17011v1](http://arxiv.org/abs/2407.17011v1)|null|\n"}, "LLM - Interpretable": {"2311.18836": "|**2023-11-30**|**PoseGPT: Chatting about 3D Human Pose**|Yao Feng et.al.|[2311.18836v1](http://arxiv.org/abs/2311.18836v1)|null|\n", "2311.18775": "|**2023-11-30**|**CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation**|Zineng Tang et.al.|[2311.18775v1](http://arxiv.org/abs/2311.18775v1)|null|\n", "2311.18743": "|**2023-12-05**|**AlignBench: Benchmarking Chinese Alignment of Large Language Models**|Xiao Liu et.al.|[2311.18743v3](http://arxiv.org/abs/2311.18743v3)|**[link](https://github.com/thudm/alignbench)**|\n", "2311.18307": "|**2023-11-30**|**Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent**|Yuxiao Chen et.al.|[2311.18307v1](http://arxiv.org/abs/2311.18307v1)|null|\n", "2311.18034": "|**2023-11-29**|**Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings**|Andrea W Wen-Yi et.al.|[2311.18034v1](http://arxiv.org/abs/2311.18034v1)|**[link](https://github.com/andreawwenyi/hyperpolyglot)**|\n", "2311.17647": "|**2023-11-29**|**VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following**|Yujie Lu et.al.|[2311.17647v1](http://arxiv.org/abs/2311.17647v1)|null|\n", "2311.17351": "|**2023-11-29**|**Exploring Large Language Models for Human Mobility Prediction under Public Events**|Yuebing Liang et.al.|[2311.17351v1](http://arxiv.org/abs/2311.17351v1)|null|\n", "2311.17331": "|**2023-11-29**|**Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering**|Zeqing Wang et.al.|[2311.17331v1](http://arxiv.org/abs/2311.17331v1)|null|\n", "2311.17937": "|**2023-11-28**|**Unlocking Spatial Comprehension in Text-to-Image Diffusion Models**|Mohammad Mahdi Derakhshani et.al.|[2311.17937v1](http://arxiv.org/abs/2311.17937v1)|null|\n", "2311.17002": "|**2023-11-30**|**Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following**|Yutong Feng et.al.|[2311.17002v2](http://arxiv.org/abs/2311.17002v2)|null|\n", "2311.17126": "|**2023-11-28**|**Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis**|Xiaohui Chen et.al.|[2311.17126v1](http://arxiv.org/abs/2311.17126v1)|null|\n", "2311.16509": "|**2023-12-27**|**StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models**|Kazuki Yamauchi et.al.|[2311.16509v2](http://arxiv.org/abs/2311.16509v2)|null|\n", "2311.16093": "|**2023-11-27**|**Have we built machines that think like people?**|Luca M. Schulze Buschoff et.al.|[2311.16093v1](http://arxiv.org/abs/2311.16093v1)|**[link](https://github.com/lsbuschoff/multimodal)**|\n", "2311.16017": "|**2023-11-27**|**Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models**|Stephen MacNeil et.al.|[2311.16017v1](http://arxiv.org/abs/2311.16017v1)|null|\n", "2311.15983": "|**2023-11-27**|**Sparsify-then-Classify: From Internal Neurons of Large Language Models To Efficient Text Classifiers**|Yilun Liu et.al.|[2311.15983v1](http://arxiv.org/abs/2311.15983v1)|**[link](https://github.com/difanj0713/sparsify-then-classify)**|\n", "2311.16483": "|**2023-11-27**|**ChartLlama: A Multimodal LLM for Chart Understanding and Generation**|Yucheng Han et.al.|[2311.16483v1](http://arxiv.org/abs/2311.16483v1)|null|\n", "2311.16500": "|**2023-12-10**|**LLMGA: Multimodal Large Language Model based Generation Assistant**|Bin Xia et.al.|[2311.16500v2](http://arxiv.org/abs/2311.16500v2)|**[link](https://github.com/dvlab-research/LLMGA)**|\n", "2311.15585": "|**2023-11-27**|**Dawning of a New Era in Gravitational Wave Data Analysis: Unveiling Cosmic Mysteries via Artificial Intelligence -- A Systematic Review**|Tianyu Zhao et.al.|[2311.15585v1](http://arxiv.org/abs/2311.15585v1)|null|\n", "2311.15209": "|**2023-12-03**|**See and Think: Embodied Agent in Virtual Environment**|Zhonghan Zhao et.al.|[2311.15209v2](http://arxiv.org/abs/2311.15209v2)|null|\n", "2311.15131": "|**2023-11-25**|**Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching**|James Campbell et.al.|[2311.15131v1](http://arxiv.org/abs/2311.15131v1)|null|\n", "2311.14519": "|**2023-11-24**|**Benchmarking Large Language Models for Log Analysis, Security, and Interpretation**|Egil Karlsen et.al.|[2311.14519v1](http://arxiv.org/abs/2311.14519v1)|null|\n", "2311.14115": "|**2023-11-30**|**A density estimation perspective on learning from pairwise human preferences**|Vincent Dumoulin et.al.|[2311.14115v2](http://arxiv.org/abs/2311.14115v2)|**[link](https://github.com/google-deepmind/pbde)**|\n", "2311.14061": "|**2023-11-23**|**Towards Explainable Strategy Templates using NLP Transformers**|Pallavi Bagga et.al.|[2311.14061v1](http://arxiv.org/abs/2311.14061v1)|null|\n", "2311.13857": "|**2023-11-23**|**Challenges of Large Language Models for Mental Health Counseling**|Neo Christopher Chung et.al.|[2311.13857v1](http://arxiv.org/abs/2311.13857v1)|null|\n", "2311.13743": "|**2023-12-03**|**FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design**|Yangyang Yu et.al.|[2311.13743v2](http://arxiv.org/abs/2311.13743v2)|**[link](https://github.com/pipiku915/finmem-llm-stocktrading)**|\n", "2311.13549": "|**2023-11-22**|**ADriver-I: A General World Model for Autonomous Driving**|Fan Jia et.al.|[2311.13549v1](http://arxiv.org/abs/2311.13549v1)|null|\n", "2311.13627": "|**2023-11-22**|**Vamos: Versatile Action Models for Video Understanding**|Shijie Wang et.al.|[2311.13627v1](http://arxiv.org/abs/2311.13627v1)|null|\n", "2311.13194": "|**2023-12-15**|**Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs**|Yonghui Wang et.al.|[2311.13194v2](http://arxiv.org/abs/2311.13194v2)|**[link](https://github.com/harrytea/tgdoc)**|\n", "2311.13063": "|**2023-11-25**|**From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models**|Zachary Englhardt et.al.|[2311.13063v2](http://arxiv.org/abs/2311.13063v2)|null|\n", "2311.12524": "|**2023-11-21**|**ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models**|Jiankai Tang et.al.|[2311.12524v1](http://arxiv.org/abs/2311.12524v1)|**[link](https://github.com/mcjacktang/llm-healthassistant)**|\n", "2311.12287": "|**2023-11-21**|**Adapting LLMs for Efficient, Personalized Information Retrieval: Methods and Implications**|Samira Ghodratnama et.al.|[2311.12287v1](http://arxiv.org/abs/2311.12287v1)|null|\n", "2311.11797": "|**2023-11-20**|**Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents**|Zhuosheng Zhang et.al.|[2311.11797v1](http://arxiv.org/abs/2311.11797v1)|**[link](https://github.com/zoeyyao27/cot-igniting-agent)**|\n", "2311.11628": "|**2023-11-20**|**Incorporating LLM Priors into Tabular Learners**|Max Zhu et.al.|[2311.11628v1](http://arxiv.org/abs/2311.11628v1)|null|\n", "2311.11516": "|**2023-11-20**|**GPT in Data Science: A Practical Exploration of Model Selection**|Nathalia Nascimento et.al.|[2311.11516v1](http://arxiv.org/abs/2311.11516v1)|null|\n", "2311.11482": "|**2023-11-20**|**Meta Prompting for AGI Systems**|Yifan Zhang et.al.|[2311.11482v1](http://arxiv.org/abs/2311.11482v1)|**[link](https://github.com/meta-prompting/meta-prompting)**|\n", "2311.14722": "|**2023-11-19**|**Zero-Shot Question Answering over Financial Documents using Large Language Models**|Karmvir Singh Phogat et.al.|[2311.14722v1](http://arxiv.org/abs/2311.14722v1)|null|\n", "2311.11267": "|**2023-12-17**|**Rethinking Large Language Models in Mental Health Applications**|Shaoxiong Ji et.al.|[2311.11267v2](http://arxiv.org/abs/2311.11267v2)|null|\n", "2311.11012": "|**2023-11-18**|**Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models**|Haoran Zhao et.al.|[2311.11012v1](http://arxiv.org/abs/2311.11012v1)|null|\n", "2311.10947": "|**2023-11-18**|**RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability**|Yuxuan Lei et.al.|[2311.10947v1](http://arxiv.org/abs/2311.10947v1)|null|\n", "2311.10905": "|**2023-11-17**|**Flexible Model Interpretability through Natural Language Model Editing**|Karel D'Oosterlinck et.al.|[2311.10905v1](http://arxiv.org/abs/2311.10905v1)|null|\n", "2311.10813": "|**2023-11-27**|**A Language Agent for Autonomous Driving**|Jiageng Mao et.al.|[2311.10813v3](http://arxiv.org/abs/2311.10813v3)|**[link](https://github.com/usc-gvl/agent-driver)**|\n", "2311.10537": "|**2023-11-16**|**MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning**|Xiangru Tang et.al.|[2311.10537v1](http://arxiv.org/abs/2311.10537v1)|**[link](https://github.com/gersteinlab/medagents)**|\n", "2311.09796": "|**2023-11-16**|**Interpreting User Requests in the Context of Natural Language Standing Instructions**|Nikita Moghe et.al.|[2311.09796v1](http://arxiv.org/abs/2311.09796v1)|null|\n", "2311.09721": "|**2023-11-16**|**On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering**|Linyong Nan et.al.|[2311.09721v1](http://arxiv.org/abs/2311.09721v1)|null|\n", "2311.09635": "|**2023-11-16**|**Evaluating In-Context Learning of Libraries for Code Generation**|Arkil Patel et.al.|[2311.09635v1](http://arxiv.org/abs/2311.09635v1)|null|\n", "2311.09612": "|**2023-11-16**|**Efficient End-to-End Visual Document Understanding with Rationale Distillation**|Wang Zhu et.al.|[2311.09612v1](http://arxiv.org/abs/2311.09612v1)|null|\n", "2311.09558": "|**2023-11-16**|**Pachinko: Patching Interpretable QA Models through Natural Language Feedback**|Chaitanya Malaviya et.al.|[2311.09558v1](http://arxiv.org/abs/2311.09558v1)|**[link](https://github.com/chaitanyamalaviya/pachinko)**|\n", "2311.10774": "|**2023-11-15**|**MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning**|Fuxiao Liu et.al.|[2311.10774v1](http://arxiv.org/abs/2311.10774v1)|**[link](https://github.com/fuxiaoliu/mmc)**|\n", "2311.09206": "|**2023-11-15**|**TableLlama: Towards Open Large Generalist Models for Tables**|Tianshu Zhang et.al.|[2311.09206v1](http://arxiv.org/abs/2311.09206v1)|null|\n", "2311.09033": "|**2023-11-15**|**MELA: Multilingual Evaluation of Linguistic Acceptability**|Ziyin Zhang et.al.|[2311.09033v1](http://arxiv.org/abs/2311.09033v1)|null|\n", "2311.08968": "|**2023-11-15**|**Identifying Linear Relational Concepts in Large Language Models**|David Chanin et.al.|[2311.08968v1](http://arxiv.org/abs/2311.08968v1)|null|\n", "2311.08957": "|**2023-11-15**|**I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots**|Giulio Antonio Abbo et.al.|[2311.08957v1](http://arxiv.org/abs/2311.08957v1)|null|\n", "2311.08896": "|**2023-11-15**|**HELLaMA: LLaMA-based Table to Text Generation by Highlighting the Important Evidence**|Junyi Bian et.al.|[2311.08896v1](http://arxiv.org/abs/2311.08896v1)|null|\n", "2311.08723": "|**2023-11-15**|**Token Prediction as Implicit Classification to Identify LLM-Generated Text**|Yutian Chen et.al.|[2311.08723v1](http://arxiv.org/abs/2311.08723v1)|**[link](https://github.com/markchenyutian/t5-sentinel-public)**|\n", "2311.08718": "|**2023-11-15**|**Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling**|Bairu Hou et.al.|[2311.08718v1](http://arxiv.org/abs/2311.08718v1)|**[link](https://github.com/ucsb-nlp-chang/llm_uncertainty)**|\n", "2311.08614": "|**2023-11-15**|**XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making**|Zichen Chen et.al.|[2311.08614v1](http://arxiv.org/abs/2311.08614v1)|null|\n", "2311.08605": "|**2023-11-15**|**Navigating the Ocean of Biases: Political Bias Attribution in Language Models via Causal Structures**|David F. Jenny et.al.|[2311.08605v1](http://arxiv.org/abs/2311.08605v1)|**[link](https://github.com/david-jenny/llm-political-study)**|\n", "2311.08576": "|**2023-11-14**|**Towards Evaluating AI Systems for Moral Status Using Self-Reports**|Ethan Perez et.al.|[2311.08576v1](http://arxiv.org/abs/2311.08576v1)|null|\n", "2311.08535": "|**2023-11-14**|**Taxonomy, Semantic Data Schema, and Schema Alignment for Open Data in Urban Building Energy Modeling**|Liang Zhang et.al.|[2311.08535v1](http://arxiv.org/abs/2311.08535v1)|null|\n", "2311.08364": "|**2023-11-14**|**Plum: Prompt Learning using Metaheuristic**|Rui Pan et.al.|[2311.08364v1](http://arxiv.org/abs/2311.08364v1)|**[link](https://github.com/research4pan/plum)**|\n", "2311.08206": "|**2023-11-14**|**Human-Centric Autonomous Systems With LLMs for User Command Reasoning**|Yi Yang et.al.|[2311.08206v1](http://arxiv.org/abs/2311.08206v1)|**[link](https://github.com/kth-rpl/drivecmd_llm)**|\n", "2311.07532": "|**2023-11-13**|**It's Not Easy Being Wrong: Evaluating Process of Elimination Reasoning in Large Language Models**|Nishant Balepur et.al.|[2311.07532v1](http://arxiv.org/abs/2311.07532v1)|**[link](https://github.com/nbalepur/poe)**|\n", "2311.07470": "|**2023-11-13**|**Finding and Editing Multi-Modal Neurons in Pre-Trained Transformer**|Haowen Pan et.al.|[2311.07470v1](http://arxiv.org/abs/2311.07470v1)|null|\n", "2311.07466": "|**2023-11-13**|**On Measuring Faithfulness of Natural Language Explanations**|Letitia Parcalabescu et.al.|[2311.07466v1](http://arxiv.org/abs/2311.07466v1)|**[link](https://github.com/heidelberg-nlp/cc-shap)**|\n", "2311.07314": "|**2023-11-13**|**Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models**|Junpeng Li et.al.|[2311.07314v1](http://arxiv.org/abs/2311.07314v1)|null|\n", "2311.06979": "|**2023-11-12**|**Assessing the Interpretability of Programmatic Policies with Large Language Models**|Zahra Bashir et.al.|[2311.06979v1](http://arxiv.org/abs/2311.06979v1)|null|\n", "2311.06957": "|**2023-11-12**|**Simulating Public Administration Crisis: A Novel Generative Agent-Based Simulation System to Lower Technology Barriers in Social Science Research**|Bushi Xiao et.al.|[2311.06957v1](http://arxiv.org/abs/2311.06957v1)|null|\n", "2311.07605": "|**2023-11-11**|**Conceptual Model Interpreter for Large Language Models**|Felix H\u00e4rer et.al.|[2311.07605v1](http://arxiv.org/abs/2311.07605v1)|**[link](https://github.com/fhaer/llm-cmi)**|\n", "2311.06390": "|**2023-11-10**|**ChatGPT in the context of precision agriculture data analytics**|Ilyas Potamitis et.al.|[2311.06390v1](http://arxiv.org/abs/2311.06390v1)|**[link](https://github.com/potamitis123/chatgpt-in-the-context-of-precision-agriculture-data-analytics)**|\n", "2311.05754": "|**2023-11-09**|**Deep Natural Language Feature Learning for Interpretable Prediction**|Felipe Urrutia et.al.|[2311.05754v1](http://arxiv.org/abs/2311.05754v1)|**[link](https://github.com/furrutiav/nllf-emnlp-2023)**|\n", "2311.05297": "|**2023-11-09**|**Do personality tests generalize to Large Language Models?**|Florian E. Dorner et.al.|[2311.05297v1](http://arxiv.org/abs/2311.05297v1)|null|\n", "2311.09241": "|**2023-11-09**|**Chain of Images for Intuitively Reasoning**|Fanxu Meng et.al.|[2311.09241v1](http://arxiv.org/abs/2311.09241v1)|**[link](https://github.com/graphpku/coi)**|\n", "2311.04886": "|**2023-11-08**|**SEMQA: Semi-Extractive Multi-Source Question Answering**|Tal Schuster et.al.|[2311.04886v1](http://arxiv.org/abs/2311.04886v1)|**[link](https://github.com/google-research-datasets/quotesum)**|\n", "2311.04348": "|**2023-11-07**|**Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning**|Sai Munikoti et.al.|[2311.04348v1](http://arxiv.org/abs/2311.04348v1)|null|\n", "2311.04205": "|**2023-11-07**|**Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves**|Yihe Deng et.al.|[2311.04205v1](http://arxiv.org/abs/2311.04205v1)|**[link](https://github.com/uclaml/Rephrase-and-Respond)**|\n", "2311.04166": "|**2023-11-07**|**Perturbed examples reveal invariances shared by language models**|Ruchit Rawal et.al.|[2311.04166v1](http://arxiv.org/abs/2311.04166v1)|null|\n", "2311.04047": "|**2023-11-07**|**Extracting human interpretable structure-property relationships in chemistry using XAI and large language models**|Geemi P. Wellawatte et.al.|[2311.04047v1](http://arxiv.org/abs/2311.04047v1)|**[link](https://github.com/geemi725/xpertai)**|\n", "2311.03799": "|**2023-11-07**|**Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models**|Yichao Cao et.al.|[2311.03799v1](http://arxiv.org/abs/2311.03799v1)|**[link](https://github.com/caoyichao/unihoi)**|\n", "2311.03734": "|**2023-11-07**|**Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning**|Ruosen Li et.al.|[2311.03734v1](http://arxiv.org/abs/2311.03734v1)|**[link](https://github.com/bcdnlp/structure-qa)**|\n", "2311.03658": "|**2023-11-07**|**The Linear Representation Hypothesis and the Geometry of Large Language Models**|Kiho Park et.al.|[2311.03658v1](http://arxiv.org/abs/2311.03658v1)|**[link](https://github.com/kihopark/linear_rep_geometry)**|\n", "2311.03033": "|**2023-11-06**|**Beyond Words: A Mathematical Framework for Interpreting Large Language Models**|Javier Gonz\u00e1lez et.al.|[2311.03033v1](http://arxiv.org/abs/2311.03033v1)|null|\n", "2311.02807": "|**2023-11-06**|**QualEval: Qualitative Evaluation for Model Improvement**|Vishvak Murahari et.al.|[2311.02807v1](http://arxiv.org/abs/2311.02807v1)|**[link](https://github.com/vmurahari3/qualeval)**|\n", "2311.01964": "|**2023-11-03**|**Don't Make Your LLM an Evaluation Benchmark Cheater**|Kun Zhou et.al.|[2311.01964v1](http://arxiv.org/abs/2311.01964v1)|null|\n", "2311.01825": "|**2023-11-06**|**Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPT**|Mario S\u00e4nger et.al.|[2311.01825v2](http://arxiv.org/abs/2311.01825v2)|null|\n", "2311.01732": "|**2023-11-12**|**Proto-lm: A Prototypical Network-Based Framework for Built-in Interpretability in Large Language Models**|Sean Xie et.al.|[2311.01732v2](http://arxiv.org/abs/2311.01732v2)|**[link](https://github.com/yx131/proto-lm)**|\n", "2311.01449": "|**2023-11-02**|**TopicGPT: A Prompt-based Topic Modeling Framework**|Chau Minh Pham et.al.|[2311.01449v1](http://arxiv.org/abs/2311.01449v1)|**[link](https://github.com/chtmp223/topicgpt)**|\n", "2311.01403": "|**2023-11-02**|**REAL: Resilience and Adaptation using Large Language Models on Autonomous Aerial Robots**|Andrea Tagliabue et.al.|[2311.01403v1](http://arxiv.org/abs/2311.01403v1)|null|\n", "2311.01150": "|**2023-11-02**|**Revisiting the Knowledge Injection Frameworks**|Peng Fu et.al.|[2311.01150v1](http://arxiv.org/abs/2311.01150v1)|null|\n", "2311.01011": "|**2023-11-02**|**Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game**|Sam Toyer et.al.|[2311.01011v1](http://arxiv.org/abs/2311.01011v1)|null|\n", "2311.00967": "|**2023-11-02**|**Vision-Language Interpreter for Robot Task Planning**|Keisuke Shirai et.al.|[2311.00967v1](http://arxiv.org/abs/2311.00967v1)|**[link](https://github.com/omron-sinicx/vilain)**|\n", "2311.04915": "|**2023-11-02**|**Chain of Empathy: Enhancing Empathetic Response of Large Language Models Based on Psychotherapy Models**|Yoon Kyung Lee et.al.|[2311.04915v1](http://arxiv.org/abs/2311.04915v1)|null|\n", "2311.00926": "|**2023-11-02**|**M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place**|Wentao Yuan et.al.|[2311.00926v1](http://arxiv.org/abs/2311.00926v1)|null|\n", "2311.00671": "|**2023-11-01**|**Emotion Detection for Misinformation: A Review**|Zhiwei Liu et.al.|[2311.00671v1](http://arxiv.org/abs/2311.00671v1)|null|\n", "2311.00618": "|**2023-11-01**|**De-Diffusion Makes Text a Strong Cross-Modal Interface**|Chen Wei et.al.|[2311.00618v1](http://arxiv.org/abs/2311.00618v1)|null|\n", "2311.00237": "|**2023-11-01**|**The Mystery and Fascination of LLMs: A Comprehensive Survey on the Interpretation and Analysis of Emergent Abilities**|Yuxiang Zhou et.al.|[2311.00237v1](http://arxiv.org/abs/2311.00237v1)|null|\n", "2311.00223": "|**2023-11-01**|**Is GPT Powerful Enough to Analyze the Emotions of Memes?**|Jingjing Wang et.al.|[2311.00223v1](http://arxiv.org/abs/2311.00223v1)|null|\n", "2310.20487": "|**2023-10-31**|**Large Language Model Can Interpret Latent Space of Sequential Recommender**|Zhengyi Yang et.al.|[2310.20487v1](http://arxiv.org/abs/2310.20487v1)|**[link](https://github.com/yangzhengyi98/recinterpreter)**|\n", "2310.20440": "|**2023-10-31**|**The SourceData-NLP dataset: integrating curation into scientific publishing for training large language models**|Jorge Abreu-Vicente et.al.|[2310.20440v1](http://arxiv.org/abs/2310.20440v1)|**[link](https://github.com/source-data/soda-data)**|\n", "2310.19998": "|**2023-10-30**|**Generative retrieval-augmented ontologic graph and multi-agent strategies for interpretive large language model-based materials design**|Markus J. Buehler et.al.|[2310.19998v1](http://arxiv.org/abs/2310.19998v1)|null|\n", "2310.19915": "|**2023-10-30**|**GPCR-BERT: Interpreting Sequential Design of G Protein Coupled Receptors Using Protein Language Models**|Seongwon Kim et.al.|[2310.19915v1](http://arxiv.org/abs/2310.19915v1)|null|\n", "2312.00164": "|**2023-11-30**|**Towards Accurate Differential Diagnosis with Large Language Models**|Daniel McDuff et.al.|[2312.00164v1](http://arxiv.org/abs/2312.00164v1)|null|\n", "2312.01818": "|**2023-12-04**|**Learning Machine Morality through Experience and Interaction**|Elizaveta Tennant et.al.|[2312.01818v1](http://arxiv.org/abs/2312.01818v1)|null|\n", "2312.01678": "|**2023-12-26**|**Jellyfish: A Large Language Model for Data Preprocessing**|Haochen Zhang et.al.|[2312.01678v3](http://arxiv.org/abs/2312.01678v3)|null|\n", "2312.01648": "|**2023-12-11**|**Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation**|Randall Balestriero et.al.|[2312.01648v2](http://arxiv.org/abs/2312.01648v2)|**[link](https://github.com/randallbalestriero/splinellm)**|\n", "2312.01552": "|**2023-12-04**|**The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning**|Bill Yuchen Lin et.al.|[2312.01552v1](http://arxiv.org/abs/2312.01552v1)|null|\n", "2312.01307": "|**2023-12-03**|**SAGE: Bridging Semantic and Actionable Parts for GEneralizable Articulated-Object Manipulation under Language Instructions**|Haoran Geng et.al.|[2312.01307v1](http://arxiv.org/abs/2312.01307v1)|null|\n", "2312.01279": "|**2023-12-03**|**TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents**|James Enouen et.al.|[2312.01279v1](http://arxiv.org/abs/2312.01279v1)|null|\n", "2312.01202": "|**2023-12-02**|**From Voices to Validity: Leveraging Large Language Models (LLMs) for Textual Analysis of Policy Stakeholder Interviews**|Alex Liu et.al.|[2312.01202v1](http://arxiv.org/abs/2312.01202v1)|null|\n", "2312.00894": "|**2023-12-01**|**Leveraging Large Language Models to Improve REST API Testing**|Myeongsoo Kim et.al.|[2312.00894v1](http://arxiv.org/abs/2312.00894v1)|null|\n", "2312.00812": "|**2023-12-18**|**Empowering Autonomous Driving with Large Language Models: A Safety Perspective**|Yixuan Wang et.al.|[2312.00812v3](http://arxiv.org/abs/2312.00812v3)|null|\n", "2312.02401": "|**2023-12-05**|**Harmonizing Global Voices: Culturally-Aware Models for Enhanced Content Moderation**|Alex J. Chan et.al.|[2312.02401v1](http://arxiv.org/abs/2312.02401v1)|null|\n", "2312.02296": "|**2023-12-04**|**LLMs Accelerate Annotation for Medical Information Extraction**|Akshay Goel et.al.|[2312.02296v1](http://arxiv.org/abs/2312.02296v1)|null|\n", "2312.02226": "|**2023-12-04**|**Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition**|Chengyou Jia et.al.|[2312.02226v1](http://arxiv.org/abs/2312.02226v1)|null|\n", "2312.02179": "|**2023-11-28**|**Training Chain-of-Thought via Latent-Variable Inference**|Du Phan et.al.|[2312.02179v1](http://arxiv.org/abs/2312.02179v1)|null|\n", "2312.03543": "|**2023-12-06**|**GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models**|Haicheng Liao et.al.|[2312.03543v1](http://arxiv.org/abs/2312.03543v1)|**[link](https://github.com/petrichor625/talk2car_cavg)**|\n", "2312.03140": "|**2023-12-05**|**FlexModel: A Framework for Interpretability of Distributed Large Language Models**|Matthew Choi et.al.|[2312.03140v1](http://arxiv.org/abs/2312.03140v1)|**[link](https://github.com/vectorinstitute/flex_model)**|\n", "2312.03121": "|**2023-12-07**|**Evaluating Agents using Social Choice Theory**|Marc Lanctot et.al.|[2312.03121v2](http://arxiv.org/abs/2312.03121v2)|**[link](https://github.com/google-deepmind/open_spiel/tree/master/open_spiel/python/voting)**|\n", "2312.03013": "|**2023-12-05**|**Breast Ultrasound Report Generation using LangChain**|Jaeyoung Huh et.al.|[2312.03013v1](http://arxiv.org/abs/2312.03013v1)|null|\n", "2312.04494": "|**2023-12-07**|**AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making**|Shusen Liu et.al.|[2312.04494v1](http://arxiv.org/abs/2312.04494v1)|null|\n", "2312.04372": "|**2023-12-07**|**LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs**|Yunsheng Ma et.al.|[2312.04372v1](http://arxiv.org/abs/2312.04372v1)|null|\n", "2312.04316": "|**2023-12-27**|**Towards Knowledge-driven Autonomous Driving**|Xin Li et.al.|[2312.04316v3](http://arxiv.org/abs/2312.04316v3)|**[link](https://github.com/pjlab-adg/awesome-knowledge-driven-ad)**|\n", "2312.04019": "|**2023-12-07**|**Efficiently Predicting Protein Stability Changes Upon Single-point Mutation with Large Language Models**|Yijie Zhang et.al.|[2312.04019v1](http://arxiv.org/abs/2312.04019v1)|null|\n", "2312.03759": "|**2023-12-05**|**How should the advent of large language models affect the practice of science?**|Marcel Binz et.al.|[2312.03759v1](http://arxiv.org/abs/2312.03759v1)|null|\n", "2312.03755": "|**2023-12-04**|**Near-real-time Earthquake-induced Fatality Estimation using Crowdsourced Data and Large-Language Models**|Chenguang Wang et.al.|[2312.03755v1](http://arxiv.org/abs/2312.03755v1)|null|\n", "2312.03733": "|**2023-12-08**|**Methods to Estimate Large Language Model Confidence**|Maia Kotelanski et.al.|[2312.03733v2](http://arxiv.org/abs/2312.03733v2)|null|\n", "2312.04931": "|**2023-12-08**|**Retrieval-based Video Language Model for Efficient Long Video Question Answering**|Jiaqi Xu et.al.|[2312.04931v1](http://arxiv.org/abs/2312.04931v1)|null|\n", "2312.04906": "|**2023-12-08**|**Ophtha-LLaMA2: A Large Language Model for Ophthalmology**|Huan Zhao et.al.|[2312.04906v1](http://arxiv.org/abs/2312.04906v1)|null|\n", "2312.04889": "|**2024-01-10**|**KwaiAgents: Generalized Information-seeking Agent System with Large Language Models**|Haojie Pan et.al.|[2312.04889v3](http://arxiv.org/abs/2312.04889v3)|**[link](https://github.com/kwaikeg/kwaiagents)**|\n", "2312.06644": "|**2023-12-11**|**AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes**|Zehao Wen et.al.|[2312.06644v1](http://arxiv.org/abs/2312.06644v1)|null|\n", "2312.06408": "|**2023-12-11**|**DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics**|Zhiao Huang et.al.|[2312.06408v1](http://arxiv.org/abs/2312.06408v1)|null|\n", "2312.06315": "|**2023-12-11**|**GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models**|Jiaxu Zhao et.al.|[2312.06315v1](http://arxiv.org/abs/2312.06315v1)|null|\n", "2312.06241": "|**2023-12-11**|**ProtoCode: Leveraging Large Language Models for Automated Generation of Machine-Readable Protocols from Scientific Publications**|Shuo Jiang et.al.|[2312.06241v1](http://arxiv.org/abs/2312.06241v1)|null|\n", "2312.05834": "|**2023-12-10**|**Evidence-based Interpretable Open-domain Fact-checking with Large Language Models**|Xin Tan et.al.|[2312.05834v1](http://arxiv.org/abs/2312.05834v1)|null|\n", "2312.05571": "|**2023-12-19**|**Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning**|Subhabrata Dutta et.al.|[2312.05571v2](http://arxiv.org/abs/2312.05571v2)|**[link](https://github.com/joykirat18/syrelm)**|\n", "2312.05468": "|**2023-12-09**|**Image and Data Mining in Reticular Chemistry Using GPT-4V**|Zhiling Zheng et.al.|[2312.05468v1](http://arxiv.org/abs/2312.05468v1)|null|\n", "2312.05464": "|**2023-12-09**|**Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation**|Atoosa Chegini et.al.|[2312.05464v1](http://arxiv.org/abs/2312.05464v1)|null|\n", "2312.05291": "|**2023-12-08**|**GlitchBench: Can large multimodal models detect video game glitches?**|Mohammad Reza Taesiri et.al.|[2312.05291v1](http://arxiv.org/abs/2312.05291v1)|null|\n", "2312.07104": "|**2023-12-12**|**Efficiently Programming Large Language Models using SGLang**|Lianmin Zheng et.al.|[2312.07104v1](http://arxiv.org/abs/2312.07104v1)|**[link](https://github.com/sgl-project/sglang)**|\n", "2312.06965": "|**2023-12-12**|**Towards Enhanced Human Activity Recognition through Natural Language Generation and Pose Estimation**|Nikhil Kashyap et.al.|[2312.06965v1](http://arxiv.org/abs/2312.06965v1)|null|\n", "2312.06681": "|**2023-12-27**|**Steering Llama 2 via Contrastive Activation Addition**|Nina Rimsky et.al.|[2312.06681v2](http://arxiv.org/abs/2312.06681v2)|**[link](https://github.com/nrimsky/sycophancysteering)**|\n", "2312.08027": "|**2023-12-13**|**Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning**|Jinta Weng et.al.|[2312.08027v1](http://arxiv.org/abs/2312.08027v1)|null|\n", "2312.07552": "|**2023-12-07**|**Large Language Models for Intent-Driven Session Recommendations**|Zhu Sun et.al.|[2312.07552v1](http://arxiv.org/abs/2312.07552v1)|**[link](https://github.com/llm4sr/po4isr)**|\n", "2312.10794": "|**2023-12-22**|**A mathematical perspective on Transformers**|Borjan Geshkovski et.al.|[2312.10794v2](http://arxiv.org/abs/2312.10794v2)|**[link](https://github.com/borjang/2023-transformers-rotf)**|\n", "2312.10771": "|**2023-12-17**|**kNN-ICL: Compositional Task-Oriented Parsing Generalization with Nearest Neighbor In-Context Learning**|Wenting Zhao et.al.|[2312.10771v1](http://arxiv.org/abs/2312.10771v1)|null|\n", "2312.10746": "|**2023-12-17**|**Knowledge Trees: Gradient Boosting Decision Trees on Knowledge Neurons as Probing Classifier**|Sergey A. Saltykov et.al.|[2312.10746v1](http://arxiv.org/abs/2312.10746v1)|null|\n", "2312.10702": "|**2023-12-17**|**Can persistent homology whiten Transformer-based black-box models? A case study on BERT compression**|Luis Balderas et.al.|[2312.10702v1](http://arxiv.org/abs/2312.10702v1)|null|\n", "2312.10323": "|**2023-12-16**|**Continuous Prompt Generation from Linear Combination of Discrete Prompt Embeddings**|Pascal Passigan et.al.|[2312.10323v1](http://arxiv.org/abs/2312.10323v1)|null|\n", "2312.10297": "|**2023-12-23**|**Shedding Light on Software Engineering-specific Metaphors and Idioms**|Mia Mohammad Imran et.al.|[2312.10297v2](http://arxiv.org/abs/2312.10297v2)|**[link](https://github.com/vcu-swim-lab/se-figurative-language)**|\n", "2312.09928": "|**2023-12-15**|**Neurosymbolic Value-Inspired AI (Why, What, and How)**|Amit Sheth et.al.|[2312.09928v1](http://arxiv.org/abs/2312.09928v1)|null|\n", "2312.09545": "|**2023-12-15**|**GPT-4 Surpassing Human Performance in Linguistic Pragmatics**|Ljubisa Bojic et.al.|[2312.09545v1](http://arxiv.org/abs/2312.09545v1)|null|\n", "2312.10101": "|**2023-12-15**|**A Review of Repository Level Prompting for LLMs**|Douglas Schonholtz et.al.|[2312.10101v1](http://arxiv.org/abs/2312.10101v1)|null|\n", "2312.09397": "|**2023-12-14**|**Large Language Models for Autonomous Driving: Real-World Experiments**|Can Cui et.al.|[2312.09397v1](http://arxiv.org/abs/2312.09397v1)|null|\n", "2312.09230": "|**2023-12-14**|**Successor Heads: Recurring, Interpretable Attention Heads In The Wild**|Rhys Gould et.al.|[2312.09230v1](http://arxiv.org/abs/2312.09230v1)|null|\n", "2312.08962": "|**2023-12-14**|**Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models**|Zhiyuan You et.al.|[2312.08962v1](http://arxiv.org/abs/2312.08962v1)|null|\n", "2312.08837": "|**2023-12-14**|**Learning Safety Constraints From Demonstration Using One-Class Decision Trees**|Mattijs Baert et.al.|[2312.08837v1](http://arxiv.org/abs/2312.08837v1)|null|\n", "2312.10057": "|**2023-12-04**|**Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarly Work**|Rishab Jain et.al.|[2312.10057v1](http://arxiv.org/abs/2312.10057v1)|null|\n", "2312.11865": "|**2023-12-19**|**Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach**|Weiyu Ma et.al.|[2312.11865v1](http://arxiv.org/abs/2312.11865v1)|**[link](https://github.com/histmeisah/large-language-models-play-starcraftii)**|\n", "2312.11548": "|**2023-12-16**|**Learning Interpretable Queries for Explainable Image Classification with Information Pursuit**|Stefan Kolek et.al.|[2312.11548v1](http://arxiv.org/abs/2312.11548v1)|null|\n", "2312.12763": "|**2023-12-21**|**AMD:Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion**|Beibei Jing et.al.|[2312.12763v2](http://arxiv.org/abs/2312.12763v2)|null|\n", "2312.12598": "|**2023-12-21**|**A Case Study on Test Case Construction with Large Language Models: Unveiling Practical Insights and Challenges**|Roberto Francisco de Lima Junior et.al.|[2312.12598v2](http://arxiv.org/abs/2312.12598v2)|null|\n", "2312.13881": "|**2023-12-21**|**Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs**|Juraj Vladika et.al.|[2312.13881v1](http://arxiv.org/abs/2312.13881v1)|null|\n", "2312.13764": "|**2023-12-21**|**A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties**|Junfei Xiao et.al.|[2312.13764v1](http://arxiv.org/abs/2312.13764v1)|**[link](https://github.com/lambert-x/prolab)**|\n", "2312.13316": "|**2023-12-20**|**ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training**|Rongsheng Wang et.al.|[2312.13316v1](http://arxiv.org/abs/2312.13316v1)|**[link](https://github.com/tonichopp/ecamp)**|\n", "2312.14504": "|**2023-12-22**|**Theory of Hallucinations based on Equivariance**|Hisaichi Shibata et.al.|[2312.14504v1](http://arxiv.org/abs/2312.14504v1)|null|\n", "2312.14346": "|**2023-12-22**|**Don't Believe Everything You Read: Enhancing Summarization Interpretability through Automatic Identification of Hallucinations in Large Language Models**|Priyesh Vakharia et.al.|[2312.14346v1](http://arxiv.org/abs/2312.14346v1)|null|\n", "2312.14184": "|**2023-12-19**|**Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning**|Xiaodan Zhang et.al.|[2312.14184v1](http://arxiv.org/abs/2312.14184v1)|null|\n", "2312.16044": "|**2023-12-26**|**Large Language Models as Traffic Signal Control Agents: Capacity and Opportunity**|Siqi Lai et.al.|[2312.16044v1](http://arxiv.org/abs/2312.16044v1)|**[link](https://github.com/usail-hkust/llmtscs)**|\n", "2312.15883": "|**2023-12-26**|**Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models**|Xinke Jiang et.al.|[2312.15883v1](http://arxiv.org/abs/2312.15883v1)|null|\n", "2312.15033": "|**2023-12-22**|**Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention**|Zhen Tan et.al.|[2312.15033v1](http://arxiv.org/abs/2312.15033v1)|null|\n", "2312.17122": "|**2023-12-29**|**Large Language Model for Causal Decision Making**|Haitao Jiang et.al.|[2312.17122v2](http://arxiv.org/abs/2312.17122v2)|null|\n", "2312.16702": "|**2023-12-27**|**Rethinking Tabular Data Understanding with Large Language Models**|Tianyang Liu et.al.|[2312.16702v1](http://arxiv.org/abs/2312.16702v1)|**[link](https://github.com/Leolty/tablellm)**|\n", "2312.16291": "|**2023-12-26**|**Observable Propagation: A Data-Efficient Approach to Uncover Feature Vectors in Transformers**|Jacob Dunefsky et.al.|[2312.16291v1](http://arxiv.org/abs/2312.16291v1)|**[link](https://github.com/jacobdunefsky/observablepropagation)**|\n", "2312.16275": "|**2023-12-26**|**Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models**|Fan Liu et.al.|[2312.16275v1](http://arxiv.org/abs/2312.16275v1)|null|\n", "2312.17269": "|**2023-12-27**|**Conversational Question Answering with Reformulations over Knowledge Graph**|Lihui Liu et.al.|[2312.17269v1](http://arxiv.org/abs/2312.17269v1)|null|\n", "2401.02404": "|**2024-01-05**|**Correctness Comparison of ChatGPT-4, Bard, Claude-2, and Copilot for Spatial Tasks**|Hartwig H. Hochmair et.al.|[2401.02404v2](http://arxiv.org/abs/2401.02404v2)|null|\n", "2401.02132": "|**2024-01-04**|**DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models**|Wendi Cui et.al.|[2401.02132v1](http://arxiv.org/abs/2401.02132v1)|**[link](https://github.com/intuit-ai-research/dcr-consistency)**|\n", "2401.01814": "|**2024-01-03**|**Large Language Models Relearn Removed Concepts**|Michelle Lo et.al.|[2401.01814v1](http://arxiv.org/abs/2401.01814v1)|**[link](https://github.com/fbarez/neuroplasticity)**|\n", "2401.01699": "|**2024-01-12**|**WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope**|Jun-Yan He et.al.|[2401.01699v2](http://arxiv.org/abs/2401.01699v2)|null|\n", "2401.01414": "|**2024-01-02**|**VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics**|Ammar A. Siddiqui et.al.|[2401.01414v1](http://arxiv.org/abs/2401.01414v1)|null|\n", "2401.00991": "|**2024-01-02**|**A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models**|Daniel Wankit Yip et.al.|[2401.00991v1](http://arxiv.org/abs/2401.00991v1)|null|\n", "2401.00546": "|**2023-12-31**|**AllSpark: a multimodal spatiotemporal general model**|Run Shao et.al.|[2401.00546v1](http://arxiv.org/abs/2401.00546v1)|null|\n", "2401.00426": "|**2023-12-31**|**keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM**|Chaojie Wang et.al.|[2401.00426v1](http://arxiv.org/abs/2401.00426v1)|null|\n", "2401.00280": "|**2024-01-12**|**Advancing TTP Analysis: Harnessing the Power of Encoder-Only and Decoder-Only Language Models with Retrieval Augmented Generation**|Reza Fayyazi et.al.|[2401.00280v2](http://arxiv.org/abs/2401.00280v2)|null|\n", "2401.00139": "|**2023-12-30**|**Is Knowledge All Large Language Models Needed for Causal Reasoning?**|Hengrui Cai et.al.|[2401.00139v1](http://arxiv.org/abs/2401.00139v1)|**[link](https://github.com/ncsulsj/causal_llm)**|\n", "2401.02814": "|**2024-02-01**|**Object-Centric Instruction Augmentation for Robotic Manipulation**|Junjie Wen et.al.|[2401.02814v2](http://arxiv.org/abs/2401.02814v2)|null|\n", "2401.02695": "|**2024-02-06**|**VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model**|Pengying Wu et.al.|[2401.02695v2](http://arxiv.org/abs/2401.02695v2)|null|\n", "2401.03646": "|**2024-01-08**|**Evaluating Brain-Inspired Modular Training in Automated Circuit Discovery for Mechanistic Interpretability**|Jatin Nainani et.al.|[2401.03646v1](http://arxiv.org/abs/2401.03646v1)|null|\n", "2401.03082": "|**2024-01-05**|**UMIE: Unified Multimodal Information Extraction with Instruction Tuning**|Lin Sun et.al.|[2401.03082v1](http://arxiv.org/abs/2401.03082v1)|**[link](https://github.com/ZUCC-AI/UMIE)**|\n", "2401.05302": "|**2024-01-17**|**Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?**|Mudit Verma et.al.|[2401.05302v2](http://arxiv.org/abs/2401.05302v2)|null|\n", "2401.05072": "|**2024-01-10**|**Aligning Translation-Specific Understanding to General Understanding in Large Language Models**|Yichong Huang et.al.|[2401.05072v1](http://arxiv.org/abs/2401.05072v1)|null|\n", "2401.04898": "|**2024-01-10**|**ANGO: A Next-Level Evaluation Benchmark For Generation-Oriented Language Models In Chinese Domain**|Bingchao Wang et.al.|[2401.04898v1](http://arxiv.org/abs/2401.04898v1)|null|\n", "2401.06102": "|**2024-01-12**|**Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models**|Asma Ghandeharioun et.al.|[2401.06102v2](http://arxiv.org/abs/2401.06102v2)|null|\n", "2401.05761": "|**2024-01-11**|**Large Language Models vs. Search Engines: Evaluating User Preferences Across Varied Information Retrieval Scenarios**|Kevin Matthe Caramancion et.al.|[2401.05761v1](http://arxiv.org/abs/2401.05761v1)|null|\n", "2401.05654": "|**2024-01-11**|**Towards Conversational Diagnostic AI**|Tao Tu et.al.|[2401.05654v1](http://arxiv.org/abs/2401.05654v1)|null|\n", "2401.06373": "|**2024-01-23**|**How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs**|Yi Zeng et.al.|[2401.06373v2](http://arxiv.org/abs/2401.06373v2)|**[link](https://github.com/chats-lab/persuasive_jailbreaker)**|\n", "2401.06431": "|**2024-01-12**|**From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape**|Changrong Xiao et.al.|[2401.06431v1](http://arxiv.org/abs/2401.06431v1)|**[link](https://github.com/xiaochr/llm-aes)**|\n", "2401.08309": "|**2024-01-16**|**Anchor function: a type of benchmark functions for studying language models**|Zhongwang Zhang et.al.|[2401.08309v1](http://arxiv.org/abs/2401.08309v1)|null|\n", "2401.08276": "|**2024-01-16**|**AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception**|Yipo Huang et.al.|[2401.08276v1](http://arxiv.org/abs/2401.08276v1)|**[link](https://github.com/yipoh/aesbench)**|\n", "2401.08217": "|**2024-01-16**|**LLM-Guided Multi-View Hypergraph Learning for Human-Centric Explainable Recommendation**|Zhixuan Chu et.al.|[2401.08217v1](http://arxiv.org/abs/2401.08217v1)|null|\n", "2401.08190": "|**2024-02-16**|**MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline**|Minpeng Liao et.al.|[2401.08190v2](http://arxiv.org/abs/2401.08190v2)|**[link](https://github.com/mario-math-reasoning/mario)**|\n", "2401.07927": "|**2024-02-15**|**Are self-explanations from Large Language Models faithful?**|Andreas Madsen et.al.|[2401.07927v3](http://arxiv.org/abs/2401.07927v3)|**[link](https://github.com/AndreasMadsen/llm-introspection)**|\n", "2401.07544": "|**2024-01-17**|**See the Unseen: Better Context-Consistent Knowledge-Editing by Noises**|Youcheng Huang et.al.|[2401.07544v2](http://arxiv.org/abs/2401.07544v2)|null|\n", "2401.06866": "|**2024-01-12**|**Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data**|Yubin Kim et.al.|[2401.06866v1](http://arxiv.org/abs/2401.06866v1)|null|\n", "2401.06836": "|**2024-01-12**|**Enhancing the Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought**|Zaijing Li et.al.|[2401.06836v1](http://arxiv.org/abs/2401.06836v1)|null|\n", "2401.09083": "|**2024-01-17**|**Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models**|Haonan Guo et.al.|[2401.09083v1](http://arxiv.org/abs/2401.09083v1)|**[link](https://github.com/haonanguo/remote-sensing-chatgpt)**|\n", "2401.09082": "|**2024-01-17**|**What makes for a 'good' social actor? Using respect as a lens to evaluate interactions with language agents**|Lize Alberts et.al.|[2401.09082v1](http://arxiv.org/abs/2401.09082v1)|null|\n", "2401.08825": "|**2024-01-16**|**AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media**|Alessandro Gambetti et.al.|[2401.08825v1](http://arxiv.org/abs/2401.08825v1)|null|\n", "2401.08711": "|**2024-01-15**|**Assistant, Parrot, or Colonizing Loudspeaker? ChatGPT Metaphors for Developing Critical AI Literacies**|Anuj Gupta et.al.|[2401.08711v1](http://arxiv.org/abs/2401.08711v1)|null|\n", "2401.10005": "|**2024-01-18**|**Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation**|Kohei Uehara et.al.|[2401.10005v1](http://arxiv.org/abs/2401.10005v1)|null|\n", "2401.09861": "|**2024-01-18**|**Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models**|Li Sun et.al.|[2401.09861v1](http://arxiv.org/abs/2401.09861v1)|null|\n", "2401.10314": "|**2024-01-18**|**LangProp: A code optimization framework using Language Models applied to driving**|Shu Ishida et.al.|[2401.10314v1](http://arxiv.org/abs/2401.10314v1)|**[link](https://github.com/shuishida/langprop)**|\n", "2401.12208": "|**2024-01-22**|**CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation**|Zhihong Chen et.al.|[2401.12208v1](http://arxiv.org/abs/2401.12208v1)|null|\n", "2401.11500": "|**2024-01-21**|**Integration of Large Language Models in Control of EHD Pumps for Precise Color Synthesis**|Yanhong Peng et.al.|[2401.11500v1](http://arxiv.org/abs/2401.11500v1)|null|\n", "2401.12874": "|**2024-01-23**|**From Understanding to Utilization: A Survey on Explainability for Large Language Models**|Haoyan Luo et.al.|[2401.12874v1](http://arxiv.org/abs/2401.12874v1)|null|\n", "2401.12846": "|**2024-01-23**|**How well can large language models explain business processes?**|Dirk Fahland et.al.|[2401.12846v1](http://arxiv.org/abs/2401.12846v1)|null|\n", "2401.12586": "|**2024-01-27**|**C2Ideas: Supporting Creative Interior Color Design Ideation with Large Language Model**|Yihan Hou et.al.|[2401.12586v2](http://arxiv.org/abs/2401.12586v2)|null|\n", "2401.12585": "|**2024-01-30**|**SLANG: New Concept Comprehension of Large Language Models**|Lingrui Mei et.al.|[2401.12585v2](http://arxiv.org/abs/2401.12585v2)|null|\n", "2401.12576": "|**2024-01-23**|**LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools**|Qianli Wang et.al.|[2401.12576v1](http://arxiv.org/abs/2401.12576v1)|**[link](https://github.com/dfki-nlp/llmcheckup)**|\n", "2401.12566": "|**2024-01-23**|**Automated Fact-Checking of Climate Change Claims with Large Language Models**|Markus Leippold et.al.|[2401.12566v1](http://arxiv.org/abs/2401.12566v1)|null|\n", "2401.13178": "|**2024-01-24**|**AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents**|Chang Ma et.al.|[2401.13178v1](http://arxiv.org/abs/2401.13178v1)|**[link](https://github.com/hkust-nlp/agentboard)**|\n", "2401.14268": "|**2024-01-25**|**GPTVoiceTasker: LLM-Powered Virtual Assistant for Smartphone**|Minh Duc Vu et.al.|[2401.14268v1](http://arxiv.org/abs/2401.14268v1)|null|\n", "2401.14109": "|**2024-01-25**|**CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks**|Andrei Tomut et.al.|[2401.14109v1](http://arxiv.org/abs/2401.14109v1)|null|\n", "2401.13912": "|**2024-01-25**|**A Survey of Deep Learning and Foundation Models for Time Series Forecasting**|John A. Miller et.al.|[2401.13912v1](http://arxiv.org/abs/2401.13912v1)|null|\n", "2401.14589": "|**2024-01-26**|**Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias**|Yu He Ke et.al.|[2401.14589v1](http://arxiv.org/abs/2401.14589v1)|null|\n", "2401.14490": "|**2024-01-25**|**LongHealth: A Question Answering Benchmark with Long Clinical Documents**|Lisa Adams et.al.|[2401.14490v1](http://arxiv.org/abs/2401.14490v1)|**[link](https://github.com/kbressem/longhealth)**|\n", "2401.16024": "|**2024-01-29**|**Probabilistic Abduction for Visual Abstract Reasoning via Learning Rules in Vector-symbolic Architectures**|Michael Hersche et.al.|[2401.16024v1](http://arxiv.org/abs/2401.16024v1)|**[link](https://github.com/ibm/learn-vector-symbolic-architectures-rule-formulations)**|\n", "2401.15843": "|**2024-01-29**|**APIGen: Generative API Method Recommendation**|Yujia Chen et.al.|[2401.15843v1](http://arxiv.org/abs/2401.15843v1)|**[link](https://github.com/hitcoderr/apigen)**|\n", "2401.15170": "|**2024-02-12**|**Scalable Qualitative Coding with LLMs: Chain-of-Thought Reasoning Matches Human Performance in Some Hermeneutic Tasks**|Zackary Okun Dunivin et.al.|[2401.15170v2](http://arxiv.org/abs/2401.15170v2)|null|\n", "2312.15915": "|**2024-01-29**|**ChartBench: A Benchmark for Complex Visual Reasoning in Charts**|Zhengzhuo Xu et.al.|[2312.15915v2](http://arxiv.org/abs/2312.15915v2)|null|\n", "2401.16822": "|**2024-02-05**|**EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain**|Wei Zhang et.al.|[2401.16822v2](http://arxiv.org/abs/2401.16822v2)|null|\n", "2401.16765": "|**2024-01-30**|**A Cross-Language Investigation into Jailbreak Attacks in Large Language Models**|Jie Li et.al.|[2401.16765v1](http://arxiv.org/abs/2401.16765v1)|null|\n", "2401.16736": "|**2024-02-03**|**Engineering A Large Language Model From Scratch**|Abiodun Finbarrs Oketunji et.al.|[2401.16736v3](http://arxiv.org/abs/2401.16736v3)|null|\n", "2312.12141": "|**2024-01-30**|**Locating Factual Knowledge in Large Language Models: Exploring the Residual Stream and Analyzing Subvalues in Vocabulary Space**|Zeping Yu et.al.|[2312.12141v2](http://arxiv.org/abs/2312.12141v2)|null|\n", "2401.18006": "|**2024-02-03**|**EEG-GPT: Exploring Capabilities of Large Language Models for EEG Classification and Interpretation**|Jonathan W. Kim et.al.|[2401.18006v2](http://arxiv.org/abs/2401.18006v2)|null|\n", "2401.17981": "|**2024-01-31**|**Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study**|Qirui Jiao et.al.|[2401.17981v1](http://arxiv.org/abs/2401.17981v1)|null|\n", "2401.17858": "|**2024-01-31**|**Probing Language Models' Gesture Understanding for Enhanced Human-AI Interaction**|Philipp Wicke et.al.|[2401.17858v1](http://arxiv.org/abs/2401.17858v1)|null|\n", "2401.17477": "|**2024-01-30**|**Detecting mental disorder on social media: a ChatGPT-augmented explainable approach**|Loris Belcastro et.al.|[2401.17477v1](http://arxiv.org/abs/2401.17477v1)|**[link](https://github.com/scalabunical/bert-xdd)**|\n", "2402.00745": "|**2024-02-01**|**Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement**|Xin Quan et.al.|[2402.00745v1](http://arxiv.org/abs/2402.00745v1)|**[link](https://github.com/neuro-symbolic-ai/explanation_based_ethical_reasoning)**|\n", "2402.00742": "|**2024-02-01**|**Transforming and Combining Rewards for Aligning Large Language Models**|Zihao Wang et.al.|[2402.00742v1](http://arxiv.org/abs/2402.00742v1)|null|\n", "2402.00386": "|**2024-02-01**|**AssertLLM: Generating and Evaluating Hardware Verification Assertions from Design Specifications via Multi-LLMs**|Wenji Fang et.al.|[2402.00386v1](http://arxiv.org/abs/2402.00386v1)|null|\n", "2402.00345": "|**2024-02-01**|**IndiVec: An Exploration of Leveraging Large Language Models for Media Bias Detection with Fine-Grained Bias Indicators**|Luyang Lin et.al.|[2402.00345v1](http://arxiv.org/abs/2402.00345v1)|null|\n", "2402.00251": "|**2024-02-01**|**Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning**|Yao-Hung Hubert Tsai et.al.|[2402.00251v1](http://arxiv.org/abs/2402.00251v1)|null|\n", "2402.00137": "|**2024-01-31**|**Multimodal Neurodegenerative Disease Subtyping Explained by ChatGPT**|Diego Machado Reyes et.al.|[2402.00137v1](http://arxiv.org/abs/2402.00137v1)|null|\n", "2402.00093": "|**2024-01-31**|**ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation**|Bhabesh Mali et.al.|[2402.00093v1](http://arxiv.org/abs/2402.00093v1)|null|\n", "2402.00045": "|**2024-02-07**|**Detecting Multimedia Generated by Large AI Models: A Survey**|Li Lin et.al.|[2402.00045v3](http://arxiv.org/abs/2402.00045v3)|**[link](https://github.com/purdue-m2/detect-laim-generated-multimedia-survey)**|\n", "2402.00044": "|**2024-01-21**|**Training microrobots to swim by a large language model**|Zhuoqun Xu et.al.|[2402.00044v1](http://arxiv.org/abs/2402.00044v1)|null|\n", "2402.00024": "|**2024-02-05**|**Comparative Analysis of LLaMA and ChatGPT Embeddings for Molecule Embedding**|Shaghayegh Sadeghi et.al.|[2402.00024v2](http://arxiv.org/abs/2402.00024v2)|**[link](https://github.com/sshaghayeghs/llama-vs-chatgpt)**|\n", "2402.01591": "|**2024-02-02**|**BAT: Learning to Reason about Spatial Sounds with Large Language Models**|Zhisheng Zheng et.al.|[2402.01591v1](http://arxiv.org/abs/2402.01591v1)|null|\n", "2402.01439": "|**2024-02-02**|**From Words to Molecules: A Survey of Large Language Models in Chemistry**|Chang Liao et.al.|[2402.01439v1](http://arxiv.org/abs/2402.01439v1)|null|\n", "2402.01386": "|**2024-02-02**|**Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis**|Zeeshan Rasheed et.al.|[2402.01386v1](http://arxiv.org/abs/2402.01386v1)|null|\n", "2402.01108": "|**2024-02-02**|**Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions**|Pouya Pezeshkpour et.al.|[2402.01108v1](http://arxiv.org/abs/2402.01108v1)|null|\n", "2402.01030": "|**2024-02-01**|**Executable Code Actions Elicit Better LLM Agents**|Xingyao Wang et.al.|[2402.01030v1](http://arxiv.org/abs/2402.01030v1)|**[link](https://github.com/xingyaoww/code-act)**|\n", "2402.03223": "|**2024-03-04**|**English Prompts are Better for NLI-based Zero-Shot Emotion Classification than Target-Language Prompts**|Patrick Barrei\u00df et.al.|[2402.03223v2](http://arxiv.org/abs/2402.03223v2)|null|\n", "2402.02547": "|**2024-02-04**|**Integration of cognitive tasks into artificial general intelligence test for large models**|Youzhi Qu et.al.|[2402.02547v1](http://arxiv.org/abs/2402.02547v1)|null|\n", "2402.02212": "|**2024-02-03**|**A Data Generation Perspective to the Mechanism of In-Context Learning**|Haitao Mao et.al.|[2402.02212v1](http://arxiv.org/abs/2402.02212v1)|null|\n", "2402.02167": "|**2024-02-03**|**Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based Visualizations**|Luca Podo et.al.|[2402.02167v1](http://arxiv.org/abs/2402.02167v1)|**[link](https://github.com/lucapodo/evallm)**|\n", "2402.02006": "|**2024-02-13**|**PresAIse, A Prescriptive AI Solution for Enterprises**|Wei Sun et.al.|[2402.02006v2](http://arxiv.org/abs/2402.02006v2)|null|\n", "2402.01889": "|**2024-02-02**|**The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning**|Daniel Cunnington et.al.|[2402.01889v1](http://arxiv.org/abs/2402.01889v1)|null|\n", "2402.01881": "|**2024-02-06**|**Large Language Model Agent for Hyper-Parameter Optimization**|Siyi Liu et.al.|[2402.01881v2](http://arxiv.org/abs/2402.01881v2)|null|\n", "2402.01789": "|**2024-02-02**|**The Political Preferences of LLMs**|David Rozado et.al.|[2402.01789v1](http://arxiv.org/abs/2402.01789v1)|null|\n", "2402.01761": "|**2024-01-30**|**Rethinking Interpretability in the Era of Large Language Models**|Chandan Singh et.al.|[2402.01761v1](http://arxiv.org/abs/2402.01761v1)|**[link](https://github.com/csinva/imodelsX)**|\n", "2402.01740": "|**2024-01-29**|**Compensatory Biases Under Cognitive Load: Reducing Selection Bias in Large Language Models**|J. E. Eicher et.al.|[2402.01740v1](http://arxiv.org/abs/2402.01740v1)|null|\n", "2402.01715": "|**2024-01-25**|**ChatGPT vs Gemini vs LLaMA on Multilingual Sentiment Analysis**|Alessio Buscemi et.al.|[2402.01715v1](http://arxiv.org/abs/2402.01715v1)|null|\n", "2402.01693": "|**2024-01-23**|**Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study**|Zhe He et.al.|[2402.01693v1](http://arxiv.org/abs/2402.01693v1)|null|\n", "2402.01681": "|**2024-02-16**|**Emojis Decoded: Leveraging ChatGPT for Enhanced Understanding in Social Media Communications**|Yuhang Zhou et.al.|[2402.01681v2](http://arxiv.org/abs/2402.01681v2)|null|\n", "2402.04206": "|**2024-02-06**|**Explaining Autonomy: Enhancing Human-Robot Interaction through Explanation Generation with Large Language Models**|David Sobr\u00edn-Hidalgo et.al.|[2402.04206v1](http://arxiv.org/abs/2402.04206v1)|null|\n", "2402.04178": "|**2024-02-06**|**SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models**|Yichen Shi et.al.|[2402.04178v1](http://arxiv.org/abs/2402.04178v1)|**[link](https://github.com/laiyingxin2/shield)**|\n", "2402.04119": "|**2024-02-06**|**Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science**|Pengfei Liu et.al.|[2402.04119v1](http://arxiv.org/abs/2402.04119v1)|**[link](https://github.com/ai-hpc-research-team/slm4mol)**|\n", "2402.03962": "|**2024-02-07**|**Position Paper: Against Spurious Sparks $-$ Dovelating Inflated AI Claims**|Patrick Altmeyer et.al.|[2402.03962v2](http://arxiv.org/abs/2402.03962v2)|null|\n", "2402.03710": "|**2024-02-06**|**Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience**|Xilin Jiang et.al.|[2402.03710v1](http://arxiv.org/abs/2402.03710v1)|null|\n", "2402.03563": "|**2024-02-27**|**Distinguishing the Knowable from the Unknowable with Language Models**|Gustaf Ahdritz et.al.|[2402.03563v2](http://arxiv.org/abs/2402.03563v2)|**[link](https://github.com/gahdritz/llm_uncertainty)**|\n", "2402.03349": "|**2024-01-25**|**When Geoscience Meets Generative AI and Large Language Models: Foundations, Trends, and Future Challenges**|Abdenour Hadid et.al.|[2402.03349v1](http://arxiv.org/abs/2402.03349v1)|null|\n", "2402.05110": "|**2024-02-07**|**Opening the AI black box: program synthesis via mechanistic interpretability**|Eric J. Michaud et.al.|[2402.05110v1](http://arxiv.org/abs/2402.05110v1)|**[link](https://github.com/ejmichaud/neural-verification)**|\n", "2402.04609": "|**2024-02-07**|**Improving Cross-Domain Low-Resource Text Generation through LLM Post-Editing: A Programmer-Interpreter Approach**|Zhuang Li et.al.|[2402.04609v1](http://arxiv.org/abs/2402.04609v1)|null|\n", "2402.04411": "|**2024-02-06**|**Chatbot Meets Pipeline: Augment Large Language Model with Definite Finite Automaton**|Yiyou Sun et.al.|[2402.04411v1](http://arxiv.org/abs/2402.04411v1)|null|\n", "2402.04380": "|**2024-02-06**|**Assured LLM-Based Software Engineering**|Nadia Alshahwan et.al.|[2402.04380v1](http://arxiv.org/abs/2402.04380v1)|null|\n", "2402.05932": "|**2024-02-08**|**Driving Everywhere with Large Language Model Policy Adaptation**|Boyi Li et.al.|[2402.05932v1](http://arxiv.org/abs/2402.05932v1)|null|\n", "2402.05125": "|**2024-02-05**|**Zero-Shot Clinical Trial Patient Matching with LLMs**|Michael Wornow et.al.|[2402.05125v1](http://arxiv.org/abs/2402.05125v1)|null|\n", "2402.06332": "|**2024-02-09**|**InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning**|Huaiyuan Ying et.al.|[2402.06332v1](http://arxiv.org/abs/2402.06332v1)|**[link](https://github.com/internlm/internlm-math)**|\n", "2402.06119": "|**2024-02-09**|**ContPhy: Continuum Physical Concept Learning and Reasoning from Videos**|Zhicheng Zheng et.al.|[2402.06119v1](http://arxiv.org/abs/2402.06119v1)|null|\n", "2402.05941": "|**2024-02-02**|**Character-based Outfit Generation with Vision-augmented Style Extraction via LLMs**|Najmeh Forouzandehmehr et.al.|[2402.05941v1](http://arxiv.org/abs/2402.05941v1)|null|\n", "2402.07876": "|**2024-02-25**|**Policy Improvement using Language Feedback Models**|Victor Zhong et.al.|[2402.07876v3](http://arxiv.org/abs/2402.07876v3)|null|\n", "2402.07442": "|**2024-02-12**|**Game Agent Driven by Free-Form Text Command: Using LLM-based Code Generation and Behavior Branch**|Ray Ito et.al.|[2402.07442v1](http://arxiv.org/abs/2402.07442v1)|null|\n", "2402.07157": "|**2024-02-14**|**Natural Language Reinforcement Learning**|Xidong Feng et.al.|[2402.07157v2](http://arxiv.org/abs/2402.07157v2)|null|\n", "2402.08472": "|**2024-02-13**|**Large Language Models for the Automated Analysis of Optimization Algorithms**|Camilo Chac\u00f3n Sartori et.al.|[2402.08472v1](http://arxiv.org/abs/2402.08472v1)|**[link](https://github.com/camilochs/explainability-llm-stnweb)**|\n", "2402.08360": "|**2024-02-13**|**Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks**|Jusung Lee et.al.|[2402.08360v1](http://arxiv.org/abs/2402.08360v1)|null|\n", "2402.08170": "|**2024-02-17**|**LLaGA: Large Language and Graph Assistant**|Runjin Chen et.al.|[2402.08170v2](http://arxiv.org/abs/2402.08170v2)|**[link](https://github.com/vita-group/llaga)**|\n", "2402.09334": "|**2024-02-14**|**AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach**|Maryam Amirizaniani et.al.|[2402.09334v1](http://arxiv.org/abs/2402.09334v1)|null|\n", "2402.09299": "|**2024-02-14**|**Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code**|Vahid Majdinasab et.al.|[2402.09299v1](http://arxiv.org/abs/2402.09299v1)|null|\n", "2402.09259": "|**2024-02-14**|**SyntaxShap: Syntax-aware Explainability Method for Text Generation**|Kenza Amara et.al.|[2402.09259v1](http://arxiv.org/abs/2402.09259v1)|null|\n", "2402.09236": "|**2024-02-14**|**Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models**|Goutham Rajendran et.al.|[2402.09236v1](http://arxiv.org/abs/2402.09236v1)|null|\n", "2402.10176": "|**2024-02-15**|**OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset**|Shubham Toshniwal et.al.|[2402.10176v1](http://arxiv.org/abs/2402.10176v1)|**[link](https://github.com/kipok/nemo-skills)**|\n", "2402.09733": "|**2024-02-15**|**Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States**|Hanyu Duan et.al.|[2402.09733v1](http://arxiv.org/abs/2402.09733v1)|null|\n", "2402.09642": "|**2024-02-15**|**Answer is All You Need: Instruction-following Text Embedding via Answering the Question**|Letian Peng et.al.|[2402.09642v1](http://arxiv.org/abs/2402.09642v1)|**[link](https://github.com/zhang-yu-wei/inbedder)**|\n", "2402.09584": "|**2024-02-14**|**Large Language Model-Based Interpretable Machine Learning Control in Building Energy Systems**|Liang Zhang et.al.|[2402.09584v1](http://arxiv.org/abs/2402.09584v1)|null|\n", "2402.10770": "|**2024-02-16**|**How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs?**|Ehsan Doostmohammadi et.al.|[2402.10770v1](http://arxiv.org/abs/2402.10770v1)|null|\n", "2402.10767": "|**2024-02-16**|**Inference to the Best Explanation in Large Language Models**|Dhairya Dalal et.al.|[2402.10767v1](http://arxiv.org/abs/2402.10767v1)|null|\n", "2402.10688": "|**2024-02-16**|**Opening the Black Box of Large Language Models: Two Views on Holistic Interpretability**|Haiyan Zhao et.al.|[2402.10688v1](http://arxiv.org/abs/2402.10688v1)|null|\n", "2402.10524": "|**2024-02-16**|**LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models**|Minsuk Kahng et.al.|[2402.10524v1](http://arxiv.org/abs/2402.10524v1)|null|\n", "2402.12219": "|**2024-02-19**|**Reformatted Alignment**|Run-Ze Fan et.al.|[2402.12219v1](http://arxiv.org/abs/2402.12219v1)|**[link](https://github.com/gair-nlp/realign)**|\n", "2402.12185": "|**2024-02-19**|**ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning**|Renqiu Xia et.al.|[2402.12185v1](http://arxiv.org/abs/2402.12185v1)|**[link](https://github.com/unimodal4reasoning/chartvlm)**|\n", "2402.12022": "|**2024-02-19**|**Distilling Large Language Models for Text-Attributed Graph Learning**|Bo Pan et.al.|[2402.12022v1](http://arxiv.org/abs/2402.12022v1)|null|\n", "2402.11863": "|**2024-02-25**|**How Interpretable are Reasoning Explanations from Prompting Large Language Models?**|Wei Jie Yeo et.al.|[2402.11863v2](http://arxiv.org/abs/2402.11863v2)|**[link](https://github.com/wj210/cot_interpretability)**|\n", "2402.11753": "|**2024-02-22**|**ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs**|Fengqing Jiang et.al.|[2402.11753v2](http://arxiv.org/abs/2402.11753v2)|null|\n", "2402.11676": "|**2024-02-18**|**A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models**|Jaylen Jones et.al.|[2402.11676v1](http://arxiv.org/abs/2402.11676v1)|**[link](https://github.com/osu-nlp-group/llm-cn-eval)**|\n", "2402.11655": "|**2024-02-18**|**Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals**|Francesco Ortu et.al.|[2402.11655v1](http://arxiv.org/abs/2402.11655v1)|**[link](https://github.com/francescortu/competition_of_mechanisms)**|\n", "2402.11137": "|**2024-02-17**|**TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks**|Benjamin Feuer et.al.|[2402.11137v1](http://arxiv.org/abs/2402.11137v1)|**[link](https://github.com/penfever/tabpfn-pt)**|\n", "2402.10948": "|**2024-02-09**|**Zero-shot Explainable Mental Health Analysis on Social Media by incorporating Mental Scales**|Wenyu Li et.al.|[2402.10948v1](http://arxiv.org/abs/2402.10948v1)|null|\n", "2402.12806": "|**2024-02-20**|**SymBa: Symbolic Backward Chaining for Multi-step Natural Language Reasoning**|Jinu Lee et.al.|[2402.12806v1](http://arxiv.org/abs/2402.12806v1)|null|\n", "2402.12713": "|**2024-02-20**|**Are Large Language Models Rational Investors?**|Yuhang Zhou et.al.|[2402.12713v1](http://arxiv.org/abs/2402.12713v1)|null|\n", "2402.12405": "|**2024-02-18**|**scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type Annotation**|Cong Li et.al.|[2402.12405v1](http://arxiv.org/abs/2402.12405v1)|null|\n", "2402.13871": "|**2024-02-21**|**An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach**|Mohammad Amaz Uddin et.al.|[2402.13871v1](http://arxiv.org/abs/2402.13871v1)|null|\n", "2402.13840": "|**2024-02-21**|**LLM4SBR: A Lightweight and Effective Framework for Integrating Large Language Models in Session-based Recommendation**|Shutong Qiao et.al.|[2402.13840v1](http://arxiv.org/abs/2402.13840v1)|null|\n", "2402.13607": "|**2024-03-15**|**CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models**|Fuwen Luo et.al.|[2402.13607v2](http://arxiv.org/abs/2402.13607v2)|null|\n", "2402.13561": "|**2024-02-21**|**Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment**|Yunxin Li et.al.|[2402.13561v1](http://arxiv.org/abs/2402.13561v1)|null|\n", "2402.13517": "|**2024-02-21**|**Round Trip Translation Defence against Large Language Model Jailbreaking Attacks**|Canaan Yung et.al.|[2402.13517v1](http://arxiv.org/abs/2402.13517v1)|**[link](https://github.com/cancanxxx/round_trip_translation_defence)**|\n", "2402.14807": "|**2024-02-23**|**A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health**|Nikhil Behari et.al.|[2402.14807v2](http://arxiv.org/abs/2402.14807v2)|null|\n", "2402.14744": "|**2024-02-22**|**Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation**|Jiawei Wang et.al.|[2402.14744v1](http://arxiv.org/abs/2402.14744v1)|null|\n", "2402.14701": "|**2024-02-22**|**COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling**|Baihan Lin et.al.|[2402.14701v1](http://arxiv.org/abs/2402.14701v1)|null|\n", "2402.14658": "|**2024-02-28**|**OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement**|Tianyu Zheng et.al.|[2402.14658v2](http://arxiv.org/abs/2402.14658v2)|null|\n", "2402.14522": "|**2024-02-22**|**Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond**|Xinyu Wang et.al.|[2402.14522v1](http://arxiv.org/abs/2402.14522v1)|null|\n", "2402.14474": "|**2024-02-22**|**Data Science with LLMs and Interpretable Models**|Sebastian Bordt et.al.|[2402.14474v1](http://arxiv.org/abs/2402.14474v1)|**[link](https://github.com/interpretml/talktoebm)**|\n", "2402.14154": "|**2024-02-21**|**MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms**|Yiqiao Jin et.al.|[2402.14154v1](http://arxiv.org/abs/2402.14154v1)|null|\n", "2402.14123": "|**2024-02-21**|**DeiSAM: Segment Anything with Deictic Prompting**|Hikaru Shindo et.al.|[2402.14123v1](http://arxiv.org/abs/2402.14123v1)|**[link](https://github.com/ml-research/deictic-segment-anything)**|\n", "2402.02611": "|**2024-02-22**|**PuzzleBench: Can LLMs Solve Challenging First-Order Combinatorial Reasoning Problems?**|Chinmay Mittal et.al.|[2402.02611v2](http://arxiv.org/abs/2402.02611v2)|null|\n", "2402.15390": "|**2024-02-23**|**Explorations of Self-Repair in Language Models**|Cody Rushing et.al.|[2402.15390v1](http://arxiv.org/abs/2402.15390v1)|**[link](https://github.com/starship006/backup_research)**|\n", "2402.15181": "|**2024-02-23**|**Substrate Prediction for RiPP Biosynthetic Enzymes via Masked Language Modeling and Transfer Learning**|Joseph D. Clark et.al.|[2402.15181v1](http://arxiv.org/abs/2402.15181v1)|null|\n", "2402.15116": "|**2024-02-23**|**Large Multimodal Agents: A Survey**|Junlin Xie et.al.|[2402.15116v1](http://arxiv.org/abs/2402.15116v1)|null|\n", "2402.14891": "|**2024-03-08**|**LLMBind: A Unified Modality-Task Integration Framework**|Bin Zhu et.al.|[2402.14891v3](http://arxiv.org/abs/2402.14891v3)|null|\n", "2402.14879": "|**2024-02-21**|**Driving Generative Agents With Their Personality**|Lawrence J. Klinkert et.al.|[2402.14879v1](http://arxiv.org/abs/2402.14879v1)|null|\n", "2402.14854": "|**2024-02-20**|**A Dual-Prompting for Interpretable Mental Health Language Models**|Hyolim Jeon et.al.|[2402.14854v1](http://arxiv.org/abs/2402.14854v1)|null|\n", "2402.14840": "|**2024-02-19**|**RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning**|Congyun Jin et.al.|[2402.14840v1](http://arxiv.org/abs/2402.14840v1)|null|\n", "2402.16379": "|**2024-03-04**|**Improving LLM-based Machine Translation with Systematic Self-Correction**|Zhaopeng Feng et.al.|[2402.16379v2](http://arxiv.org/abs/2402.16379v2)|**[link](https://github.com/fzp0424/self_correct_mt)**|\n", "2402.16124": "|**2024-02-25**|**AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation**|Yasheng Sun et.al.|[2402.16124v1](http://arxiv.org/abs/2402.16124v1)|null|\n", "2402.16058": "|**2024-02-25**|**Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression**|Xinze Li et.al.|[2402.16058v1](http://arxiv.org/abs/2402.16058v1)|**[link](https://github.com/openmatch/gist-coco)**|\n", "2402.16050": "|**2024-02-25**|**LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding**|Yuxuan Wang et.al.|[2402.16050v1](http://arxiv.org/abs/2402.16050v1)|**[link](https://github.com/bigai-nlco/lstp-chat)**|\n", "2402.15623": "|**2024-02-23**|**Language-Based User Profiles for Recommendation**|Joyce Zhou et.al.|[2402.15623v1](http://arxiv.org/abs/2402.15623v1)|null|\n", "2402.15525": "|**2024-02-19**|**Detecting misinformation through Framing Theory: the Frame Element-based Model**|Guan Wang et.al.|[2402.15525v1](http://arxiv.org/abs/2402.15525v1)|null|\n", "2402.17644": "|**2024-02-27**|**Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data**|Xiao Liu et.al.|[2402.17644v1](http://arxiv.org/abs/2402.17644v1)|**[link](https://github.com/xxxiaol/qrdata)**|\n", "2402.17531": "|**2024-02-27**|**Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides**|Kaikai An et.al.|[2402.17531v1](http://arxiv.org/abs/2402.17531v1)|null|\n", "2402.17226": "|**2024-02-27**|**Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models**|Xiaolong Wang et.al.|[2402.17226v1](http://arxiv.org/abs/2402.17226v1)|null|\n", "2402.17128": "|**2024-03-20**|**OSCaR: Object State Captioning and State Change Representation**|Nguyen Nguyen et.al.|[2402.17128v3](http://arxiv.org/abs/2402.17128v3)|**[link](https://github.com/nguyennm1024/oscar)**|\n", "2402.16832": "|**2024-02-26**|**Mysterious Projections: Multimodal LLMs Gain Domain-Specific Visual Capabilities Without Richer Cross-Modal Projections**|Gaurav Verma et.al.|[2402.16832v1](http://arxiv.org/abs/2402.16832v1)|null|\n", "2402.16671": "|**2024-02-28**|**StructLM: Towards Building Generalist Models for Structured Knowledge Grounding**|Alex Zhuang et.al.|[2402.16671v2](http://arxiv.org/abs/2402.16671v2)|null|\n", "2402.16905": "|**2024-02-24**|**Enforcing Temporal Constraints on Generative Agent Behavior with Reactive Synthesis**|Raven Rothkopf et.al.|[2402.16905v1](http://arxiv.org/abs/2402.16905v1)|null|\n", "2402.18344": "|**2024-02-28**|**Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning**|Jiachun Li et.al.|[2402.18344v1](http://arxiv.org/abs/2402.18344v1)|null|\n", "2402.18169": "|**2024-02-29**|**MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery**|Feihong Lu et.al.|[2402.18169v2](http://arxiv.org/abs/2402.18169v2)|null|\n", "2402.18139": "|**2024-02-28**|**Cause and Effect: Can Large Language Models Truly Understand Causality?**|Swagata Ashwani et.al.|[2402.18139v1](http://arxiv.org/abs/2402.18139v1)|null|\n", "2402.18093": "|**2024-02-28**|**ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection**|Takashi Koide et.al.|[2402.18093v1](http://arxiv.org/abs/2402.18093v1)|null|\n", "2402.17879": "|**2024-02-27**|**Automated Statistical Model Discovery with Language Models**|Michael Y. Li et.al.|[2402.17879v1](http://arxiv.org/abs/2402.17879v1)|null|\n", "2402.17785": "|**2024-03-07**|**ByteComposer: a Human-like Melody Composition Method based on Language Model Agent**|Xia Liang et.al.|[2402.17785v2](http://arxiv.org/abs/2402.17785v2)|null|\n", "2402.19421": "|**2024-02-29**|**Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines**|Lijia Ma et.al.|[2402.19421v1](http://arxiv.org/abs/2402.19421v1)|null|\n", "2402.18679": "|**2024-03-12**|**Data Interpreter: An LLM Agent For Data Science**|Sirui Hong et.al.|[2402.18679v3](http://arxiv.org/abs/2402.18679v3)|**[link](https://github.com/geekan/metagpt)**|\n", "2403.01061": "|**2024-03-02**|**Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers**|Melanie Subbiah et.al.|[2403.01061v1](http://arxiv.org/abs/2403.01061v1)|**[link](https://github.com/melaniesubbiah/reading-subtext)**|\n", "2403.01002": "|**2024-03-01**|**Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries**|Zelalem Gero et.al.|[2403.01002v1](http://arxiv.org/abs/2403.01002v1)|**[link](https://github.com/microsoft/attribute-structuring)**|\n", "2403.00154": "|**2024-03-27**|**LLMs in Political Science: Heralding a New Era of Visual Analysis**|Yu Wang et.al.|[2403.00154v2](http://arxiv.org/abs/2403.00154v2)|null|\n", "2403.00126": "|**2024-02-29**|**FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition**|Xiaoqiang Wang et.al.|[2403.00126v1](http://arxiv.org/abs/2403.00126v1)|null|\n", "2403.00822": "|**2024-02-26**|**InteraRec: Interactive Recommendations Using Multimodal Large Language Models**|Saketh Reddy Karra et.al.|[2403.00822v1](http://arxiv.org/abs/2403.00822v1)|null|\n", "2403.00810": "|**2024-02-25**|**Bootstrapping Cognitive Agents with a Large Language Model**|Feiyu Zhu et.al.|[2403.00810v1](http://arxiv.org/abs/2403.00810v1)|null|\n", "2403.00782": "|**2024-02-18**|**Ploutos: Towards interpretable stock movement prediction with financial large language model**|Hanshuang Tong et.al.|[2403.00782v1](http://arxiv.org/abs/2403.00782v1)|null|\n", "2403.00781": "|**2024-02-18**|**ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework**|Zhongqi Yang et.al.|[2403.00781v1](http://arxiv.org/abs/2403.00781v1)|null|\n", "2403.03188": "|**2024-03-05**|**Towards Democratized Flood Risk Management: An Advanced AI Assistant Enabled by GPT-4 for Enhanced Interpretability and Public Engagement**|Rafaela Martelo et.al.|[2403.03188v1](http://arxiv.org/abs/2403.03188v1)|**[link](https://github.com/rafaelamartelo/floodgpt-4_prototype)**|\n", "2403.02752": "|**2024-03-05**|**HINTs: Sensemaking on large collections of documents with Hypergraph visualization and INTelligent agents**|Sam Yu-Te Lee et.al.|[2403.02752v1](http://arxiv.org/abs/2403.02752v1)|null|\n", "2403.02727": "|**2024-03-05**|**HARGPT: Are LLMs Zero-Shot Human Activity Recognizers?**|Sijie Ji et.al.|[2403.02727v1](http://arxiv.org/abs/2403.02727v1)|null|\n", "2403.02558": "|**2024-03-05**|**Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research**|Brenda Y. Miao et.al.|[2403.02558v1](http://arxiv.org/abs/2403.02558v1)|**[link](https://github.com/bmiao10/mi-claim-2024)**|\n", "2403.02270": "|**2024-03-26**|**FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction**|Alessandro Scir\u00e8 et.al.|[2403.02270v2](http://arxiv.org/abs/2403.02270v2)|null|\n", "2403.02238": "|**2024-03-04**|**Towards Intent-Based Network Management: Large Language Models for Intent Extraction in 5G Core Networks**|Dimitrios Michael Manias et.al.|[2403.02238v1](http://arxiv.org/abs/2403.02238v1)|null|\n", "2403.01981": "|**2024-03-04**|**Evaluating the Explainability of Neural Rankers**|Saran Pandian et.al.|[2403.01981v1](http://arxiv.org/abs/2403.01981v1)|null|\n", "2403.01457": "|**2024-03-03**|**Logic Rules as Explanations for Legal Case Retrieval**|Zhongxiang Sun et.al.|[2403.01457v1](http://arxiv.org/abs/2403.01457v1)|**[link](https://github.com/ke-01/ns-lcr)**|\n", "2403.03864": "|**2024-03-13**|**Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning**|Deepanway Ghosal et.al.|[2403.03864v3](http://arxiv.org/abs/2403.03864v3)|**[link](https://github.com/declare-lab/puzzle-reasoning)**|\n", "2403.03790": "|**2024-03-06**|**Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery**|Wei Zhang et.al.|[2403.03790v1](http://arxiv.org/abs/2403.03790v1)|null|\n", "2403.03628": "|**2024-03-06**|**GPTopic: Dynamic and Interactive Topic Representations**|Arik Reuter et.al.|[2403.03628v1](http://arxiv.org/abs/2403.03628v1)|null|\n", "2403.03397": "|**2024-03-06**|**Explaining Genetic Programming Trees using Large Language Models**|Paula Maddigan et.al.|[2403.03397v1](http://arxiv.org/abs/2403.03397v1)|null|\n", "2403.04760": "|**2024-03-07**|**iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries**|Adam Coscia et.al.|[2403.04760v1](http://arxiv.org/abs/2403.04760v1)|**[link](https://github.com/adamcoscia/iscore)**|\n", "2403.04758": "|**2024-03-07**|**KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts**|Adam Coscia et.al.|[2403.04758v1](http://arxiv.org/abs/2403.04758v1)|**[link](https://github.com/adamcoscia/knowledgevis)**|\n", "2403.04577": "|**2024-03-07**|**Wiki-TabNER:Advancing Table Interpretation Through Named Entity Recognition**|Aneta Koleva et.al.|[2403.04577v1](http://arxiv.org/abs/2403.04577v1)|**[link](https://github.com/table-interpretation/wiki_table_ner)**|\n", "2403.04481": "|**2024-03-08**|**Do Large Language Model Understand Multi-Intent Spoken Language ?**|Shangjian Yin et.al.|[2403.04481v2](http://arxiv.org/abs/2403.04481v2)|**[link](https://github.com/SJY8460/SLM)**|\n", "2403.04325": "|**2024-03-18**|**Measuring Meaning Composition in the Human Brain with Composition Scores from Large Language Models**|Changjiang Gao et.al.|[2403.04325v2](http://arxiv.org/abs/2403.04325v2)|null|\n", "2403.05135": "|**2024-03-08**|**ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment**|Xiwei Hu et.al.|[2403.05135v1](http://arxiv.org/abs/2403.05135v1)|null|\n", "2403.04974": "|**2024-03-11**|**Embracing Large Language and Multimodal Models for Prosthetic Technologies**|Sharmita Dey et.al.|[2403.04974v2](http://arxiv.org/abs/2403.04974v2)|null|\n", "2403.04957": "|**2024-03-07**|**Automatic and Universal Prompt Injection Attacks against Large Language Models**|Xiaogeng Liu et.al.|[2403.04957v1](http://arxiv.org/abs/2403.04957v1)|**[link](https://github.com/sheltonliu-n/universal-prompt-injection)**|\n", "2403.06728": "|**2024-03-11**|**Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning**|Zijian Zhou et.al.|[2403.06728v1](http://arxiv.org/abs/2403.06728v1)|null|\n", "2403.06660": "|**2024-03-11**|**FashionReGen: LLM-Empowered Fashion Report Generation**|Yujuan Ding et.al.|[2403.06660v1](http://arxiv.org/abs/2403.06660v1)|null|\n", "2403.06201": "|**2024-03-10**|**Are You Being Tracked? Discover the Power of Zero-Shot Trajectory Tracing with LLMs!**|Huanqi Yang et.al.|[2403.06201v1](http://arxiv.org/abs/2403.06201v1)|null|\n", "2403.06070": "|**2024-03-10**|**Reframe Anything: LLM Agent for Open World Video Reframing**|Jiawang Cao et.al.|[2403.06070v1](http://arxiv.org/abs/2403.06070v1)|null|\n", "2403.05816": "|**2024-03-09**|**LEVA: Using Large Language Models to Enhance Visual Analytics**|Yuheng Zhao et.al.|[2403.05816v1](http://arxiv.org/abs/2403.05816v1)|null|\n", "2403.05636": "|**2024-03-08**|**Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach**|Zhen Tan et.al.|[2403.05636v1](http://arxiv.org/abs/2403.05636v1)|null|\n", "2403.07376": "|**2024-03-12**|**NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning**|Bingqian Lin et.al.|[2403.07376v1](http://arxiv.org/abs/2403.07376v1)|**[link](https://github.com/expectorlin/navcot)**|\n", "2403.07039": "|**2024-03-11**|**From English to ASIC: Hardware Implementation with Large Language Model**|Emil Goh et.al.|[2403.07039v1](http://arxiv.org/abs/2403.07039v1)|**[link](https://huggingface.co/emilgoh/mistral-verilog)**|\n", "2403.08396": "|**2024-03-13**|**A Picture Is Worth a Thousand Words: Exploring Diagram and Video-Based OOP Exercises to Counter LLM Over-Reliance**|Bruno Pereira Cipriano et.al.|[2403.08396v1](http://arxiv.org/abs/2403.08396v1)|null|\n", "2403.08189": "|**2024-03-13**|**Embedded Translations for Low-resource Automated Glossing**|Changbing Yang et.al.|[2403.08189v1](http://arxiv.org/abs/2403.08189v1)|null|\n", "2403.09410": "|**2024-03-14**|**XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization**|Yequan Bie et.al.|[2403.09410v1](http://arxiv.org/abs/2403.09410v1)|null|\n", "2403.09072": "|**2024-03-14**|**UniCode: Learning a Unified Codebook for Multimodal Large Language Models**|Sipeng Zheng et.al.|[2403.09072v1](http://arxiv.org/abs/2403.09072v1)|null|\n", "2403.08820": "|**2024-02-21**|**Diet-ODIN: A Novel Framework for Opioid Misuse Detection with Interpretable Dietary Patterns**|Zheyuan Zhang et.al.|[2403.08820v1](http://arxiv.org/abs/2403.08820v1)|**[link](https://github.com/jasonzhangzy1757/diet-odin)**|\n", "2403.10351": "|**2024-03-15**|**TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale**|Pengcheng Jiang et.al.|[2403.10351v1](http://arxiv.org/abs/2403.10351v1)|null|\n", "2403.09747": "|**2024-03-14**|**Re-Search for The Truth: Multi-round Retrieval-augmented Large Language Models are Strong Fake News Detectors**|Guanghua Li et.al.|[2403.09747v1](http://arxiv.org/abs/2403.09747v1)|null|\n", "2403.11401": "|**2024-03-22**|**Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning**|Rao Fu et.al.|[2403.11401v2](http://arxiv.org/abs/2403.11401v2)|null|\n", "2403.11289": "|**2024-03-17**|**ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models**|Siyuan Huang et.al.|[2403.11289v1](http://arxiv.org/abs/2403.11289v1)|**[link](https://github.com/siyuanhuang95/manipvqa)**|\n", "2403.10949": "|**2024-03-26**|**SelfIE: Self-Interpretation of Large Language Model Embeddings**|Haozhe Chen et.al.|[2403.10949v2](http://arxiv.org/abs/2403.10949v2)|**[link](https://github.com/tonychenxyz/selfie)**|\n", "2403.10854": "|**2024-03-16**|**A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment**|Tianhe Wu et.al.|[2403.10854v1](http://arxiv.org/abs/2403.10854v1)|**[link](https://github.com/tianhewu/mllms-for-iqa)**|\n", "2403.10779": "|**2024-03-16**|**LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices**|Jingping Nie et.al.|[2403.10779v1](http://arxiv.org/abs/2403.10779v1)|null|\n", "2403.10762": "|**2024-03-16**|**NARRATE: Versatile Language Architecture for Optimal Control in Robotics**|Seif Ismail et.al.|[2403.10762v1](http://arxiv.org/abs/2403.10762v1)|null|\n", "2403.10707": "|**2024-03-15**|**Uncovering Latent Themes of Messaging on Social Media by Integrating LLMs: A Case Study on Climate Campaigns**|Tunazzina Islam et.al.|[2403.10707v1](http://arxiv.org/abs/2403.10707v1)|null|\n", "2403.10581": "|**2024-03-22**|**Large Language Model-informed ECG Dual Attention Network for Heart Failure Risk Prediction**|Chen Chen et.al.|[2403.10581v2](http://arxiv.org/abs/2403.10581v2)|null|\n", "2403.12920": "|**2024-03-19**|**Semantic Layering in Room Segmentation via LLMs**|Taehyeon Kim et.al.|[2403.12920v1](http://arxiv.org/abs/2403.12920v1)|null|\n", "2403.12675": "|**2024-03-19**|**Pragmatic Competence Evaluation of Large Language Models for Korean**|Dojun Park et.al.|[2403.12675v1](http://arxiv.org/abs/2403.12675v1)|null|\n", "2403.12627": "|**2024-04-02**|**Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code**|Andreas Florath et.al.|[2403.12627v2](http://arxiv.org/abs/2403.12627v2)|null|\n", "2403.12582": "|**2024-03-19**|**AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework**|Xiang Li et.al.|[2403.12582v1](http://arxiv.org/abs/2403.12582v1)|**[link](https://github.com/alphafin-proj/alphafin)**|\n", "2403.12451": "|**2024-03-19**|**INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations**|Lirui Luo et.al.|[2403.12451v1](http://arxiv.org/abs/2403.12451v1)|null|\n", "2403.12403": "|**2024-03-19**|**Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales**|Ayushi Nirmal et.al.|[2403.12403v1](http://arxiv.org/abs/2403.12403v1)|null|\n", "2403.12388": "|**2024-03-19**|**Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models**|Ying-Chun Lin et.al.|[2403.12388v1](http://arxiv.org/abs/2403.12388v1)|null|\n", "2403.11896": "|**2024-04-02**|**Investigating Markers and Drivers of Gender Bias in Machine Translations**|Peter J Barclay et.al.|[2403.11896v2](http://arxiv.org/abs/2403.11896v2)|null|\n", "2403.11810": "|**2024-03-18**|**Metaphor Understanding Challenge Dataset for LLMs**|Xiaoyu Tong et.al.|[2403.11810v1](http://arxiv.org/abs/2403.11810v1)|null|\n", "2403.13593": "|**2024-03-20**|**Encoding the Subsurface in 3D with Seismic**|Ben Lasscock et.al.|[2403.13593v1](http://arxiv.org/abs/2403.13593v1)|null|\n", "2403.13446": "|**2024-03-20**|**IndiTag: An Online Media Bias Analysis and Annotation System Using Fine-Grained Bias Indicators**|Luyang Lin et.al.|[2403.13446v1](http://arxiv.org/abs/2403.13446v1)|**[link](https://github.com/lylin0/inditag)**|\n", "2403.13073": "|**2024-03-19**|**A Canary in the AI Coal Mine: American Jews May Be Disproportionately Harmed by Intellectual Property Dispossession in Large Language Model Training**|Heila Precel et.al.|[2403.13073v1](http://arxiv.org/abs/2403.13073v1)|null|\n", "2403.13002": "|**2024-04-02**|**AutoTRIZ: Artificial Ideation with TRIZ and Large Language Models**|Shuo Jiang et.al.|[2403.13002v2](http://arxiv.org/abs/2403.13002v2)|null|\n", "2403.14624": "|**2024-03-21**|**MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?**|Renrui Zhang et.al.|[2403.14624v1](http://arxiv.org/abs/2403.14624v1)|null|\n", "2403.14243": "|**2024-03-21**|**Dermacen Analytica: A Novel Methodology Integrating Multi-Modal Large Language Models with Machine Learning in tele-dermatology**|Dimitrios P. Panagoulias et.al.|[2403.14243v1](http://arxiv.org/abs/2403.14243v1)|null|\n", "2403.14171": "|**2024-04-08**|**MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation**|Longzheng Wang et.al.|[2403.14171v3](http://arxiv.org/abs/2403.14171v3)|**[link](https://github.com/wishever/mmidr)**|\n", "2403.13900": "|**2024-03-20**|**CoMo: Controllable Motion Generation through Language Guided Pose Code Editing**|Yiming Huang et.al.|[2403.13900v1](http://arxiv.org/abs/2403.13900v1)|null|\n", "2403.15371": "|**2024-03-22**|**Can large language models explore in-context?**|Akshay Krishnamurthy et.al.|[2403.15371v1](http://arxiv.org/abs/2403.15371v1)|null|\n", "2403.15157": "|**2024-04-03**|**AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models**|Chaoyun Zhang et.al.|[2403.15157v2](http://arxiv.org/abs/2403.15157v2)|null|\n", "2403.15076": "|**2024-03-22**|**Comprehensive Lipidomic Automation Workflow using Large Language Models**|Connor Beveridge et.al.|[2403.15076v1](http://arxiv.org/abs/2403.15076v1)|null|\n", "2403.16687": "|**2024-04-22**|**Investigation of the effectiveness of applying ChatGPT in Dialogic Teaching Using Electroencephalography**|Jiayue Zhang et.al.|[2403.16687v3](http://arxiv.org/abs/2403.16687v3)|null|\n", "2403.16097": "|**2024-03-28**|**Can Language Models Pretend Solvers? Logic Code Simulation with LLMs**|Minyu Chen et.al.|[2403.16097v2](http://arxiv.org/abs/2403.16097v2)|null|\n", "2403.15822": "|**2024-04-15**|**Computational Sentence-level Metrics Predicting Human Sentence Comprehension**|Kun Sun et.al.|[2403.15822v2](http://arxiv.org/abs/2403.15822v2)|null|\n", "2403.15715": "|**2024-03-23**|**EDDA: A Encoder-Decoder Data Augmentation Framework for Zero-Shot Stance Detection**|Daijun Ding et.al.|[2403.15715v1](http://arxiv.org/abs/2403.15715v1)|**[link](https://github.com/szu-ddj/edda)**|\n", "2403.15528": "|**2024-04-03**|**Evaluating GPT-4 with Vision on Detection of Radiological Findings on Chest Radiographs**|Yiliang Zhou et.al.|[2403.15528v2](http://arxiv.org/abs/2403.15528v2)|null|\n", "2403.15491": "|**2024-03-21**|**Open Source Conversational LLMs do not know most Spanish words**|Javier Conde et.al.|[2403.15491v1](http://arxiv.org/abs/2403.15491v1)|null|\n", "2403.15434": "|**2024-03-15**|**ChatPattern: Layout Pattern Customization via Natural Language**|Zixiao Wang et.al.|[2403.15434v1](http://arxiv.org/abs/2403.15434v1)|null|\n", "2403.17787": "|**2024-03-26**|**Evaluating the Efficacy of Prompt-Engineered Large Multimodal Models Versus Fine-Tuned Vision Transformers in Image-Based Security Applications**|Fouad Trad et.al.|[2403.17787v1](http://arxiv.org/abs/2403.17787v1)|null|\n", "2403.17209": "|**2024-03-25**|**Generation of Asset Administration Shell with Large Language Model Agents: Interoperability in Digital Twins with Semantic Node**|Yuchen Xia et.al.|[2403.17209v1](http://arxiv.org/abs/2403.17209v1)|null|\n", "2403.17125": "|**2024-03-25**|**The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition**|Georgios Chochlakis et.al.|[2403.17125v1](http://arxiv.org/abs/2403.17125v1)|null|\n", "2403.17124": "|**2024-03-25**|**Grounding Language Plans in Demonstrations Through Counterfactual Perturbations**|Yanwei Wang et.al.|[2403.17124v1](http://arxiv.org/abs/2403.17124v1)|null|\n", "2403.16999": "|**2024-03-25**|**Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models**|Hao Shao et.al.|[2403.16999v1](http://arxiv.org/abs/2403.16999v1)|**[link](https://github.com/deepcs233/visual-cot)**|\n", "2403.16921": "|**2024-03-25**|**PropTest: Automatic Property Testing for Improved Visual Programming**|Jaywon Koo et.al.|[2403.16921v1](http://arxiv.org/abs/2403.16921v1)|null|\n", "2403.18771": "|**2024-03-27**|**CheckEval: Robust Evaluation Framework using Large Language Model via Checklist**|Yukyung Lee et.al.|[2403.18771v1](http://arxiv.org/abs/2403.18771v1)|null|\n", "2403.18346": "|**2024-04-03**|**Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective**|Meiqi Chen et.al.|[2403.18346v3](http://arxiv.org/abs/2403.18346v3)|null|\n", "2403.18344": "|**2024-03-27**|**LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models**|Mingxing Peng et.al.|[2403.18344v1](http://arxiv.org/abs/2403.18344v1)|null|\n", "2403.18327": "|**2024-03-27**|**Can LLMs Converse Formally? Automatically Assessing LLMs in Translating and Interpreting Formal Specifications**|Rushang Karia et.al.|[2403.18327v1](http://arxiv.org/abs/2403.18327v1)|null|\n", "2403.19646": "|**2024-04-01**|**Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis**|Chenyang Liu et.al.|[2403.19646v2](http://arxiv.org/abs/2403.19646v2)|**[link](https://github.com/chen-yang-liu/change-agent)**|\n", "2403.19103": "|**2024-03-28**|**Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation**|Yutong He et.al.|[2403.19103v1](http://arxiv.org/abs/2403.19103v1)|null|\n", "2403.18969": "|**2024-03-27**|**A Survey on Large Language Models from Concept to Implementation**|Chen Wang et.al.|[2403.18969v1](http://arxiv.org/abs/2403.18969v1)|null|\n", "2403.20134": "|**2024-03-29**|**User Modeling Challenges in Interactive AI Assistant Systems**|Megan Su et.al.|[2403.20134v1](http://arxiv.org/abs/2403.20134v1)|null|\n", "2403.19838": "|**2024-03-28**|**Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving**|Akshay Gopalkrishnan et.al.|[2403.19838v1](http://arxiv.org/abs/2403.19838v1)|**[link](https://github.com/akshaygopalkr/em-vlm4ad)**|\n", "2403.19783": "|**2024-03-28**|**AlloyBERT: Alloy Property Prediction with Large Language Models**|Akshat Chaudhari et.al.|[2403.19783v1](http://arxiv.org/abs/2403.19783v1)|null|\n", "2403.19735": "|**2024-03-28**|**Enhancing Anomaly Detection in Financial Markets with an LLM-based Multi-Agent Framework**|Taejin Park et.al.|[2403.19735v1](http://arxiv.org/abs/2403.19735v1)|null|\n", "2404.01940": "|**2024-04-02**|**Towards Better Understanding of Cybercrime: The Role of Fine-Tuned LLMs in Translation**|Veronica Valeros et.al.|[2404.01940v1](http://arxiv.org/abs/2404.01940v1)|null|\n", "2404.01644": "|**2024-04-02**|**InsightLens: Discovering and Exploring Insights from Conversational Contexts in Large-Language-Model-Powered Data Analysis**|Luoxuan Weng et.al.|[2404.01644v1](http://arxiv.org/abs/2404.01644v1)|null|\n", "2404.01063": "|**2024-04-01**|**Chat Modeling: Natural Language-based Procedural Modeling of Biological Structures without Training**|Donggang Jia et.al.|[2404.01063v1](http://arxiv.org/abs/2404.01063v1)|null|\n", "2404.01019": "|**2024-04-11**|**Source-Aware Training Enables Knowledge Attribution in Language Models**|Muhammad Khalifa et.al.|[2404.01019v2](http://arxiv.org/abs/2404.01019v2)|**[link](https://github.com/mukhal/intrinsic-source-citation)**|\n", "2404.01012": "|**2024-04-01**|**Query Performance Prediction using Relevance Judgments Generated by Large Language Models**|Chuan Meng et.al.|[2404.01012v1](http://arxiv.org/abs/2404.01012v1)|**[link](https://github.com/chuanmeng/qpp-genre)**|\n", "2404.00990": "|**2024-04-01**|**Exploring the Nexus of Large Language Models and Legal Systems: A Short Survey**|Weicong Qin et.al.|[2404.00990v1](http://arxiv.org/abs/2404.00990v1)|null|\n", "2404.00589": "|**2024-04-12**|**Harnessing the Power of Large Language Model for Uncertainty Aware Graph Processing**|Zhenyu Qian et.al.|[2404.00589v2](http://arxiv.org/abs/2404.00589v2)|**[link](https://github.com/code4paper-2024/code4paper)**|\n", "2404.00489": "|**2024-03-30**|**PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression**|Muhammad Asif Ali et.al.|[2404.00489v1](http://arxiv.org/abs/2404.00489v1)|null|\n", "2404.00419": "|**2024-03-30**|**Do Vision-Language Models Understand Compound Nouns?**|Sonal Kumar et.al.|[2404.00419v1](http://arxiv.org/abs/2404.00419v1)|null|\n", "2404.00209": "|**2024-03-30**|**EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs**|Cheng Jiayang et.al.|[2404.00209v1](http://arxiv.org/abs/2404.00209v1)|**[link](https://github.com/hkust-knowcomp/eventground)**|\n", "2404.01332": "|**2024-03-29**|**Wait, It's All Token Noise? Always Has Been: Interpreting LLM Behavior Using Shapley Value**|Behnam Mohammadi et.al.|[2404.01332v1](http://arxiv.org/abs/2404.01332v1)|null|\n", "2404.02323": "|**2024-04-13**|**Toward Informal Language Processing: Knowledge of Slang in Large Language Models**|Zhewei Sun et.al.|[2404.02323v2](http://arxiv.org/abs/2404.02323v2)|null|\n", "2404.02318": "|**2024-04-02**|**ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models**|Vishnunandan L. N. Venkatesh et.al.|[2404.02318v1](http://arxiv.org/abs/2404.02318v1)|null|\n", "2403.11322": "|**2024-04-10**|**StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows**|Yiran Wu et.al.|[2403.11322v3](http://arxiv.org/abs/2403.11322v3)|**[link](https://github.com/kevin666aa/stateflow)**|\n", "2404.03623": "|**2024-04-04**|**Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph**|Marco Bronzini et.al.|[2404.03623v1](http://arxiv.org/abs/2404.03623v1)|null|\n", "2404.03570": "|**2024-04-04**|**Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity**|Jake Varley et.al.|[2404.03570v1](http://arxiv.org/abs/2404.03570v1)|null|\n", "2404.03118": "|**2024-04-03**|**LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models**|Gabriela Ben Melech Stan et.al.|[2404.03118v1](http://arxiv.org/abs/2404.03118v1)|null|\n", "2404.02983": "|**2024-04-03**|**Towards a Fully Interpretable and More Scalable RSA Model for Metaphor Understanding**|Gaia Carenini et.al.|[2404.02983v1](http://arxiv.org/abs/2404.02983v1)|null|\n", "2404.02937": "|**2024-04-13**|**Explainable Traffic Flow Prediction with Large Language Models**|Xusen Guo et.al.|[2404.02937v3](http://arxiv.org/abs/2404.02937v3)|null|\n", "2404.04068": "|**2024-04-05**|**Assessing the quality of information extraction**|Filip Seitl et.al.|[2404.04068v1](http://arxiv.org/abs/2404.04068v1)|null|\n", "2404.05225": "|**2024-04-08**|**LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding**|Chuwei Luo et.al.|[2404.05225v1](http://arxiv.org/abs/2404.05225v1)|**[link](https://github.com/alibabaresearch/advancedliteratemachinery)**|\n", "2404.05221": "|**2024-04-08**|**LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models**|Shibo Hao et.al.|[2404.05221v1](http://arxiv.org/abs/2404.05221v1)|null|\n", "2404.05052": "|**2024-04-07**|**Facial Affective Behavior Analysis with Instruction Tuning**|Yifan Li et.al.|[2404.05052v1](http://arxiv.org/abs/2404.05052v1)|null|\n", "2404.04817": "|**2024-04-07**|**FRACTAL: Fine-Grained Scoring from Aggregate Text Labels**|Yukti Makhija et.al.|[2404.04817v1](http://arxiv.org/abs/2404.04817v1)|null|\n", "2404.04689": "|**2024-04-06**|**Multicalibration for Confidence Scoring in LLMs**|Gianluca Detommaso et.al.|[2404.04689v1](http://arxiv.org/abs/2404.04689v1)|null|\n", "2404.04667": "|**2024-04-06**|**Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology**|Dyke Ferber et.al.|[2404.04667v1](http://arxiv.org/abs/2404.04667v1)|null|\n", "2404.04619": "|**2024-04-06**|**Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model**|Zhonghan Zhao et.al.|[2404.04619v1](http://arxiv.org/abs/2404.04619v1)|null|\n", "2404.04332": "|**2024-04-05**|**Scope Ambiguities in Large Language Models**|Gaurav Kamath et.al.|[2404.04332v1](http://arxiv.org/abs/2404.04332v1)|**[link](https://github.com/mcgill-nlp/scope-ambiguity)**|\n", "2404.06370": "|**2024-04-09**|**Enhancing Decision Analysis with a Large Language Model: pyDecision a Comprehensive Library of MCDA Methods in Python**|Valdecy Pereira et.al.|[2404.06370v1](http://arxiv.org/abs/2404.06370v1)|**[link](https://github.com/Valdecy/pyDecision)**|\n", "2404.06345": "|**2024-04-21**|**AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning**|Senkang Hu et.al.|[2404.06345v2](http://arxiv.org/abs/2404.06345v2)|null|\n", "2404.06332": "|**2024-04-07**|**X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model**|Jan Held et.al.|[2404.06332v1](http://arxiv.org/abs/2404.06332v1)|null|\n", "2404.06904": "|**2024-04-10**|**Vision-Language Model-based Physical Reasoning for Robot Liquid Perception**|Wenqiang Lai et.al.|[2404.06904v1](http://arxiv.org/abs/2404.06904v1)|null|\n", "2404.06644": "|**2024-04-09**|**Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?**|Omid Ghahroodi et.al.|[2404.06644v1](http://arxiv.org/abs/2404.06644v1)|null|\n", "2404.06571": "|**2024-04-09**|**Building A Knowledge Graph to Enrich ChatGPT Responses in Manufacturing Service Discovery**|Yunqing Li et.al.|[2404.06571v1](http://arxiv.org/abs/2404.06571v1)|null|\n", "2404.07917": "|**2024-04-11**|**DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation**|Anna C. Doris et.al.|[2404.07917v1](http://arxiv.org/abs/2404.07917v1)|**[link](https://github.com/anniedoris/design_qa)**|\n", "2404.07717": "|**2024-04-12**|**Reflectance Estimation for Proximity Sensing by Vision-Language Models: Utilizing Distributional Semantics for Low-Level Cognition in Robotics**|Masashi Osada et.al.|[2404.07717v2](http://arxiv.org/abs/2404.07717v2)|**[link](https://github.com/osada-m/reflectanceestimationbychatgpt)**|\n", "2404.07499": "|**2024-04-11**|**Can Large Language Models Assess Serendipity in Recommender Systems?**|Yu Tokutake et.al.|[2404.07499v1](http://arxiv.org/abs/2404.07499v1)|null|\n", "2404.07960": "|**2024-03-22**|**Content Knowledge Identification with Multi-Agent Large Language Models (LLMs)**|Kaiqi Yang et.al.|[2404.07960v1](http://arxiv.org/abs/2404.07960v1)|null|\n", "2404.08517": "|**2024-04-12**|**Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward**|Xuan Xie et.al.|[2404.08517v1](http://arxiv.org/abs/2404.08517v1)|null|\n", "2404.08424": "|**2024-04-12**|**Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task**|Hassan Ali et.al.|[2404.08424v1](http://arxiv.org/abs/2404.08424v1)|null|\n", "2404.09705": "|**2024-04-15**|**Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction**|David Sobr\u00edn-Hidalgo et.al.|[2404.09705v1](http://arxiv.org/abs/2404.09705v1)|null|\n", "2404.09632": "|**2024-04-15**|**Bridging Vision and Language Spaces with Assignment Prediction**|Jungin Park et.al.|[2404.09632v1](http://arxiv.org/abs/2404.09632v1)|**[link](https://github.com/park-jungin/vlap)**|\n", "2404.09486": "|**2024-04-15**|**MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems**|Kaixin Li et.al.|[2404.09486v1](http://arxiv.org/abs/2404.09486v1)|**[link](https://github.com/happylkx/mmcode)**|\n", "2404.09135": "|**2024-04-14**|**Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions**|Taojun Hu et.al.|[2404.09135v1](http://arxiv.org/abs/2404.09135v1)|null|\n", "2404.08978": "|**2024-04-17**|**Incremental Residual Concept Bottleneck Models**|Chenming Shang et.al.|[2404.08978v2](http://arxiv.org/abs/2404.08978v2)|null|\n", "2404.08885": "|**2024-04-13**|**Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension**|Mengnan Qi et.al.|[2404.08885v1](http://arxiv.org/abs/2404.08885v1)|null|\n", "2404.08767": "|**2024-04-12**|**LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning**|Junchi Wang et.al.|[2404.08767v1](http://arxiv.org/abs/2404.08767v1)|**[link](https://github.com/wangjunchi/llmseg)**|\n", "2404.08727": "|**2024-04-12**|**Can LLMs substitute SQL? Comparing Resource Utilization of Querying LLMs versus Traditional Relational Databases**|Xiang Zhang et.al.|[2404.08727v1](http://arxiv.org/abs/2404.08727v1)|null|\n", "2404.08674": "|**2024-04-05**|**Effects of Different Prompts on the Quality of GPT-4 Responses to Dementia Care Questions**|Zhuochun Li et.al.|[2404.08674v1](http://arxiv.org/abs/2404.08674v1)|null|\n", "2404.08656": "|**2024-03-25**|**Linear Cross-document Event Coreference Resolution with X-AMR**|Shafiuddin Rehan Ahmed et.al.|[2404.08656v1](http://arxiv.org/abs/2404.08656v1)|**[link](https://github.com/ahmeshaf/gpt_coref)**|\n", "2404.10595": "|**2024-04-16**|**Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases**|Yanze Li et.al.|[2404.10595v1](http://arxiv.org/abs/2404.10595v1)|null|\n", "2404.10552": "|**2024-04-16**|**Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning**|Xiao Wang et.al.|[2404.10552v1](http://arxiv.org/abs/2404.10552v1)|null|\n", "2404.09941": "|**2024-04-15**|**Evolving Interpretable Visual Classifiers with Large Language Models**|Mia Chiquier et.al.|[2404.09941v1](http://arxiv.org/abs/2404.09941v1)|null|\n", "2404.09866": "|**2024-04-15**|**Reimagining Self-Adaptation in the Age of Large Language Models**|Raghav Donakanti et.al.|[2404.09866v1](http://arxiv.org/abs/2404.09866v1)|null|\n", "2404.09836": "|**2024-04-16**|**How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models**|Xiuwei Shang et.al.|[2404.09836v2](http://arxiv.org/abs/2404.09836v2)|null|\n", "2404.09754": "|**2024-04-15**|**Resilience of Large Language Models for Noisy Instructions**|Bin Wang et.al.|[2404.09754v1](http://arxiv.org/abs/2404.09754v1)|null|\n", "2404.12372": "|**2024-04-18**|**MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale**|Xiaotang Gai et.al.|[2404.12372v1](http://arxiv.org/abs/2404.12372v1)|null|\n", "2404.12317": "|**2024-04-23**|**Large Language Models for Synthetic Participatory Planning of Synergistic Transportation Systems**|Jiangbo Yu et.al.|[2404.12317v3](http://arxiv.org/abs/2404.12317v3)|null|\n", "2404.12299": "|**2024-04-18**|**Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair**|Yusuke Sakai et.al.|[2404.12299v1](http://arxiv.org/abs/2404.12299v1)|null|\n", "2404.12259": "|**2024-04-18**|**Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM**|Michelle S. Lam et.al.|[2404.12259v1](http://arxiv.org/abs/2404.12259v1)|**[link](https://github.com/michelle123lam/lloom)**|\n", "2404.11978": "|**2024-04-18**|**EVIT: Event-Oriented Instruction Tuning for Event Reasoning**|Zhengwei Tao et.al.|[2404.11978v1](http://arxiv.org/abs/2404.11978v1)|null|\n", "2404.11972": "|**2024-04-18**|**Aligning Language Models to Explicitly Handle Ambiguity**|Hyuhng Joon Kim et.al.|[2404.11972v1](http://arxiv.org/abs/2404.11972v1)|null|\n", "2404.11875": "|**2024-04-18**|**Concept Induction using LLMs: a user experiment for assessment**|Adrita Barua et.al.|[2404.11875v1](http://arxiv.org/abs/2404.11875v1)|null|\n", "2404.11672": "|**2024-04-17**|**MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory**|Ali Modarressi et.al.|[2404.11672v1](http://arxiv.org/abs/2404.11672v1)|null|\n", "2404.12879": "|**2024-04-19**|**Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation**|Guanhua Chen et.al.|[2404.12879v1](http://arxiv.org/abs/2404.12879v1)|null|\n", "2404.12736": "|**2024-04-19**|**Large Language Model Supply Chain: A Research Agenda**|Shenao Wang et.al.|[2404.12736v1](http://arxiv.org/abs/2404.12736v1)|null|\n", "2404.12558": "|**2024-04-19**|**Just Like Me: The Role of Opinions and Personal Experiences in The Perception of Explanations in Subjective Decision-Making**|Sharon Ferguson et.al.|[2404.12558v1](http://arxiv.org/abs/2404.12558v1)|null|\n", "2404.12494": "|**2024-04-18**|**BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models**|Yu Feng et.al.|[2404.12494v1](http://arxiv.org/abs/2404.12494v1)|null|\n", "2404.13999": "|**2024-04-22**|**CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment**|Kanglei Zhou et.al.|[2404.13999v1](http://arxiv.org/abs/2404.13999v1)|**[link](https://github.com/zhoukanglei/cofinal_aqa)**|\n", "2404.13752": "|**2024-05-23**|**Towards General Conceptual Model Editing via Adversarial Representation Engineering**|Yihao Zhang et.al.|[2404.13752v2](http://arxiv.org/abs/2404.13752v2)|**[link](https://github.com/zhang-yihao/adversarial-representation-engineering)**|\n", "2404.13671": "|**2024-04-21**|**FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization**|Zhaopeng Gu et.al.|[2404.13671v1](http://arxiv.org/abs/2404.13671v1)|null|\n", "2404.13660": "|**2024-04-21**|**Trojan Detection in Large Language Models: Insights from The Trojan Detection Challenge**|Narek Maloyan et.al.|[2404.13660v1](http://arxiv.org/abs/2404.13660v1)|null|\n", "2404.13556": "|**2024-04-21**|**ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval**|Kelong Mao et.al.|[2404.13556v1](http://arxiv.org/abs/2404.13556v1)|**[link](https://github.com/kyriemao/chatretriever)**|\n", "2404.13409": "|**2024-04-20**|**\"I Wish There Were an AI\": Challenges and AI Potential in Cancer Patient-Provider Communication**|Ziqi Yang et.al.|[2404.13409v1](http://arxiv.org/abs/2404.13409v1)|null|\n", "2404.13340": "|**2024-04-20**|**Large Language Models as Test Case Generators: Performance Evaluation and Enhancement**|Kefan Li et.al.|[2404.13340v1](http://arxiv.org/abs/2404.13340v1)|null|\n", "2404.13161": "|**2024-04-19**|**CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models**|Manish Bhatt et.al.|[2404.13161v1](http://arxiv.org/abs/2404.13161v1)|**[link](https://github.com/facebookresearch/purplellama)**|\n", "2404.15269": "|**2024-04-23**|**Aligning LLM Agents by Learning Latent Preference from User Edits**|Ge Gao et.al.|[2404.15269v1](http://arxiv.org/abs/2404.15269v1)|**[link](https://github.com/gao-g/prelude)**|\n", "2404.14883": "|**2024-04-23**|**Language in Vivo vs. in Silico: Size Matters but Larger Language Models Still Do Not Comprehend Language on a Par with Humans**|Vittoria Dentella et.al.|[2404.14883v1](http://arxiv.org/abs/2404.14883v1)|null|\n", "2404.14705": "|**2024-04-23**|**Think-Program-reCtify: 3D Situated Reasoning with Large Language Models**|Qingrong He et.al.|[2404.14705v1](http://arxiv.org/abs/2404.14705v1)|null|\n", "2404.14604": "|**2024-04-26**|**Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training**|Mengzhao Jia et.al.|[2404.14604v3](http://arxiv.org/abs/2404.14604v3)|null|\n", "2404.14547": "|**2024-04-22**|**Integrating Disambiguation and User Preferences into Large Language Models for Robot Motion Planning**|Mohammed Abugurain et.al.|[2404.14547v1](http://arxiv.org/abs/2404.14547v1)|null|\n", "2404.15166": "|**2024-04-22**|**Pixels and Predictions: Potential of GPT-4V in Meteorological Imagery Analysis and Forecast Communication**|John R. Lawson et.al.|[2404.15166v1](http://arxiv.org/abs/2404.15166v1)|null|\n", "2404.15549": "|**2024-04-27**|**PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models**|Shashi Kant Gupta et.al.|[2404.15549v2](http://arxiv.org/abs/2404.15549v2)|null|\n", "2404.15310": "|**2024-04-01**|**Automated Assessment of Encouragement and Warmth in Classrooms Leveraging Multimodal Emotional Features and ChatGPT**|Ruikun Hou et.al.|[2404.15310v1](http://arxiv.org/abs/2404.15310v1)|null|\n", "2404.16754": "|**2024-04-25**|**RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis**|Xiaoman Zhang et.al.|[2404.16754v1](http://arxiv.org/abs/2404.16754v1)|null|\n", "2404.16651": "|**2024-04-25**|**Evolutionary Large Language Models for Hardware Security: A Comparative Survey**|Mohammad Akyash et.al.|[2404.16651v1](http://arxiv.org/abs/2404.16651v1)|null|\n", "2404.16262": "|**2024-04-25**|**Interpreting Answers to Yes-No Questions in Dialogues from Multiple Domains**|Zijie Wang et.al.|[2404.16262v1](http://arxiv.org/abs/2404.16262v1)|**[link](https://github.com/wang-zijie/yn-question-multi-domains)**|\n", "2404.15650": "|**2024-04-24**|**Return of EM: Entity-driven Answer Set Expansion for QA Evaluation**|Dongryeol Lee et.al.|[2404.15650v1](http://arxiv.org/abs/2404.15650v1)|null|\n", "2404.17524": "|**2024-04-29**|**On the Use of Large Language Models to Generate Capability Ontologies**|Luis Miguel Vieira da Silva et.al.|[2404.17524v2](http://arxiv.org/abs/2404.17524v2)|null|\n", "2404.17136": "|**2024-04-26**|**Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study**|Yang Wu et.al.|[2404.17136v1](http://arxiv.org/abs/2404.17136v1)|**[link](https://github.com/CGCL-codes/naturalcc)**|\n", "2404.17017": "|**2024-04-25**|**AutoGenesisAgent: Self-Generating Multi-Agent Systems for Complex Tasks**|Jeremy Harper et.al.|[2404.17017v1](http://arxiv.org/abs/2404.17017v1)|null|\n", "2404.16906": "|**2024-04-25**|**Evolve Cost-aware Acquisition Functions Using Large Language Models**|Yiming Yao et.al.|[2404.16906v1](http://arxiv.org/abs/2404.16906v1)|null|\n", "2404.16859": "|**2024-04-11**|**Rumour Evaluation with Very Large Language Models**|Dahlia Shehata et.al.|[2404.16859v1](http://arxiv.org/abs/2404.16859v1)|**[link](https://github.com/dahlia-chehata/rumoureval-with-vllms)**|\n", "2404.18816": "|**2024-04-29**|**AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering**|Wenxiang Zhao et.al.|[2404.18816v1](http://arxiv.org/abs/2404.18816v1)|null|\n", "2404.18766": "|**2024-04-29**|**PECC: Problem Extraction and Coding Challenges**|Patrick Haller et.al.|[2404.18766v1](http://arxiv.org/abs/2404.18766v1)|**[link](https://github.com/hallerpatrick/pecc)**|\n", "2404.18466": "|**2024-04-29**|**HFT: Half Fine-Tuning for Large Language Models**|Tingfeng Hui et.al.|[2404.18466v1](http://arxiv.org/abs/2404.18466v1)|null|\n", "2404.18130": "|**2024-04-28**|**Logic Agent: Enhancing Validity with Logic Rule Invocation**|Hanmeng Liu et.al.|[2404.18130v1](http://arxiv.org/abs/2404.18130v1)|null|\n", "2404.17999": "|**2024-04-27**|**MediFact at MEDIQA-CORR 2024: Why AI Needs a Human Touch**|Nadia Saeed et.al.|[2404.17999v1](http://arxiv.org/abs/2404.17999v1)|**[link](https://github.com/nadiasaeed/medifact-mediqa-corr-2024)**|\n", "2404.17780": "|**2024-04-27**|**Verco: Learning Coordinated Verbal Communication for Multi-agent Reinforcement Learning**|Dapeng Li et.al.|[2404.17780v1](http://arxiv.org/abs/2404.17780v1)|null|\n", "2404.19744": "|**2024-04-30**|**PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification**|Leon Garza et.al.|[2404.19744v1](http://arxiv.org/abs/2404.19744v1)|null|\n", "2404.19438": "|**2024-05-22**|**Neuro-Vision to Language: Enhancing Visual Reconstruction and Language Interaction through Brain Recordings**|Guobin Shen et.al.|[2404.19438v3](http://arxiv.org/abs/2404.19438v3)|null|\n", "2404.19221": "|**2024-04-30**|**Transcrib3D: 3D Referring Expression Resolution through Large Language Models**|Jiading Fang et.al.|[2404.19221v1](http://arxiv.org/abs/2404.19221v1)|null|\n", "2404.19063": "|**2024-04-29**|**SuperCLUE-Fin: Graded Fine-Grained Analysis of Chinese LLMs on Diverse Financial Tasks and Applications**|Liang Xu et.al.|[2404.19063v1](http://arxiv.org/abs/2404.19063v1)|null|\n", "2405.00494": "|**2024-05-01**|**GOLD: Geometry Problem Solver with Natural Language Description**|Jiaxin Zhang et.al.|[2405.00494v1](http://arxiv.org/abs/2405.00494v1)|**[link](https://github.com/neurasearch/geometry-diagram-description)**|\n", "2405.00485": "|**2024-05-01**|**The Pyramid of Captions**|Delong Chen et.al.|[2405.00485v1](http://arxiv.org/abs/2405.00485v1)|null|\n", "2405.00435": "|**2024-05-01**|**CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model**|Wei Zhang et.al.|[2405.00435v1](http://arxiv.org/abs/2405.00435v1)|null|\n", "2405.02260": "|**2024-05-03**|**Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows**|Jasmine Y. Shih et.al.|[2405.02260v1](http://arxiv.org/abs/2405.02260v1)|null|\n", "2405.02079": "|**2024-05-03**|**Argumentative Large Language Models for Explainable and Contestable Decision-Making**|Gabriel Freedman et.al.|[2405.02079v1](http://arxiv.org/abs/2405.02079v1)|null|\n", "2405.01769": "|**2024-05-02**|**A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law**|Zhiyu Zoey Chen et.al.|[2405.01769v1](http://arxiv.org/abs/2405.01769v1)|null|\n", "2405.01744": "|**2024-05-02**|**ALCM: Autonomous LLM-Augmented Causal Discovery Framework**|Elahe Khatibi et.al.|[2405.01744v1](http://arxiv.org/abs/2405.01744v1)|null|\n", "2405.03688": "|**2024-05-06**|**Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames**|Keith Burghardt et.al.|[2405.03688v1](http://arxiv.org/abs/2405.03688v1)|**[link](https://github.com/KeithBurghardt/LLM_Coordination)**|\n", "2405.03553": "|**2024-05-23**|**AlphaMath Almost Zero: process Supervision without process**|Guoxin Chen et.al.|[2405.03553v2](http://arxiv.org/abs/2405.03553v2)|**[link](https://github.com/MARIO-Math-Reasoning/Super_MARIO)**|\n", "2405.03359": "|**2024-05-06**|**MedDoc-Bot: A Chat Tool for Comparative Analysis of Large Language Models in the Context of the Pediatric Hypertension Guideline**|Mohamed Yaseen Jabarulla et.al.|[2405.03359v1](http://arxiv.org/abs/2405.03359v1)|**[link](https://github.com/yaseen28/meddoc-bot)**|\n", "2405.03272": "|**2024-05-06**|**WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning**|Yuanhan Zhang et.al.|[2405.03272v1](http://arxiv.org/abs/2405.03272v1)|null|\n", "2405.03207": "|**2024-05-06**|**A Philosophical Introduction to Language Models - Part II: The Way Forward**|Rapha\u00ebl Milli\u00e8re et.al.|[2405.03207v1](http://arxiv.org/abs/2405.03207v1)|null|\n", "2405.03205": "|**2024-05-23**|**Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions**|Ruizhe Li et.al.|[2405.03205v2](http://arxiv.org/abs/2405.03205v2)|**[link](https://github.com/ruizheliuoa/anchored_bias_gpt2)**|\n", "2405.03153": "|**2024-05-06**|**Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines**|Md Main Uddin Rony et.al.|[2405.03153v1](http://arxiv.org/abs/2405.03153v1)|null|\n", "2405.03076": "|**2024-05-05**|**Traffic Performance GPT (TP-GPT): Real-Time Data Informed Intelligent ChatBot for Transportation Surveillance and Management**|Bingzhang Wang et.al.|[2405.03076v1](http://arxiv.org/abs/2405.03076v1)|null|\n", "2405.03066": "|**2024-05-22**|**A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs)**|Lingyao Li et.al.|[2405.03066v2](http://arxiv.org/abs/2405.03066v2)|null|\n", "2405.02801": "|**2024-05-07**|**Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models**|Tianze Xu et.al.|[2405.02801v2](http://arxiv.org/abs/2405.02801v2)|**[link](https://github.com/wangtoonaive/mozartstouch)**|\n", "2405.02637": "|**2024-05-04**|**TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants**|Mohammad Aliannejadi et.al.|[2405.02637v1](http://arxiv.org/abs/2405.02637v1)|**[link](https://github.com/irlabamsterdam/iKAT)**|\n", "2405.02421": "|**2024-05-03**|**What does the Knowledge Neuron Thesis Have to do with Knowledge?**|Jingcheng Niu et.al.|[2405.02421v1](http://arxiv.org/abs/2405.02421v1)|**[link](https://github.com/frankniujc/kn_thesis)**|\n", "2405.02363": "|**2024-05-03**|**LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model**|Yulin Luo et.al.|[2405.02363v1](http://arxiv.org/abs/2405.02363v1)|null|\n", "2405.02318": "|**2024-04-18**|**NL2FOL: Translating Natural Language to First-Order Logic for Logical Fallacy Detection**|Abhinav Lalwani et.al.|[2405.02318v1](http://arxiv.org/abs/2405.02318v1)|null|\n", "2405.04515": "|**2024-05-13**|**A Transformer with Stack Attention**|Jiaoda Li et.al.|[2405.04515v2](http://arxiv.org/abs/2405.04515v2)|**[link](https://github.com/rycolab/stack-transformer)**|\n", "2405.03806": "|**2024-05-06**|**In Situ AI Prototyping: Infusing Multimodal Prompts into Mobile Settings with MobileMaker**|Savvas Petridis et.al.|[2405.03806v1](http://arxiv.org/abs/2405.03806v1)|null|\n", "2405.04777": "|**2024-05-08**|**Empathy Through Multimodality in Conversational Interfaces**|Mahyar Abbasian et.al.|[2405.04777v1](http://arxiv.org/abs/2405.04777v1)|null|\n", "2405.04760": "|**2024-05-09**|**Large Language Models for Cyber Security: A Systematic Literature Review**|HanXiang Xu et.al.|[2405.04760v2](http://arxiv.org/abs/2405.04760v2)|null|\n", "2405.05956": "|**2024-05-09**|**Probing Multimodal LLMs as World Models for Driving**|Shiva Sreeram et.al.|[2405.05956v1](http://arxiv.org/abs/2405.05956v1)|**[link](https://github.com/sreeramsa/drivesim)**|\n", "2405.05581": "|**2024-05-09**|**One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations**|Yoonjoo Lee et.al.|[2405.05581v1](http://arxiv.org/abs/2405.05581v1)|null|\n", "2405.05466": "|**2024-05-11**|**Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals**|Joshua Clymer et.al.|[2405.05466v2](http://arxiv.org/abs/2405.05466v2)|null|\n", "2405.06495": "|**2024-05-13**|**Storypark: Leveraging Large Language Models to Enhance Children Story Learning Through Child-AI collaboration Storytelling**|Lyumanshan Ye et.al.|[2405.06495v2](http://arxiv.org/abs/2405.06495v2)|null|\n", "2405.06410": "|**2024-05-10**|**Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL**|Ning Cheng et.al.|[2405.06410v1](http://arxiv.org/abs/2405.06410v1)|null|\n", "2405.06064": "|**2024-05-09**|**LLMs for XAI: Future Directions for Explaining Explanations**|Alexandra Zytek et.al.|[2405.06064v1](http://arxiv.org/abs/2405.06064v1)|null|\n", "2405.07988": "|**2024-05-13**|**A Generalist Learner for Multifaceted Medical Image Interpretation**|Hong-Yu Zhou et.al.|[2405.07988v1](http://arxiv.org/abs/2405.07988v1)|null|\n", "2405.07551": "|**2024-05-13**|**MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning**|Shuo Yin et.al.|[2405.07551v1](http://arxiv.org/abs/2405.07551v1)|null|\n", "2405.07474": "|**2024-05-13**|**Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human Instructions**|Xinglin Chen et.al.|[2405.07474v1](http://arxiv.org/abs/2405.07474v1)|null|\n", "2405.07278": "|**2024-05-12**|**Human-interpretable clustering of short-text using large language models**|Justin K. Miller et.al.|[2405.07278v1](http://arxiv.org/abs/2405.07278v1)|null|\n", "2405.06919": "|**2024-05-11**|**Automating Thematic Analysis: How LLMs Analyse Controversial Topics**|Awais Hameed Khan et.al.|[2405.06919v1](http://arxiv.org/abs/2405.06919v1)|null|\n", "2405.06907": "|**2024-05-21**|**AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI Agents**|Shuyuan Xu et.al.|[2405.06907v2](http://arxiv.org/abs/2405.06907v2)|**[link](https://github.com/agiresearch/openagi)**|\n", "2405.06840": "|**2024-05-10**|**MEIC: Re-thinking RTL Debug Automation using LLMs**|Ke Xu et.al.|[2405.06840v1](http://arxiv.org/abs/2405.06840v1)|null|\n", "2405.06808": "|**2024-05-10**|**Large Language Model in Financial Regulatory Interpretation**|Zhiyu Cao et.al.|[2405.06808v1](http://arxiv.org/abs/2405.06808v1)|null|\n", "2405.06725": "|**2024-05-15**|**On the Shape of Brainscores for Large Language Models (LLMs)**|Jingkai Li et.al.|[2405.06725v3](http://arxiv.org/abs/2405.06725v3)|**[link](https://github.com/GUDHI/TDA-tutorial)**|\n", "2405.06712": "|**2024-05-09**|**Digital Diagnostics: The Potential Of Large Language Models In Recognizing Symptoms Of Common Illnesses**|Gaurav Kumar Gupta et.al.|[2405.06712v1](http://arxiv.org/abs/2405.06712v1)|null|\n", "2405.06703": "|**2024-05-08**|**Interpretable Cross-Examination Technique (ICE-T): Using highly informative features to boost LLM performance**|Goran Muric et.al.|[2405.06703v1](http://arxiv.org/abs/2405.06703v1)|null|\n", "2405.08760": "|**2024-05-14**|**Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs**|Akhila Yerukola et.al.|[2405.08760v1](http://arxiv.org/abs/2405.08760v1)|null|\n", "2405.08468": "|**2024-05-14**|**Challenges and Opportunities in Text Generation Explainability**|Kenza Amara et.al.|[2405.08468v1](http://arxiv.org/abs/2405.08468v1)|null|\n", "2405.08246": "|**2024-05-14**|**Compositional Text-to-Image Generation with Dense Blob Representations**|Weili Nie et.al.|[2405.08246v1](http://arxiv.org/abs/2405.08246v1)|null|\n", "2405.08213": "|**2024-05-13**|**Interpreting Latent Student Knowledge Representations in Programming Assignments**|Nigel Fernandez et.al.|[2405.08213v1](http://arxiv.org/abs/2405.08213v1)|null|\n", "2405.08017": "|**2024-05-11**|**Translating Expert Intuition into Quantifiable Features: Encode Investigator Domain Knowledge via LLM for Enhanced Predictive Analytics**|Phoebe Jing et.al.|[2405.08017v1](http://arxiv.org/abs/2405.08017v1)|null|\n", "2405.10928": "|**2024-05-20**|**The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks**|Lucius Bushnaq et.al.|[2405.10928v2](http://arxiv.org/abs/2405.10928v2)|**[link](https://github.com/apolloresearch/rib)**|\n", "2405.10893": "|**2024-05-17**|**COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain**|Dimitrios P. Panagoulias et.al.|[2405.10893v1](http://arxiv.org/abs/2405.10893v1)|null|\n", "2405.10620": "|**2024-05-17**|**MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains**|Zhaohuan Zhan et.al.|[2405.10620v1](http://arxiv.org/abs/2405.10620v1)|null|\n", "2405.10548": "|**2024-05-20**|**Language Models can Exploit Cross-Task In-context Learning for Data-Scarce Novel Tasks**|Anwoy Chatterjee et.al.|[2405.10548v2](http://arxiv.org/abs/2405.10548v2)|null|\n", "2405.12617": "|**2024-05-21**|**Quantifying Emergence in Large Language Models**|Hang Chen et.al.|[2405.12617v1](http://arxiv.org/abs/2405.12617v1)|**[link](https://github.com/zodiark-ch/emergence-of-llms)**|\n", "2405.12522": "|**2024-05-21**|**Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models**|Charles O'Neill et.al.|[2405.12522v1](http://arxiv.org/abs/2405.12522v1)|null|\n", "2405.12264": "|**2024-05-20**|**Directed Metric Structures arising in Large Language Models**|St\u00e9phane Gaubert et.al.|[2405.12264v1](http://arxiv.org/abs/2405.12264v1)|null|\n", "2405.11928": "|**2024-05-20**|**\"Set It Up!\": Functional Object Arrangement with Compositional Generative Models**|Yiqing Xu et.al.|[2405.11928v1](http://arxiv.org/abs/2405.11928v1)|null|\n", "2405.11891": "|**2024-05-20**|**Unveiling and Manipulating Prompt Influence in Large Language Models**|Zijian Feng et.al.|[2405.11891v1](http://arxiv.org/abs/2405.11891v1)|**[link](https://github.com/zijian678/tdd)**|\n", "2405.11613": "|**2024-05-21**|**Decoding by Contrasting Knowledge: Enhancing LLMs' Confidence on Edited Facts**|Baolong Bi et.al.|[2405.11613v2](http://arxiv.org/abs/2405.11613v2)|**[link](https://github.com/byronbbl/deck)**|\n", "2405.11048": "|**2024-05-17**|**Exploring Subjectivity for more Human-Centric Assessment of Social Biases in Large Language Models**|Paula Akemi Aoyagui et.al.|[2405.11048v1](http://arxiv.org/abs/2405.11048v1)|null|\n", "2405.14612": "|**2024-05-28**|**Explaining Multi-modal Large Language Models by Analyzing their Vision Perception**|Loris Giulivi et.al.|[2405.14612v2](http://arxiv.org/abs/2405.14612v2)|**[link](https://github.com/loris2222/ExplainingMLLMs)**|\n", "2405.14170": "|**2024-05-23**|**Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning**|Jiapu Wang et.al.|[2405.14170v1](http://arxiv.org/abs/2405.14170v1)|null|\n", "2405.13967": "|**2024-05-28**|**DeTox: Toxic Subspace Projection for Model Editing**|Rheeya Uppaal et.al.|[2405.13967v3](http://arxiv.org/abs/2405.13967v3)|**[link](https://github.com/uppaal/detox-edit)**|\n", "2405.13816": "|**2024-05-22**|**Large Language Models are Good Spontaneous Multilingual Learners: Is the Multilingual Annotated Data Necessary?**|Shimao Zhang et.al.|[2405.13816v1](http://arxiv.org/abs/2405.13816v1)|**[link](https://github.com/shimao-zhang/llm-multilingual-learner)**|\n", "2405.13622": "|**2024-05-22**|**Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation**|Gauthier Guinet et.al.|[2405.13622v1](http://arxiv.org/abs/2405.13622v1)|null|\n", "2405.13548": "|**2024-05-24**|**ECLIPSE: Semantic Entropy-LCS for Cross-Lingual Industrial Log Parsing**|Wei Zhang et.al.|[2405.13548v2](http://arxiv.org/abs/2405.13548v2)|null|\n", "2405.13547": "|**2024-05-22**|**HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model**|Mustafa Yildirim et.al.|[2405.13547v1](http://arxiv.org/abs/2405.13547v1)|null|\n", "2405.13245": "|**2024-05-21**|**A Survey of Robotic Language Grounding: Tradeoffs Between Symbols and Embeddings**|Vanya Cohen et.al.|[2405.13245v1](http://arxiv.org/abs/2405.13245v1)|null|\n", "2405.13077": "|**2024-05-21**|**GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation**|Govind Ramesh et.al.|[2405.13077v1](http://arxiv.org/abs/2405.13077v1)|null|\n", "2405.13050": "|**2024-05-19**|**Human-Centered LLM-Agent User Interface: A Position Paper**|Daniel Chin et.al.|[2405.13050v1](http://arxiv.org/abs/2405.13050v1)|null|\n", "2405.13021": "|**2024-05-15**|**IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues**|Diji Yang et.al.|[2405.13021v1](http://arxiv.org/abs/2405.13021v1)|null|\n", "2405.15684": "|**2024-05-24**|**Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models**|Yue Zhang et.al.|[2405.15684v1](http://arxiv.org/abs/2405.15684v1)|null|\n", "2405.15604": "|**2024-05-24**|**Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges**|Jonas Becker et.al.|[2405.15604v1](http://arxiv.org/abs/2405.15604v1)|**[link](https://github.com/jonas-becker/text-generation)**|\n", "2405.15512": "|**2024-05-24**|**ChatGPT Code Detection: Techniques for Uncovering the Source of Code**|Marc Oedingen et.al.|[2405.15512v1](http://arxiv.org/abs/2405.15512v1)|**[link](https://github.com/marcoedingen/ai_code_detection)**|\n", "2405.15383": "|**2024-05-24**|**Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search**|Nicola Dainese et.al.|[2405.15383v1](http://arxiv.org/abs/2405.15383v1)|null|\n", "2405.15370": "|**2024-05-24**|**Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection**|Jun Liu et.al.|[2405.15370v1](http://arxiv.org/abs/2405.15370v1)|null|\n", "2405.15341": "|**2024-05-24**|**V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM**|Abdur Rahman et.al.|[2405.15341v1](http://arxiv.org/abs/2405.15341v1)|null|\n", "2405.15329": "|**2024-05-24**|**Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework**|Minzhi Li et.al.|[2405.15329v1](http://arxiv.org/abs/2405.15329v1)|null|\n", "2405.15307": "|**2024-05-24**|**Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation**|Ge Qu et.al.|[2405.15307v1](http://arxiv.org/abs/2405.15307v1)|**[link](https://github.com/quge2023/TA-SQL)**|\n", "2405.14906": "|**2024-05-23**|**AutoCoder: Enhancing Code Large Language Model with \\textsc{AIEV-Instruct}**|Bin Lei et.al.|[2405.14906v1](http://arxiv.org/abs/2405.14906v1)|**[link](https://github.com/bin123apple/autocoder)**|\n", "2405.17104": "|**2024-05-28**|**LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding**|Haoyu Zhao et.al.|[2405.17104v2](http://arxiv.org/abs/2405.17104v2)|null|\n", "2405.16964": "|**2024-05-27**|**Exploring the LLM Journey from Cognition to Expression with Linear Representations**|Yuzi Yan et.al.|[2405.16964v1](http://arxiv.org/abs/2405.16964v1)|null|\n", "2405.16803": "|**2024-05-27**|**TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing**|Xinyu Zhang et.al.|[2405.16803v1](http://arxiv.org/abs/2405.16803v1)|null|\n", "2405.16714": "|**2024-05-26**|**Crafting Interpretable Embeddings by Asking LLMs Questions**|Vinamra Benara et.al.|[2405.16714v1](http://arxiv.org/abs/2405.16714v1)|**[link](https://github.com/csinva/interpretable-embeddings)**|\n", "2405.16588": "|**2024-05-26**|**Attaining Human`s Desirable Outcomes in Human-AI Interaction via Structural Causal Games**|Anjie Liu et.al.|[2405.16588v1](http://arxiv.org/abs/2405.16588v1)|null|\n", "2405.16450": "|**2024-05-26**|**Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search**|Max Liu et.al.|[2405.16450v1](http://arxiv.org/abs/2405.16450v1)|null|\n", "2405.16405": "|**2024-05-26**|**Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level**|Runlin Lei et.al.|[2405.16405v1](http://arxiv.org/abs/2405.16405v1)|null|\n", "2405.16344": "|**2024-05-25**|**Large Language Models Enable Automated Formative Feedback in Human-Robot Interaction Tasks**|Emily Jensen et.al.|[2405.16344v1](http://arxiv.org/abs/2405.16344v1)|null|\n", "2405.16277": "|**2024-06-03**|**Picturing Ambiguity: A Visual Twist on the Winograd Schema Challenge**|Brendan Park et.al.|[2405.16277v3](http://arxiv.org/abs/2405.16277v3)|**[link](https://github.com/bpark2/winovis)**|\n", "2405.16042": "|**2024-05-25**|**Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention**|Andrew Li et.al.|[2405.16042v1](http://arxiv.org/abs/2405.16042v1)|null|\n", "2405.18380": "|**2024-05-28**|**OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning**|Pengxiang Li et.al.|[2405.18380v1](http://arxiv.org/abs/2405.18380v1)|**[link](https://github.com/pixeli99/owlore)**|\n", "2405.18218": "|**2024-05-28**|**FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models**|Yang Zhang et.al.|[2405.18218v1](http://arxiv.org/abs/2405.18218v1)|null|\n", "2405.18009": "|**2024-05-28**|**Exploring Context Window of Large Language Models via Decomposed Positional Vectors**|Zican Dong et.al.|[2405.18009v1](http://arxiv.org/abs/2405.18009v1)|null|\n", "2405.18004": "|**2024-05-28**|**SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions**|Juexiao Zhou et.al.|[2405.18004v1](http://arxiv.org/abs/2405.18004v1)|null|\n", "2405.17969": "|**2024-05-28**|**Knowledge Circuits in Pretrained Transformers**|Yunzhi Yao et.al.|[2405.17969v1](http://arxiv.org/abs/2405.17969v1)|**[link](https://github.com/zjunlp/knowledgecircuits)**|\n", "2405.17893": "|**2024-05-28**|**Arithmetic Reasoning with LLM: Prolog Generation & Permutation**|Xiaocheng Yang et.al.|[2405.17893v1](http://arxiv.org/abs/2405.17893v1)|null|\n", "2405.17703": "|**2024-05-27**|**Mechanistic Interpretability of Binary and Ternary Transformers**|Jason Li et.al.|[2405.17703v1](http://arxiv.org/abs/2405.17703v1)|**[link](https://github.com/jasonlizhengjian/MI_of_binary_transformers)**|\n", "2405.17670": "|**2024-05-27**|**Deployment of NLP and LLM Techniques to Control Mobile Robots at the Edge: A Case Study Using GPT-4-Turbo and LLaMA 2**|Pascal Sikorski et.al.|[2405.17670v1](http://arxiv.org/abs/2405.17670v1)|null|\n", "2405.17665": "|**2024-05-27**|**Enhanced Robot Arm at the Edge with NLP and Vision Systems**|Pascal Sikorski et.al.|[2405.17665v1](http://arxiv.org/abs/2405.17665v1)|null|\n", "2405.17631": "|**2024-05-27**|**BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments**|Yusuf Roohani et.al.|[2405.17631v1](http://arxiv.org/abs/2405.17631v1)|**[link](https://github.com/snap-stanford/biodiscoveryagent)**|\n", "2405.17490": "|**2024-05-25**|**Revisit, Extend, and Enhance Hessian-Free Influence Functions**|Ziao Yang et.al.|[2405.17490v1](http://arxiv.org/abs/2405.17490v1)|null|\n", "2405.19328": "|**2024-05-29**|**Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation**|Atrisha Sarkar et.al.|[2405.19328v1](http://arxiv.org/abs/2405.19328v1)|null|\n", "2405.19326": "|**2024-05-29**|**Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models**|Tianrun Chen et.al.|[2405.19326v1](http://arxiv.org/abs/2405.19326v1)|null|\n", "2405.19164": "|**2024-05-29**|**Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery**|Sounak Lahiri et.al.|[2405.19164v1](http://arxiv.org/abs/2405.19164v1)|null|\n", "2405.19076": "|**2024-06-02**|**Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design**|Markus J. Buehler et.al.|[2405.19076v2](http://arxiv.org/abs/2405.19076v2)|**[link](https://github.com/lamm-mit/Cephalo-Phi-3-MoE)**|\n", "2405.18741": "|**2024-06-03**|**Genshin: General Shield for Natural Language Processing with Large Language Models**|Xiao Peng et.al.|[2405.18741v2](http://arxiv.org/abs/2405.18741v2)|null|\n", "2405.18672": "|**2024-06-02**|**LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification**|Renyi Qu et.al.|[2405.18672v2](http://arxiv.org/abs/2405.18672v2)|null|\n", "2405.18632": "|**2024-05-28**|**Large Language Models as Partners in Student Essay Evaluation**|Toru Ishida et.al.|[2405.18632v1](http://arxiv.org/abs/2405.18632v1)|null|\n", "2405.20099": "|**2024-05-30**|**Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks**|Chen Xiong et.al.|[2405.20099v1](http://arxiv.org/abs/2405.20099v1)|null|\n", "2405.19850": "|**2024-05-30**|**Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models**|Yuxiao Luo et.al.|[2405.19850v1](http://arxiv.org/abs/2405.19850v1)|null|\n", "2405.19846": "|**2024-05-30**|**Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model**|Chaochen Gao et.al.|[2405.19846v1](http://arxiv.org/abs/2405.19846v1)|null|\n", "2405.19686": "|**2024-05-30**|**Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback**|Jingwei Sun et.al.|[2405.19686v1](http://arxiv.org/abs/2405.19686v1)|null|\n", "2405.20985": "|**2024-05-31**|**DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models**|Linli Yao et.al.|[2405.20985v1](http://arxiv.org/abs/2405.20985v1)|null|\n", "2405.20850": "|**2024-05-31**|**Improving Reward Models with Synthetic Critiques**|Zihuiwen Ye et.al.|[2405.20850v1](http://arxiv.org/abs/2405.20850v1)|null|\n", "2405.20834": "|**2024-05-31**|**Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning**|Cheng Tan et.al.|[2405.20834v1](http://arxiv.org/abs/2405.20834v1)|null|\n", "2405.20612": "|**2024-05-31**|**UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation**|Hanzhang Zhou et.al.|[2405.20612v1](http://arxiv.org/abs/2405.20612v1)|null|\n", "2405.20404": "|**2024-05-30**|**XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution**|Yurui Chang et.al.|[2405.20404v1](http://arxiv.org/abs/2405.20404v1)|null|\n", "2406.02550": "|**2024-06-04**|**Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks**|Tianyu He et.al.|[2406.02550v1](http://arxiv.org/abs/2406.02550v1)|**[link](https://github.com/ablghtianyi/ICL_Modular_Arithmetic)**|\n", "2406.02128": "|**2024-06-04**|**Iteration Head: A Mechanistic Study of Chain-of-Thought**|Vivien Cabannes et.al.|[2406.02128v1](http://arxiv.org/abs/2406.02128v1)|null|\n", "2406.02060": "|**2024-06-04**|**I've got the \"Answer\"! Interpretation of LLMs Hidden States in Question Answering**|Valeriya Goloviznina et.al.|[2406.02060v1](http://arxiv.org/abs/2406.02060v1)|null|\n", "2406.01943": "|**2024-06-04**|**Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs**|Nik Bear Brown et.al.|[2406.01943v1](http://arxiv.org/abs/2406.01943v1)|null|\n", "2406.01931": "|**2024-06-05**|**Dishonesty in Helpful and Harmless Alignment**|Youcheng Huang et.al.|[2406.01931v2](http://arxiv.org/abs/2406.01931v2)|null|\n", "2406.01893": "|**2024-06-21**|**Large Language Model-Enabled Multi-Agent Manufacturing Systems**|Jonghan Lim et.al.|[2406.01893v2](http://arxiv.org/abs/2406.01893v2)|null|\n", "2406.01587": "|**2024-06-04**|**PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning**|Yupeng Zheng et.al.|[2406.01587v2](http://arxiv.org/abs/2406.01587v2)|null|\n", "2406.01563": "|**2024-06-03**|**LoFiT: Localized Fine-tuning on LLM Representations**|Fangcong Yin et.al.|[2406.01563v1](http://arxiv.org/abs/2406.01563v1)|**[link](https://github.com/fc2869/lo-fit)**|\n", "2406.01538": "|**2024-06-20**|**What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores**|Ebrahim Feghhi et.al.|[2406.01538v2](http://arxiv.org/abs/2406.01538v2)|**[link](https://github.com/ebrahimfeghhi/beyond-brainscore)**|\n", "2406.01506": "|**2024-06-03**|**The Geometry of Categorical and Hierarchical Concepts in Large Language Models**|Kiho Park et.al.|[2406.01506v1](http://arxiv.org/abs/2406.01506v1)|**[link](https://github.com/kihopark/llm_categorical_hierarchical_representations)**|\n", "2406.01388": "|**2024-06-11**|**AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation**|Junhao Cheng et.al.|[2406.01388v2](http://arxiv.org/abs/2406.01388v2)|**[link](https://github.com/donahowe/AutoStudio)**|\n", "2406.00974": "|**2024-06-03**|**Large Language Model Assisted Optimal Bidding of BESS in FCAS Market: An AI-agent based Approach**|Borui Zhang et.al.|[2406.00974v1](http://arxiv.org/abs/2406.00974v1)|null|\n", "2406.00965": "|**2024-06-04**|**Efficient Behavior Tree Planning with Commonsense Pruning and Heuristic**|Xinglin Chen et.al.|[2406.00965v2](http://arxiv.org/abs/2406.00965v2)|null|\n", "2406.00799": "|**2024-06-10**|**Are you still on track!? Catching LLM Task Drift with Activations**|Sahar Abdelnabi et.al.|[2406.00799v2](http://arxiv.org/abs/2406.00799v2)|null|\n", "2406.00667": "|**2024-06-02**|**An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging**|Sulaiman Khan et.al.|[2406.00667v1](http://arxiv.org/abs/2406.00667v1)|null|\n", "2406.00656": "|**2024-06-02**|**Presence or Absence: Are Unknown Word Usages in Dictionaries?**|Xianghe Ma et.al.|[2406.00656v1](http://arxiv.org/abs/2406.00656v1)|**[link](https://github.com/xiaohemaikoo/axolotl24-abdn-nlp)**|\n", "2406.00426": "|**2024-06-11**|**InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation**|Jacob Si et.al.|[2406.00426v3](http://arxiv.org/abs/2406.00426v3)|**[link](https://github.com/jacobyhsi/InterpreTabNet)**|\n", "2406.00244": "|**2024-06-01**|**Controlling Large Language Model Agents with Entropic Activation Steering**|Nate Rahn et.al.|[2406.00244v1](http://arxiv.org/abs/2406.00244v1)|null|\n", "2406.03441": "|**2024-06-05**|**Cycles of Thought: Measuring LLM Confidence through Stable Explanations**|Evan Becker et.al.|[2406.03441v1](http://arxiv.org/abs/2406.03441v1)|null|\n", "2406.02962": "|**2024-06-05**|**Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models**|Qiang Sun et.al.|[2406.02962v1](http://arxiv.org/abs/2406.02962v1)|**[link](https://github.com/AI4WA/Docs2KG)**|\n", "2406.02847": "|**2024-06-06**|**Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers**|Brian K Chen et.al.|[2406.02847v2](http://arxiv.org/abs/2406.02847v2)|null|\n", "2406.04344": "|**2024-06-06**|**Verbalized Machine Learning: Revisiting Machine Learning with Language Models**|Tim Z. Xiao et.al.|[2406.04344v1](http://arxiv.org/abs/2406.04344v1)|null|\n", "2406.04278": "|**2024-06-06**|**Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People**|Dun-Ming Huang et.al.|[2406.04278v1](http://arxiv.org/abs/2406.04278v1)|**[link](https://github.com/jacobyn/SamplingTonesACL)**|\n", "2406.04136": "|**2024-06-06**|**Legal Judgment Reimagined: PredEx and the Rise of Intelligent AI Interpretation in Indian Courts**|Shubham Kumar Nigam et.al.|[2406.04136v1](http://arxiv.org/abs/2406.04136v1)|**[link](https://github.com/shubhamkumarnigam/predex)**|\n", "2406.03718": "|**2024-06-06**|**Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning**|Xiaohu Du et.al.|[2406.03718v1](http://arxiv.org/abs/2406.03718v1)|**[link](https://github.com/CGCL-codes/VulLLM)**|\n", "2406.03589": "|**2024-06-13**|**Ranking Manipulation for Conversational Search Engines**|Samuel Pfrommer et.al.|[2406.03589v2](http://arxiv.org/abs/2406.03589v2)|**[link](https://github.com/spfrommer/ranking_manipulation_data_pipeline)**|\n", "2406.03505": "|**2024-06-04**|**Dynamic and Adaptive Feature Generation with LLM**|Xinhao Zhang et.al.|[2406.03505v1](http://arxiv.org/abs/2406.03505v1)|null|\n", "2406.05107": "|**2024-06-07**|**LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration**|Tavor Lipman et.al.|[2406.05107v1](http://arxiv.org/abs/2406.05107v1)|null|\n", "2406.04927": "|**2024-06-07**|**LLM-based speaker diarization correction: A generalizable approach**|Georgios Efstathiadis et.al.|[2406.04927v1](http://arxiv.org/abs/2406.04927v1)|**[link](https://github.com/GeorgeEfstathiadis/LLM-Diarize-ASR-Agnostic)**|\n", "2406.04926": "|**2024-06-07**|**Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models**|Micha\u0142 Romaszewski et.al.|[2406.04926v1](http://arxiv.org/abs/2406.04926v1)|null|\n", "2406.04770": "|**2024-06-07**|**WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild**|Bill Yuchen Lin et.al.|[2406.04770v1](http://arxiv.org/abs/2406.04770v1)|**[link](https://github.com/allenai/wildbench)**|\n", "2406.04687": "|**2024-06-07**|**LogiCode: an LLM-Driven Framework for Logical Anomaly Detection**|Yiheng Zhang et.al.|[2406.04687v1](http://arxiv.org/abs/2406.04687v1)|**[link](https://github.com/22strongestme/LOCO-Annotations)**|\n", "2406.04638": "|**2024-06-07**|**Large Language Model-guided Document Selection**|Xiang Kong et.al.|[2406.04638v1](http://arxiv.org/abs/2406.04638v1)|null|\n", "2406.04598": "|**2024-06-07**|**OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and Evaluation Framework**|Wei Zhou et.al.|[2406.04598v1](http://arxiv.org/abs/2406.04598v1)|null|\n", "2406.04449": "|**2024-06-06**|**MAIRA-2: Grounded Radiology Report Generation**|Shruthi Bannur et.al.|[2406.04449v1](http://arxiv.org/abs/2406.04449v1)|null|\n", "2406.04370": "|**2024-06-01**|**Large Language Model Confidence Estimation via Black-Box Access**|Tejaswini Pedapati et.al.|[2406.04370v1](http://arxiv.org/abs/2406.04370v1)|null|\n", "2406.06464": "|**2024-06-11**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464v2](http://arxiv.org/abs/2406.06464v2)|null|\n", "2406.06382": "|**2024-06-10**|**Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization**|Yi Gu et.al.|[2406.06382v1](http://arxiv.org/abs/2406.06382v1)|**[link](https://github.com/yigu1008/diffusion-rpo)**|\n", "2406.06357": "|**2024-06-10**|**MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows**|Xingjian Zhang et.al.|[2406.06357v1](http://arxiv.org/abs/2406.06357v1)|**[link](https://github.com/xingjian-zhang/massw)**|\n", "2406.06211": "|**2024-06-11**|**iMotion-LLM: Motion Prediction Instruction Tuning**|Abdulwahab Felemban et.al.|[2406.06211v2](http://arxiv.org/abs/2406.06211v2)|null|\n", "2406.05968": "|**2024-06-10**|**Prompting Large Language Models with Audio for General-Purpose Speech Summarization**|Wonjune Kang et.al.|[2406.05968v1](http://arxiv.org/abs/2406.05968v1)|**[link](https://github.com/wonjune-kang/llm-speech-summarization)**|\n", "2406.05543": "|**2024-06-08**|**VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification**|Jianmeng Liu et.al.|[2406.05543v1](http://arxiv.org/abs/2406.05543v1)|null|\n", "2406.05344": "|**2024-06-08**|**MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention**|Prince Jha et.al.|[2406.05344v1](http://arxiv.org/abs/2406.05344v1)|**[link](https://github.com/Jhaprince/MemeGuard)**|\n", "2406.07457": "|**2024-06-11**|**Estimating the Hallucination Rate of Generative AI**|Andrew Jesson et.al.|[2406.07457v1](http://arxiv.org/abs/2406.07457v1)|null|\n", "2406.07353": "|**2024-06-11**|**Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities**|Delfina Sol Martinez Pandiani et.al.|[2406.07353v1](http://arxiv.org/abs/2406.07353v1)|**[link](https://github.com/delfimpandiani/toxic_memes)**|\n", "2406.07296": "|**2024-06-11**|**Instruct Large Language Models to Drive like Humans**|Ruijun Zhang et.al.|[2406.07296v1](http://arxiv.org/abs/2406.07296v1)|**[link](https://github.com/bonbon-rj/instructdriver)**|\n", "2406.06657": "|**2024-06-10**|**Harnessing AI for efficient analysis of complex policy documents: a case study of Executive Order 14110**|Mark A. Kramer et.al.|[2406.06657v1](http://arxiv.org/abs/2406.06657v1)|null|\n", "2406.06637": "|**2024-06-09**|**Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering**|Saman Pordanesh et.al.|[2406.06637v1](http://arxiv.org/abs/2406.06637v1)|null|\n", "2406.06636": "|**2024-06-09**|**LLM Questionnaire Completion for Automatic Psychiatric Assessment**|Gony Rosenman et.al.|[2406.06636v1](http://arxiv.org/abs/2406.06636v1)|null|\n", "2406.06621": "|**2024-06-07**|**LinkQ: An LLM-Assisted Visual Interface for Knowledge Graph Question-Answering**|Harry Li et.al.|[2406.06621v1](http://arxiv.org/abs/2406.06621v1)|**[link](https://github.com/mit-ll/linkq)**|\n", "2406.06606": "|**2024-06-06**|**Prototypical Reward Network for Data-Efficient RLHF**|Jinghan Zhang et.al.|[2406.06606v1](http://arxiv.org/abs/2406.06606v1)|null|\n", "2406.06579": "|**2024-06-13**|**From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models**|Xiaofeng Zhang et.al.|[2406.06579v2](http://arxiv.org/abs/2406.06579v2)|null|\n", "2406.06576": "|**2024-06-18**|**OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step**|Owen Dugan et.al.|[2406.06576v2](http://arxiv.org/abs/2406.06576v2)|null|\n", "2406.06560": "|**2024-06-02**|**Inverse Constitutional AI: Compressing Preferences into Principles**|Arduin Findeis et.al.|[2406.06560v1](http://arxiv.org/abs/2406.06560v1)|**[link](https://github.com/rdnfn/icai)**|\n", "2406.08246": "|**2024-06-12**|**Leveraging Large Language Models for Web Scraping**|Aman Ahluwalia et.al.|[2406.08246v1](http://arxiv.org/abs/2406.08246v1)|null|\n", "2406.08080": "|**2024-06-12**|**AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection**|Pia Pachinger et.al.|[2406.08080v1](http://arxiv.org/abs/2406.08080v1)|null|\n", "2406.08074": "|**2024-06-12**|**A Concept-Based Explainability Framework for Large Multimodal Models**|Jayneel Parekh et.al.|[2406.08074v1](http://arxiv.org/abs/2406.08074v1)|null|\n", "2406.07962": "|**2024-06-12**|**Toward a Method to Generate Capability Ontologies from Natural Language Descriptions**|Luis Miguel Vieira da Silva et.al.|[2406.07962v1](http://arxiv.org/abs/2406.07962v1)|null|\n", "2406.08572": "|**2024-06-12**|**LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions**|Nhat Hoang-Xuan et.al.|[2406.08572v1](http://arxiv.org/abs/2406.08572v1)|null|\n", "2406.08527": "|**2024-06-12**|**Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning**|Jaehyun Nam et.al.|[2406.08527v1](http://arxiv.org/abs/2406.08527v1)|null|\n", "2406.10101": "|**2024-06-17**|**Requirements are All You Need: From Requirements to Code with LLMs**|Bingyang Wei et.al.|[2406.10101v2](http://arxiv.org/abs/2406.10101v2)|**[link](https://github.com/Washingtonwei/software-engineer-gpt)**|\n", "2406.10091": "|**2024-06-14**|**Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation**|Xiaoman Wang et.al.|[2406.10091v1](http://arxiv.org/abs/2406.10091v1)|null|\n", "2406.09671": "|**2024-06-14**|**Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam**|Nabor C. Mendon\u00e7a et.al.|[2406.09671v1](http://arxiv.org/abs/2406.09671v1)|**[link](https://github.com/nabormendonca/gpt-4v-enade-cs-2021)**|\n", "2406.11813": "|**2024-06-17**|**How Do Large Language Models Acquire Factual Knowledge During Pretraining?**|Hoyeon Chang et.al.|[2406.11813v1](http://arxiv.org/abs/2406.11813v1)|null|\n", "2406.11346": "|**2024-06-17**|**WaDec: Decompile WebAssembly Using Large Language Model**|Xinyu She et.al.|[2406.11346v1](http://arxiv.org/abs/2406.11346v1)|null|\n", "2406.11250": "|**2024-06-17**|**Can Machines Resonate with Humans? Evaluating the Emotional and Empathic Comprehension of LMs**|Muhammad Arslan Manzoor et.al.|[2406.11250v1](http://arxiv.org/abs/2406.11250v1)|null|\n", "2406.11231": "|**2024-06-17**|**Enabling robots to follow abstract instructions and complete complex dynamic tasks**|Ruaridh Mon-Williams et.al.|[2406.11231v1](http://arxiv.org/abs/2406.11231v1)|null|\n", "2406.11227": "|**2024-06-17**|**Compound Schema Registry**|Silvery D. Fu et.al.|[2406.11227v1](http://arxiv.org/abs/2406.11227v1)|null|\n", "2406.11193": "|**2024-06-17**|**MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model**|Jiahao Huo et.al.|[2406.11193v1](http://arxiv.org/abs/2406.11193v1)|null|\n", "2406.11156": "|**2024-06-18**|**DELRec: Distilling Sequential Pattern to Enhance LLM-based Recommendation**|Guohao Sun et.al.|[2406.11156v2](http://arxiv.org/abs/2406.11156v2)|null|\n", "2406.11096": "|**2024-07-01**|**The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models**|Bolei Ma et.al.|[2406.11096v2](http://arxiv.org/abs/2406.11096v2)|null|\n", "2406.10985": "|**2024-06-16**|**Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens**|Weiyao Luo et.al.|[2406.10985v1](http://arxiv.org/abs/2406.10985v1)|null|\n", "2406.10958": "|**2024-06-18**|**City-LEO: Toward Transparent City Management Using LLM with End-to-End Optimization**|Zihao Jiao et.al.|[2406.10958v2](http://arxiv.org/abs/2406.10958v2)|null|\n", "2406.10552": "|**2024-06-28**|**Large Language Model Enhanced Clustering for News Event Detection**|Adane Nega Tarekegn et.al.|[2406.10552v3](http://arxiv.org/abs/2406.10552v3)|null|\n", "2406.05794": "|**2024-06-16**|**RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation**|Kiseung Kim et.al.|[2406.05794v2](http://arxiv.org/abs/2406.05794v2)|null|\n", "2406.12845": "|**2024-06-18**|**Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts**|Haoxiang Wang et.al.|[2406.12845v1](http://arxiv.org/abs/2406.12845v1)|**[link](https://github.com/RLHFlow/RLHF-Reward-Modeling)**|\n", "2406.12793": "|**2024-06-18**|**ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools**|Team GLM et.al.|[2406.12793v1](http://arxiv.org/abs/2406.12793v1)|**[link](https://github.com/thudm/chatglm-6b)**|\n", "2406.12784": "|**2024-06-18**|**UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions**|Xunzhi Wang et.al.|[2406.12784v1](http://arxiv.org/abs/2406.12784v1)|**[link](https://github.com/Cyno2232/UBENCH)**|\n", "2406.12742": "|**2024-06-18**|**Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning**|Bingchen Zhao et.al.|[2406.12742v1](http://arxiv.org/abs/2406.12742v1)|**[link](https://github.com/dtennant/mirb_eval)**|\n", "2406.12719": "|**2024-06-18**|**On the Robustness of Language Models for Tabular Question Answering**|Kushal Raj Bhandari et.al.|[2406.12719v1](http://arxiv.org/abs/2406.12719v1)|null|\n", "2406.12707": "|**2024-06-18**|**Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction**|Haoqiu Yan et.al.|[2406.12707v1](http://arxiv.org/abs/2406.12707v1)|**[link](https://github.com/haoqiu-yan/perceptiveagent)**|\n", "2406.12692": "|**2024-06-18**|**MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL**|Arian Askari et.al.|[2406.12692v1](http://arxiv.org/abs/2406.12692v1)|null|\n", "2406.12673": "|**2024-06-18**|**Estimating Knowledge in Large Language Models Without Generating a Single Token**|Daniela Gottesman et.al.|[2406.12673v1](http://arxiv.org/abs/2406.12673v1)|null|\n", "2406.12651": "|**2024-06-18**|**Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics**|Huan Xu et.al.|[2406.12651v1](http://arxiv.org/abs/2406.12651v1)|null|\n", "2406.12649": "|**2024-06-19**|**Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models**|Hengyi Wang et.al.|[2406.12649v2](http://arxiv.org/abs/2406.12649v2)|null|\n", "2406.12572": "|**2024-06-19**|**Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models**|Eldar Kurtic et.al.|[2406.12572v2](http://arxiv.org/abs/2406.12572v2)|**[link](https://github.com/ist-daslab/mathador-lm)**|\n", "2406.12529": "|**2024-06-18**|**LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation**|Yuhao Wang et.al.|[2406.12529v1](http://arxiv.org/abs/2406.12529v1)|null|\n", "2406.12347": "|**2024-06-18**|**Interpreting Bias in Large Language Models: A Feature-Based Approach**|Nirmalendu Prakash et.al.|[2406.12347v1](http://arxiv.org/abs/2406.12347v1)|null|\n", "2406.12255": "|**2024-06-18**|**A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning**|Lijie Hu et.al.|[2406.12255v1](http://arxiv.org/abs/2406.12255v1)|null|\n", "2406.12235": "|**2024-06-29**|**Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM**|Huaxin Zhang et.al.|[2406.12235v2](http://arxiv.org/abs/2406.12235v2)|**[link](https://github.com/pipixin321/holmesvad)**|\n", "2406.12227": "|**2024-06-24**|**Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector**|Gangwei Jiang et.al.|[2406.12227v2](http://arxiv.org/abs/2406.12227v2)|null|\n", "2406.12069": "|**2024-06-17**|**Satyrn: A Platform for Analytics Augmented Generation**|Marko Sterbentz et.al.|[2406.12069v1](http://arxiv.org/abs/2406.12069v1)|null|\n", "2406.12044": "|**2024-06-17**|**ARTIST: Improving the Generation of Text-rich Images by Disentanglement**|Jianyi Zhang et.al.|[2406.12044v1](http://arxiv.org/abs/2406.12044v1)|null|\n", "2406.12034": "|**2024-06-17**|**Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts**|Junmo Kang et.al.|[2406.12034v1](http://arxiv.org/abs/2406.12034v1)|null|\n", "2406.14556": "|**2024-06-21**|**Asynchronous Large Language Model Enhanced Planner for Autonomous Driving**|Yuan Chen et.al.|[2406.14556v2](http://arxiv.org/abs/2406.14556v2)|null|\n", "2406.14498": "|**2024-06-20**|**LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors**|Sheikh Asif Imran et.al.|[2406.14498v1](http://arxiv.org/abs/2406.14498v1)|**[link](https://github.com/bashlab/llasa)**|\n", "2406.14335": "|**2024-06-20**|**Self-supervised Interpretable Concept-based Models for Text Classification**|Francesco De Santis et.al.|[2406.14335v1](http://arxiv.org/abs/2406.14335v1)|null|\n", "2406.14307": "|**2024-07-01**|**QuST-LLM: Integrating Large Language Models for Comprehensive Spatial Transcriptomics Analysis**|Chao Hui Huang et.al.|[2406.14307v2](http://arxiv.org/abs/2406.14307v2)|**[link](https://github.com/huangch/qust)**|\n", "2406.14167": "|**2024-06-20**|**Definition generation for lexical semantic change detection**|Mariia Fedorova et.al.|[2406.14167v1](http://arxiv.org/abs/2406.14167v1)|**[link](https://github.com/ltgoslo/Definition-generation-for-LSCD)**|\n", "2406.14144": "|**2024-06-20**|**Finding Safety Neurons in Large Language Models**|Jianhui Chen et.al.|[2406.14144v1](http://arxiv.org/abs/2406.14144v1)|null|\n", "2406.13858": "|**2024-06-19**|**Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning**|Yuval Shalev et.al.|[2406.13858v1](http://arxiv.org/abs/2406.13858v1)|null|\n", "2406.13626": "|**2024-06-19**|**Fine-Tuning Gemma-7B for Enhanced Sentiment Analysis of Financial News Headlines**|Kangtong Mo et.al.|[2406.13626v1](http://arxiv.org/abs/2406.13626v1)|null|\n", "2406.13444": "|**2024-06-27**|**VDebugger: Harnessing Execution Feedback for Debugging Visual Programs**|Xueqing Wu et.al.|[2406.13444v2](http://arxiv.org/abs/2406.13444v2)|**[link](https://github.com/shirley-wu/vdebugger)**|\n", "2406.13439": "|**2024-06-19**|**Finding Blind Spots in Evaluator LLMs with Interpretable Checklists**|Sumanth Doddapaneni et.al.|[2406.13439v1](http://arxiv.org/abs/2406.13439v1)|**[link](https://github.com/ai4bharat/fbi)**|\n", "2406.13236": "|**2024-06-19**|**Data Contamination Can Cross Language Barriers**|Feng Yao et.al.|[2406.13236v1](http://arxiv.org/abs/2406.13236v1)|**[link](https://github.com/shangdatalab/deep-contam)**|\n", "2406.13184": "|**2024-06-19**|**Locating and Extracting Relational Concepts in Large Language Models**|Zijian Wang et.al.|[2406.13184v1](http://arxiv.org/abs/2406.13184v1)|**[link](https://github.com/Zijian007/Locate_Extract_Relation)**|\n", "2406.13163": "|**2024-06-19**|**LLMatDesign: Autonomous Materials Discovery with Large Language Models**|Shuyi Jia et.al.|[2406.13163v1](http://arxiv.org/abs/2406.13163v1)|null|\n", "2406.15227": "|**2024-06-21**|**A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation**|Irune Zubiaga et.al.|[2406.15227v1](http://arxiv.org/abs/2406.15227v1)|null|\n", "2406.15214": "|**2024-06-21**|**Unsupervised Extraction of Dialogue Policies from Conversations**|Makesh Narsimhan Sreedhar et.al.|[2406.15214v1](http://arxiv.org/abs/2406.15214v1)|null|\n", "2406.16833": "|**2024-06-24**|**USDC: A Dataset of $\\underline{U}$ser $\\underline{S}$tance and $\\underline{D}$ogmatism in Long $\\underline{C}$onversations**|Mounika Marreddy et.al.|[2406.16833v1](http://arxiv.org/abs/2406.16833v1)|null|\n", "2406.16748": "|**2024-06-24**|**OCALM: Object-Centric Assessment with Language Models**|Timo Kaufmann et.al.|[2406.16748v1](http://arxiv.org/abs/2406.16748v1)|null|\n", "2406.16442": "|**2024-06-29**|**EmoLLM: Multimodal Emotional Understanding Meets Large Language Models**|Qu Yang et.al.|[2406.16442v2](http://arxiv.org/abs/2406.16442v2)|**[link](https://github.com/yan9qu/emollm)**|\n", "2406.16252": "|**2024-06-25**|**Graph-Augmented LLMs for Personalized Health Insights: A Case Study in Sleep Analysis**|Ajan Subramanian et.al.|[2406.16252v2](http://arxiv.org/abs/2406.16252v2)|null|\n", "2406.16235": "|**2024-06-23**|**Preference Tuning For Toxicity Mitigation Generalizes Across Languages**|Xiaochen Li et.al.|[2406.16235v1](http://arxiv.org/abs/2406.16235v1)|**[link](https://github.com/batsresearch/cross-lingual-detox)**|\n", "2406.16093": "|**2024-06-23**|**Towards Natural Language-Driven Assembly Using Foundation Models**|Omkar Joglekar et.al.|[2406.16093v1](http://arxiv.org/abs/2406.16093v1)|null|\n", "2406.16033": "|**2024-06-23**|**Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models**|Tianyi Men et.al.|[2406.16033v1](http://arxiv.org/abs/2406.16033v1)|null|\n", "2406.16020": "|**2024-06-25**|**AudioBench: A Universal Benchmark for Audio Large Language Models**|Bin Wang et.al.|[2406.16020v2](http://arxiv.org/abs/2406.16020v2)|**[link](https://github.com/audiollms/audiobench)**|\n", "2406.15996": "|**2024-06-23**|**Memorizing Documents with Guidance in Large Language Models**|Bumjin Park et.al.|[2406.15996v1](http://arxiv.org/abs/2406.15996v1)|null|\n", "2406.15859": "|**2024-06-30**|**LLM-Powered Explanations: Unraveling Recommendations Through Subgraph Reasoning**|Guangsi Shi et.al.|[2406.15859v2](http://arxiv.org/abs/2406.15859v2)|null|\n", "2406.15781": "|**2024-06-22**|**DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models**|Wei Guan et.al.|[2406.15781v1](http://arxiv.org/abs/2406.15781v1)|**[link](https://github.com/guanwei49/DABL)**|\n", "2406.15768": "|**2024-06-22**|**MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception**|Guanqun Wang et.al.|[2406.15768v1](http://arxiv.org/abs/2406.15768v1)|null|\n", "2406.15627": "|**2024-06-21**|**Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph**|Roman Vashurin et.al.|[2406.15627v1](http://arxiv.org/abs/2406.15627v1)|null|\n", "2406.15504": "|**2024-06-19**|**Dr.E Bridges Graphs with Large Language Models through Words**|Zipeng Liu et.al.|[2406.15504v1](http://arxiv.org/abs/2406.15504v1)|null|\n", "2406.17642": "|**2024-06-25**|**Banishing LLM Hallucinations Requires Rethinking Generalization**|Johnny Li et.al.|[2406.17642v1](http://arxiv.org/abs/2406.17642v1)|null|\n", "2406.17224": "|**2024-06-25**|**Large Language Models are Interpretable Learners**|Ruochen Wang et.al.|[2406.17224v1](http://arxiv.org/abs/2406.17224v1)|**[link](https://github.com/ruocwang/llm-symbolic-program)**|\n", "2406.17055": "|**2024-07-01**|**Large Language Models Assume People are More Rational than We Really are**|Ryan Liu et.al.|[2406.17055v2](http://arxiv.org/abs/2406.17055v2)|**[link](https://github.com/theryanl/llm-rationality)**|\n", "2406.16985": "|**2024-06-23**|**Unveiling LLM Mechanisms Through Neural ODEs and Control Theory**|Yukun Zhang et.al.|[2406.16985v1](http://arxiv.org/abs/2406.16985v1)|null|\n", "2406.18365": "|**2024-06-26**|**Themis: Towards Flexible and Interpretable NLG Evaluation**|Xinyu Hu et.al.|[2406.18365v1](http://arxiv.org/abs/2406.18365v1)|**[link](https://github.com/PKU-ONELab/Themis)**|\n", "2406.18346": "|**2024-06-26**|**AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations**|Adam Dahlgren Lindstr\u00f6m et.al.|[2406.18346v1](http://arxiv.org/abs/2406.18346v1)|null|\n", "2406.18075": "|**2024-06-26**|**A Context-Driven Approach for Co-Auditing Smart Contracts with The Support of GPT-4 code interpreter**|Mohamed Salah Bouafif et.al.|[2406.18075v1](http://arxiv.org/abs/2406.18075v1)|null|\n", "2406.18039": "|**2024-06-26**|**Diagnosis Assistant for Liver Cancer Utilizing a Large Language Model with Three Types of Knowledge**|Xuzhou Wu et.al.|[2406.18039v1](http://arxiv.org/abs/2406.18039v1)|null|\n", "2406.18027": "|**2024-06-26**|**Automated Clinical Data Extraction with Knowledge Conditioned LLMs**|Diya Li et.al.|[2406.18027v1](http://arxiv.org/abs/2406.18027v1)|null|\n", "2406.17969": "|**2024-06-25**|**Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective**|Hanqi Yan et.al.|[2406.17969v1](http://arxiv.org/abs/2406.17969v1)|null|\n", "2406.17873": "|**2024-06-25**|**Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback**|Zhongtao Miao et.al.|[2406.17873v1](http://arxiv.org/abs/2406.17873v1)|**[link](https://github.com/gpgg/art)**|\n", "2406.17840": "|**2024-06-25**|**Human-Object Interaction from Human-Level Instructions**|Zhen Wu et.al.|[2406.17840v1](http://arxiv.org/abs/2406.17840v1)|null|\n", "2406.16801": "|**2024-06-25**|**RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale**|Beck LaBash et.al.|[2406.16801v2](http://arxiv.org/abs/2406.16801v2)|**[link](https://github.com/qurrent-ai/res-q)**|\n", "2406.17806": "|**2024-06-22**|**MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?**|Xirui Li et.al.|[2406.17806v1](http://arxiv.org/abs/2406.17806v1)|null|\n", "2406.19356": "|**2024-06-27**|**DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions**|Nigel Fernandez et.al.|[2406.19356v1](http://arxiv.org/abs/2406.19356v1)|null|\n", "2406.19263": "|**2024-06-27**|**Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding**|Yue Fan et.al.|[2406.19263v1](http://arxiv.org/abs/2406.19263v1)|**[link](https://github.com/eric-ai-lab/Screen-Point-and-Read)**|\n", "2406.19121": "|**2024-06-27**|**Towards Learning Abductive Reasoning using VSA Distributed Representations**|Giacomo Camposampiero et.al.|[2406.19121v1](http://arxiv.org/abs/2406.19121v1)|**[link](https://github.com/ibm/abductive-rule-learner-with-context-awareness)**|\n", "2406.18873": "|**2024-06-27**|**LayoutCopilot: An LLM-powered Multi-agent Collaborative Framework for Interactive Analog Layout Design**|Bingyang Liu et.al.|[2406.18873v1](http://arxiv.org/abs/2406.18873v1)|null|\n", "2406.18871": "|**2024-06-27**|**DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment**|Ke-Han Lu et.al.|[2406.18871v1](http://arxiv.org/abs/2406.18871v1)|null|\n", "2406.18825": "|**2024-06-27**|**ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation**|Jizheng Chen et.al.|[2406.18825v1](http://arxiv.org/abs/2406.18825v1)|null|\n", "2406.18762": "|**2024-06-26**|**Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism**|Shi Zong et.al.|[2406.18762v1](http://arxiv.org/abs/2406.18762v1)|null|\n", "2406.18746": "|**2024-07-15**|**Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models**|Georgios Tziafas et.al.|[2406.18746v2](http://arxiv.org/abs/2406.18746v2)|null|\n", "2406.20079": "|**2024-06-28**|**Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification**|Anisha Gunjal et.al.|[2406.20079v1](http://arxiv.org/abs/2406.20079v1)|**[link](https://github.com/anisha2102/molecular_facts)**|\n", "2406.19760": "|**2024-06-28**|**Learning Interpretable Legal Case Retrieval via Knowledge-Guided Case Reformulation**|Chenlong Deng et.al.|[2406.19760v1](http://arxiv.org/abs/2406.19760v1)|**[link](https://github.com/ChenlongDeng/KELLER)**|\n", "2406.19578": "|**2024-06-27**|**PathAlign: A vision-language model for whole slide images in histopathology**|Faruk Ahmed et.al.|[2406.19578v1](http://arxiv.org/abs/2406.19578v1)|null|\n", "2407.03008": "|**2024-07-03**|**Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering**|Zhaohe Liao et.al.|[2407.03008v1](http://arxiv.org/abs/2407.03008v1)|null|\n", "2407.02964": "|**2024-07-03**|**FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering**|Xiaochen Wang et.al.|[2407.02964v1](http://arxiv.org/abs/2407.02964v1)|null|\n", "2407.02791": "|**2024-07-03**|**Model-Enhanced LLM-Driven VUI Testing of VPA Apps**|Suwan Li et.al.|[2407.02791v1](http://arxiv.org/abs/2407.02791v1)|null|\n", "2407.01892": "|**2024-07-02**|**GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning**|Zhisheng Tang et.al.|[2407.01892v1](http://arxiv.org/abs/2407.01892v1)|**[link](https://github.com/jasontangzs0/GRASP)**|\n", "2407.01489": "|**2024-07-01**|**Agentless: Demystifying LLM-based Software Engineering Agents**|Chunqiu Steven Xia et.al.|[2407.01489v1](http://arxiv.org/abs/2407.01489v1)|**[link](https://github.com/OpenAutoCoder/Agentless)**|\n", "2407.01358": "|**2024-07-01**|**Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models**|Xiaolin Xing et.al.|[2407.01358v1](http://arxiv.org/abs/2407.01358v1)|**[link](https://github.com/Xingxl2studious/Cross-lingual-Consistency)**|\n", "2407.01122": "|**2024-07-01**|**Calibrated Large Language Models for Binary Question Answering**|Patrizio Giovannotti et.al.|[2407.01122v1](http://arxiv.org/abs/2407.01122v1)|null|\n", "2407.01067": "|**2024-07-01**|**Human-like object concept representations emerge naturally in multimodal large language models**|Changde Du et.al.|[2407.01067v1](http://arxiv.org/abs/2407.01067v1)|null|\n", "2407.00904": "|**2024-07-01**|**Background-aware Multi-source Fusion Financial Trend Forecasting Mechanism**|Fengting Mo et.al.|[2407.00904v1](http://arxiv.org/abs/2407.00904v1)|null|\n", "2407.00365": "|**2024-06-29**|**Financial Knowledge Large Language Model**|Cehao Yang et.al.|[2407.00365v1](http://arxiv.org/abs/2407.00365v1)|null|\n", "2407.01627": "|**2024-06-29**|**Potential Renovation of Information Search Process with the Power of Large Language Model for Healthcare**|Forhan Bin Emdad et.al.|[2407.01627v1](http://arxiv.org/abs/2407.01627v1)|null|\n", "2407.00322": "|**2024-06-29**|**LLM-Generated Natural Language Meets Scaling Laws: New Explorations and Data Augmentation Methods**|Zhenhua Wang et.al.|[2407.00322v1](http://arxiv.org/abs/2407.00322v1)|null|\n", "2407.02524": "|**2024-06-27**|**Meta Large Language Model Compiler: Foundation Models of Compiler Optimization**|Chris Cummins et.al.|[2407.02524v1](http://arxiv.org/abs/2407.02524v1)|null|\n", "2407.00121": "|**2024-06-27**|**Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks**|Ibrahim Abdelaziz et.al.|[2407.00121v1](http://arxiv.org/abs/2407.00121v1)|null|\n", "2407.02518": "|**2024-06-23**|**INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness**|Hung Le et.al.|[2407.02518v1](http://arxiv.org/abs/2407.02518v1)|null|\n", "2407.00065": "|**2024-06-17**|**A Personalised Learning Tool for Physics Undergraduate Students Built On a Large Language Model for Symbolic Regression**|Yufan Zhu et.al.|[2407.00065v1](http://arxiv.org/abs/2407.00065v1)|null|\n", "2407.04346": "|**2024-07-05**|**MobileFlow: A Multimodal LLM For Mobile GUI Agent**|Songqin Nong et.al.|[2407.04346v1](http://arxiv.org/abs/2407.04346v1)|null|\n", "2407.04307": "|**2024-07-05**|**Crafting Large Language Models for Enhanced Interpretability**|Chung-En Sun et.al.|[2407.04307v1](http://arxiv.org/abs/2407.04307v1)|null|\n", "2407.04078": "|**2024-07-17**|**DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning**|Chengpeng Li et.al.|[2407.04078v3](http://arxiv.org/abs/2407.04078v3)|**[link](https://github.com/chengpengli1003/dotamath)**|\n", "2407.04069": "|**2024-07-04**|**A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations**|Md Tahmid Rahman Laskar et.al.|[2407.04069v1](http://arxiv.org/abs/2407.04069v1)|null|\n", "2407.04067": "|**2024-07-04**|**Semantic Graphs for Syntactic Simplification: A Revisit from the Age of LLM**|Peiran Yao et.al.|[2407.04067v1](http://arxiv.org/abs/2407.04067v1)|**[link](https://github.com/U-Alberta/AMRS3)**|\n", "2407.04020": "|**2024-07-15**|**LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking**|Amy Xin et.al.|[2407.04020v2](http://arxiv.org/abs/2407.04020v2)|**[link](https://github.com/THU-KEG/LLMAEL)**|\n", "2407.03640": "|**2024-07-04**|**Generative Technology for Human Emotion Recognition: A Scope Review**|Fei Ma et.al.|[2407.03640v1](http://arxiv.org/abs/2407.03640v1)|null|\n", "2407.03621": "|**2024-07-04**|**The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model**|Brenden Smith et.al.|[2407.03621v1](http://arxiv.org/abs/2407.03621v1)|**[link](https://github.com/dragnlabs/injectable-alignment-model)**|\n", "2407.07810": "|**2024-07-10**|**Transformer Alignment in Large Language Models**|Murdock Aubry et.al.|[2407.07810v1](http://arxiv.org/abs/2407.07810v1)|null|\n", "2407.07330": "|**2024-07-10**|**Interpretable Differential Diagnosis with Dual-Inference Large Language Models**|Shuang Zhou et.al.|[2407.07330v1](http://arxiv.org/abs/2407.07330v1)|null|\n", "2407.07196": "|**2024-07-09**|**Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges**|Emilio Ferrara et.al.|[2407.07196v1](http://arxiv.org/abs/2407.07196v1)|null|\n", "2407.06908": "|**2024-07-09**|**Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models**|Flor Miriam Plaza-del-Arco et.al.|[2407.06908v1](http://arxiv.org/abs/2407.06908v1)|null|\n", "2407.06842": "|**2024-07-10**|**Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts**|Shuangkang Fang et.al.|[2407.06842v2](http://arxiv.org/abs/2407.06842v2)|null|\n", "2407.06564": "|**2024-07-09**|**Combining Knowledge Graphs and Large Language Models**|Amanda Kau et.al.|[2407.06564v1](http://arxiv.org/abs/2407.06564v1)|null|\n", "2407.06488": "|**2024-07-09**|**Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons**|Yongqi Leng et.al.|[2407.06488v1](http://arxiv.org/abs/2407.06488v1)|null|\n", "2407.06093": "|**2024-07-08**|**Artificial Intuition: Efficient Classification of Scientific Abstracts**|Harsh Sakhrani et.al.|[2407.06093v1](http://arxiv.org/abs/2407.06093v1)|null|\n", "2407.05611": "|**2024-07-08**|**GenFollower: Enhancing Car-Following Prediction with Large Language Models**|Xianda Chen et.al.|[2407.05611v1](http://arxiv.org/abs/2407.05611v1)|null|\n", "2407.05464": "|**2024-07-07**|**Experiments with truth using Machine Learning: Spectral analysis and explainable classification of synthetic, false, and genuine information**|Vishnu S. Pendyala et.al.|[2407.05464v1](http://arxiv.org/abs/2407.05464v1)|null|\n", "2407.05036": "|**2024-07-06**|**Enhance the Robustness of Text-Centric Multimodal Alignments**|Ting-Yu Yen et.al.|[2407.05036v1](http://arxiv.org/abs/2407.05036v1)|null|\n", "2407.08618": "|**2024-07-11**|**Tamil Language Computing: the Present and the Future**|Kengatharaiyer Sarveswaran et.al.|[2407.08618v1](http://arxiv.org/abs/2407.08618v1)|null|\n", "2407.08550": "|**2024-07-11**|**Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility**|Yuchen Xia et.al.|[2407.08550v1](http://arxiv.org/abs/2407.08550v1)|null|\n", "2407.08532": "|**2024-07-11**|**Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models**|Ying Zhang et.al.|[2407.08532v1](http://arxiv.org/abs/2407.08532v1)|null|\n", "2407.08388": "|**2024-07-11**|**On the attribution of confidence to large language models**|Geoff Keeling et.al.|[2407.08388v1](http://arxiv.org/abs/2407.08388v1)|null|\n", "2407.08331": "|**2024-07-11**|**Towards Explainable Evolution Strategies with Large Language Models**|Jill Baumann et.al.|[2407.08331v1](http://arxiv.org/abs/2407.08331v1)|null|\n", "2407.08249": "|**2024-07-11**|**GeNet: A Multimodal LLM-Based Co-Pilot for Network Topology and Configuration**|Beni Ifland et.al.|[2407.08249v1](http://arxiv.org/abs/2407.08249v1)|null|\n", "2407.08067": "|**2024-07-10**|**On LLM Wizards: Identifying Large Language Models' Behaviors for Wizard of Oz Experiments**|Jingchao Fang et.al.|[2407.08067v1](http://arxiv.org/abs/2407.08067v1)|null|\n", "2407.08039": "|**2024-07-10**|**Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models**|Yuji Zhang et.al.|[2407.08039v1](http://arxiv.org/abs/2407.08039v1)|null|\n", "2407.09413": "|**2024-07-12**|**SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers**|Shraman Pramanick et.al.|[2407.09413v1](http://arxiv.org/abs/2407.09413v1)|**[link](https://github.com/google/spiqa)**|\n", "2407.08836": "|**2024-07-11**|**Fault Diagnosis in Power Grids with Large Language Model**|Liu Jing et.al.|[2407.08836v1](http://arxiv.org/abs/2407.08836v1)|null|\n", "2407.10805": "|**2024-07-15**|**Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval**|Shengjie Ma et.al.|[2407.10805v1](http://arxiv.org/abs/2407.10805v1)|null|\n", "2407.10795": "|**2024-07-15**|**Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping**|Wenhao Zhu et.al.|[2407.10795v1](http://arxiv.org/abs/2407.10795v1)|**[link](https://github.com/njunlp/skiplayercd)**|\n", "2407.10785": "|**2024-07-15**|**Interpretability analysis on a pathology foundation model reveals biologically relevant embeddings across modalities**|Nhat Le et.al.|[2407.10785v1](http://arxiv.org/abs/2407.10785v1)|null|\n", "2407.10490": "|**2024-07-15**|**Learning Dynamics of LLM Finetuning**|Yi Ren et.al.|[2407.10490v1](http://arxiv.org/abs/2407.10490v1)|**[link](https://github.com/joshua-ren/learning_dynamics_llm)**|\n", "2407.10362": "|**2024-07-17**|**LAB-Bench: Measuring Capabilities of Language Models for Biology Research**|Jon M. Laurent et.al.|[2407.10362v3](http://arxiv.org/abs/2407.10362v3)|null|\n", "2407.10114": "|**2024-07-22**|**TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation**|Roni Goldshmidt et.al.|[2407.10114v2](http://arxiv.org/abs/2407.10114v2)|null|\n", "2407.10091": "|**2024-07-14**|**Enhancing Emotion Prediction in News Headlines: Insights from ChatGPT and Seq2Seq Models for Free-Text Generation**|Ge Gao et.al.|[2407.10091v1](http://arxiv.org/abs/2407.10091v1)|null|\n", "2407.09893": "|**2024-07-13**|**Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks**|Shengbin Yue et.al.|[2407.09893v1](http://arxiv.org/abs/2407.09893v1)|**[link](https://github.com/yueshengbin/SMART)**|\n", "2407.09890": "|**2024-07-13**|**Speech-Guided Sequential Planning for Autonomous Navigation using Large Language Model Meta AI 3 (Llama3)**|Alkesh K. Srivastava et.al.|[2407.09890v1](http://arxiv.org/abs/2407.09890v1)|null|\n", "2407.09540": "|**2024-06-26**|**Prompting Whole Slide Image Based Genetic Biomarker Prediction**|Ling Zhang et.al.|[2407.09540v1](http://arxiv.org/abs/2407.09540v1)|null|\n", "2407.11827": "|**2024-07-16**|**GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text**|Kyle Hamilton et.al.|[2407.11827v1](http://arxiv.org/abs/2407.11827v1)|null|\n", "2407.11215": "|**2024-07-15**|**Mechanistic interpretability of large language models with applications to the financial services industry**|Ashkan Golgoon et.al.|[2407.11215v1](http://arxiv.org/abs/2407.11215v1)|null|\n", "2407.11015": "|**2024-06-27**|**Does ChatGPT Have a Mind?**|Simon Goldstein et.al.|[2407.11015v1](http://arxiv.org/abs/2407.11015v1)|null|\n", "2407.10996": "|**2024-06-24**|**Visualization Literacy of Multimodal Large Language Models: A Comparative Study**|Zhimin Li et.al.|[2407.10996v1](http://arxiv.org/abs/2407.10996v1)|null|\n", "2407.13596": "|**2024-07-20**|**EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension**|Wei Zhang et.al.|[2407.13596v2](http://arxiv.org/abs/2407.13596v2)|**[link](https://github.com/wivizhang/earthmarker)**|\n", "2407.13301": "|**2024-07-18**|**CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis**|Junying Chen et.al.|[2407.13301v1](http://arxiv.org/abs/2407.13301v1)|null|\n", "2407.13117": "|**2024-07-18**|**SOMONITOR: Explainable Marketing Data Processing and Analysis with Large Language Models**|Qi Yang et.al.|[2407.13117v1](http://arxiv.org/abs/2407.13117v1)|null|\n", "2407.13115": "|**2024-07-18**|**TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models**|Ling Yue et.al.|[2407.13115v1](http://arxiv.org/abs/2407.13115v1)|null|\n", "2407.12613": "|**2024-07-17**|**AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism**|William Brannon et.al.|[2407.12613v1](http://arxiv.org/abs/2407.12613v1)|**[link](https://github.com/mit-ccc/AudienceView-demo)**|\n", "2407.12366": "|**2024-07-17**|**NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models**|Gengze Zhou et.al.|[2407.12366v1](http://arxiv.org/abs/2407.12366v1)|**[link](https://github.com/gengzezhou/navgpt-2)**|\n", "2407.12858": "|**2024-07-10**|**Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)**|Krishnaram Kenthapadi et.al.|[2407.12858v1](http://arxiv.org/abs/2407.12858v1)|null|\n", "2407.12821": "|**2024-07-01**|**AutoFlow: Automated Workflow Generation for Large Language Model Agents**|Zelong Li et.al.|[2407.12821v1](http://arxiv.org/abs/2407.12821v1)|**[link](https://github.com/agiresearch/autoflow)**|\n", "2407.15549": "|**2024-07-22**|**Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs**|Abhay Sheshadri et.al.|[2407.15549v1](http://arxiv.org/abs/2407.15549v1)|null|\n", "2407.15428": "|**2024-07-22**|**Decoding BACnet Packets: A Large Language Model Approach for Packet Interpretation**|Rashi Sharma et.al.|[2407.15428v1](http://arxiv.org/abs/2407.15428v1)|null|\n", "2407.15360": "|**2024-07-22**|**Dissecting Multiplication in Transformers: Insights into LLMs**|Luyu Qiu et.al.|[2407.15360v1](http://arxiv.org/abs/2407.15360v1)|null|\n", "2407.15351": "|**2024-07-23**|**LLMExplainer: Large Language Model based Bayesian Inference for Graph Explanation Generation**|Jiaxing Zhang et.al.|[2407.15351v2](http://arxiv.org/abs/2407.15351v2)|null|\n", "2407.15248": "|**2024-07-21**|**XAI meets LLMs: A Survey of the Relation between Explainable AI and Large Language Models**|Erik Cambria et.al.|[2407.15248v1](http://arxiv.org/abs/2407.15248v1)|null|\n", "2407.14644": "|**2024-07-19**|**Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context**|Nilanjana Das et.al.|[2407.14644v1](http://arxiv.org/abs/2407.14644v1)|null|\n", "2407.14506": "|**2024-07-19**|**On Pre-training of Multimodal Language Models Customized for Chart Understanding**|Wan-Cyuan Fan et.al.|[2407.14506v1](http://arxiv.org/abs/2407.14506v1)|null|\n", "2407.14467": "|**2024-07-19**|**Check-Eval: A Checklist-based Approach for Evaluating Text Quality**|Jayr Pereira et.al.|[2407.14467v1](http://arxiv.org/abs/2407.14467v1)|null|\n", "2407.14239": "|**2024-07-19**|**KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models**|Kemou Jiang et.al.|[2407.14239v1](http://arxiv.org/abs/2407.14239v1)|null|\n", "2407.14192": "|**2024-07-19**|**LeKUBE: A Legal Knowledge Update BEnchmark**|Changyue Wang et.al.|[2407.14192v1](http://arxiv.org/abs/2407.14192v1)|null|\n", "2407.14044": "|**2024-07-19**|**ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?**|Siddhant Waghjale et.al.|[2407.14044v1](http://arxiv.org/abs/2407.14044v1)|**[link](https://github.com/codeeff/ecco)**|\n", "2407.13909": "|**2024-07-18**|**PRAGyan -- Connecting the Dots in Tweets**|Rahul Ravi et.al.|[2407.13909v1](http://arxiv.org/abs/2407.13909v1)|null|\n", "2407.13851": "|**2024-07-18**|**X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs**|Sirnam Swetha et.al.|[2407.13851v1](http://arxiv.org/abs/2407.13851v1)|null|\n", "2407.13787": "|**2024-07-24**|**The Honorific Effect: Exploring the Impact of Japanese Linguistic Formalities on AI-Generated Physics Explanations**|Keisuke Sato et.al.|[2407.13787v2](http://arxiv.org/abs/2407.13787v2)|null|\n", "2407.13781": "|**2024-07-03**|**RDBE: Reasoning Distillation-Based Evaluation Enhances Automatic Essay Scoring**|Ali Ghiasvand Mohammadkhani et.al.|[2407.13781v1](http://arxiv.org/abs/2407.13781v1)|null|\n", "2407.14269": "|**2024-07-02**|**Predictive Simultaneous Interpretation: Harnessing Large Language Models for Democratizing Real-Time Multilingual Communication**|Kurando Iida et.al.|[2407.14269v1](http://arxiv.org/abs/2407.14269v1)|null|\n", "2407.16329": "|**2024-07-23**|**PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets**|Jaeyoung Kim et.al.|[2407.16329v1](http://arxiv.org/abs/2407.16329v1)|null|\n", "2407.17291": "|**2024-07-24**|**How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations?**|Leo Yu-Ho Lo et.al.|[2407.17291v1](http://arxiv.org/abs/2407.17291v1)|null|\n", "2407.17075": "|**2024-07-24**|**SAFETY-J: Evaluating Safety with Critique**|Yixiu Liu et.al.|[2407.17075v1](http://arxiv.org/abs/2407.17075v1)|null|\n", "2407.17011": "|**2024-07-24**|**Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism**|Anhao Zhao et.al.|[2407.17011v1](http://arxiv.org/abs/2407.17011v1)|null|\n"}, "LLM - Reasoning": {"2311.18836": "|**2023-11-30**|**PoseGPT: Chatting about 3D Human Pose**|Yao Feng et.al.|[2311.18836v1](http://arxiv.org/abs/2311.18836v1)|null|\n", "2311.18799": "|**2023-11-30**|**X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning**|Artemis Panagopoulou et.al.|[2311.18799v1](http://arxiv.org/abs/2311.18799v1)|**[link](https://github.com/artemisp/lavis-xinstructblip)**|\n", "2311.18775": "|**2023-11-30**|**CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation**|Zineng Tang et.al.|[2311.18775v1](http://arxiv.org/abs/2311.18775v1)|null|\n", "2311.18658": "|**2023-11-30**|**ArcMMLU: A Library and Information Science Benchmark for Large Language Models**|Shitou Zhang et.al.|[2311.18658v1](http://arxiv.org/abs/2311.18658v1)|**[link](https://github.com/stzhang-patrick/arcmmlu)**|\n", "2311.18445": "|**2023-11-30**|**VTimeLLM: Empower LLM to Grasp Video Moments**|Bin Huang et.al.|[2311.18445v1](http://arxiv.org/abs/2311.18445v1)|**[link](https://github.com/huangb23/vtimellm)**|\n", "2311.18397": "|**2023-11-30**|**IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions**|Zhebin Zhang et.al.|[2311.18397v1](http://arxiv.org/abs/2311.18397v1)|null|\n", "2311.18353": "|**2023-11-30**|**Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension**|Akira Kawabata et.al.|[2311.18353v1](http://arxiv.org/abs/2311.18353v1)|null|\n", "2311.18307": "|**2023-11-30**|**Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent**|Yuxiao Chen et.al.|[2311.18307v1](http://arxiv.org/abs/2311.18307v1)|null|\n", "2311.18062": "|**2023-11-29**|**Understanding Your Agent: Leveraging Large Language Models for Behavior Explanation**|Xijia Zhang et.al.|[2311.18062v1](http://arxiv.org/abs/2311.18062v1)|null|\n", "2311.17842": "|**2023-11-29**|**Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning**|Yingdong Hu et.al.|[2311.17842v1](http://arxiv.org/abs/2311.17842v1)|null|\n", "2311.17667": "|**2023-11-29**|**TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models**|Zheng Chu et.al.|[2311.17667v1](http://arxiv.org/abs/2311.17667v1)|**[link](https://github.com/zchuz/timebench)**|\n", "2311.17438": "|**2023-11-30**|**CLOMO: Counterfactual Logical Modification with Large Language Models**|Yinya Huang et.al.|[2311.17438v2](http://arxiv.org/abs/2311.17438v2)|null|\n", "2311.17406": "|**2023-11-29**|**LLM-State: Expandable State Representation for Long-horizon Task Planning in the Open World**|Siwei Chen et.al.|[2311.17406v1](http://arxiv.org/abs/2311.17406v1)|null|\n", "2311.17365": "|**2023-11-29**|**Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning**|Xiaoqian Wu et.al.|[2311.17365v1](http://arxiv.org/abs/2311.17365v1)|null|\n", "2311.17355": "|**2023-11-29**|**Are Large Language Models Good Fact Checkers: A Preliminary Study**|Han Cao et.al.|[2311.17355v1](http://arxiv.org/abs/2311.17355v1)|null|\n", "2311.17331": "|**2023-11-29**|**Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering**|Zeqing Wang et.al.|[2311.17331v1](http://arxiv.org/abs/2311.17331v1)|null|\n", "2311.17311": "|**2023-11-29**|**Universal Self-Consistency for Large Language Model Generation**|Xinyun Chen et.al.|[2311.17311v1](http://arxiv.org/abs/2311.17311v1)|null|\n", "2311.17126": "|**2023-11-28**|**Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis**|Xiaohui Chen et.al.|[2311.17126v1](http://arxiv.org/abs/2311.17126v1)|null|\n", "2311.16542": "|**2023-11-28**|**Agents meet OKR: An Object and Key Results Driven Agent System with Hierarchical Self-Collaboration and Self-Evaluation**|Yi Zheng et.al.|[2311.16542v1](http://arxiv.org/abs/2311.16542v1)|null|\n", "2311.16509": "|**2023-11-28**|**StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models**|Kazuki Yamauchi et.al.|[2311.16509v1](http://arxiv.org/abs/2311.16509v1)|null|\n", "2311.17076": "|**2023-11-27**|**Compositional Chain-of-Thought Prompting for Large Multimodal Models**|Chancharik Mitra et.al.|[2311.17076v1](http://arxiv.org/abs/2311.17076v1)|null|\n", "2311.16093": "|**2023-11-27**|**Have we built machines that think like people?**|Luca M. Schulze Buschoff et.al.|[2311.16093v1](http://arxiv.org/abs/2311.16093v1)|**[link](https://github.com/lsbuschoff/multimodal)**|\n", "2311.16079": "|**2023-11-27**|**MEDITRON-70B: Scaling Medical Pretraining for Large Language Models**|Zeming Chen et.al.|[2311.16079v1](http://arxiv.org/abs/2311.16079v1)|**[link](https://github.com/epfllm/meditron)**|\n", "2311.15930": "|**2023-11-27**|**WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models**|Youssef Benchekroun et.al.|[2311.15930v1](http://arxiv.org/abs/2311.15930v1)|**[link](https://github.com/facebookresearch/worldsense)**|\n", "2311.16500": "|**2023-11-27**|**LLMGA: Multimodal Large Language Model based Generation Assistant**|Bin Xia et.al.|[2311.16500v1](http://arxiv.org/abs/2311.16500v1)|null|\n", "2311.15766": "|**2023-12-08**|**Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges**|Nianwen Si et.al.|[2311.15766v2](http://arxiv.org/abs/2311.15766v2)|null|\n", "2311.15759": "|**2023-11-27**|**Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs**|Yunxin Li et.al.|[2311.15759v1](http://arxiv.org/abs/2311.15759v1)|null|\n", "2311.15383": "|**2023-11-26**|**Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding**|Zhihao Yuan et.al.|[2311.15383v1](http://arxiv.org/abs/2311.15383v1)|null|\n", "2311.15209": "|**2023-12-03**|**See and Think: Embodied Agent in Virtual Environment**|Zhonghan Zhao et.al.|[2311.15209v2](http://arxiv.org/abs/2311.15209v2)|null|\n", "2311.14786": "|**2023-11-24**|**GPT-4V Takes the Wheel: Evaluating Promise and Challenges for Pedestrian Behavior Prediction**|Jia Huang et.al.|[2311.14786v1](http://arxiv.org/abs/2311.14786v1)|null|\n", "2311.14580": "|**2023-11-24**|**Large Language Models as Automated Aligners for benchmarking Vision-Language Models**|Yuanfeng Ji et.al.|[2311.14580v1](http://arxiv.org/abs/2311.14580v1)|null|\n", "2311.14379": "|**2023-11-24**|**Robot Learning in the Era of Foundation Models: A Survey**|Xuan Xiao et.al.|[2311.14379v1](http://arxiv.org/abs/2311.14379v1)|null|\n", "2311.14096": "|**2023-11-23**|**Auditing and Mitigating Cultural Bias in LLMs**|Yan Tao et.al.|[2311.14096v1](http://arxiv.org/abs/2311.14096v1)|null|\n", "2311.13982": "|**2023-11-23**|**Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions**|Shulin Cao et.al.|[2311.13982v1](http://arxiv.org/abs/2311.13982v1)|null|\n", "2311.13743": "|**2023-12-03**|**FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design**|Yangyang Yu et.al.|[2311.13743v2](http://arxiv.org/abs/2311.13743v2)|**[link](https://github.com/pipiku915/finmem-llm-stocktrading)**|\n", "2311.13720": "|**2023-11-22**|**Towards More Likely Models for AI Planning**|Turgay Caglar et.al.|[2311.13720v1](http://arxiv.org/abs/2311.13720v1)|null|\n", "2311.13577": "|**2023-11-22**|**Physical Reasoning and Object Planning for Household Embodied Agents**|Ayush Agrawal et.al.|[2311.13577v1](http://arxiv.org/abs/2311.13577v1)|**[link](https://github.com/com-phy-affordance/coat)**|\n", "2311.13565": "|**2023-11-22**|**Drilling Down into the Discourse Structure with LLMs for Long Document Question Answering**|Inderjeet Nair et.al.|[2311.13565v1](http://arxiv.org/abs/2311.13565v1)|null|\n", "2311.13627": "|**2023-11-22**|**Vamos: Versatile Action Models for Video Understanding**|Shijie Wang et.al.|[2311.13627v1](http://arxiv.org/abs/2311.13627v1)|null|\n", "2311.13538": "|**2023-11-22**|**Speak Like a Native: Prompting Large Language Models in a Native Style**|Zhicheng Yang et.al.|[2311.13538v1](http://arxiv.org/abs/2311.13538v1)|**[link](https://github.com/yangzhch6/aligncot)**|\n", "2311.13445": "|**2023-11-22**|**Transfer Attacks and Defenses for Large Language Models on Coding Tasks**|Chi Zhang et.al.|[2311.13445v1](http://arxiv.org/abs/2311.13445v1)|null|\n", "2311.13314": "|**2023-11-22**|**Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-based Retrofitting**|Xinyan Guan et.al.|[2311.13314v1](http://arxiv.org/abs/2311.13314v1)|null|\n", "2311.13148": "|**2023-11-28**|**Building the Future of Responsible AI: A Reference Architecture for Designing Large Language Model based Agents**|Qinghua Lu et.al.|[2311.13148v2](http://arxiv.org/abs/2311.13148v2)|null|\n", "2311.16173": "|**2023-12-06**|**Conditions for Length Generalization in Learning Reasoning Skills**|Changnan Xiao et.al.|[2311.16173v2](http://arxiv.org/abs/2311.16173v2)|null|\n", "2311.13095": "|**2023-11-22**|**Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications**|Ha-Thanh Nguyen et.al.|[2311.13095v1](http://arxiv.org/abs/2311.13095v1)|null|\n", "2311.13063": "|**2023-11-25**|**From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models**|Zachary Englhardt et.al.|[2311.13063v2](http://arxiv.org/abs/2311.13063v2)|null|\n", "2311.12699": "|**2023-11-21**|**Can Large Language Models Understand Content and Propagation for Misinformation Detection: An Empirical Study**|Mengyang Chen et.al.|[2311.12699v1](http://arxiv.org/abs/2311.12699v1)|null|\n", "2311.12668": "|**2023-11-21**|**From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design**|Cyril Picard et.al.|[2311.12668v1](http://arxiv.org/abs/2311.12668v1)|null|\n", "2311.12889": "|**2023-11-21**|**Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge**|Bowen Jiang et.al.|[2311.12889v1](http://arxiv.org/abs/2311.12889v1)|null|\n", "2311.12327": "|**2023-11-21**|**ViLaM: A Vision-Language Model with Enhanced Visual Grounding and Generalization Capability**|Xiaoyu Yang et.al.|[2311.12327v1](http://arxiv.org/abs/2311.12327v1)|**[link](https://github.com/anonymgiant/vilam)**|\n", "2311.12188": "|**2023-12-04**|**ChatGPT and post-test probability**|Samuel J. Weisenthal et.al.|[2311.12188v3](http://arxiv.org/abs/2311.12188v3)|null|\n", "2311.12144": "|**2023-12-04**|**Applications of Large Scale Foundation Models for Autonomous Driving**|Yu Huang et.al.|[2311.12144v6](http://arxiv.org/abs/2311.12144v6)|null|\n", "2311.11865": "|**2023-11-20**|**VLM-Eval: A General Evaluation on Video Large Language Models**|Shuailin Li et.al.|[2311.11865v1](http://arxiv.org/abs/2311.11865v1)|null|\n", "2311.11860": "|**2023-11-26**|**LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge**|Gongwei Chen et.al.|[2311.11860v2](http://arxiv.org/abs/2311.11860v2)|**[link](https://github.com/rshaojimmy/jiutian)**|\n", "2311.11829": "|**2023-11-20**|**System 2 Attention (is something you might need too)**|Jason Weston et.al.|[2311.11829v1](http://arxiv.org/abs/2311.11829v1)|null|\n", "2311.11797": "|**2023-11-20**|**Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents**|Zhuosheng Zhang et.al.|[2311.11797v1](http://arxiv.org/abs/2311.11797v1)|**[link](https://github.com/zoeyyao27/cot-igniting-agent)**|\n", "2311.11689": "|**2023-11-20**|**Causal Structure Learning Supervised by Large Language Model**|Taiyu Ban et.al.|[2311.11689v1](http://arxiv.org/abs/2311.11689v1)|**[link](https://github.com/tymadara/ils-csl)**|\n", "2311.11598": "|**2023-11-20**|**Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions**|Ziyue Wang et.al.|[2311.11598v1](http://arxiv.org/abs/2311.11598v1)|**[link](https://github.com/thunlp-mt/fiig)**|\n", "2311.11567": "|**2023-12-04**|**InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models**|Xiaotian Han et.al.|[2311.11567v3](http://arxiv.org/abs/2311.11567v3)|null|\n", "2311.11482": "|**2023-11-20**|**Meta Prompting for AGI Systems**|Yifan Zhang et.al.|[2311.11482v1](http://arxiv.org/abs/2311.11482v1)|**[link](https://github.com/meta-prompting/meta-prompting)**|\n", "2311.14722": "|**2023-11-19**|**Zero-Shot Question Answering over Financial Documents using Large Language Models**|Karmvir Singh Phogat et.al.|[2311.14722v1](http://arxiv.org/abs/2311.14722v1)|null|\n", "2311.11255": "|**2023-11-28**|**M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models**|Atin Sakkeer Hussain et.al.|[2311.11255v2](http://arxiv.org/abs/2311.11255v2)|null|\n", "2311.12065": "|**2023-11-19**|**Few-Shot Classification & Segmentation Using Large Language Models Agent**|Tian Meng et.al.|[2311.12065v1](http://arxiv.org/abs/2311.12065v1)|null|\n", "2311.11135": "|**2023-11-18**|**A Principled Framework for Knowledge-enhanced Large Language Model**|Saizhuo Wang et.al.|[2311.11135v1](http://arxiv.org/abs/2311.11135v1)|null|\n", "2311.10947": "|**2023-11-18**|**RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability**|Yuxuan Lei et.al.|[2311.10947v1](http://arxiv.org/abs/2311.10947v1)|null|\n", "2311.12871": "|**2023-11-18**|**An Embodied Generalist Agent in 3D World**|Jiangyong Huang et.al.|[2311.12871v1](http://arxiv.org/abs/2311.12871v1)|**[link](https://github.com/embodied-generalist/embodied-generalist)**|\n", "2311.10813": "|**2023-11-27**|**A Language Agent for Autonomous Driving**|Jiageng Mao et.al.|[2311.10813v3](http://arxiv.org/abs/2311.10813v3)|**[link](https://github.com/usc-gvl/agent-driver)**|\n", "2311.10614": "|**2023-11-17**|**A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest**|Ruohong Zhang et.al.|[2311.10614v1](http://arxiv.org/abs/2311.10614v1)|null|\n", "2311.10227": "|**2023-11-16**|**Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities**|Alex Wilf et.al.|[2311.10227v1](http://arxiv.org/abs/2311.10227v1)|null|\n", "2311.10215": "|**2023-11-16**|**Predictive Minds: LLMs As Atypical Active Inference Agents**|Jan Kulveit et.al.|[2311.10215v1](http://arxiv.org/abs/2311.10215v1)|null|\n", "2311.16169": "|**2023-11-16**|**Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities**|Avishree Khare et.al.|[2311.16169v1](http://arxiv.org/abs/2311.16169v1)|null|\n", "2311.09868": "|**2023-11-16**|**INTERVENOR: Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing**|Hanbin Wang et.al.|[2311.09868v1](http://arxiv.org/abs/2311.09868v1)|**[link](https://github.com/neuir/intervenor)**|\n", "2311.09862": "|**2023-11-16**|**Which Modality should I use -- Text, Motif, or Image? : Understanding Graphs with Large Language Models**|Debarati Das et.al.|[2311.09862v1](http://arxiv.org/abs/2311.09862v1)|null|\n", "2311.09832": "|**2023-11-16**|**X-Mark: Towards Lossless Watermarking Through Lexical Redundancy**|Liang Chen et.al.|[2311.09832v1](http://arxiv.org/abs/2311.09832v1)|null|\n", "2311.09829": "|**2023-11-16**|**FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models**|Yimin Jing et.al.|[2311.09829v1](http://arxiv.org/abs/2311.09829v1)|null|\n", "2311.09827": "|**2023-11-16**|**Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking**|Nan Xu et.al.|[2311.09827v1](http://arxiv.org/abs/2311.09827v1)|null|\n", "2311.09821": "|**2023-11-16**|**Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning**|Qingyu Tan et.al.|[2311.09821v1](http://arxiv.org/abs/2311.09821v1)|null|\n", "2311.10537": "|**2023-11-16**|**MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning**|Xiangru Tang et.al.|[2311.10537v1](http://arxiv.org/abs/2311.10537v1)|**[link](https://github.com/gersteinlab/medagents)**|\n", "2311.09816": "|**2023-11-16**|**Performance Trade-offs of Watermarking Large Language Models**|Anirudh Ajith et.al.|[2311.09816v1](http://arxiv.org/abs/2311.09816v1)|null|\n", "2311.09762": "|**2023-11-16**|**Graph-Guided Reasoning for Multi-Hop Question Answering in Large Language Models**|Jinyoung Park et.al.|[2311.09762v1](http://arxiv.org/abs/2311.09762v1)|null|\n", "2311.09724": "|**2023-11-16**|**Outcome-supervised Verifiers for Planning in Mathematical Reasoning**|Fei Yu et.al.|[2311.09724v1](http://arxiv.org/abs/2311.09724v1)|**[link](https://github.com/freedomintelligence/ovm)**|\n", "2311.09721": "|**2023-11-16**|**On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering**|Linyong Nan et.al.|[2311.09721v1](http://arxiv.org/abs/2311.09721v1)|null|\n", "2311.09702": "|**2023-11-16**|**Deceiving Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?**|Bangzheng Li et.al.|[2311.09702v1](http://arxiv.org/abs/2311.09702v1)|null|\n", "2311.09665": "|**2023-11-16**|**Evaluating LLM Agent Group Dynamics against Human Group Dynamics: A Case Study on Wisdom of Partisan Crowds**|Yun-Shiuan Chuang et.al.|[2311.09665v1](http://arxiv.org/abs/2311.09665v1)|null|\n", "2311.09656": "|**2023-11-16**|**Structured Chemistry Reasoning with Large Language Models**|Siru Ouyang et.al.|[2311.09656v1](http://arxiv.org/abs/2311.09656v1)|null|\n", "2311.09612": "|**2023-11-16**|**Efficient End-to-End Visual Document Understanding with Rationale Distillation**|Wang Zhu et.al.|[2311.09612v1](http://arxiv.org/abs/2311.09612v1)|null|\n", "2311.09603": "|**2023-11-16**|**SCORE: A framework for Self-Contradictory Reasoning Evaluation**|Ziyi Liu et.al.|[2311.09603v1](http://arxiv.org/abs/2311.09603v1)|null|\n", "2311.09553": "|**2023-11-16**|**Program-Aided Reasoners (better) Know What They Know**|Anubha Kabra et.al.|[2311.09553v1](http://arxiv.org/abs/2311.09553v1)|null|\n", "2311.10775": "|**2023-11-15**|**ToolTalk: Evaluating Tool-Usage in a Conversational Setting**|Nicholas Farn et.al.|[2311.10775v1](http://arxiv.org/abs/2311.10775v1)|null|\n", "2311.10774": "|**2023-11-15**|**MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning**|Fuxiao Liu et.al.|[2311.10774v1](http://arxiv.org/abs/2311.10774v1)|**[link](https://github.com/fuxiaoliu/mmc)**|\n", "2311.09335": "|**2023-11-15**|**Lighter, yet More Faithful: Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization**|George Chrysostomou et.al.|[2311.09335v1](http://arxiv.org/abs/2311.09335v1)|null|\n", "2311.09214": "|**2023-11-15**|**Mind's Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models**|Weize Liu et.al.|[2311.09214v1](http://arxiv.org/abs/2311.09214v1)|null|\n", "2311.09204": "|**2023-11-15**|**Fusion-Eval: Integrating Evaluators with LLMs**|Lei Shu et.al.|[2311.09204v1](http://arxiv.org/abs/2311.09204v1)|null|\n", "2311.09175": "|**2023-11-15**|**Generate, Filter, and Fuse: Query Expansion via Multi-Step Keyword Generation for Zero-Shot Neural Rankers**|Minghan Li et.al.|[2311.09175v1](http://arxiv.org/abs/2311.09175v1)|null|\n", "2311.09149": "|**2023-11-15**|**Temporal Knowledge Question Answering via Abstract Reasoning Induction**|Ziyang Chen et.al.|[2311.09149v1](http://arxiv.org/abs/2311.09149v1)|null|\n", "2311.09136": "|**2023-11-15**|**RRescue: Ranking LLM Responses to Enhance Reasoning Over Context**|Yikun Wang et.al.|[2311.09136v1](http://arxiv.org/abs/2311.09136v1)|null|\n", "2311.09114": "|**2023-11-15**|**Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification**|Haoqiang Kang et.al.|[2311.09114v1](http://arxiv.org/abs/2311.09114v1)|null|\n", "2311.09101": "|**2023-11-15**|**Towards A Unified View of Answer Calibration for Multi-Step Reasoning**|Shumin Deng et.al.|[2311.09101v1](http://arxiv.org/abs/2311.09101v1)|null|\n", "2311.09050": "|**2023-11-15**|**Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts**|Yunshi Lan et.al.|[2311.09050v1](http://arxiv.org/abs/2311.09050v1)|**[link](https://github.com/ecnu-dase-nlp/rqp)**|\n", "2311.09033": "|**2023-11-15**|**MELA: Multilingual Evaluation of Linguistic Acceptability**|Ziyin Zhang et.al.|[2311.09033v1](http://arxiv.org/abs/2311.09033v1)|null|\n", "2312.00746": "|**2023-12-01**|**Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games**|Dekun Wu et.al.|[2312.00746v1](http://arxiv.org/abs/2312.00746v1)|null|\n", "2312.00567": "|**2023-12-01**|**Explanatory Argument Extraction of Correct Answers in Resident Medical Exams**|Iakes Goenaga et.al.|[2312.00567v1](http://arxiv.org/abs/2312.00567v1)|null|\n", "2312.00554": "|**2023-12-01**|**Questioning Biases in Case Judgment Summaries: Legal Datasets or Large Language Models?**|Aniket Deroy et.al.|[2312.00554v1](http://arxiv.org/abs/2312.00554v1)|null|\n", "2312.00353": "|**2023-12-01**|**On Exploring the Reasoning Capability of Large Language Models with Knowledge Graphs**|Pei-Chi Lo et.al.|[2312.00353v1](http://arxiv.org/abs/2312.00353v1)|null|\n", "2312.00249": "|**2023-11-30**|**Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities**|Jinhua Liang et.al.|[2312.00249v1](http://arxiv.org/abs/2312.00249v1)|**[link](https://github.com/jinhualiang/apt)**|\n", "2312.00164": "|**2023-11-30**|**Towards Accurate Differential Diagnosis with Large Language Models**|Daniel McDuff et.al.|[2312.00164v1](http://arxiv.org/abs/2312.00164v1)|null|\n", "2312.00589": "|**2023-11-30**|**Merlin:Empowering Multimodal LLMs with Foresight Minds**|En Yu et.al.|[2312.00589v1](http://arxiv.org/abs/2312.00589v1)|null|\n", "2312.02143": "|**2023-12-05**|**Competition-Level Problems are Effective LLM Evaluators**|Yiming Huang et.al.|[2312.02143v2](http://arxiv.org/abs/2312.02143v2)|null|\n", "2312.02119": "|**2023-12-04**|**Tree of Attacks: Jailbreaking Black-Box LLMs Automatically**|Anay Mehrotra et.al.|[2312.02119v1](http://arxiv.org/abs/2312.02119v1)|**[link](https://github.com/ricommunity/tap)**|\n", "2312.02051": "|**2023-12-04**|**TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding**|Shuhuai Ren et.al.|[2312.02051v1](http://arxiv.org/abs/2312.02051v1)|**[link](https://github.com/renshuhuai-andy/timechat)**|\n", "2312.02003": "|**2023-12-04**|**A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly**|Yifan Yao et.al.|[2312.02003v1](http://arxiv.org/abs/2312.02003v1)|null|\n", "2312.01886": "|**2023-12-04**|**InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models**|Xunguang Wang et.al.|[2312.01886v1](http://arxiv.org/abs/2312.01886v1)|null|\n", "2312.01823": "|**2023-12-04**|**Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication**|Zhangyue Yin et.al.|[2312.01823v1](http://arxiv.org/abs/2312.01823v1)|**[link](https://github.com/yinzhangyue/eot)**|\n", "2312.01714": "|**2023-12-04**|**Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models**|Bingshuai Liu et.al.|[2312.01714v1](http://arxiv.org/abs/2312.01714v1)|null|\n", "2312.01678": "|**2023-12-05**|**Jellyfish: A Large Language Model for Data Preprocessing**|Haochen Zhang et.al.|[2312.01678v2](http://arxiv.org/abs/2312.01678v2)|null|\n", "2312.01661": "|**2023-12-04**|**ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions**|Phuoc Pham Van Long et.al.|[2312.01661v1](http://arxiv.org/abs/2312.01661v1)|**[link](https://github.com/dxlong2000/chatgpt-as-a-math-questioner)**|\n", "2312.01598": "|**2023-12-09**|**Good Questions Help Zero-Shot Image Reasoning**|Kaiwen Yang et.al.|[2312.01598v2](http://arxiv.org/abs/2312.01598v2)|**[link](https://github.com/kai-wen-yang/qvix)**|\n", "2312.01454": "|**2023-12-06**|**D-Bot: Database Diagnosis System using Large Language Models**|Xuanhe Zhou et.al.|[2312.01454v2](http://arxiv.org/abs/2312.01454v2)|**[link](https://github.com/tsinghuadatabasegroup/db-gpt)**|\n", "2312.01279": "|**2023-12-03**|**TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents**|James Enouen et.al.|[2312.01279v1](http://arxiv.org/abs/2312.01279v1)|null|\n", "2312.01054": "|**2023-12-02**|**Exploring and Improving the Spatial Reasoning Abilities of Large Language Models**|Manasi Sharma et.al.|[2312.01054v1](http://arxiv.org/abs/2312.01054v1)|null|\n", "2312.01044": "|**2023-12-02**|**Large Language Models Are Zero-Shot Text Classifiers**|Zhiqiang Wang et.al.|[2312.01044v1](http://arxiv.org/abs/2312.01044v1)|**[link](https://github.com/yeyimilk/llm-zero-shot-classifiers)**|\n", "2312.01040": "|**2023-12-18**|**From Beginner to Expert: Modeling Medical Knowledge into General LLMs**|Qiang Li et.al.|[2312.01040v2](http://arxiv.org/abs/2312.01040v2)|null|\n", "2312.01032": "|**2023-12-02**|**Harnessing the Power of Prompt-based Techniques for Generating School-Level Questions using Large Language Models**|Subhankar Maity et.al.|[2312.01032v1](http://arxiv.org/abs/2312.01032v1)|**[link](https://github.com/my625/promptqg)**|\n", "2312.00849": "|**2023-12-01**|**RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback**|Tianyu Yu et.al.|[2312.00849v1](http://arxiv.org/abs/2312.00849v1)|**[link](https://github.com/rlhf-v/rlhf-v)**|\n", "2312.00819": "|**2023-11-30**|**Large Language Models for Travel Behavior Prediction**|Baichuan Mo et.al.|[2312.00819v1](http://arxiv.org/abs/2312.00819v1)|null|\n", "2312.00812": "|**2023-11-28**|**Empowering Autonomous Driving with Large Language Models: A Safety Perspective**|Yixuan Wang et.al.|[2312.00812v1](http://arxiv.org/abs/2312.00812v1)|null|\n", "2312.02783": "|**2023-12-05**|**Large Language Models on Graphs: A Comprehensive Survey**|Bowen Jin et.al.|[2312.02783v1](http://arxiv.org/abs/2312.02783v1)|**[link](https://github.com/petergriffinjin/awesome-language-model-on-graphs)**|\n", "2312.02598": "|**2023-12-05**|**Impact of Tokenization on LLaMa Russian Adaptation**|Mikhail Tikhomirov et.al.|[2312.02598v1](http://arxiv.org/abs/2312.02598v1)|null|\n", "2312.02441": "|**2023-12-05**|**MedDM:LLM-executable clinical guidance tree for clinical decision-making**|Binbin Li et.al.|[2312.02441v1](http://arxiv.org/abs/2312.02441v1)|null|\n", "2312.02439": "|**2023-12-06**|**Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation**|Shanshan Zhong et.al.|[2312.02439v2](http://arxiv.org/abs/2312.02439v2)|**[link](https://github.com/sail-sg/clot)**|\n", "2312.02433": "|**2023-12-05**|**Lenna: Language Enhanced Reasoning Detection Assistant**|Fei Wei et.al.|[2312.02433v1](http://arxiv.org/abs/2312.02433v1)|**[link](https://github.com/meituan-automl/lenna)**|\n", "2312.02252": "|**2023-12-13**|**StoryGPT-V: Large Language Models as Consistent Story Visualizers**|Xiaoqian Shen et.al.|[2312.02252v2](http://arxiv.org/abs/2312.02252v2)|**[link](https://github.com/xiaoqian-shen/StoryGPT-V)**|\n", "2312.02179": "|**2023-11-28**|**Training Chain-of-Thought via Latent-Variable Inference**|Du Phan et.al.|[2312.02179v1](http://arxiv.org/abs/2312.02179v1)|null|\n", "2312.03700": "|**2023-12-06**|**OneLLM: One Framework to Align All Modalities with Language**|Jiaming Han et.al.|[2312.03700v1](http://arxiv.org/abs/2312.03700v1)|**[link](https://github.com/csuhan/onellm)**|\n", "2312.03664": "|**2023-12-13**|**Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia**|Alexander Sasha Vezhnevets et.al.|[2312.03664v2](http://arxiv.org/abs/2312.03664v2)|**[link](https://github.com/google-deepmind/concordia)**|\n", "2312.03633": "|**2023-12-06**|**Not All Large Language Models (LLMs) Succumb to the \"Reversal Curse\": A Comparative Study of Deductive Logical Reasoning in BERT and GPT Models**|Jingye Yang et.al.|[2312.03633v1](http://arxiv.org/abs/2312.03633v1)|null|\n", "2312.03632": "|**2023-12-06**|**Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models**|Dominik Wagner et.al.|[2312.03632v1](http://arxiv.org/abs/2312.03632v1)|null|\n", "2312.03360": "|**2023-12-18**|**Teaching Specific Scientific Knowledge into Large Language Models through Additional Training**|Kan Hatakeyama-Sato et.al.|[2312.03360v2](http://arxiv.org/abs/2312.03360v2)|null|\n", "2312.03134": "|**2023-12-05**|**A Hardware Evaluation Framework for Large Language Model Inference**|Hengrui Zhang et.al.|[2312.03134v1](http://arxiv.org/abs/2312.03134v1)|null|\n", "2312.03052": "|**2023-12-05**|**Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models**|Yushi Hu et.al.|[2312.03052v1](http://arxiv.org/abs/2312.03052v1)|null|\n", "2312.03042": "|**2023-12-05**|**Inherent limitations of LLMs regarding spatial information**|He Yan et.al.|[2312.03042v1](http://arxiv.org/abs/2312.03042v1)|null|\n", "2312.03003": "|**2023-12-04**|**Explore, Select, Derive, and Recall: Augmenting LLM with Human-like Memory for Mobile Task Automation**|Sunjae Lee et.al.|[2312.03003v1](http://arxiv.org/abs/2312.03003v1)|null|\n", "2312.04511": "|**2023-12-07**|**An LLM Compiler for Parallel Function Calling**|Sehoon Kim et.al.|[2312.04511v1](http://arxiv.org/abs/2312.04511v1)|**[link](https://github.com/squeezeailab/llmcompiler)**|\n", "2312.04350": "|**2023-12-07**|**CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models**|Zhijing Jin et.al.|[2312.04350v1](http://arxiv.org/abs/2312.04350v1)|**[link](https://github.com/causalnlp/cladder)**|\n", "2312.04333": "|**2023-12-14**|**Beyond Surface: Probing LLaMA Across Scales and Layers**|Nuo Chen et.al.|[2312.04333v3](http://arxiv.org/abs/2312.04333v3)|**[link](https://github.com/nuochenpku/llama_analysis)**|\n", "2312.04021": "|**2023-12-11**|**A Study on the Calibration of In-context Learning**|Hanlin Zhang et.al.|[2312.04021v2](http://arxiv.org/abs/2312.04021v2)|null|\n", "2312.03863": "|**2023-12-06**|**Efficient Large Language Models: A Survey**|Zhongwei Wan et.al.|[2312.03863v1](http://arxiv.org/abs/2312.03863v1)|**[link](https://github.com/aiot-mlsys-lab/efficientllms)**|\n", "2312.03748": "|**2023-11-30**|**Applying Large Language Models and Chain-of-Thought for Automatic Scoring**|Gyeong-Geon Lee et.al.|[2312.03748v1](http://arxiv.org/abs/2312.03748v1)|null|\n", "2312.03720": "|**2023-11-26**|**Negotiating with LLMS: Prompt Hacks, Skill Gaps, and Reasoning Deficits**|Johannes Schneider et.al.|[2312.03720v1](http://arxiv.org/abs/2312.03720v1)|null|\n", "2312.05230": "|**2023-12-08**|**Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning**|Zhiting Hu et.al.|[2312.05230v1](http://arxiv.org/abs/2312.05230v1)|null|\n", "2312.05209": "|**2023-12-08**|**HALO: An Ontology for Representing Hallucinations in Generative Models**|Navapat Nananukul et.al.|[2312.05209v1](http://arxiv.org/abs/2312.05209v1)|null|\n", "2312.05200": "|**2023-12-08**|**DelucionQA: Detecting Hallucinations in Domain-specific Question Answering**|Mobashir Sadat et.al.|[2312.05200v1](http://arxiv.org/abs/2312.05200v1)|**[link](https://github.com/boschresearch/delucionqa)**|\n", "2312.05180": "|**2023-12-12**|**PathFinder: Guided Search over Multi-Step Reasoning Paths**|Olga Golovneva et.al.|[2312.05180v2](http://arxiv.org/abs/2312.05180v2)|null|\n", "2312.04931": "|**2023-12-08**|**Retrieval-based Video Language Model for Efficient Long Video Question Answering**|Jiaqi Xu et.al.|[2312.04931v1](http://arxiv.org/abs/2312.04931v1)|null|\n", "2312.04837": "|**2023-12-12**|**Localized Symbolic Knowledge Distillation for Visual Commonsense Models**|Jae Sung Park et.al.|[2312.04837v2](http://arxiv.org/abs/2312.04837v2)|**[link](https://github.com/jamespark3922/localized-skd)**|\n", "2312.04746": "|**2023-12-07**|**Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos**|Mehmet Saygin Seyfioglu et.al.|[2312.04746v1](http://arxiv.org/abs/2312.04746v1)|null|\n", "2312.04684": "|**2023-12-07**|**Latent Skill Discovery for Chain-of-Thought Reasoning**|Zifan Xu et.al.|[2312.04684v1](http://arxiv.org/abs/2312.04684v1)|null|\n", "2312.06315": "|**2023-12-11**|**GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models**|Jiaxu Zhao et.al.|[2312.06315v1](http://arxiv.org/abs/2312.06315v1)|null|\n", "2312.06147": "|**2023-12-11**|**\"What's important here?\": Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces**|Faria Huq et.al.|[2312.06147v1](http://arxiv.org/abs/2312.06147v1)|null|\n", "2312.05834": "|**2023-12-10**|**Evidence-based Interpretable Open-domain Fact-checking with Large Language Models**|Xin Tan et.al.|[2312.05834v1](http://arxiv.org/abs/2312.05834v1)|null|\n", "2312.05821": "|**2023-12-10**|**ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models**|Zhihang Yuan et.al.|[2312.05821v1](http://arxiv.org/abs/2312.05821v1)|**[link](https://github.com/hahnyuan/asvd4llm)**|\n", "2312.05696": "|**2023-12-09**|**GPT-4 and Safety Case Generation: An Exploratory Analysis**|Mithila Sivakumar et.al.|[2312.05696v1](http://arxiv.org/abs/2312.05696v1)|null|\n", "2312.05571": "|**2023-12-19**|**Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning**|Subhabrata Dutta et.al.|[2312.05571v2](http://arxiv.org/abs/2312.05571v2)|**[link](https://github.com/joykirat18/syrelm)**|\n", "2312.05562": "|**2023-12-09**|**Chain-of-Thought in Neural Code Generation: From and For Lightweight Language Models**|Guang Yang et.al.|[2312.05562v1](http://arxiv.org/abs/2312.05562v1)|**[link](https://github.com/ntdxyg/cotton)**|\n", "2312.05497": "|**2023-12-14**|**History Matters: Temporal Knowledge Editing in Large Language Model**|Xunjian Yin et.al.|[2312.05497v3](http://arxiv.org/abs/2312.05497v3)|**[link](https://github.com/arvid-pku/atoke)**|\n", "2312.05464": "|**2023-12-09**|**Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation**|Atoosa Chegini et.al.|[2312.05464v1](http://arxiv.org/abs/2312.05464v1)|null|\n", "2312.05434": "|**2023-12-09**|**Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models**|Hongzhan Lin et.al.|[2312.05434v1](http://arxiv.org/abs/2312.05434v1)|**[link](https://github.com/hkbunlp/mr.harm-emnlp2023)**|\n", "2312.05356": "|**2023-12-08**|**Neuron Patching: Neuron-level Model Editing on Code Generation and LLMs**|Jian Gu et.al.|[2312.05356v1](http://arxiv.org/abs/2312.05356v1)|null|\n", "2312.05291": "|**2023-12-08**|**GlitchBench: Can large multimodal models detect video game glitches?**|Mohammad Reza Taesiri et.al.|[2312.05291v1](http://arxiv.org/abs/2312.05291v1)|null|\n", "2312.05275": "|**2023-12-08**|**Exploring the Limits of ChatGPT in Software Security Applications**|Fangzhou Wu et.al.|[2312.05275v1](http://arxiv.org/abs/2312.05275v1)|null|\n", "2312.07533": "|**2023-12-14**|**VILA: On Pre-training for Visual Language Models**|Ji Lin et.al.|[2312.07533v2](http://arxiv.org/abs/2312.07533v2)|null|\n", "2312.07488": "|**2023-12-21**|**LMDrive: Closed-Loop End-to-End Driving with Large Language Models**|Hao Shao et.al.|[2312.07488v2](http://arxiv.org/abs/2312.07488v2)|**[link](https://github.com/opendilab/lmdrive)**|\n", "2312.07399": "|**2023-12-12**|**Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales**|Taeyoon Kwon et.al.|[2312.07399v1](http://arxiv.org/abs/2312.07399v1)|null|\n", "2312.07368": "|**2023-12-12**|**Sequential Planning in Large Partially Observable Environments guided by LLMs**|Swarna Kamal Paul et.al.|[2312.07368v1](http://arxiv.org/abs/2312.07368v1)|**[link](https://github.com/swarna-kpaul/neoplanner)**|\n", "2312.07110": "|**2023-12-12**|**LLMs Perform Poorly at Concept Extraction in Cyber-security Research Literature**|Maxime W\u00fcrsch et.al.|[2312.07110v1](http://arxiv.org/abs/2312.07110v1)|null|\n", "2312.07062": "|**2023-12-14**|**ThinkBot: Embodied Instruction Following with Thought Chain Reasoning**|Guanxing Lu et.al.|[2312.07062v2](http://arxiv.org/abs/2312.07062v2)|null|\n", "2312.06974": "|**2023-12-12**|**SM70: A Large Language Model for Medical Devices**|Anubhav Bhatti et.al.|[2312.06974v1](http://arxiv.org/abs/2312.06974v1)|null|\n", "2312.06876": "|**2023-12-11**|**Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks**|Lingfeng Sun et.al.|[2312.06876v1](http://arxiv.org/abs/2312.06876v1)|null|\n", "2312.06867": "|**2023-12-11**|**Get an A in Math: Progressive Rectification Prompting**|Zhenyu Wu et.al.|[2312.06867v1](http://arxiv.org/abs/2312.06867v1)|**[link](https://github.com/wzy6642/PRP)**|\n", "2312.06820": "|**2023-12-11**|**Extracting Self-Consistent Causal Insights from Users Feedback with LLMs and In-context Learning**|Sara Abdali et.al.|[2312.06820v1](http://arxiv.org/abs/2312.06820v1)|null|\n", "2312.06739": "|**2023-12-11**|**SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models**|Yuzhou Huang et.al.|[2312.06739v1](http://arxiv.org/abs/2312.06739v1)|**[link](https://github.com/TencentARC/SmartEdit)**|\n", "2312.06722": "|**2023-12-11**|**EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models**|Yi Chen et.al.|[2312.06722v1](http://arxiv.org/abs/2312.06722v1)|**[link](https://github.com/chenyi99/egoplan)**|\n", "2312.06720": "|**2023-12-13**|**Audio-Visual LLM for Video Understanding**|Fangxun Shu et.al.|[2312.06720v2](http://arxiv.org/abs/2312.06720v2)|null|\n", "2312.06677": "|**2023-12-04**|**Intelligent Virtual Assistants with LLM-based Process Automation**|Yanchu Guan et.al.|[2312.06677v1](http://arxiv.org/abs/2312.06677v1)|null|\n", "2312.08274": "|**2023-12-15**|**High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models**|Songchi Zhou et.al.|[2312.08274v3](http://arxiv.org/abs/2312.08274v3)|null|\n", "2312.07886": "|**2023-12-13**|**Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI**|Kai Huang et.al.|[2312.07886v1](http://arxiv.org/abs/2312.07886v1)|**[link](https://github.com/pittisl/mpnp-llm)**|\n", "2312.07850": "|**2023-12-13**|**Large Language Model Enhanced Multi-Agent Systems for 6G Communications**|Feibo Jiang et.al.|[2312.07850v1](http://arxiv.org/abs/2312.07850v1)|null|\n", "2312.07843": "|**2023-12-13**|**Foundation Models in Robotics: Applications, Challenges, and the Future**|Roya Firoozi et.al.|[2312.07843v1](http://arxiv.org/abs/2312.07843v1)|**[link](https://github.com/robotics-survey/awesome-robotics-foundation-models)**|\n", "2312.07819": "|**2023-12-13**|**Native Language Identification with Large Language Models**|Wei Zhang et.al.|[2312.07819v1](http://arxiv.org/abs/2312.07819v1)|null|\n", "2312.07763": "|**2023-12-12**|**Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization**|Min Zhang et.al.|[2312.07763v1](http://arxiv.org/abs/2312.07763v1)|null|\n", "2312.07552": "|**2023-12-07**|**Large Language Models for Intent-Driven Session Recommendations**|Zhu Sun et.al.|[2312.07552v1](http://arxiv.org/abs/2312.07552v1)|**[link](https://github.com/llm4sr/po4isr)**|\n", "2312.10748": "|**2023-12-17**|**Multi-Label Classification of COVID-Tweets Using Large Language Models**|Aniket Deroy et.al.|[2312.10748v1](http://arxiv.org/abs/2312.10748v1)|**[link](https://github.com/anonmous1981/aisome)**|\n", "2312.10730": "|**2023-12-17**|**Mixed Distillation Helps Smaller Language Model Better Reasoning**|Li Chenglin et.al.|[2312.10730v1](http://arxiv.org/abs/2312.10730v1)|null|\n", "2312.10626": "|**2023-12-17**|**Decoding Concerns: Multi-label Classification of Vaccine Sentiments in Social Media**|Somsubhra De et.al.|[2312.10626v1](http://arxiv.org/abs/2312.10626v1)|**[link](https://github.com/somsubhra04/aisome_2023)**|\n", "2312.10372": "|**2023-12-16**|**When Graph Data Meets Multimodal: A New Paradigm for Graph Understanding and Reasoning**|Qihang Ai et.al.|[2312.10372v1](http://arxiv.org/abs/2312.10372v1)|null|\n", "2312.10321": "|**2023-12-16**|**LLM-SQL-Solver: Can LLMs Determine SQL Equivalence?**|Fuheng Zhao et.al.|[2312.10321v1](http://arxiv.org/abs/2312.10321v1)|null|\n", "2312.10003": "|**2023-12-15**|**ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent**|Renat Aksitov et.al.|[2312.10003v1](http://arxiv.org/abs/2312.10003v1)|null|\n", "2312.09947": "|**2023-12-15**|**Prompting Datasets: Data Discovery with Conversational Agents**|Johanna Walker et.al.|[2312.09947v1](http://arxiv.org/abs/2312.09947v1)|null|\n", "2312.09818": "|**2023-12-15**|**SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models**|Lee Hyun et.al.|[2312.09818v1](http://arxiv.org/abs/2312.09818v1)|**[link](https://github.com/smile-data/smile)**|\n", "2312.09785": "|**2023-12-28**|**RJUA-QA: A Comprehensive QA Dataset for Urology**|Shiwei Lyu et.al.|[2312.09785v2](http://arxiv.org/abs/2312.09785v2)|**[link](https://github.com/alipay/rju_ant_qa)**|\n", "2312.09542": "|**2023-12-15**|**Marathon: A Race Through the Realm of Long Context with Large Language Models**|Lei Zhang et.al.|[2312.09542v1](http://arxiv.org/abs/2312.09542v1)|null|\n", "2312.09397": "|**2023-12-14**|**Large Language Models for Autonomous Driving: Real-World Experiments**|Can Cui et.al.|[2312.09397v1](http://arxiv.org/abs/2312.09397v1)|null|\n", "2312.09237": "|**2023-12-14**|**Pixel Aligned Language Models**|Jiarui Xu et.al.|[2312.09237v1](http://arxiv.org/abs/2312.09237v1)|null|\n", "2312.09075": "|**2023-12-14**|**Towards Verifiable Text Generation with Evolving Memory and Self-Reflection**|Hao Sun et.al.|[2312.09075v1](http://arxiv.org/abs/2312.09075v1)|null|\n", "2312.09039": "|**2023-12-14**|**TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning**|Yuan Sui et.al.|[2312.09039v1](http://arxiv.org/abs/2312.09039v1)|null|\n", "2312.08962": "|**2023-12-14**|**Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models**|Zhiyuan You et.al.|[2312.08962v1](http://arxiv.org/abs/2312.08962v1)|null|\n", "2312.08935": "|**2023-12-14**|**Math-Shepherd: A Label-Free Step-by-Step Verifier for LLMs in Mathematical Reasoning**|Peiyi Wang et.al.|[2312.08935v1](http://arxiv.org/abs/2312.08935v1)|null|\n", "2312.08926": "|**2023-12-17**|**Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent**|Haoran Liao et.al.|[2312.08926v2](http://arxiv.org/abs/2312.08926v2)|**[link](https://github.com/oashua/mathagent)**|\n", "2312.08901": "|**2023-12-26**|**Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning**|Xijie Huang et.al.|[2312.08901v2](http://arxiv.org/abs/2312.08901v2)|null|\n", "2312.08837": "|**2023-12-14**|**Learning Safety Constraints From Demonstration Using One-Class Decision Trees**|Mattijs Baert et.al.|[2312.08837v1](http://arxiv.org/abs/2312.08837v1)|null|\n", "2312.10059": "|**2023-12-04**|**A collection of principles for guiding and evaluating large language models**|Konstantin Hebenstreit et.al.|[2312.10059v1](http://arxiv.org/abs/2312.10059v1)|null|\n", "2312.12436": "|**2023-12-20**|**A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise**|Chaoyou Fu et.al.|[2312.12436v2](http://arxiv.org/abs/2312.12436v2)|**[link](https://github.com/bradyfu/awesome-multimodal-large-language-models)**|\n", "2312.12423": "|**2023-12-19**|**Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model**|Shraman Pramanick et.al.|[2312.12423v1](http://arxiv.org/abs/2312.12423v1)|null|\n", "2312.12241": "|**2023-12-19**|**GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning**|Mehran Kazemi et.al.|[2312.12241v1](http://arxiv.org/abs/2312.12241v1)|null|\n", "2312.12009": "|**2023-12-19**|**Active Preference Inference using Language Models and Probabilistic Reasoning**|Top Piriyakulkij et.al.|[2312.12009v1](http://arxiv.org/abs/2312.12009v1)|null|\n", "2312.11865": "|**2023-12-19**|**Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach**|Weiyu Ma et.al.|[2312.11865v1](http://arxiv.org/abs/2312.11865v1)|**[link](https://github.com/histmeisah/large-language-models-play-starcraftii)**|\n", "2312.11370": "|**2023-12-18**|**G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model**|Jiahui Gao et.al.|[2312.11370v1](http://arxiv.org/abs/2312.11370v1)|**[link](https://github.com/pipilurj/g-llava)**|\n", "2312.11364": "|**2023-12-18**|**Counting Reward Automata: Sample Efficient Reinforcement Learning Through the Exploitation of Reward Function Structure**|Tristan Bester et.al.|[2312.11364v1](http://arxiv.org/abs/2312.11364v1)|null|\n", "2312.11336": "|**2023-12-18**|**DRDT: Dynamic Reflection with Divergent Thinking for LLM-based Sequential Recommendation**|Yu Wang et.al.|[2312.11336v1](http://arxiv.org/abs/2312.11336v1)|null|\n", "2312.11282": "|**2023-12-18**|**LLM-ARK: Knowledge Graph Reasoning Using Large Language Models via Deep Reinforcement Learning**|Yuxuan Huang et.al.|[2312.11282v1](http://arxiv.org/abs/2312.11282v1)|**[link](https://github.com/Aipura/LLM-ARK)**|\n", "2312.11111": "|**2023-12-19**|**The Good, The Bad, and Why: Unveiling Emotions in Generative AI**|Cheng Li et.al.|[2312.11111v2](http://arxiv.org/abs/2312.11111v2)|null|\n", "2312.10908": "|**2023-12-18**|**CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update**|Zhi Gao et.al.|[2312.10908v1](http://arxiv.org/abs/2312.10908v1)|null|\n", "2312.10904": "|**2023-12-18**|**Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI)**|Sabrina Toro et.al.|[2312.10904v1](http://arxiv.org/abs/2312.10904v1)|null|\n", "2312.09979": "|**2023-12-18**|**LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment**|Shihan Dou et.al.|[2312.09979v2](http://arxiv.org/abs/2312.09979v2)|null|\n", "2312.11524": "|**2023-12-13**|**Assessing GPT4-V on Structured Reasoning Tasks**|Mukul Singh et.al.|[2312.11524v1](http://arxiv.org/abs/2312.11524v1)|null|\n", "2312.11521": "|**2023-12-13**|**Large Language Models are Complex Table Parsers**|Bowen Zhao et.al.|[2312.11521v1](http://arxiv.org/abs/2312.11521v1)|null|\n", "2312.11518": "|**2023-12-23**|**User Modeling in the Era of Large Language Models: Current Research and Future Directions**|Zhaoxuan Tan et.al.|[2312.11518v2](http://arxiv.org/abs/2312.11518v2)|**[link](https://github.com/tamsiuhin/llm-um-reading)**|\n", "2312.13264": "|**2023-12-20**|**dIR -- Discrete Information Retrieval: Conversational Search over Unstructured (and Structured) Data with Large Language Models**|Pablo M. Rodriguez Bertorello et.al.|[2312.13264v1](http://arxiv.org/abs/2312.13264v1)|null|\n", "2312.13126": "|**2023-12-20**|**Generative agents in the streets: Exploring the use of Large Language Models (LLMs) in collecting urban perceptions**|Deepank Verma et.al.|[2312.13126v1](http://arxiv.org/abs/2312.13126v1)|null|\n", "2312.13108": "|**2024-01-01**|**ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation**|Difei Gao et.al.|[2312.13108v2](http://arxiv.org/abs/2312.13108v2)|null|\n", "2312.12853": "|**2023-12-20**|**CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models**|Dan Shi et.al.|[2312.12853v1](http://arxiv.org/abs/2312.12853v1)|null|\n", "2312.12832": "|**2023-12-20**|**Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data**|Yiwei Li et.al.|[2312.12832v1](http://arxiv.org/abs/2312.12832v1)|**[link](https://github.com/Yiwei98/TDG)**|\n", "2312.12806": "|**2023-12-20**|**MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models**|Yan Cai et.al.|[2312.12806v1](http://arxiv.org/abs/2312.12806v1)|null|\n", "2312.12575": "|**2023-12-19**|**Can Large Language Models Identify And Reason About Security Vulnerabilities? Not Yet**|Saad Ullah et.al.|[2312.12575v1](http://arxiv.org/abs/2312.12575v1)|null|\n", "2312.14074": "|**2023-12-21**|**LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding**|Senqiao Yang et.al.|[2312.14074v1](http://arxiv.org/abs/2312.14074v1)|null|\n", "2312.14033": "|**2024-01-04**|**T-Eval: Evaluating the Tool Utilization Capability Step by Step**|Zehui Chen et.al.|[2312.14033v2](http://arxiv.org/abs/2312.14033v2)|**[link](https://github.com/open-compass/t-eval)**|\n", "2312.13881": "|**2023-12-21**|**Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs**|Juraj Vladika et.al.|[2312.13881v1](http://arxiv.org/abs/2312.13881v1)|null|\n", "2312.13876": "|**2023-12-21**|**Capture the Flag: Uncovering Data Insights with Large Language Models**|Issam Laradji et.al.|[2312.13876v1](http://arxiv.org/abs/2312.13876v1)|null|\n", "2312.13558": "|**2023-12-21**|**The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction**|Pratyusha Sharma et.al.|[2312.13558v1](http://arxiv.org/abs/2312.13558v1)|**[link](https://github.com/pratyushasharma/laser)**|\n", "2312.13557": "|**2023-12-21**|**Empowering Few-Shot Recommender Systems with Large Language Models -- Enhanced Representations**|Zhoumeng Wang et.al.|[2312.13557v1](http://arxiv.org/abs/2312.13557v1)|**[link](https://github.com/JNY-Wang/ChatGPT-processed-representations)**|\n", "2312.14890": "|**2024-01-12**|**NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes**|Lizhou Fan et.al.|[2312.14890v3](http://arxiv.org/abs/2312.14890v3)|**[link](https://github.com/casmlab/nphardeval)**|\n", "2312.14878": "|**2023-12-22**|**Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning**|Filippos Christianos et.al.|[2312.14878v1](http://arxiv.org/abs/2312.14878v1)|null|\n", "2312.14870": "|**2023-12-22**|**Numerical Reasoning for Financial Reports**|Abhinav Arun et.al.|[2312.14870v1](http://arxiv.org/abs/2312.14870v1)|**[link](https://github.com/abhi23run/cse8803_dlt_project)**|\n", "2312.14856": "|**2023-12-22**|**Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code**|Shahin Honarvar et.al.|[2312.14856v1](http://arxiv.org/abs/2312.14856v1)|**[link](https://github.com/shahinhonarvar/turbulence-benchmark)**|\n", "2312.14591": "|**2023-12-22**|**Reasons to Reject? Aligning Language Models with Judgments**|Weiwen Xu et.al.|[2312.14591v1](http://arxiv.org/abs/2312.14591v1)|**[link](https://github.com/wwxu21/cut)**|\n", "2312.14345": "|**2023-12-22**|**Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs**|Behnam Rahdari et.al.|[2312.14345v1](http://arxiv.org/abs/2312.14345v1)|null|\n", "2312.14233": "|**2023-12-21**|**VCoder: Versatile Vision Encoders for Multimodal Large Language Models**|Jitesh Jain et.al.|[2312.14233v1](http://arxiv.org/abs/2312.14233v1)|**[link](https://github.com/shi-labs/vcoder)**|\n", "2312.14226": "|**2023-12-21**|**Deep de Finetti: Recovering Topic Distributions from Large Language Models**|Liyi Zhang et.al.|[2312.14226v1](http://arxiv.org/abs/2312.14226v1)|null|\n", "2312.14215": "|**2023-12-21**|**SimLM: Can Language Models Infer Parameters of Physical Systems?**|Sean Memery et.al.|[2312.14215v1](http://arxiv.org/abs/2312.14215v1)|null|\n", "2312.14184": "|**2023-12-19**|**Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning**|Xiaodan Zhang et.al.|[2312.14184v1](http://arxiv.org/abs/2312.14184v1)|null|\n", "2312.16132": "|**2023-12-26**|**RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models**|Tianhao Shen et.al.|[2312.16132v1](http://arxiv.org/abs/2312.16132v1)|**[link](https://github.com/magnetic2014/roleeval)**|\n", "2312.16127": "|**2024-01-03**|**LLM-SAP: Large Language Model Situational Awareness Based Planning**|Liman Wang et.al.|[2312.16127v3](http://arxiv.org/abs/2312.16127v3)|**[link](https://github.com/hanyangzhong/situational_planning_datasets)**|\n", "2312.16044": "|**2023-12-26**|**Large Language Models as Traffic Signal Control Agents: Capacity and Opportunity**|Siqi Lai et.al.|[2312.16044v1](http://arxiv.org/abs/2312.16044v1)|**[link](https://github.com/usail-hkust/llmtscs)**|\n", "2312.15915": "|**2023-12-26**|**ChartBench: A Benchmark for Complex Visual Reasoning in Charts**|Zhengzhuo Xu et.al.|[2312.15915v1](http://arxiv.org/abs/2312.15915v1)|null|\n", "2312.15880": "|**2023-12-26**|**KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph**|Tiezheng Guo et.al.|[2312.15880v1](http://arxiv.org/abs/2312.15880v1)|null|\n", "2312.15316": "|**2023-12-23**|**Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue**|Guan-Ting Lin et.al.|[2312.15316v1](http://arxiv.org/abs/2312.15316v1)|null|\n", "2312.15310": "|**2023-12-23**|**Towards Generalization in Subitizing with Neuro-Symbolic Loss using Holographic Reduced Representations**|Mohammad Mahmudul Alam et.al.|[2312.15310v1](http://arxiv.org/abs/2312.15310v1)|**[link](https://github.com/mahmudulalam/subitizing)**|\n", "2312.15224": "|**2024-01-09**|**LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination**|Jijia Liu et.al.|[2312.15224v2](http://arxiv.org/abs/2312.15224v2)|**[link](https://github.com/HosnLS/Hierarchical-Language-Agent)**|\n", "2312.15198": "|**2023-12-23**|**Do LLM Agents Exhibit Social Behavior?**|Yan Leng et.al.|[2312.15198v1](http://arxiv.org/abs/2312.15198v1)|null|\n", "2312.15194": "|**2023-12-23**|**PokeMQA: Programmable knowledge editing for Multi-hop Question Answering**|Hengrui Gu et.al.|[2312.15194v1](http://arxiv.org/abs/2312.15194v1)|**[link](https://github.com/hengrui-gu/pokemqa)**|\n", "2312.15099": "|**2023-12-22**|**Moderating New Waves of Online Hate with Chain-of-Thought Reasoning in Large Language Models**|Nishant Vishwamitra et.al.|[2312.15099v1](http://arxiv.org/abs/2312.15099v1)|**[link](https://github.com/cactilab/hateguard)**|\n", "2312.17240": "|**2024-01-03**|**An Improved Baseline for Reasoning Segmentation with Large Language Model**|Senqiao Yang et.al.|[2312.17240v2](http://arxiv.org/abs/2312.17240v2)|null|\n", "2312.17235": "|**2023-12-28**|**A Simple LLM Framework for Long-Range Video Question-Answering**|Ce Zhang et.al.|[2312.17235v1](http://arxiv.org/abs/2312.17235v1)|null|\n", "2312.17122": "|**2023-12-29**|**Large Language Model for Causal Decision Making**|Haitao Jiang et.al.|[2312.17122v2](http://arxiv.org/abs/2312.17122v2)|null|\n", "2312.17080": "|**2023-12-28**|**Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs**|Zhongshen Zeng et.al.|[2312.17080v1](http://arxiv.org/abs/2312.17080v1)|**[link](https://github.com/dvlab-research/diaggsm8k)**|\n", "2312.17055": "|**2023-12-28**|**Improving In-context Learning via Bidirectional Alignment**|Chengwei Qin et.al.|[2312.17055v1](http://arxiv.org/abs/2312.17055v1)|null|\n", "2312.17025": "|**2023-12-29**|**Experiential Co-Learning of Software-Developing Agents**|Chen Qian et.al.|[2312.17025v2](http://arxiv.org/abs/2312.17025v2)|null|\n", "2312.16702": "|**2023-12-27**|**Rethinking Tabular Data Understanding with Large Language Models**|Tianyang Liu et.al.|[2312.16702v1](http://arxiv.org/abs/2312.16702v1)|**[link](https://github.com/Leolty/tablellm)**|\n", "2312.16279": "|**2023-12-26**|**Cloud-Device Collaborative Learning for Multimodal Large Language Models**|Guanqun Wang et.al.|[2312.16279v1](http://arxiv.org/abs/2312.16279v1)|null|\n", "2312.16275": "|**2023-12-26**|**Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models**|Fan Liu et.al.|[2312.16275v1](http://arxiv.org/abs/2312.16275v1)|null|\n", "2312.16262": "|**2023-12-26**|**Dynamic In-Context Learning from Nearest Neighbors for Bundle Generation**|Zhu Sun et.al.|[2312.16262v1](http://arxiv.org/abs/2312.16262v1)|null|\n", "2312.16217": "|**2023-12-24**|**ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation**|Xiaoqi Li et.al.|[2312.16217v1](http://arxiv.org/abs/2312.16217v1)|null|\n", "2312.17016": "|**2023-12-23**|**On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications**|Chenjiao Tan et.al.|[2312.17016v1](http://arxiv.org/abs/2312.17016v1)|null|\n", "2312.17661": "|**2023-12-29**|**Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models**|Yuqing Wang et.al.|[2312.17661v1](http://arxiv.org/abs/2312.17661v1)|**[link](https://github.com/eternityyw/gemini-commonsense-evaluation)**|\n", "2312.17532": "|**2023-12-29**|**Enhancing Quantitative Reasoning Skills of Large Language Models through Dimension Perception**|Yuncheng Huang et.al.|[2312.17532v1](http://arxiv.org/abs/2312.17532v1)|null|\n", "2312.17515": "|**2023-12-29**|**Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game**|Zijing Shi et.al.|[2312.17515v1](http://arxiv.org/abs/2312.17515v1)|null|\n", "2312.17432": "|**2024-01-04**|**Video Understanding with Large Language Models: A Survey**|Yunlong Tang et.al.|[2312.17432v2](http://arxiv.org/abs/2312.17432v2)|**[link](https://github.com/yunlong10/awesome-llms-for-video-understanding)**|\n", "2312.17259": "|**2023-12-22**|**Empowering Working Memory for Large Language Model Agents**|Jing Guo et.al.|[2312.17259v1](http://arxiv.org/abs/2312.17259v1)|null|\n", "2312.10997": "|**2024-01-05**|**Retrieval-Augmented Generation for Large Language Models: A Survey**|Yunfan Gao et.al.|[2312.10997v4](http://arxiv.org/abs/2312.10997v4)|**[link](https://github.com/tongji-kgllm/rag-survey)**|\n", "2401.02415": "|**2024-01-04**|**LLaMA Pro: Progressive LLaMA with Block Expansion**|Chengyue Wu et.al.|[2401.02415v1](http://arxiv.org/abs/2401.02415v1)|**[link](https://github.com/tencentarc/llama-pro)**|\n", "2401.02132": "|**2024-01-04**|**DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models**|Wendi Cui et.al.|[2401.02132v1](http://arxiv.org/abs/2401.02132v1)|**[link](https://github.com/intuit-ai-research/dcr-consistency)**|\n", "2401.02072": "|**2024-01-04**|**ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers**|Chen Zheng et.al.|[2401.02072v1](http://arxiv.org/abs/2401.02072v1)|null|\n", "2401.02009": "|**2024-01-04**|**Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives**|Wenqi Zhang et.al.|[2401.02009v1](http://arxiv.org/abs/2401.02009v1)|null|\n", "2401.01974": "|**2024-01-03**|**Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers**|Aleksandar Stani\u0107 et.al.|[2401.01974v1](http://arxiv.org/abs/2401.01974v1)|null|\n", "2401.01735": "|**2024-01-03**|**Economics Arena for Large Language Models**|Shangmin Guo et.al.|[2401.01735v1](http://arxiv.org/abs/2401.01735v1)|null|\n", "2401.01312": "|**2024-01-02**|**LLM Harmony: Multi-Agent Communication for Problem Solving**|Sumedh Rasal et.al.|[2401.01312v1](http://arxiv.org/abs/2401.01312v1)|**[link](https://github.com/sumedhrasal/simulation)**|\n", "2401.00812": "|**2024-01-08**|**If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents**|Ke Yang et.al.|[2401.00812v2](http://arxiv.org/abs/2401.00812v2)|null|\n", "2401.00757": "|**2024-01-01**|**A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models**|Yuxuan Wan et.al.|[2401.00757v1](http://arxiv.org/abs/2401.00757v1)|null|\n", "2401.00908": "|**2023-12-31**|**DocLLM: A layout-aware generative language model for multimodal document understanding**|Dongsheng Wang et.al.|[2401.00908v1](http://arxiv.org/abs/2401.00908v1)|null|\n", "2401.00907": "|**2023-12-31**|**LaFFi: Leveraging Hybrid Natural Language Feedback for Fine-tuning Language Models**|Qianxi Li et.al.|[2401.00907v1](http://arxiv.org/abs/2401.00907v1)|null|\n", "2401.00448": "|**2023-12-31**|**Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws**|Nikhil Sardana et.al.|[2401.00448v1](http://arxiv.org/abs/2401.00448v1)|null|\n", "2401.00426": "|**2023-12-31**|**keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM**|Chaojie Wang et.al.|[2401.00426v1](http://arxiv.org/abs/2401.00426v1)|null|\n", "2401.00290": "|**2023-12-30**|**Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks**|Aleksander Buszydlik et.al.|[2401.00290v1](http://arxiv.org/abs/2401.00290v1)|**[link](https://github.com/redteamingforllms/redteamingforllms)**|\n", "2401.00139": "|**2023-12-30**|**Is Knowledge All Large Language Models Needed for Causal Reasoning?**|Hengrui Cai et.al.|[2401.00139v1](http://arxiv.org/abs/2401.00139v1)|**[link](https://github.com/ncsulsj/causal_llm)**|\n", "2401.00125": "|**2023-12-30**|**LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning**|S P Sharan et.al.|[2401.00125v1](http://arxiv.org/abs/2401.00125v1)|null|\n", "2401.02954": "|**2024-01-05**|**DeepSeek LLM: Scaling Open-Source Language Models with Longtermism**|DeepSeek-AI et.al.|[2401.02954v1](http://arxiv.org/abs/2401.02954v1)|**[link](https://github.com/deepseek-ai/deepseek-llm)**|\n", "2401.02777": "|**2024-01-05**|**From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models**|Na Liu et.al.|[2401.02777v1](http://arxiv.org/abs/2401.02777v1)|null|\n", "2401.02695": "|**2024-01-05**|**VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model**|Pengying Wu et.al.|[2401.02695v1](http://arxiv.org/abs/2401.02695v1)|null|\n", "2401.02675": "|**2024-01-05**|**LMaaS: Exploring Pricing Strategy of Large Model as a Service for Communication**|Panlong Wu et.al.|[2401.02675v1](http://arxiv.org/abs/2401.02675v1)|null|\n", "2401.02575": "|**2024-01-04**|**Large Language Models for Social Networks: Applications, Challenges, and Solutions**|Jingying Zeng et.al.|[2401.02575v1](http://arxiv.org/abs/2401.02575v1)|null|\n", "2401.03991": "|**2024-01-08**|**Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark**|Fangjun Li et.al.|[2401.03991v1](http://arxiv.org/abs/2401.03991v1)|**[link](https://github.com/Fangjun-Li/SpatialLM-StepGame)**|\n", "2401.03804": "|**2024-01-08**|**TeleChat Technical Report**|Zihan Wang et.al.|[2401.03804v1](http://arxiv.org/abs/2401.03804v1)|null|\n", "2401.03737": "|**2024-01-08**|**Can Large Language Models Beat Wall Street? Unveiling the Potential of AI in Stock Selection**|Georgios Fatouros et.al.|[2401.03737v1](http://arxiv.org/abs/2401.03737v1)|null|\n", "2401.03653": "|**2024-01-10**|**An exploratory study on automatic identification of assumptions in the development of deep learning frameworks**|Chen Yang et.al.|[2401.03653v2](http://arxiv.org/abs/2401.03653v2)|null|\n", "2401.03630": "|**2024-01-08**|**Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet**|Weizhe Chen et.al.|[2401.03630v1](http://arxiv.org/abs/2401.03630v1)|null|\n", "2401.03411": "|**2024-01-07**|**GRAM: Global Reasoning for Multi-Page VQA**|Tsachi Blau et.al.|[2401.03411v1](http://arxiv.org/abs/2401.03411v1)|null|\n", "2401.03408": "|**2024-01-07**|**Escalation Risks from Language Models in Military and Diplomatic Decision-Making**|Juan-Pablo Rivera et.al.|[2401.03408v1](http://arxiv.org/abs/2401.03408v1)|**[link](https://github.com/jprivera44/EscalAItion)**|\n", "2401.03401": "|**2024-01-07**|**Empirical Study of Large Language Models as Automated Essay Scoring Tools in English Composition__Taking TOEFL Independent Writing Task for Example**|Wei Xia et.al.|[2401.03401v1](http://arxiv.org/abs/2401.03401v1)|null|\n", "2401.03374": "|**2024-01-07**|**LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward**|Nafis Tanveer Islam et.al.|[2401.03374v1](http://arxiv.org/abs/2401.03374v1)|null|\n", "2401.03346": "|**2024-01-07**|**An Investigation of Large Language Models for Real-World Hate Speech Detection**|Keyan Guo et.al.|[2401.03346v1](http://arxiv.org/abs/2401.03346v1)|null|\n", "2401.03183": "|**2024-01-06**|**\u03b4-CAUSAL: Exploring Defeasibility in Causal Reasoning**|Shaobo Cui et.al.|[2401.03183v1](http://arxiv.org/abs/2401.03183v1)|null|\n", "2401.03158": "|**2024-01-06**|**Quartet Logic: A Four-Step Reasoning (QLFR) framework for advancing Short Text Classification**|Hui Wu et.al.|[2401.03158v1](http://arxiv.org/abs/2401.03158v1)|null|\n", "2401.02985": "|**2024-01-02**|**Evaluating Large Language Models on the GMAT: Implications for the Future of Business Education**|Vahid Ashrafimoghari et.al.|[2401.02985v1](http://arxiv.org/abs/2401.02985v1)|null|\n", "2401.02982": "|**2024-01-01**|**BIBench: Benchmarking Data Analysis Knowledge of Large Language Models**|Shu Liu et.al.|[2401.02982v1](http://arxiv.org/abs/2401.02982v1)|**[link](https://github.com/cubenlp/BIBench)**|\n", "2401.04518": "|**2024-01-09**|**The Critique of Critique**|Shichao Sun et.al.|[2401.04518v1](http://arxiv.org/abs/2401.04518v1)|**[link](https://github.com/gair-nlp/metacritique)**|\n", "2401.04398": "|**2024-01-19**|**Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding**|Zilong Wang et.al.|[2401.04398v2](http://arxiv.org/abs/2401.04398v2)|null|\n", "2401.04334": "|**2024-01-09**|**Large Language Models for Robotics: Opportunities, Challenges, and Perspectives**|Jiaqi Wang et.al.|[2401.04334v1](http://arxiv.org/abs/2401.04334v1)|null|\n", "2401.04319": "|**2024-01-09**|**Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs**|Junjie Wang et.al.|[2401.04319v1](http://arxiv.org/abs/2401.04319v1)|null|\n", "2401.04218": "|**2024-01-08**|**Distortions in Judged Spatial Relations in Large Language Models: The Dawn of Natural Language Geographic Data?**|Nir Fulman et.al.|[2401.04218v1](http://arxiv.org/abs/2401.04218v1)|null|\n", "2401.04157": "|**2024-01-08**|**RePLan: Robotic Replanning with Perception and Language Models**|Marta Skreta et.al.|[2401.04157v1](http://arxiv.org/abs/2401.04157v1)|null|\n", "2401.05190": "|**2024-01-10**|**Divide and Conquer for Large Language Models Reasoning**|Zijie Meng et.al.|[2401.05190v1](http://arxiv.org/abs/2401.05190v1)|**[link](https://github.com/aimijie/divide-and-conquer)**|\n", "2401.04925": "|**2024-01-20**|**The Impact of Reasoning Step Length on Large Language Models**|Mingyu Jin et.al.|[2401.04925v3](http://arxiv.org/abs/2401.04925v3)|**[link](https://github.com/jmyissb/The-Impact-of-Reasoning-Step-Length-on-Large-Language-Models)**|\n", "2401.06102": "|**2024-01-12**|**Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models**|Asma Ghandeharioun et.al.|[2401.06102v2](http://arxiv.org/abs/2401.06102v2)|null|\n", "2401.06088": "|**2024-01-11**|**Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models**|K M Sajjadul Islam et.al.|[2401.06088v1](http://arxiv.org/abs/2401.06088v1)|null|\n", "2401.06081": "|**2024-01-11**|**Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint**|Zhipeng Chen et.al.|[2401.06081v1](http://arxiv.org/abs/2401.06081v1)|**[link](https://github.com/rucaibox/rlmec)**|\n", "2401.05799": "|**2024-01-11**|**Designing Heterogeneous LLM Agents for Financial Sentiment Analysis**|Frank Xing et.al.|[2401.05799v1](http://arxiv.org/abs/2401.05799v1)|null|\n", "2401.05702": "|**2024-01-11**|**Video Anomaly Detection and Explanation via Large Language Models**|Hui Lv et.al.|[2401.05702v1](http://arxiv.org/abs/2401.05702v1)|null|\n", "2401.05654": "|**2024-01-11**|**Towards Conversational Diagnostic AI**|Tao Tu et.al.|[2401.05654v1](http://arxiv.org/abs/2401.05654v1)|null|\n", "2401.05618": "|**2024-01-11**|**The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models**|Matthew Renze et.al.|[2401.05618v1](http://arxiv.org/abs/2401.05618v1)|**[link](https://github.com/matthewrenze/jhu-concise-cot)**|\n", "2401.05605": "|**2024-01-11**|**Scaling Laws for Forgetting When Fine-Tuning Large Language Models**|Damjan Kalajdzievski et.al.|[2401.05605v1](http://arxiv.org/abs/2401.05605v1)|null|\n", "2401.05604": "|**2024-01-11**|**REBUS: A Robust Evaluation Benchmark of Understanding Symbols**|Andrew Gritsevskiy et.al.|[2401.05604v1](http://arxiv.org/abs/2401.05604v1)|**[link](https://github.com/cvndsh/rebus)**|\n", "2401.05566": "|**2024-01-17**|**Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training**|Evan Hubinger et.al.|[2401.05566v3](http://arxiv.org/abs/2401.05566v3)|**[link](https://github.com/anthropics/sleeper-agents-paper)**|\n", "2401.05459": "|**2024-01-10**|**Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security**|Yuanchun Li et.al.|[2401.05459v1](http://arxiv.org/abs/2401.05459v1)|**[link](https://github.com/mobilellm/personal_llm_agents_survey)**|\n", "2401.06603": "|**2024-01-12**|**Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study**|Shangding Gu et.al.|[2401.06603v1](http://arxiv.org/abs/2401.06603v1)|null|\n", "2401.06400": "|**2024-01-16**|**Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model**|Taehee Kim et.al.|[2401.06400v2](http://arxiv.org/abs/2401.06400v2)|null|\n", "2401.06311": "|**2024-01-12**|**MuGI: Enhancing Information Retrieval through Multi-Text Generation Intergration with Large Language Models**|Le Zhang et.al.|[2401.06311v1](http://arxiv.org/abs/2401.06311v1)|**[link](https://github.com/lezhang7/retrieval_mugi)**|\n", "2401.06209": "|**2024-01-11**|**Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs**|Shengbang Tong et.al.|[2401.06209v1](http://arxiv.org/abs/2401.06209v1)|**[link](https://github.com/tsb0601/MMVP)**|\n", "2401.08392": "|**2024-01-16**|**DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models**|Zongxin Yang et.al.|[2401.08392v1](http://arxiv.org/abs/2401.08392v1)|**[link](https://github.com/z-x-yang/doraemongpt)**|\n", "2401.08273": "|**2024-01-16**|**Large Language Models are Null-Shot Learners**|Pittawat Taveekitworachai et.al.|[2401.08273v1](http://arxiv.org/abs/2401.08273v1)|null|\n", "2401.08217": "|**2024-01-16**|**LLM-Guided Multi-View Hypergraph Learning for Human-Centric Explainable Recommendation**|Zhixuan Chu et.al.|[2401.08217v1](http://arxiv.org/abs/2401.08217v1)|null|\n", "2401.08190": "|**2024-01-16**|**MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline**|Minpeng Liao et.al.|[2401.08190v1](http://arxiv.org/abs/2401.08190v1)|**[link](https://github.com/mario-math-reasoning/mario)**|\n", "2401.08156": "|**2024-01-16**|**GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching**|Cong Guo et.al.|[2401.08156v1](http://arxiv.org/abs/2401.08156v1)|**[link](https://github.com/intelligent-machine-learning/glake)**|\n", "2401.08138": "|**2024-01-16**|**LLMs for Test Input Generation for Semantic Caches**|Zafaryab Rasool et.al.|[2401.08138v1](http://arxiv.org/abs/2401.08138v1)|null|\n", "2401.08089": "|**2024-01-16**|**A Study on Training and Developing Large Language Models for Behavior Tree Generation**|Fu Li et.al.|[2401.08089v1](http://arxiv.org/abs/2401.08089v1)|null|\n", "2401.07950": "|**2024-01-15**|**SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning**|Dan Zhang et.al.|[2401.07950v1](http://arxiv.org/abs/2401.07950v1)|**[link](https://github.com/thudm/sciglm)**|\n", "2401.07817": "|**2024-01-15**|**Question Translation Training for Better Multilingual Reasoning**|Wenhao Zhu et.al.|[2401.07817v1](http://arxiv.org/abs/2401.07817v1)|**[link](https://github.com/njunlp/qalign)**|\n", "2401.07810": "|**2024-01-15**|**Consolidating Strategies for Countering Hate Speech Using Persuasive Dialogues**|Sougata Saha et.al.|[2401.07810v1](http://arxiv.org/abs/2401.07810v1)|null|\n", "2401.07534": "|**2024-01-15**|**Exploring the Potential of Large Language Models in Self-adaptive Systems**|Jialong Li et.al.|[2401.07534v1](http://arxiv.org/abs/2401.07534v1)|null|\n", "2401.07529": "|**2024-01-15**|**MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception**|Yuhao Wang et.al.|[2401.07529v1](http://arxiv.org/abs/2401.07529v1)|null|\n", "2401.07367": "|**2024-01-14**|**Active Learning for NLP with Large Language Models**|Xuesong Wang et.al.|[2401.07367v1](http://arxiv.org/abs/2401.07367v1)|null|\n", "2401.07286": "|**2024-01-14**|**CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning**|Weiqi Wang et.al.|[2401.07286v1](http://arxiv.org/abs/2401.07286v1)|null|\n", "2401.07128": "|**2024-01-13**|**EHRAgent: Code Empowers Large Language Models for Complex Tabular Reasoning on Electronic Health Records**|Wenqi Shi et.al.|[2401.07128v1](http://arxiv.org/abs/2401.07128v1)|**[link](https://github.com/wshi83/ehragent)**|\n", "2401.07037": "|**2024-01-13**|**xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning**|Linzheng Chai et.al.|[2401.07037v1](http://arxiv.org/abs/2401.07037v1)|null|\n", "2401.06961": "|**2024-01-13**|**CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities**|Yujun Mao et.al.|[2401.06961v1](http://arxiv.org/abs/2401.06961v1)|null|\n", "2401.06949": "|**2024-01-13**|**ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization**|Kourosh Darvish et.al.|[2401.06949v1](http://arxiv.org/abs/2401.06949v1)|null|\n", "2401.06853": "|**2024-01-12**|**Large Language Models Can Learn Temporal Reasoning**|Siheng Xiong et.al.|[2401.06853v1](http://arxiv.org/abs/2401.06853v1)|null|\n", "2401.06806": "|**2024-01-10**|**AugSumm: towards generalizable speech summarization using synthetic labels from large language model**|Jee-weon Jung et.al.|[2401.06806v1](http://arxiv.org/abs/2401.06806v1)|**[link](https://github.com/jungjee/augsumm)**|\n", "2401.06805": "|**2024-01-18**|**Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning**|Yiqi Wang et.al.|[2401.06805v2](http://arxiv.org/abs/2401.06805v2)|null|\n", "2401.06795": "|**2024-01-08**|**AI and Generative AI for Research Discovery and Summarization**|Mark Glickman et.al.|[2401.06795v1](http://arxiv.org/abs/2401.06795v1)|null|\n", "2401.09395": "|**2024-01-17**|**Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs' Mathematical Competency through Ontology-guided Perturbations**|Pengfei Hong et.al.|[2401.09395v1](http://arxiv.org/abs/2401.09395v1)|null|\n", "2401.09334": "|**2024-01-17**|**Large Language Models Are Neurosymbolic Reasoners**|Meng Fang et.al.|[2401.09334v1](http://arxiv.org/abs/2401.09334v1)|null|\n", "2401.09083": "|**2024-01-17**|**Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models**|Haonan Guo et.al.|[2401.09083v1](http://arxiv.org/abs/2401.09083v1)|**[link](https://github.com/haonanguo/remote-sensing-chatgpt)**|\n", "2401.09051": "|**2024-01-17**|**Canvil: Designerly Adaptation for LLM-Powered User Experiences**|K. J. Kevin Feng et.al.|[2401.09051v1](http://arxiv.org/abs/2401.09051v1)|null|\n", "2401.09042": "|**2024-01-17**|**LLMs for Relational Reasoning: How Far are We?**|Zhiming Li et.al.|[2401.09042v1](http://arxiv.org/abs/2401.09042v1)|null|\n", "2401.09003": "|**2024-01-30**|**Augmenting Math Word Problems via Iterative Question Composing**|Haoxiong Liu et.al.|[2401.09003v3](http://arxiv.org/abs/2401.09003v3)|**[link](https://github.com/iiis-ai/iterativequestioncomposing)**|\n", "2401.08967": "|**2024-01-17**|**ReFT: Reasoning with Reinforced Fine-Tuning**|Trung Quoc Luong et.al.|[2401.08967v1](http://arxiv.org/abs/2401.08967v1)|**[link](https://github.com/lqtrung1998/mwp_reft)**|\n", "2401.08743": "|**2024-01-16**|**MMToM-QA: Multimodal Theory of Mind Question Answering**|Chuanyang Jin et.al.|[2401.08743v1](http://arxiv.org/abs/2401.08743v1)|**[link](https://github.com/chuanyangjin/MMToM-QA)**|\n", "2401.08517": "|**2024-01-24**|**Supporting Student Decisions on Learning Recommendations: An LLM-Based Chatbot with Knowledge Graph Contextualization for Conversational Explainability and Mentoring**|Hasan Abu-Rasheed et.al.|[2401.08517v3](http://arxiv.org/abs/2401.08517v3)|null|\n", "2401.08508": "|**2024-01-16**|**EmoLLMs: A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective Analysis**|Zhiwei Liu et.al.|[2401.08508v1](http://arxiv.org/abs/2401.08508v1)|**[link](https://github.com/lzw108/emollms)**|\n", "2401.08491": "|**2024-01-24**|**Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models**|Tassilo Klein et.al.|[2401.08491v2](http://arxiv.org/abs/2401.08491v2)|null|\n", "2401.10065": "|**2024-01-18**|**Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs**|Haritz Puerto et.al.|[2401.10065v1](http://arxiv.org/abs/2401.10065v1)|**[link](https://github.com/ukplab/arxiv2024-conditional-reasoning-llms)**|\n", "2401.10005": "|**2024-01-18**|**Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation**|Kohei Uehara et.al.|[2401.10005v1](http://arxiv.org/abs/2401.10005v1)|null|\n", "2401.10744": "|**2024-01-19**|**FinLLMs: A Framework for Financial Reasoning Dataset Generation with Large Language Models**|Ziqiang Yuan et.al.|[2401.10744v1](http://arxiv.org/abs/2401.10744v1)|null|\n", "2401.10712": "|**2024-01-19**|**Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge**|Haibi Wang et.al.|[2401.10712v1](http://arxiv.org/abs/2401.10712v1)|null|\n", "2401.10529": "|**2024-01-25**|**Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences**|Xiyao Wang et.al.|[2401.10529v2](http://arxiv.org/abs/2401.10529v2)|null|\n", "2401.10491": "|**2024-01-22**|**Knowledge Fusion of Large Language Models**|Fanqi Wan et.al.|[2401.10491v2](http://arxiv.org/abs/2401.10491v2)|**[link](https://github.com/fanqiwan/fusellm)**|\n", "2401.10471": "|**2024-01-19**|**DeepEdit: Knowledge Editing as Decoding with Constraints**|Yiwei Wang et.al.|[2401.10471v1](http://arxiv.org/abs/2401.10471v1)|**[link](https://github.com/wangywust/deepedit)**|\n", "2401.10446": "|**2024-01-19**|**Large Language Models are Efficient Learners of Noise-Robust Speech Recognition**|Yuchen Hu et.al.|[2401.10446v1](http://arxiv.org/abs/2401.10446v1)|**[link](https://github.com/yuchen005/robustger)**|\n", "2401.10279": "|**2024-01-12**|**A systematic review of geospatial location embedding approaches in large language models: A path to spatial AI systems**|Sean Tucker et.al.|[2401.10279v1](http://arxiv.org/abs/2401.10279v1)|null|\n", "2401.12117": "|**2024-01-22**|**The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models**|Kian Ahrabian et.al.|[2401.12117v1](http://arxiv.org/abs/2401.12117v1)|**[link](https://github.com/kahrabian/mllm-nvar)**|\n", "2401.11864": "|**2024-02-01**|**Distilling Mathematical Reasoning Capabilities into Small Language Models**|Xunyu Zhu et.al.|[2401.11864v4](http://arxiv.org/abs/2401.11864v4)|null|\n", "2401.11725": "|**2024-01-22**|**Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models**|Yile Wang et.al.|[2401.11725v1](http://arxiv.org/abs/2401.11725v1)|**[link](https://github.com/thunlp-mt/symbol2language)**|\n", "2401.11467": "|**2024-01-21**|**Over-Reasoning and Redundant Calculation of Large Language Models**|Cheng-Han Chiang et.al.|[2401.11467v1](http://arxiv.org/abs/2401.11467v1)|**[link](https://github.com/d223302/over-reasoning-of-llms)**|\n", "2401.11323": "|**2024-01-20**|**Analyzing Task-Encoding Tokens in Large Language Models**|Yu Bai et.al.|[2401.11323v1](http://arxiv.org/abs/2401.11323v1)|null|\n", "2401.11185": "|**2024-01-20**|**How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation**|Yoo Yeon Sung et.al.|[2401.11185v1](http://arxiv.org/abs/2401.11185v1)|null|\n", "2401.11061": "|**2024-01-19**|**PhotoBot: Reference-Guided Interactive Photography via Natural Language**|Oliver Limoyo et.al.|[2401.11061v1](http://arxiv.org/abs/2401.11061v1)|null|\n", "2401.11052": "|**2024-01-19**|**Mining experimental data from Materials Science literature with Large Language Models**|Luca Foppiano et.al.|[2401.11052v1](http://arxiv.org/abs/2401.11052v1)|**[link](https://github.com/lfoppiano/matsci-lumen)**|\n", "2401.10995": "|**2024-01-19**|**The Radiation Oncology NLP Database**|Zhengliang Liu et.al.|[2401.10995v1](http://arxiv.org/abs/2401.10995v1)|**[link](https://github.com/zl-liu/radiation-oncology-nlp-database)**|\n", "2401.12975": "|**2024-01-23**|**HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments**|Qinhong Zhou et.al.|[2401.12975v1](http://arxiv.org/abs/2401.12975v1)|**[link](https://github.com/umass-foundation-model/hazard)**|\n", "2401.12963": "|**2024-01-23**|**AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents**|Michael Ahn et.al.|[2401.12963v1](http://arxiv.org/abs/2401.12963v1)|null|\n", "2401.12863": "|**2024-01-23**|**KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning**|Debjyoti Mondal et.al.|[2401.12863v1](http://arxiv.org/abs/2401.12863v1)|null|\n", "2401.12846": "|**2024-01-23**|**How well can large language models explain business processes?**|Dirk Fahland et.al.|[2401.12846v1](http://arxiv.org/abs/2401.12846v1)|null|\n", "2401.12586": "|**2024-01-27**|**C2Ideas: Supporting Creative Interior Color Design Ideation with Large Language Model**|Yihan Hou et.al.|[2401.12586v2](http://arxiv.org/abs/2401.12586v2)|null|\n", "2401.12242": "|**2024-01-20**|**BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models**|Zhen Xiang et.al.|[2401.12242v1](http://arxiv.org/abs/2401.12242v1)|null|\n", "2401.13641": "|**2024-01-24**|**How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and Explainability**|Ivan DeAndres-Tame et.al.|[2401.13641v1](http://arxiv.org/abs/2401.13641v1)|**[link](https://github.com/bidalab/chatgpt_facebiometrics)**|\n", "2401.13601": "|**2024-01-25**|**MM-LLMs: Recent Advances in MultiModal Large Language Models**|Duzhen Zhang et.al.|[2401.13601v2](http://arxiv.org/abs/2401.13601v2)|null|\n", "2401.13545": "|**2024-01-24**|**Fine-grained Contract NER using instruction based model**|Hiranmai Sri Adibhatla et.al.|[2401.13545v1](http://arxiv.org/abs/2401.13545v1)|**[link](https://github.com/pavanbaswani/fincausal_sharedtask-2023)**|\n", "2401.13298": "|**2024-01-24**|**Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models**|Hongzhan Lin et.al.|[2401.13298v1](http://arxiv.org/abs/2401.13298v1)|**[link](https://github.com/hkbunlp/explainhm-www2024)**|\n", "2401.13223": "|**2024-01-24**|**TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data**|Fengbin Zhu et.al.|[2401.13223v1](http://arxiv.org/abs/2401.13223v1)|null|\n", "2401.14295": "|**2024-01-25**|**Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts**|Maciej Besta et.al.|[2401.14295v1](http://arxiv.org/abs/2401.14295v1)|null|\n", "2401.14109": "|**2024-01-25**|**CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks**|Andrei Tomut et.al.|[2401.14109v1](http://arxiv.org/abs/2401.14109v1)|null|\n", "2401.14011": "|**2024-01-26**|**CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning**|Zheqi He et.al.|[2401.14011v2](http://arxiv.org/abs/2401.14011v2)|null|\n", "2401.14003": "|**2024-01-25**|**ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases**|Quyet V. Do et.al.|[2401.14003v1](http://arxiv.org/abs/2401.14003v1)|**[link](https://github.com/hkust-knowcomp/constraintchecker)**|\n", "2401.13870": "|**2024-01-25**|**Integrating Large Language Models into Recommendation via Mutual Augmentation and Adaptive Aggregation**|Sichun Luo et.al.|[2401.13870v1](http://arxiv.org/abs/2401.13870v1)|null|\n", "2401.13849": "|**2024-01-24**|**TPD: Enhancing Student Language Model Reasoning via Principle Discovery and Guidance**|Haorui Wang et.al.|[2401.13849v1](http://arxiv.org/abs/2401.13849v1)|null|\n", "2401.13837": "|**2024-01-24**|**Democratizing Fine-grained Visual Recognition with Large Language Models**|Mingxuan Liu et.al.|[2401.13837v1](http://arxiv.org/abs/2401.13837v1)|null|\n", "2401.15071": "|**2024-01-29**|**From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities**|Chaochao Lu et.al.|[2401.15071v2](http://arxiv.org/abs/2401.15071v2)|null|\n", "2401.15030": "|**2024-01-26**|**On the generalization capacity of neural networks during generic multimodal reasoning**|Takuya Ito et.al.|[2401.15030v1](http://arxiv.org/abs/2401.15030v1)|null|\n", "2401.14818": "|**2024-01-26**|**ChemDFM: Dialogue Foundation Model for Chemistry**|Zihan Zhao et.al.|[2401.14818v1](http://arxiv.org/abs/2401.14818v1)|null|\n", "2401.14640": "|**2024-01-26**|**Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs**|Nan Hu et.al.|[2401.14640v1](http://arxiv.org/abs/2401.14640v1)|null|\n", "2401.14624": "|**2024-01-26**|**Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora**|Zhaoye Fei et.al.|[2401.14624v1](http://arxiv.org/abs/2401.14624v1)|null|\n", "2401.16185": "|**2024-01-29**|**LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning**|Yuqiang Sun et.al.|[2401.16185v1](http://arxiv.org/abs/2401.16185v1)|null|\n", "2401.16024": "|**2024-01-29**|**Probabilistic Abduction for Visual Abstract Reasoning via Learning Rules in Vector-symbolic Architectures**|Michael Hersche et.al.|[2401.16024v1](http://arxiv.org/abs/2401.16024v1)|**[link](https://github.com/ibm/learn-vector-symbolic-architectures-rule-formulations)**|\n", "2401.15940": "|**2024-02-01**|**Knowledge-Aware Code Generation with Large Language Models**|Tao Huang et.al.|[2401.15940v3](http://arxiv.org/abs/2401.15940v3)|**[link](https://github.com/codegeneration3/karecoder)**|\n", "2401.15843": "|**2024-01-29**|**APIGen: Generative API Method Recommendation**|Yujia Chen et.al.|[2401.15843v1](http://arxiv.org/abs/2401.15843v1)|**[link](https://github.com/hitcoderr/apigen)**|\n", "2401.15810": "|**2024-01-29**|**Green Runner: A tool for efficient deep learning component selection**|Jai Kannan et.al.|[2401.15810v1](http://arxiv.org/abs/2401.15810v1)|null|\n", "2401.15688": "|**2024-01-30**|**Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation**|Zhenyu Wang et.al.|[2401.15688v2](http://arxiv.org/abs/2401.15688v2)|null|\n", "2401.15670": "|**2024-01-28**|**YODA: Teacher-Student Progressive Learning for Language Models**|Jianqiao Lu et.al.|[2401.15670v1](http://arxiv.org/abs/2401.15670v1)|null|\n", "2401.15585": "|**2024-01-28**|**Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting**|Masahiro Kaneko et.al.|[2401.15585v1](http://arxiv.org/abs/2401.15585v1)|null|\n", "2401.15391": "|**2024-01-27**|**MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries**|Yixuan Tang et.al.|[2401.15391v1](http://arxiv.org/abs/2401.15391v1)|**[link](https://github.com/yixuantt/MultiHop-RAG)**|\n", "2401.15328": "|**2024-01-30**|**Equipping Language Models with Tool Use Capability for Tabular Data Analysis in Finance**|Adrian Theuma et.al.|[2401.15328v2](http://arxiv.org/abs/2401.15328v2)|null|\n", "2401.15284": "|**2024-01-27**|**Building ethical guidelines for generative AI in scientific research**|Zhicheng Lin et.al.|[2401.15284v1](http://arxiv.org/abs/2401.15284v1)|null|\n", "2401.15269": "|**2024-01-27**|**Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models**|Minbyul Jeong et.al.|[2401.15269v1](http://arxiv.org/abs/2401.15269v1)|**[link](https://github.com/dmis-lab/self-biorag)**|\n", "2401.15174": "|**2024-01-26**|**Large Language Models for Multi-Modal Human-Robot Interaction**|Chao Wang et.al.|[2401.15174v1](http://arxiv.org/abs/2401.15174v1)|null|\n", "2401.15170": "|**2024-01-26**|**Scalable Qualitative Coding with LLMs: Chain-of-Thought Reasoning Matches Human Performance in Some Hermeneutic Tasks**|Zackary Okun Dunivin et.al.|[2401.15170v1](http://arxiv.org/abs/2401.15170v1)|null|\n", "2401.15098": "|**2024-01-25**|**Hi-Core: Hierarchical Knowledge Transfer for Continual Reinforcement Learning**|Chaofan Pan et.al.|[2401.15098v1](http://arxiv.org/abs/2401.15098v1)|null|\n", "2401.17244": "|**2024-01-30**|**LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation**|Yuan Chiang et.al.|[2401.17244v1](http://arxiv.org/abs/2401.17244v1)|null|\n", "2401.17169": "|**2024-01-30**|**Conditional and Modal Reasoning in Large Language Models**|Wesley H. Holliday et.al.|[2401.17169v1](http://arxiv.org/abs/2401.17169v1)|**[link](https://github.com/wesholliday/llm-logic)**|\n", "2401.17163": "|**2024-01-31**|**Learning Agent-based Modeling with LLM Companions: Experiences of Novices and Experts Using ChatGPT & NetLogo Chat**|John Chen et.al.|[2401.17163v2](http://arxiv.org/abs/2401.17163v2)|null|\n", "2401.16822": "|**2024-02-05**|**EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain**|Wei Zhang et.al.|[2401.16822v2](http://arxiv.org/abs/2401.16822v2)|null|\n", "2401.16797": "|**2024-02-01**|**Enhancing Translation Validation of Compiler Transformations with Large Language Models**|Yanzhao Wang et.al.|[2401.16797v2](http://arxiv.org/abs/2401.16797v2)|null|\n", "2401.16713": "|**2024-01-30**|**Prospects for inconsistency detection using large language models and sheaves**|Steve Huntsman et.al.|[2401.16713v1](http://arxiv.org/abs/2401.16713v1)|**[link](https://github.com/stevehuntsman/prospectsforinconsistencydetection)**|\n", "2401.16578": "|**2024-02-02**|**Leveraging Professional Radiologists' Expertise to Enhance LLMs' Evaluation for Radiology Reports**|Qingqing Zhu et.al.|[2401.16578v2](http://arxiv.org/abs/2401.16578v2)|null|\n", "2401.16467": "|**2024-01-29**|**ReGAL: Refactoring Programs to Discover Generalizable Abstractions**|Elias Stengel-Eskin et.al.|[2401.16467v1](http://arxiv.org/abs/2401.16467v1)|**[link](https://github.com/esteng/regal_program_learning)**|\n", "2401.18006": "|**2024-02-03**|**EEG-GPT: Exploring Capabilities of Large Language Models for EEG Classification and Interpretation**|Jonathan W. Kim et.al.|[2401.18006v2](http://arxiv.org/abs/2401.18006v2)|null|\n", "2401.17809": "|**2024-01-31**|**SWEA: Changing Factual Knowledge in Large Language Models via Subject Word Embedding Altering**|Xiaopeng Li et.al.|[2401.17809v1](http://arxiv.org/abs/2401.17809v1)|null|\n", "2401.17749": "|**2024-01-31**|**SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models**|Xiao Shao et.al.|[2401.17749v1](http://arxiv.org/abs/2401.17749v1)|**[link](https://github.com/ramsayxiaoshao/SwarmBrain-Embodied-agent-for-real-time-strategy-game-StarCraft-II-via-large-language-models)**|\n", "2401.17716": "|**2024-01-31**|**Enhancing Large Language Model with Decomposed Reasoning for Emotion Cause Pair Extraction**|Jialiang Wu et.al.|[2401.17716v1](http://arxiv.org/abs/2401.17716v1)|null|\n", "2401.17686": "|**2024-02-04**|**Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning**|Tinghui Zhu et.al.|[2401.17686v2](http://arxiv.org/abs/2401.17686v2)|**[link](https://github.com/osu-nlp-group/deductive-beam-search)**|\n", "2401.17602": "|**2024-01-31**|**Assertion Detection Large Language Model In-context Learning LoRA Fine-tuning**|Yuelyu Ji et.al.|[2401.17602v1](http://arxiv.org/abs/2401.17602v1)|null|\n", "2401.17464": "|**2024-01-30**|**Efficient Tool Use with Chain-of-Abstraction Reasoning**|Silin Gao et.al.|[2401.17464v1](http://arxiv.org/abs/2401.17464v1)|null|\n", "2401.17390": "|**2024-01-30**|**Customizing Language Model Responses with Contrastive In-Context Learning**|Xiang Gao et.al.|[2401.17390v1](http://arxiv.org/abs/2401.17390v1)|null|\n", "2402.00854": "|**2024-02-05**|**SymbolicAI: A framework for logic-based approaches combining generative models and solvers**|Marius-Constantin Dinu et.al.|[2402.00854v2](http://arxiv.org/abs/2402.00854v2)|**[link](https://github.com/extensityai/benchmark)**|\n", "2402.00745": "|**2024-02-01**|**Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement**|Xin Quan et.al.|[2402.00745v1](http://arxiv.org/abs/2402.00745v1)|**[link](https://github.com/neuro-symbolic-ai/explanation_based_ethical_reasoning)**|\n", "2402.00658": "|**2024-02-01**|**Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing**|Fangkai Jiao et.al.|[2402.00658v1](http://arxiv.org/abs/2402.00658v1)|**[link](https://github.com/sparkjiao/rl-trajectory-reasoning)**|\n", "2402.00367": "|**2024-02-01**|**Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration**|Shangbin Feng et.al.|[2402.00367v1](http://arxiv.org/abs/2402.00367v1)|null|\n", "2402.00262": "|**2024-02-01**|**Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective**|Qun Ma et.al.|[2402.00262v1](http://arxiv.org/abs/2402.00262v1)|null|\n", "2402.00157": "|**2024-01-31**|**Large Language Models for Mathematical Reasoning: Progresses and Challenges**|Janice Ahn et.al.|[2402.00157v1](http://arxiv.org/abs/2402.00157v1)|null|\n", "2402.00097": "|**2024-01-31**|**Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM**|Gabriel Ryan et.al.|[2402.00097v1](http://arxiv.org/abs/2402.00097v1)|null|\n", "2402.00070": "|**2024-01-30**|**EvoMerge: Neuroevolution for Large Language Models**|Yushu Jiang et.al.|[2402.00070v1](http://arxiv.org/abs/2402.00070v1)|null|\n", "2402.01622": "|**2024-02-05**|**TravelPlanner: A Benchmark for Real-World Planning with Language Agents**|Jian Xie et.al.|[2402.01622v2](http://arxiv.org/abs/2402.01622v2)|null|\n", "2402.01620": "|**2024-02-02**|**MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models**|Justin Chih-Yao Chen et.al.|[2402.01620v1](http://arxiv.org/abs/2402.01620v1)|**[link](https://github.com/dinobby/magdi)**|\n", "2402.01602": "|**2024-02-02**|**Foundation Model Sherpas: Guiding Foundation Models through Knowledge and Reasoning**|Debarun Bhattacharjya et.al.|[2402.01602v1](http://arxiv.org/abs/2402.01602v1)|null|\n", "2402.01591": "|**2024-02-02**|**BAT: Learning to Reason about Spatial Sounds with Large Language Models**|Zhisheng Zheng et.al.|[2402.01591v1](http://arxiv.org/abs/2402.01591v1)|null|\n", "2402.01521": "|**2024-02-02**|**K-Level Reasoning with Large Language Models**|Yadong Zhang et.al.|[2402.01521v1](http://arxiv.org/abs/2402.01521v1)|null|\n", "2402.01469": "|**2024-02-02**|**AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback**|Jian Guan et.al.|[2402.01469v1](http://arxiv.org/abs/2402.01469v1)|null|\n", "2402.01246": "|**2024-02-02**|**LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving**|Daocheng Fu et.al.|[2402.01246v1](http://arxiv.org/abs/2402.01246v1)|null|\n", "2402.01135": "|**2024-02-02**|**A Multi-Agent Conversational Recommender System**|Jiabao Fang et.al.|[2402.01135v1](http://arxiv.org/abs/2402.01135v1)|null|\n", "2402.01109": "|**2024-02-02**|**Vaccine: Perturbation-aware Alignment for Large Language Model**|Tiansheng Huang et.al.|[2402.01109v1](http://arxiv.org/abs/2402.01109v1)|**[link](https://github.com/git-disl/vaccine)**|\n", "2402.01108": "|**2024-02-02**|**Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions**|Pouya Pezeshkpour et.al.|[2402.01108v1](http://arxiv.org/abs/2402.01108v1)|null|\n", "2402.01105": "|**2024-02-02**|**A Survey for Foundation Models in Autonomous Driving**|Haoxiang Gao et.al.|[2402.01105v1](http://arxiv.org/abs/2402.01105v1)|null|\n", "2402.03173": "|**2024-02-05**|**Multi: Multimodal Understanding Leaderboard with Text and Images**|Zichen Zhu et.al.|[2402.03173v1](http://arxiv.org/abs/2402.03173v1)|null|\n", "2402.02872": "|**2024-02-05**|**How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning**|Zeping Yu et.al.|[2402.02872v1](http://arxiv.org/abs/2402.02872v1)|null|\n", "2402.02805": "|**2024-02-05**|**Graph-enhanced Large Language Models in Asynchronous Plan Reasoning**|Fangru Lin et.al.|[2402.02805v1](http://arxiv.org/abs/2402.02805v1)|null|\n", "2402.02648": "|**2024-02-05**|**Chain-of-Feedback: Mitigating the Effects of Inconsistency in Responses**|Jinwoo Ahn et.al.|[2402.02648v1](http://arxiv.org/abs/2402.02648v1)|null|\n", "2402.02636": "|**2024-02-04**|**Can Large Language Models Learn Independent Causal Mechanisms?**|Ga\u00ebl Gendron et.al.|[2402.02636v1](http://arxiv.org/abs/2402.02636v1)|null|\n", "2402.02563": "|**2024-02-04**|**DefInt: A Default-interventionist Framework for Efficient Reasoning with Hybrid Large Language Models**|Yu Shang et.al.|[2402.02563v1](http://arxiv.org/abs/2402.02563v1)|null|\n", "2402.02558": "|**2024-02-04**|**Enhancing Robustness in Biomedical NLI Models: A Probing Approach for Clinical Trials**|Ata Mustafa et.al.|[2402.02558v1](http://arxiv.org/abs/2402.02558v1)|null|\n", "2402.02549": "|**2024-02-04**|**Are Large Language Models Table-based Fact-Checkers?**|Hangwen Zhang et.al.|[2402.02549v1](http://arxiv.org/abs/2402.02549v1)|null|\n", "2402.02548": "|**2024-02-04**|**\"What's my model inside of?\": Exploring the role of environments for grounded natural language understanding**|Ronen Tamari et.al.|[2402.02548v1](http://arxiv.org/abs/2402.02548v1)|null|\n", "2402.02547": "|**2024-02-04**|**Integration of cognitive tasks into artificial general intelligence test for large models**|Youzhi Qu et.al.|[2402.02547v1](http://arxiv.org/abs/2402.02547v1)|null|\n", "2402.02544": "|**2024-02-07**|**LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model**|Dilxat Muhtar et.al.|[2402.02544v2](http://arxiv.org/abs/2402.02544v2)|**[link](https://github.com/NJU-LHRS/LHRS-Bot)**|\n", "2402.02503": "|**2024-02-04**|**GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering**|Ziyu Ma et.al.|[2402.02503v1](http://arxiv.org/abs/2402.02503v1)|null|\n", "2402.02408": "|**2024-02-04**|**GLaPE: Gold Label-agnostic Prompt Evaluation and Optimization for Large Language Model**|Xuanchang Zhang et.al.|[2402.02408v1](http://arxiv.org/abs/2402.02408v1)|**[link](https://github.com/thunderous77/glape)**|\n", "2402.02330": "|**2024-02-04**|**Enhance Reasoning for Large Language Models in the Game Werewolf**|Shuang Wu et.al.|[2402.02330v1](http://arxiv.org/abs/2402.02330v1)|null|\n", "2402.02244": "|**2024-02-03**|**Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models**|Xindi Wang et.al.|[2402.02244v1](http://arxiv.org/abs/2402.02244v1)|null|\n", "2402.02135": "|**2024-02-03**|**Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test**|Aditi Khandelwal et.al.|[2402.02135v1](http://arxiv.org/abs/2402.02135v1)|null|\n", "2402.02130": "|**2024-02-03**|**Rendering Graphs for Graph Reasoning in Multimodal Large Language Models**|Yanbin Wei et.al.|[2402.02130v1](http://arxiv.org/abs/2402.02130v1)|null|\n", "2402.02018": "|**2024-02-07**|**The Landscape and Challenges of HPC Research and LLMs**|Le Chen et.al.|[2402.02018v3](http://arxiv.org/abs/2402.02018v3)|null|\n", "2402.01980": "|**2024-02-03**|**SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks**|Gourab Dey et.al.|[2402.01980v1](http://arxiv.org/abs/2402.01980v1)|null|\n", "2402.01968": "|**2024-02-03**|**A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions**|Hung Du et.al.|[2402.01968v1](http://arxiv.org/abs/2402.01968v1)|null|\n", "2402.01889": "|**2024-02-02**|**The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning**|Daniel Cunnington et.al.|[2402.01889v1](http://arxiv.org/abs/2402.01889v1)|null|\n", "2402.01874": "|**2024-02-02**|**The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models**|Moschoula Pternea et.al.|[2402.01874v1](http://arxiv.org/abs/2402.01874v1)|null|\n", "2402.01864": "|**2024-02-02**|**(A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice**|Inyoung Cheong et.al.|[2402.01864v1](http://arxiv.org/abs/2402.01864v1)|null|\n", "2402.01821": "|**2024-02-02**|**Ecologically rational meta-learned inference explains human category learning**|Akshay K. Jagadish et.al.|[2402.01821v1](http://arxiv.org/abs/2402.01821v1)|null|\n", "2402.01817": "|**2024-02-06**|**LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks**|Subbarao Kambhampati et.al.|[2402.01817v2](http://arxiv.org/abs/2402.01817v2)|null|\n", "2402.01812": "|**2024-02-02**|**Distilling LLMs' Decomposition Abilities into Compact Language Models**|Denis Tarasov et.al.|[2402.01812v1](http://arxiv.org/abs/2402.01812v1)|**[link](https://github.com/dt6a/gsm8k-ai-subq)**|\n", "2402.01805": "|**2024-02-02**|**Exploring the Limitations of Graph Reasoning in Large Language Models**|Palaash Agrawal et.al.|[2402.01805v1](http://arxiv.org/abs/2402.01805v1)|null|\n", "2402.01758": "|**2024-01-30**|**Aalap: AI Assistant for Legal & Paralegal Functions in India**|Aman Tiwari et.al.|[2402.01758v1](http://arxiv.org/abs/2402.01758v1)|null|\n", "2402.01750": "|**2024-01-30**|**PACE: A Pragmatic Agent for Enhancing Communication Efficiency Using Large Language Models**|Jiaxuan Li et.al.|[2402.01750v1](http://arxiv.org/abs/2402.01750v1)|null|\n", "2402.01748": "|**2024-02-07**|**Large Multi-Modal Models (LMMs) as Universal Foundation Models for AI-Native Wireless Systems**|Shengzhe Xu et.al.|[2402.01748v2](http://arxiv.org/abs/2402.01748v2)|null|\n", "2401.15077": "|**2024-02-04**|**EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty**|Yuhui Li et.al.|[2401.15077v2](http://arxiv.org/abs/2401.15077v2)|**[link](https://github.com/safeailab/eagle)**|\n", "2402.01698": "|**2024-01-24**|**Large language model empowered participatory urban planning**|Zhilun Zhou et.al.|[2402.01698v1](http://arxiv.org/abs/2402.01698v1)|null|\n", "2402.04178": "|**2024-02-06**|**SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models**|Yichen Shi et.al.|[2402.04178v1](http://arxiv.org/abs/2402.04178v1)|**[link](https://github.com/laiyingxin2/shield)**|\n", "2402.03916": "|**2024-02-08**|**Can Large Language Models Detect Rumors on Social Media?**|Qiang Liu et.al.|[2402.03916v2](http://arxiv.org/abs/2402.03916v2)|null|\n", "2402.03877": "|**2024-02-14**|**Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models**|Spyridon Mouselinos et.al.|[2402.03877v2](http://arxiv.org/abs/2402.03877v2)|null|\n", "2402.03686": "|**2024-02-06**|**Minds versus Machines: Rethinking Entailment Verification with Language Models**|Soumya Sanyal et.al.|[2402.03686v1](http://arxiv.org/abs/2402.03686v1)|null|\n", "2402.03667": "|**2024-02-06**|**Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning**|Yanfang Zhang et.al.|[2402.03667v1](http://arxiv.org/abs/2402.03667v1)|null|\n", "2402.03659": "|**2024-02-07**|**Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models**|Kelvin J. L. Koa et.al.|[2402.03659v2](http://arxiv.org/abs/2402.03659v2)|null|\n", "2402.03628": "|**2024-02-06**|**Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies**|Zhixuan Chu et.al.|[2402.03628v1](http://arxiv.org/abs/2402.03628v1)|null|\n", "2402.03620": "|**2024-02-06**|**Self-Discover: Large Language Models Self-Compose Reasoning Structures**|Pei Zhou et.al.|[2402.03620v1](http://arxiv.org/abs/2402.03620v1)|null|\n", "2402.03616": "|**2024-02-06**|**Leveraging Large Language Models for Hybrid Workplace Decision Support**|Yujin Kim et.al.|[2402.03616v1](http://arxiv.org/abs/2402.03616v1)|null|\n", "2402.03597": "|**2024-02-06**|**Identifying Reasons for Contraceptive Switching from Real-World Data Using Large Language Models**|Brenda Y. Miao et.al.|[2402.03597v1](http://arxiv.org/abs/2402.03597v1)|null|\n", "2402.03507": "|**2024-02-05**|**Neural networks for abstraction and reasoning: Towards broad generalization in machines**|Mikel Bober-Irizar et.al.|[2402.03507v1](http://arxiv.org/abs/2402.03507v1)|**[link](https://github.com/mxbi/arckit)**|\n", "2402.03366": "|**2024-01-31**|**Uncertainty-Aware Explainable Recommendation with Large Language Models**|Yicui Peng et.al.|[2402.03366v1](http://arxiv.org/abs/2402.03366v1)|null|\n", "2402.04978": "|**2024-02-07**|**An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration**|Yihao Li et.al.|[2402.04978v1](http://arxiv.org/abs/2402.04978v1)|null|\n", "2402.04918": "|**2024-02-07**|**Prompting Implicit Discourse Relation Annotation**|Frances Yung et.al.|[2402.04918v1](http://arxiv.org/abs/2402.04918v1)|null|\n", "2402.04858": "|**2024-02-07**|**CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay**|Natasha Butt et.al.|[2402.04858v1](http://arxiv.org/abs/2402.04858v1)|null|\n", "2402.04678": "|**2024-02-07**|**Large Language Models As Faithful Explainers**|Yu-Neng Chuang et.al.|[2402.04678v1](http://arxiv.org/abs/2402.04678v1)|null|\n", "2402.04636": "|**2024-02-07**|**TransLLaMa: LLM-based Simultaneous Translation System**|Roman Koshkin et.al.|[2402.04636v1](http://arxiv.org/abs/2402.04636v1)|null|\n", "2402.04616": "|**2024-02-07**|**TinyLLM: Learning a Small Student from Multiple Large Language Models**|Yijun Tian et.al.|[2402.04616v1](http://arxiv.org/abs/2402.04616v1)|null|\n", "2402.04614": "|**2024-02-08**|**Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models**|Chirag Agarwal et.al.|[2402.04614v2](http://arxiv.org/abs/2402.04614v2)|null|\n", "2402.04559": "|**2024-02-07**|**Can Large Language Model Agents Simulate Human Trust Behaviors?**|Chengxing Xie et.al.|[2402.04559v1](http://arxiv.org/abs/2402.04559v1)|**[link](https://github.com/camel-ai/agent-trust)**|\n", "2402.04333": "|**2024-02-06**|**LESS: Selecting Influential Data for Targeted Instruction Tuning**|Mengzhou Xia et.al.|[2402.04333v1](http://arxiv.org/abs/2402.04333v1)|**[link](https://github.com/princeton-nlp/less)**|\n", "2402.05863": "|**2024-02-08**|**How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis**|Federico Bianchi et.al.|[2402.05863v1](http://arxiv.org/abs/2402.05863v1)|**[link](https://github.com/vinid/negotiationarena)**|\n", "2402.05862": "|**2024-02-08**|**Let Your Graph Do the Talking: Encoding Structured Data for LLMs**|Bryan Perozzi et.al.|[2402.05862v1](http://arxiv.org/abs/2402.05862v1)|null|\n", "2402.05808": "|**2024-02-08**|**Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning**|Zhiheng Xi et.al.|[2402.05808v1](http://arxiv.org/abs/2402.05808v1)|**[link](https://github.com/woooodyy/llm-reverse-curriculum-rl)**|\n", "2402.05706": "|**2024-02-08**|**Unified Speech-Text Pretraining for Spoken Dialog Modeling**|Heeseung Kim et.al.|[2402.05706v1](http://arxiv.org/abs/2402.05706v1)|null|\n", "2402.05602": "|**2024-02-08**|**AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers**|Reduan Achtibat et.al.|[2402.05602v1](http://arxiv.org/abs/2402.05602v1)|null|\n", "2402.05472": "|**2024-02-08**|**Question Aware Vision Transformer for Multimodal Reasoning**|Roy Ganz et.al.|[2402.05472v1](http://arxiv.org/abs/2402.05472v1)|null|\n", "2402.05467": "|**2024-02-08**|**Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia**|Guangyu Shen et.al.|[2402.05467v1](http://arxiv.org/abs/2402.05467v1)|**[link](https://github.com/solidshen/ripple_official)**|\n", "2402.05376": "|**2024-02-08**|**Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models**|Feihu Jin et.al.|[2402.05376v1](http://arxiv.org/abs/2402.05376v1)|**[link](https://github.com/stan-anony/zero-shot-eot-prompting)**|\n", "2402.05200": "|**2024-02-07**|**Are LLMs Ready for Real-World Materials Discovery?**|Santiago Miret et.al.|[2402.05200v1](http://arxiv.org/abs/2402.05200v1)|null|\n", "2402.05138": "|**2024-02-06**|**SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark**|Zhenwen Liang et.al.|[2402.05138v1](http://arxiv.org/abs/2402.05138v1)|null|\n", "2402.05128": "|**2024-02-14**|**Enhancing Textbook Question Answering Task with Large Language Models and Retrieval Augmented Generation**|Hessa Abdulrahman Alawwad et.al.|[2402.05128v2](http://arxiv.org/abs/2402.05128v2)|null|\n", "2402.06596": "|**2024-02-09**|**Understanding the Weakness of Large Language Model Agents within a Complex Android Environment**|Mingzhe Xing et.al.|[2402.06596v1](http://arxiv.org/abs/2402.06596v1)|**[link](https://github.com/androidarenaagent/androidarena)**|\n", "2402.06557": "|**2024-02-09**|**The Quantified Boolean Bayesian Network: Theory and Experiments with a Logical Graphical Model**|Gregory Coppola et.al.|[2402.06557v1](http://arxiv.org/abs/2402.06557v1)|null|\n", "2402.06529": "|**2024-02-09**|**Introspective Planning: Guiding Language-Enabled Agents to Refine Their Own Uncertainty**|Kaiqu Liang et.al.|[2402.06529v1](http://arxiv.org/abs/2402.06529v1)|null|\n", "2402.06457": "|**2024-02-09**|**V-STaR: Training Verifiers for Self-Taught Reasoners**|Arian Hosseini et.al.|[2402.06457v1](http://arxiv.org/abs/2402.06457v1)|null|\n", "2402.06332": "|**2024-02-09**|**InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning**|Huaiyuan Ying et.al.|[2402.06332v1](http://arxiv.org/abs/2402.06332v1)|**[link](https://github.com/internlm/internlm-math)**|\n", "2402.06120": "|**2024-02-09**|**Exploring Group and Symmetry Principles in Large Language Models**|Shima Imani et.al.|[2402.06120v1](http://arxiv.org/abs/2402.06120v1)|null|\n", "2402.06119": "|**2024-02-09**|**ContPhy: Continuum Physical Concept Learning and Reasoning from Videos**|Zhicheng Zheng et.al.|[2402.06119v1](http://arxiv.org/abs/2402.06119v1)|null|\n", "2402.06118": "|**2024-02-09**|**ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling**|Siming Yan et.al.|[2402.06118v1](http://arxiv.org/abs/2402.06118v1)|null|\n", "2402.06044": "|**2024-02-14**|**OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models**|Hainiu Xu et.al.|[2402.06044v2](http://arxiv.org/abs/2402.06044v2)|**[link](https://github.com/seacowx/opentom)**|\n", "2402.07776": "|**2024-02-12**|**TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection**|Hui Liu et.al.|[2402.07776v1](http://arxiv.org/abs/2402.07776v1)|**[link](https://github.com/less-and-less-bugs/trust_teller)**|\n", "2402.07647": "|**2024-02-12**|**GRILLBot In Practice: Lessons and Tradeoffs Deploying Large Language Models for Adaptable Conversational Task Assistants**|Sophie Fischer et.al.|[2402.07647v1](http://arxiv.org/abs/2402.07647v1)|null|\n", "2402.07630": "|**2024-02-12**|**G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering**|Xiaoxin He et.al.|[2402.07630v1](http://arxiv.org/abs/2402.07630v1)|**[link](https://github.com/xiaoxinhe/g-retriever)**|\n", "2402.07536": "|**2024-02-12**|**BreakGPT: A Large Language Model with Multi-stage Structure for Financial Breakout Detection**|Kang Zhang et.al.|[2402.07536v1](http://arxiv.org/abs/2402.07536v1)|**[link](https://github.com/neviim96/breakgpt)**|\n", "2402.07408": "|**2024-02-12**|**Large Language Models are Few-shot Generators: Proposing Hybrid Prompt Algorithm To Generate Webshell Escape Samples**|Mingrui Ma et.al.|[2402.07408v1](http://arxiv.org/abs/2402.07408v1)|null|\n", "2402.07282": "|**2024-02-13**|**How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?**|Ryan Liu et.al.|[2402.07282v2](http://arxiv.org/abs/2402.07282v2)|null|\n", "2402.07148": "|**2024-02-11**|**X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Design**|Eric L. Buehler et.al.|[2402.07148v1](http://arxiv.org/abs/2402.07148v1)|**[link](https://github.com/ericlbuehler/xlora)**|\n", "2402.07140": "|**2024-02-11**|**Sequential Ordering in Textual Descriptions: Impact on Spatial Perception Abilities of Large Language Models**|Yuyao Ge et.al.|[2402.07140v1](http://arxiv.org/abs/2402.07140v1)|null|\n", "2402.07081": "|**2024-02-11**|**Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education**|Nischal Ashok Kumar et.al.|[2402.07081v1](http://arxiv.org/abs/2402.07081v1)|**[link](https://github.com/umass-ml4ed/test_case_generation)**|\n", "2402.07023": "|**2024-02-10**|**Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations**|Ankit Pal et.al.|[2402.07023v1](http://arxiv.org/abs/2402.07023v1)|**[link](https://github.com/promptslab/rosettaeval)**|\n", "2402.06918": "|**2024-02-10**|**Generating Chain-of-Thoughts with a Direct Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought**|Zhen-Yu Zhang et.al.|[2402.06918v1](http://arxiv.org/abs/2402.06918v1)|null|\n", "2402.06894": "|**2024-02-10**|**GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators**|Yuchen Hu et.al.|[2402.06894v1](http://arxiv.org/abs/2402.06894v1)|**[link](https://github.com/yuchen005/gentranslate)**|\n", "2402.06798": "|**2024-02-09**|**Reasoning Grasping via Multimodal Large Language Model**|Shiyu Jin et.al.|[2402.06798v1](http://arxiv.org/abs/2402.06798v1)|null|\n", "2402.06764": "|**2024-02-16**|**GLaM: Fine-Tuning Large Language Models for Domain Knowledge Graph Alignment via Neighborhood Partitioning and Generative Subgraph Encoding**|Stefan Dernbach et.al.|[2402.06764v2](http://arxiv.org/abs/2402.06764v2)|null|\n", "2402.08309": "|**2024-02-14**|**Prompted Contextual Vectors for Spear-Phishing Detection**|Daniel Nahmias et.al.|[2402.08309v2](http://arxiv.org/abs/2402.08309v2)|**[link](https://github.com/nahmiasd/prompted-contextual-vectors-for-spear-phishing-detection)**|\n", "2402.08259": "|**2024-02-13**|**A Survey of Table Reasoning with Large Language Models**|Xuanliang Zhang et.al.|[2402.08259v1](http://arxiv.org/abs/2402.08259v1)|**[link](https://github.com/zhxlia/awesome-tablereasoning-llm-survey)**|\n", "2402.08115": "|**2024-02-12**|**On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks**|Kaya Stechly et.al.|[2402.08115v1](http://arxiv.org/abs/2402.08115v1)|null|\n", "2402.08064": "|**2024-02-12**|**Beyond LLMs: Advancing the Landscape of Complex Reasoning**|Jennifer Chu-Carroll et.al.|[2402.08064v1](http://arxiv.org/abs/2402.08064v1)|null|\n", "2402.07938": "|**2024-02-07**|**Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs**|Syed Mekael Wasti et.al.|[2402.07938v1](http://arxiv.org/abs/2402.07938v1)|null|\n", "2402.07927": "|**2024-02-05**|**A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications**|Pranab Sahoo et.al.|[2402.07927v1](http://arxiv.org/abs/2402.07927v1)|null|\n", "2402.09404": "|**2024-02-14**|**AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability**|Siwei Yang et.al.|[2402.09404v1](http://arxiv.org/abs/2402.09404v1)|**[link](https://github.com/ucsc-vlaa/aqa-bench)**|\n", "2402.09334": "|**2024-02-14**|**AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach**|Maryam Amirizaniani et.al.|[2402.09334v1](http://arxiv.org/abs/2402.09334v1)|null|\n", "2402.09269": "|**2024-02-14**|**Personalized Large Language Models**|Stanis\u0142aw Wo\u017aniak et.al.|[2402.09269v1](http://arxiv.org/abs/2402.09269v1)|null|\n", "2402.09193": "|**2024-02-15**|**(Ir)rationality and Cognitive Biases in Large Language Models**|Olivia Macmillan-Scott et.al.|[2402.09193v2](http://arxiv.org/abs/2402.09193v2)|**[link](https://github.com/oliviams/llm_rationality)**|\n", "2402.09136": "|**2024-02-14**|**DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning**|Yejie Wang et.al.|[2402.09136v1](http://arxiv.org/abs/2402.09136v1)|null|\n", "2402.09052": "|**2024-02-14**|**L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects**|Yutaro Yamada et.al.|[2402.09052v1](http://arxiv.org/abs/2402.09052v1)|null|\n", "2402.08957": "|**2024-02-14**|**MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data**|Yinya Huang et.al.|[2402.08957v1](http://arxiv.org/abs/2402.08957v1)|null|\n", "2402.08955": "|**2024-02-14**|**Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models**|Martha Lewis et.al.|[2402.08955v1](http://arxiv.org/abs/2402.08955v1)|null|\n", "2402.08939": "|**2024-02-14**|**Premise Order Matters in Reasoning with Large Language Models**|Xinyun Chen et.al.|[2402.08939v1](http://arxiv.org/abs/2402.08939v1)|null|\n", "2402.08859": "|**2024-02-14**|**Large Language Model with Graph Convolution for Recommendation**|Yingpeng Du et.al.|[2402.08859v1](http://arxiv.org/abs/2402.08859v1)|null|\n", "2402.08785": "|**2024-02-13**|**InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment**|Jianing Wang et.al.|[2402.08785v1](http://arxiv.org/abs/2402.08785v1)|**[link](https://github.com/wjn1996/instructgraph)**|\n", "2402.08755": "|**2024-02-13**|**LLM-driven Imitation of Subrational Behavior : Illusion or Reality?**|Andrea Coletta et.al.|[2402.08755v1](http://arxiv.org/abs/2402.08755v1)|null|\n", "2402.10200": "|**2024-02-15**|**Chain-of-Thought Reasoning Without Prompting**|Xuezhi Wang et.al.|[2402.10200v1](http://arxiv.org/abs/2402.10200v1)|null|\n", "2402.10176": "|**2024-02-15**|**OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset**|Shubham Toshniwal et.al.|[2402.10176v1](http://arxiv.org/abs/2402.10176v1)|**[link](https://github.com/kipok/nemo-skills)**|\n", "2402.10133": "|**2024-02-15**|**Zero-Shot Reasoning: Personalized Content Generation Without the Cold Start Problem**|Davor Hafnar et.al.|[2402.10133v1](http://arxiv.org/abs/2402.10133v1)|null|\n", "2402.10104": "|**2024-02-15**|**GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving**|Jiaxin Zhang et.al.|[2402.10104v1](http://arxiv.org/abs/2402.10104v1)|**[link](https://github.com/geoeval/geoeval)**|\n", "2402.09967": "|**2024-02-15**|**Case Study: Testing Model Capabilities in Some Reasoning Tasks**|Min Zhang et.al.|[2402.09967v1](http://arxiv.org/abs/2402.09967v1)|null|\n", "2402.09880": "|**2024-02-15**|**Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence**|Timothy R. McIntosh et.al.|[2402.09880v1](http://arxiv.org/abs/2402.09880v1)|null|\n", "2402.09836": "|**2024-02-15**|**Beyond Imitation: Generating Human Mobility from Context-aware Reasoning with Large Language Models**|Chenyang Shao et.al.|[2402.09836v1](http://arxiv.org/abs/2402.09836v1)|null|\n", "2402.09756": "|**2024-02-15**|**Mixture of Experts for Network Optimization: A Large Language Model-enabled Approach**|Hongyang Du et.al.|[2402.09756v1](http://arxiv.org/abs/2402.09756v1)|null|\n", "2402.09668": "|**2024-02-15**|**How to Train Data-Efficient LLMs**|Noveen Sachdeva et.al.|[2402.09668v1](http://arxiv.org/abs/2402.09668v1)|null|\n", "2402.09664": "|**2024-02-16**|**CodeMind: A Framework to Challenge Large Language Models for Code Reasoning**|Changshu Liu et.al.|[2402.09664v2](http://arxiv.org/abs/2402.09664v2)|**[link](https://github.com/intelligent-cat-lab/codemind)**|\n", "2402.09614": "|**2024-02-14**|**Probabilistic Reasoning in Generative Large Language Models**|Aliakbar Nafar et.al.|[2402.09614v1](http://arxiv.org/abs/2402.09614v1)|null|\n", "2402.09552": "|**2024-02-14**|**Rationality Report Cards: Assessing the Economic Rationality of Large Language Models**|Narun Raman et.al.|[2402.09552v1](http://arxiv.org/abs/2402.09552v1)|null|\n", "2402.09546": "|**2024-02-14**|**How Secure Are Large Language Models (LLMs) for Navigation in Urban Environments?**|Congcong Wen et.al.|[2402.09546v1](http://arxiv.org/abs/2402.09546v1)|null|\n", "2402.09469": "|**2024-02-12**|**Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic**|Jiuxiang Gu et.al.|[2402.09469v1](http://arxiv.org/abs/2402.09469v1)|null|\n", "2402.10896": "|**2024-02-16**|**PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter**|Junfei Xiao et.al.|[2402.10896v1](http://arxiv.org/abs/2402.10896v1)|null|\n", "2402.10890": "|**2024-02-16**|**When is Tree Search Useful for LLM Planning? It Depends on the Discriminator**|Ziru Chen et.al.|[2402.10890v1](http://arxiv.org/abs/2402.10890v1)|**[link](https://github.com/osu-nlp-group/llm-planning-eval)**|\n", "2402.10778": "|**2024-02-16**|**AutoGPT+P: Affordance-based Task Planning with Large Language Models**|Timo Birr et.al.|[2402.10778v1](http://arxiv.org/abs/2402.10778v1)|null|\n", "2402.10754": "|**2024-02-16**|**When Dataflow Analysis Meets Large Language Models**|Chengpeng Wang et.al.|[2402.10754v1](http://arxiv.org/abs/2402.10754v1)|null|\n", "2402.10698": "|**2024-02-16**|**Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering**|David Romero et.al.|[2402.10698v1](http://arxiv.org/abs/2402.10698v1)|null|\n", "2402.10688": "|**2024-02-16**|**Opening the Black Box of Large Language Models: Two Views on Holistic Interpretability**|Haiyan Zhao et.al.|[2402.10688v1](http://arxiv.org/abs/2402.10688v1)|null|\n", "2402.10670": "|**2024-02-16**|**OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models**|Yuxuan Kuang et.al.|[2402.10670v1](http://arxiv.org/abs/2402.10670v1)|null|\n", "2402.10654": "|**2024-02-16**|**Enhancing Numerical Reasoning with the Guidance of Reliable Reasoning Processes**|Dingzirui Wang et.al.|[2402.10654v1](http://arxiv.org/abs/2402.10654v1)|null|\n", "2402.10645": "|**2024-02-16**|**Can Separators Improve Chain-of-Thought Prompting?**|Yoonjeong Park et.al.|[2402.10645v1](http://arxiv.org/abs/2402.10645v1)|null|\n", "2402.10631": "|**2024-02-16**|**BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation**|Dayou Du et.al.|[2402.10631v1](http://arxiv.org/abs/2402.10631v1)|**[link](https://github.com/dd-duda/bitdistiller)**|\n", "2402.10612": "|**2024-02-16**|**Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models**|Hanxing Ding et.al.|[2402.10612v1](http://arxiv.org/abs/2402.10612v1)|null|\n", "2402.10532": "|**2024-02-16**|**Properties and Challenges of LLM-Generated Explanations**|Jenny Kunz et.al.|[2402.10532v1](http://arxiv.org/abs/2402.10532v1)|null|\n", "2402.10528": "|**2024-02-16**|**Can We Verify Step by Step for Incorrect Answer Detection?**|Xin Xu et.al.|[2402.10528v1](http://arxiv.org/abs/2402.10528v1)|**[link](https://github.com/xinxu-ustc/r2pe)**|\n", "2402.10400": "|**2024-02-16**|**Chain of Logic: Rule-Based Reasoning with Large Language Models**|Sergio Servantez et.al.|[2402.10400v1](http://arxiv.org/abs/2402.10400v1)|null|\n", "2402.12348": "|**2024-02-19**|**GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations**|Jinhao Duan et.al.|[2402.12348v1](http://arxiv.org/abs/2402.12348v1)|**[link](https://github.com/jinhaoduan/gtbench)**|\n", "2402.12219": "|**2024-02-19**|**Reformatted Alignment**|Run-Ze Fan et.al.|[2402.12219v1](http://arxiv.org/abs/2402.12219v1)|**[link](https://github.com/gair-nlp/realign)**|\n", "2402.12212": "|**2024-02-19**|**Polarization of Autonomous Generative AI Agents Under Echo Chambers**|Masaya Ohagi et.al.|[2402.12212v1](http://arxiv.org/abs/2402.12212v1)|null|\n", "2402.12195": "|**2024-02-19**|**Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion**|Ziyue Wang et.al.|[2402.12195v1](http://arxiv.org/abs/2402.12195v1)|null|\n", "2402.12185": "|**2024-02-19**|**ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning**|Renqiu Xia et.al.|[2402.12185v1](http://arxiv.org/abs/2402.12185v1)|**[link](https://github.com/unimodal4reasoning/chartvlm)**|\n", "2402.12146": "|**2024-02-19**|**Meta Ranking: Less Capable Language Models are Capable for Single Response Judgement**|Zijun Liu et.al.|[2402.12146v1](http://arxiv.org/abs/2402.12146v1)|**[link](https://github.com/thunlp-mt/metaranking)**|\n", "2402.12091": "|**2024-02-19**|**Do Large Language Models Understand Logic or Just Mimick Context?**|Junbing Yan et.al.|[2402.12091v1](http://arxiv.org/abs/2402.12091v1)|null|\n", "2402.12080": "|**2024-02-19**|**Can LLMs Compute with Reasons?**|Harshit Sandilya et.al.|[2402.12080v1](http://arxiv.org/abs/2402.12080v1)|null|\n", "2402.12071": "|**2024-02-19**|**EmoBench: Evaluating the Emotional Intelligence of Large Language Models**|Sahand Sabour et.al.|[2402.12071v1](http://arxiv.org/abs/2402.12071v1)|**[link](https://github.com/sahandfer/emobench)**|\n", "2402.12052": "|**2024-02-22**|**Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs**|Jiejun Tan et.al.|[2402.12052v2](http://arxiv.org/abs/2402.12052v2)|**[link](https://github.com/plageon/slimplm)**|\n", "2402.12038": "|**2024-02-19**|**Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations**|Milan Bhan et.al.|[2402.12038v1](http://arxiv.org/abs/2402.12038v1)|null|\n", "2402.12022": "|**2024-02-19**|**Distilling Large Language Models for Text-Attributed Graph Learning**|Bo Pan et.al.|[2402.12022v1](http://arxiv.org/abs/2402.12022v1)|null|\n", "2402.11997": "|**2024-02-19**|**Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models**|Himanshu Beniwal et.al.|[2402.11997v1](http://arxiv.org/abs/2402.11997v1)|null|\n", "2402.11924": "|**2024-02-19**|**MRKE: The Multi-hop Reasoning Evaluation of LLMs by Knowledge Edition**|Jian Wu et.al.|[2402.11924v1](http://arxiv.org/abs/2402.11924v1)|null|\n", "2402.11903": "|**2024-02-19**|**SoLA: Solver-Layer Adaption of LLM for Better Logic Reasoning**|Yu Zhang et.al.|[2402.11903v1](http://arxiv.org/abs/2402.11903v1)|null|\n", "2402.11900": "|**2024-02-19**|**Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models**|Tianjie Ju et.al.|[2402.11900v1](http://arxiv.org/abs/2402.11900v1)|null|\n", "2402.11896": "|**2024-02-19**|**SIBO: A Simple Booster for Parameter-Efficient Fine-Tuning**|Zhihao Wen et.al.|[2402.11896v1](http://arxiv.org/abs/2402.11896v1)|null|\n", "2402.11863": "|**2024-02-25**|**How Interpretable are Reasoning Explanations from Prompting Large Language Models?**|Wei Jie Yeo et.al.|[2402.11863v2](http://arxiv.org/abs/2402.11863v2)|**[link](https://github.com/wj210/cot_interpretability)**|\n", "2402.11845": "|**2024-02-19**|**Modularized Networks for Few-shot Hateful Meme Detection**|Rui Cao et.al.|[2402.11845v1](http://arxiv.org/abs/2402.11845v1)|**[link](https://github.com/social-ai-studio/mod_hate)**|\n", "2402.11821": "|**2024-02-20**|**Microstructures and Accuracy of Graph Recall by Large Language Models**|Yanbang Wang et.al.|[2402.11821v2](http://arxiv.org/abs/2402.11821v2)|null|\n", "2402.11804": "|**2024-02-19**|**LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs**|Kai Wang et.al.|[2402.11804v1](http://arxiv.org/abs/2402.11804v1)|null|\n", "2402.11724": "|**2024-02-18**|**Large Language Models as Data Augmenters for Cold-Start Item Recommendation**|Jianling Wang et.al.|[2402.11724v1](http://arxiv.org/abs/2402.11724v1)|null|\n", "2402.11651": "|**2024-02-18**|**Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents**|Renxi Wang et.al.|[2402.11651v1](http://arxiv.org/abs/2402.11651v1)|**[link](https://github.com/reason-wang/nat)**|\n", "2402.11638": "|**2024-02-18**|**Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks**|Yichen Wang et.al.|[2402.11638v1](http://arxiv.org/abs/2402.11638v1)|null|\n", "2402.11626": "|**2024-02-18**|**Metacognitive Retrieval-Augmented Large Language Models**|Yujia Zhou et.al.|[2402.11626v1](http://arxiv.org/abs/2402.11626v1)|**[link](https://github.com/ignorejjj/metarag)**|\n", "2402.11550": "|**2024-02-18**|**LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration**|Jun Zhao et.al.|[2402.11550v1](http://arxiv.org/abs/2402.11550v1)|null|\n", "2402.11541": "|**2024-02-18**|**Counter-intuitive: Large Language Models Can Better Understand Knowledge Graphs Than We Thought**|Xinbang Dai et.al.|[2402.11541v1](http://arxiv.org/abs/2402.11541v1)|null|\n", "2402.11534": "|**2024-02-18**|**PreAct: Predicting Future in ReAct Enhances Agent's Planning Ability**|Dayuan Fu et.al.|[2402.11534v1](http://arxiv.org/abs/2402.11534v1)|**[link](https://github.com/fu-dayuan/preact)**|\n", "2402.11530": "|**2024-02-18**|**Efficient Multimodal Learning from Data-centric Perspective**|Muyang He et.al.|[2402.11530v1](http://arxiv.org/abs/2402.11530v1)|**[link](https://github.com/baai-dcai/bunny)**|\n", "2402.11518": "|**2024-02-18**|**Large Language Model-driven Meta-structure Discovery in Heterogeneous Information Network**|Lin Chen et.al.|[2402.11518v1](http://arxiv.org/abs/2402.11518v1)|null|\n", "2402.11512": "|**2024-02-20**|**From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings**|Aishik Rakshit et.al.|[2402.11512v2](http://arxiv.org/abs/2402.11512v2)|null|\n", "2402.11452": "|**2024-02-18**|**AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition**|Zhaorun Chen et.al.|[2402.11452v1](http://arxiv.org/abs/2402.11452v1)|null|\n", "2402.11451": "|**2024-02-21**|**SciAgent: Tool-augmented Language Models for Scientific Reasoning**|Yubo Ma et.al.|[2402.11451v2](http://arxiv.org/abs/2402.11451v2)|null|\n", "2402.11442": "|**2024-02-18**|**Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs**|Siyuan Wang et.al.|[2402.11442v1](http://arxiv.org/abs/2402.11442v1)|**[link](https://github.com/siyuanwangw/ulogic)**|\n", "2402.11436": "|**2024-02-18**|**Perils of Self-Feedback: Self-Bias Amplifies in Large Language Models**|Wenda Xu et.al.|[2402.11436v1](http://arxiv.org/abs/2402.11436v1)|null|\n", "2402.11435": "|**2024-02-18**|**Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning**|Long Qian et.al.|[2402.11435v1](http://arxiv.org/abs/2402.11435v1)|null|\n", "2402.11432": "|**2024-02-18**|**Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning**|Kang Chen et.al.|[2402.11432v1](http://arxiv.org/abs/2402.11432v1)|null|\n", "2402.11420": "|**2024-02-18**|**Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction**|Yinghui Li et.al.|[2402.11420v1](http://arxiv.org/abs/2402.11420v1)|null|\n", "2402.11349": "|**2024-02-17**|**Tasks That Language Models Don't Learn**|Bruce W. Lee et.al.|[2402.11349v1](http://arxiv.org/abs/2402.11349v1)|**[link](https://github.com/brucewlee/h-test)**|\n", "2402.11291": "|**2024-02-17**|**Puzzle Solving using Reasoning of Large Language Models: A Survey**|Panagiotis Giadikiaroglou et.al.|[2402.11291v1](http://arxiv.org/abs/2402.11291v1)|null|\n", "2402.11254": "|**2024-02-17**|**C-ICL: Contrastive In-context Learning for Information Extraction**|Ying Mo et.al.|[2402.11254v1](http://arxiv.org/abs/2402.11254v1)|null|\n", "2402.11251": "|**2024-02-17**|**LLM can Achieve Self-Regulation via Hyperparameter Aware Generation**|Siyin Wang et.al.|[2402.11251v1](http://arxiv.org/abs/2402.11251v1)|null|\n", "2402.11208": "|**2024-02-17**|**Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents**|Wenkai Yang et.al.|[2402.11208v1](http://arxiv.org/abs/2402.11208v1)|**[link](https://github.com/lancopku/agent-backdoor-attacks)**|\n", "2402.11199": "|**2024-02-17**|**Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs**|Minh-Vuong Nguyen et.al.|[2402.11199v1](http://arxiv.org/abs/2402.11199v1)|null|\n", "2402.11194": "|**2024-02-17**|**Assessing LLMs' Mathematical Reasoning in Financial Document Question Answering**|Pragya Srivastava et.al.|[2402.11194v1](http://arxiv.org/abs/2402.11194v1)|null|\n", "2402.11166": "|**2024-02-17**|**GenDec: A robust generative Question-decomposition method for Multi-hop reasoning**|Jian Wu et.al.|[2402.11166v1](http://arxiv.org/abs/2402.11166v1)|null|\n", "2402.11163": "|**2024-02-17**|**KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph**|Jinhao Jiang et.al.|[2402.11163v1](http://arxiv.org/abs/2402.11163v1)|null|\n", "2402.11140": "|**2024-02-17**|**Boosting of Thoughts: Trial-and-Error Problem Solving with Large Language Models**|Sijia Chen et.al.|[2402.11140v1](http://arxiv.org/abs/2402.11140v1)|null|\n", "2402.11122": "|**2024-02-16**|**Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models**|Zihao Lin et.al.|[2402.11122v1](http://arxiv.org/abs/2402.11122v1)|null|\n", "2402.11100": "|**2024-02-16**|**When LLMs Meet Cunning Questions: A Fallacy Understanding Benchmark for Large Language Models**|Yinghui Li et.al.|[2402.11100v1](http://arxiv.org/abs/2402.11100v1)|null|\n", "2402.11073": "|**2024-02-16**|**AFaCTA: Assisting the Annotation of Factual Claim Detection with Reliable LLM Annotators**|Jingwei Ni et.al.|[2402.11073v1](http://arxiv.org/abs/2402.11073v1)|null|\n", "2402.11034": "|**2024-02-16**|**PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering**|Jannat Ara Meem et.al.|[2402.11034v1](http://arxiv.org/abs/2402.11034v1)|null|\n", "2402.10980": "|**2024-02-21**|**ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback**|Henry W. Sprueill et.al.|[2402.10980v2](http://arxiv.org/abs/2402.10980v2)|null|\n", "2402.10979": "|**2024-02-15**|**SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs**|Yebowen Hu et.al.|[2402.10979v1](http://arxiv.org/abs/2402.10979v1)|null|\n", "2402.10965": "|**2024-02-14**|**Generalization in Healthcare AI: Evaluation of a Clinical Large Language Model**|Salman Rahman et.al.|[2402.10965v1](http://arxiv.org/abs/2402.10965v1)|null|\n", "2402.13231": "|**2024-02-20**|**Investigating Cultural Alignment of Large Language Models**|Badr AlKhamissi et.al.|[2402.13231v1](http://arxiv.org/abs/2402.13231v1)|**[link](https://github.com/bkhmsi/cultural-trends)**|\n", "2402.13228": "|**2024-02-20**|**Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive**|Arka Pal et.al.|[2402.13228v1](http://arxiv.org/abs/2402.13228v1)|**[link](https://github.com/abacusai/smaug)**|\n", "2402.13146": "|**2024-02-20**|**OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog**|Adnen Abdessaied et.al.|[2402.13146v1](http://arxiv.org/abs/2402.13146v1)|null|\n", "2402.13109": "|**2024-02-20**|**CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models**|Yizhi LI et.al.|[2402.13109v1](http://arxiv.org/abs/2402.13109v1)|null|\n", "2402.13098": "|**2024-02-20**|**ELAD: Explanation-Guided Large Language Models Active Distillation**|Yifei Zhang et.al.|[2402.13098v1](http://arxiv.org/abs/2402.13098v1)|null|\n", "2402.13064": "|**2024-02-20**|**Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models**|Haoran Li et.al.|[2402.13064v1](http://arxiv.org/abs/2402.13064v1)|null|\n", "2402.13035": "|**2024-02-23**|**Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models**|Che Zhang et.al.|[2402.13035v2](http://arxiv.org/abs/2402.13035v2)|**[link](https://github.com/bammt/learn-to-check)**|\n", "2402.12875": "|**2024-02-20**|**Chain of Thought Empowers Transformers to Solve Inherently Serial Problems**|Zhiyuan Li et.al.|[2402.12875v1](http://arxiv.org/abs/2402.12875v1)|null|\n", "2402.12869": "|**2024-02-20**|**Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data**|Dehai Min et.al.|[2402.12869v1](http://arxiv.org/abs/2402.12869v1)|null|\n", "2402.12851": "|**2024-02-20**|**MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models**|Tongxu Luo et.al.|[2402.12851v1](http://arxiv.org/abs/2402.12851v1)|null|\n", "2402.12806": "|**2024-02-20**|**SymBa: Symbolic Backward Chaining for Multi-step Natural Language Reasoning**|Jinu Lee et.al.|[2402.12806v1](http://arxiv.org/abs/2402.12806v1)|null|\n", "2402.12728": "|**2024-02-20**|**Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering**|Junnan Dong et.al.|[2402.12728v1](http://arxiv.org/abs/2402.12728v1)|null|\n", "2402.12659": "|**2024-02-20**|**The FinBen: An Holistic Financial Benchmark for Large Language Models**|Qianqian Xie et.al.|[2402.12659v1](http://arxiv.org/abs/2402.12659v1)|**[link](https://github.com/the-finai/pixiu)**|\n", "2402.12620": "|**2024-02-20**|**Are Large Language Models (LLMs) Good Social Predictors?**|Kaiqi Yang et.al.|[2402.12620v1](http://arxiv.org/abs/2402.12620v1)|null|\n", "2402.12451": "|**2024-02-19**|**The (R)Evolution of Multimodal Large Language Models: A Survey**|Davide Caffagni et.al.|[2402.12451v1](http://arxiv.org/abs/2402.12451v1)|null|\n", "2402.14020": "|**2024-02-21**|**Coercing LLMs to do and reveal (almost) anything**|Jonas Geiping et.al.|[2402.14020v1](http://arxiv.org/abs/2402.14020v1)|**[link](https://github.com/jonasgeiping/carving)**|\n", "2402.14008": "|**2024-02-21**|**OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems**|Chaoqun He et.al.|[2402.14008v1](http://arxiv.org/abs/2402.14008v1)|**[link](https://github.com/openbmb/olympiadbench)**|\n", "2402.13950": "|**2024-02-23**|**Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning**|Debjit Paul et.al.|[2402.13950v2](http://arxiv.org/abs/2402.13950v2)|null|\n", "2402.13904": "|**2024-02-21**|**Calibrating Large Language Models with Sample Consistency**|Qing Lyu et.al.|[2402.13904v1](http://arxiv.org/abs/2402.13904v1)|null|\n", "2402.13718": "|**2024-02-24**|**$\\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens**|Xinrong Zhang et.al.|[2402.13718v3](http://arxiv.org/abs/2402.13718v3)|**[link](https://github.com/openbmb/infinitebench)**|\n", "2402.13602": "|**2024-02-21**|**Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving**|Mehdi Azarafza et.al.|[2402.13602v1](http://arxiv.org/abs/2402.13602v1)|null|\n", "2402.13584": "|**2024-02-21**|**WinoViz: Probing Visual Properties of Objects Under Different States**|Woojeong Jin et.al.|[2402.13584v1](http://arxiv.org/abs/2402.13584v1)|null|\n", "2402.13524": "|**2024-02-21**|**OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models**|Yang Liu et.al.|[2402.13524v1](http://arxiv.org/abs/2402.13524v1)|null|\n", "2402.13459": "|**2024-02-21**|**Learning to Poison Large Language Models During Instruction Tuning**|Yao Qiang et.al.|[2402.13459v1](http://arxiv.org/abs/2402.13459v1)|**[link](https://github.com/rookiezxy/gbtl)**|\n", "2402.13415": "|**2024-02-20**|**Structure Guided Prompt: Instructing Large Language Model in Multi-Step Reasoning by Exploring Graph Structure of the Text**|Kewei Cheng et.al.|[2402.13415v1](http://arxiv.org/abs/2402.13415v1)|null|\n", "2402.13372": "|**2024-02-22**|**EvoGrad: A Dynamic Take on the Winograd Schema Challenge with Human Adversaries**|Jing Han Sun et.al.|[2402.13372v2](http://arxiv.org/abs/2402.13372v2)|null|\n", "2402.14818": "|**2024-02-22**|**PALO: A Polyglot Large Multimodal Model for 5B People**|Muhammad Maaz et.al.|[2402.14818v1](http://arxiv.org/abs/2402.14818v1)|**[link](https://github.com/mbzuai-oryx/palo)**|\n", "2402.14809": "|**2024-02-22**|**CriticBench: Benchmarking LLMs for Critique-Correct Reasoning**|Zicheng Lin et.al.|[2402.14809v1](http://arxiv.org/abs/2402.14809v1)|**[link](https://github.com/CriticBench/CriticBench)**|\n", "2402.14760": "|**2024-02-22**|**Generalizing Reward Modeling for Out-of-Distribution Preference Learning**|Chen Jia et.al.|[2402.14760v1](http://arxiv.org/abs/2402.14760v1)|null|\n", "2402.14660": "|**2024-02-23**|**ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models**|Yanan Wu et.al.|[2402.14660v2](http://arxiv.org/abs/2402.14660v2)|**[link](https://github.com/conceptmath/conceptmath)**|\n", "2402.14404": "|**2024-02-26**|**On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe**|Ningyu Xu et.al.|[2402.14404v2](http://arxiv.org/abs/2402.14404v2)|**[link](https://github.com/ningyuxu/tip_of_tongue)**|\n", "2402.14382": "|**2024-02-22**|**Enhancing Temporal Knowledge Graph Forecasting with Large Language Models via Chain-of-History Reasoning**|Yuwei Xia et.al.|[2402.14382v1](http://arxiv.org/abs/2402.14382v1)|null|\n", "2402.14361": "|**2024-02-22**|**OpenTab: Advancing Large Language Models as Open-domain Table Reasoners**|Kezhi Kong et.al.|[2402.14361v1](http://arxiv.org/abs/2402.14361v1)|null|\n", "2402.14355": "|**2024-02-22**|**Rule or Story, Which is a Better Commonsense Expression for Talking with Large Language Models?**|Ning Bian et.al.|[2402.14355v1](http://arxiv.org/abs/2402.14355v1)|null|\n", "2402.14310": "|**2024-02-22**|**Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Encoded Knowledge**|Jinlan Fu et.al.|[2402.14310v1](http://arxiv.org/abs/2402.14310v1)|**[link](https://github.com/jinlanfu/hsp)**|\n", "2402.14296": "|**2024-02-22**|**Mitigating Biases of Large Language Models in Stance Detection with Calibration**|Ang Li et.al.|[2402.14296v1](http://arxiv.org/abs/2402.14296v1)|null|\n", "2402.14293": "|**2024-02-22**|**Leveraging Large Language Models for Concept Graph Recovery and Question Answering in NLP Education**|Rui Yang et.al.|[2402.14293v1](http://arxiv.org/abs/2402.14293v1)|**[link](https://github.com/irenezihuili/cgprompt)**|\n", "2402.14273": "|**2024-02-22**|**Can Language Models Act as Knowledge Bases at Scale?**|Qiyuan He et.al.|[2402.14273v1](http://arxiv.org/abs/2402.14273v1)|**[link](https://github.com/hyanique/lmkb-at-scale)**|\n", "2402.14258": "|**2024-02-22**|**Eagle: Ethical Dataset Given from Real Interactions**|Masahiro Kaneko et.al.|[2402.14258v1](http://arxiv.org/abs/2402.14258v1)|null|\n", "2402.14195": "|**2024-02-22**|**Learning to Reduce: Optimal Representations of Structured Data in Prompting Large Language Models**|Younghun Lee et.al.|[2402.14195v1](http://arxiv.org/abs/2402.14195v1)|null|\n", "2402.14123": "|**2024-02-21**|**DeiSAM: Segment Anything with Deictic Prompting**|Hikaru Shindo et.al.|[2402.14123v1](http://arxiv.org/abs/2402.14123v1)|**[link](https://github.com/ml-research/deictic-segment-anything)**|\n", "2402.14116": "|**2024-02-21**|**FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models**|Andrew Zhu et.al.|[2402.14116v1](http://arxiv.org/abs/2402.14116v1)|**[link](https://github.com/zhudotexe/fanoutqa)**|\n", "2402.15420": "|**2024-02-23**|**PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning**|Simon Holk et.al.|[2402.15420v1](http://arxiv.org/abs/2402.15420v1)|null|\n", "2402.15368": "|**2024-02-23**|**Safe Task Planning for Language-Instructed Multi-Robot Systems using Conformal Prediction**|Jun Wang et.al.|[2402.15368v1](http://arxiv.org/abs/2402.15368v1)|null|\n", "2402.15264": "|**2024-02-23**|**DEEM: Dynamic Experienced Expert Modeling for Stance Detection**|Xiaolong Wang et.al.|[2402.15264v1](http://arxiv.org/abs/2402.15264v1)|null|\n", "2402.15183": "|**2024-03-05**|**GraphEdit: Large Language Models for Graph Structure Learning**|Zirui Guo et.al.|[2402.15183v4](http://arxiv.org/abs/2402.15183v4)|**[link](https://github.com/hkuds/graphedit)**|\n", "2402.15131": "|**2024-02-23**|**Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models**|Guanming Xiong et.al.|[2402.15131v1](http://arxiv.org/abs/2402.15131v1)|null|\n", "2402.15116": "|**2024-02-23**|**Large Multimodal Agents: A Survey**|Junlin Xie et.al.|[2402.15116v1](http://arxiv.org/abs/2402.15116v1)|null|\n", "2402.15048": "|**2024-02-23**|**Unlocking the Power of Large Language Models for Entity Alignment**|Xuhui Jiang et.al.|[2402.15048v1](http://arxiv.org/abs/2402.15048v1)|null|\n", "2402.15018": "|**2024-02-22**|**Unintended Impacts of LLM Alignment on Global Representation**|Michael J. Ryan et.al.|[2402.15018v1](http://arxiv.org/abs/2402.15018v1)|null|\n", "2402.15000": "|**2024-02-22**|**Divide-or-Conquer? Which Part Should You Distill Your LLM?**|Zhuofeng Wu et.al.|[2402.15000v1](http://arxiv.org/abs/2402.15000v1)|null|\n", "2402.14963": "|**2024-02-22**|**Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning**|Hanqi Yan et.al.|[2402.14963v1](http://arxiv.org/abs/2402.14963v1)|null|\n", "2402.14903": "|**2024-02-22**|**Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs**|Aaditya K. Singh et.al.|[2402.14903v1](http://arxiv.org/abs/2402.14903v1)|null|\n", "2402.14874": "|**2024-02-21**|**Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation**|Phuc Phan et.al.|[2402.14874v1](http://arxiv.org/abs/2402.14874v1)|**[link](https://github.com/pphuc25/distil-cd)**|\n", "2402.14856": "|**2024-02-20**|**Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning**|Philipp Mondorf et.al.|[2402.14856v1](http://arxiv.org/abs/2402.14856v1)|null|\n", "2402.14850": "|**2024-02-20**|**CHATATC: Large Language Model-Driven Conversational Agents for Supporting Strategic Air Traffic Flow Management**|Sinan Abdulhak et.al.|[2402.14850v1](http://arxiv.org/abs/2402.14850v1)|null|\n", "2402.14848": "|**2024-02-19**|**Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models**|Mosh Levy et.al.|[2402.14848v1](http://arxiv.org/abs/2402.14848v1)|**[link](https://github.com/alonj/Same-Task-More-Tokens)**|\n", "2402.14840": "|**2024-02-19**|**RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning**|Congyun Jin et.al.|[2402.14840v1](http://arxiv.org/abs/2402.14840v1)|null|\n", "2402.14833": "|**2024-02-17**|**CliqueParcel: An Approach For Batching LLM Prompts That Jointly Optimizes Efficiency And Faithfulness**|Jiayi Liu et.al.|[2402.14833v1](http://arxiv.org/abs/2402.14833v1)|null|\n", "2402.16499": "|**2024-02-26**|**LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments**|Junzhe Chen et.al.|[2402.16499v1](http://arxiv.org/abs/2402.16499v1)|null|\n", "2402.16406": "|**2024-02-26**|**From RAGs to riches: Using large language models to write documents for clinical trials**|Nigel Markey et.al.|[2402.16406v1](http://arxiv.org/abs/2402.16406v1)|null|\n", "2402.16352": "|**2024-02-26**|**MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs**|Zimu Lu et.al.|[2402.16352v1](http://arxiv.org/abs/2402.16352v1)|null|\n", "2402.16313": "|**2024-02-26**|**Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering**|Mingxu Tao et.al.|[2402.16313v1](http://arxiv.org/abs/2402.16313v1)|null|\n", "2402.16124": "|**2024-02-25**|**AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation**|Yasheng Sun et.al.|[2402.16124v1](http://arxiv.org/abs/2402.16124v1)|null|\n", "2402.16117": "|**2024-02-25**|**RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis**|Yao Mu et.al.|[2402.16117v1](http://arxiv.org/abs/2402.16117v1)|null|\n", "2402.16048": "|**2024-02-25**|**LLMs with Chain-of-Thought Are Non-Causal Reasoners**|Guangsheng Bao et.al.|[2402.16048v1](http://arxiv.org/abs/2402.16048v1)|**[link](https://github.com/stevenzhb/cot_causal_analysis)**|\n", "2402.16029": "|**2024-03-06**|**GraphWiz: An Instruction-Following Language Model for Graph Problems**|Nuo Chen et.al.|[2402.16029v2](http://arxiv.org/abs/2402.16029v2)|null|\n", "2402.16006": "|**2024-02-25**|**From Noise to Clarity: Unraveling the Adversarial Suffix of Large Language Model Attacks via Translation of Text Embeddings**|Hao Wang et.al.|[2402.16006v1](http://arxiv.org/abs/2402.16006v1)|null|\n", "2402.15862": "|**2024-02-24**|**SportQA: A Benchmark for Sports Understanding in Large Language Models**|Haotian Xia et.al.|[2402.15862v1](http://arxiv.org/abs/2402.15862v1)|null|\n", "2402.15818": "|**2024-02-24**|**Linguistic Intelligence in Large Language Models for Telecommunications**|Tasnim Ahmed et.al.|[2402.15818v1](http://arxiv.org/abs/2402.15818v1)|null|\n", "2402.15764": "|**2024-02-24**|**Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models**|Haoran Liao et.al.|[2402.15764v1](http://arxiv.org/abs/2402.15764v1)|null|\n", "2402.15729": "|**2024-02-24**|**How Do Humans Write Code? Large Models Do It the Same Way Too**|Long Li et.al.|[2402.15729v1](http://arxiv.org/abs/2402.15729v1)|null|\n", "2402.15663": "|**2024-02-24**|**Leveraging ChatGPT in Pharmacovigilance Event Extraction: An Empirical Study**|Zhaoyue Sun et.al.|[2402.15663v1](http://arxiv.org/abs/2402.15663v1)|**[link](https://github.com/zhaoyuesun/phee-with-chatgpt)**|\n", "2402.15654": "|**2024-02-24**|**Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics**|Sadaf Ghaffari et.al.|[2402.15654v1](http://arxiv.org/abs/2402.15654v1)|null|\n", "2402.15631": "|**2024-02-23**|**Fine-Grained Self-Endorsement Improves Factuality and Reasoning**|Ante Wang et.al.|[2402.15631v1](http://arxiv.org/abs/2402.15631v1)|null|\n", "2402.15527": "|**2024-02-21**|**PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain**|Liang Chen et.al.|[2402.15527v1](http://arxiv.org/abs/2402.15527v1)|**[link](https://github.com/pkunlp-icler/pca-eval)**|\n", "2402.17709": "|**2024-02-27**|**Case-Based or Rule-Based: How Do Transformers Do the Math?**|Yi Hu et.al.|[2402.17709v1](http://arxiv.org/abs/2402.17709v1)|**[link](https://github.com/graphpku/case_or_rule)**|\n", "2402.17644": "|**2024-02-27**|**Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data**|Xiao Liu et.al.|[2402.17644v1](http://arxiv.org/abs/2402.17644v1)|**[link](https://github.com/xxxiaol/qrdata)**|\n", "2402.17453": "|**2024-02-27**|**DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning**|Siyuan Guo et.al.|[2402.17453v1](http://arxiv.org/abs/2402.17453v1)|null|\n", "2402.17231": "|**2024-02-27**|**MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning**|Debrup Das et.al.|[2402.17231v1](http://arxiv.org/abs/2402.17231v1)|**[link](https://github.com/debrup-61/mathsensei)**|\n", "2402.17226": "|**2024-02-27**|**Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models**|Xiaolong Wang et.al.|[2402.17226v1](http://arxiv.org/abs/2402.17226v1)|null|\n", "2402.16837": "|**2024-02-26**|**Do Large Language Models Latently Perform Multi-Hop Reasoning?**|Sohee Yang et.al.|[2402.16837v1](http://arxiv.org/abs/2402.16837v1)|null|\n", "2402.16568": "|**2024-02-26**|**Two-stage Generative Question Answering on Temporal Knowledge Graph Using Large Language Models**|Yifu Gao et.al.|[2402.16568v1](http://arxiv.org/abs/2402.16568v1)|null|\n", "2402.16905": "|**2024-02-24**|**Enforcing Temporal Constraints on Generative Agent Behavior with Reactive Synthesis**|Raven Rothkopf et.al.|[2402.16905v1](http://arxiv.org/abs/2402.16905v1)|null|\n", "2402.16611": "|**2024-02-21**|**Understanding the Dataset Practitioners Behind Large Language Model Development**|Crystal Qian et.al.|[2402.16611v1](http://arxiv.org/abs/2402.16611v1)|null|\n", "2402.18566": "|**2024-02-28**|**A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic**|Gregory Coppola et.al.|[2402.18566v1](http://arxiv.org/abs/2402.18566v1)|null|\n", "2402.18496": "|**2024-02-29**|**Language Models Represent Beliefs of Self and Others**|Wentao Zhu et.al.|[2402.18496v2](http://arxiv.org/abs/2402.18496v2)|null|\n", "2402.18439": "|**2024-02-28**|**Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication**|Weize Chen et.al.|[2402.18439v1](http://arxiv.org/abs/2402.18439v1)|**[link](https://github.com/thunlp/autoform)**|\n", "2402.18374": "|**2024-02-28**|**VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models**|Seoyeon Kim et.al.|[2402.18374v1](http://arxiv.org/abs/2402.18374v1)|null|\n", "2402.18344": "|**2024-02-28**|**Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning**|Jiachun Li et.al.|[2402.18344v1](http://arxiv.org/abs/2402.18344v1)|null|\n", "2402.18312": "|**2024-02-28**|**How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning**|Subhabrata Dutta et.al.|[2402.18312v1](http://arxiv.org/abs/2402.18312v1)|**[link](https://github.com/joykirat18/how-to-think-step-by-step)**|\n", "2402.18252": "|**2024-02-28**|**Towards Generalist Prompting for Large Language Models by Mental Models**|Haoxiang Guan et.al.|[2402.18252v1](http://arxiv.org/abs/2402.18252v1)|null|\n", "2402.18225": "|**2024-02-28**|**CogBench: a large language model walks into a psychology lab**|Julian Coda-Forno et.al.|[2402.18225v1](http://arxiv.org/abs/2402.18225v1)|**[link](https://github.com/juliancodaforno/cogbench)**|\n", "2402.18157": "|**2024-02-28**|**From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs**|Yulong Liu et.al.|[2402.18157v1](http://arxiv.org/abs/2402.18157v1)|null|\n", "2402.18150": "|**2024-02-28**|**Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation**|Shicheng Xu et.al.|[2402.18150v1](http://arxiv.org/abs/2402.18150v1)|**[link](https://github.com/xsc1234/info-rag)**|\n", "2402.18139": "|**2024-02-28**|**Cause and Effect: Can Large Language Models Truly Understand Causality?**|Swagata Ashwani et.al.|[2402.18139v1](http://arxiv.org/abs/2402.18139v1)|null|\n", "2402.18113": "|**2024-02-28**|**Small But Funny: A Feedback-Driven Approach to Humor Distillation**|Sahithya Ravi et.al.|[2402.18113v1](http://arxiv.org/abs/2402.18113v1)|null|\n", "2402.18093": "|**2024-02-28**|**ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection**|Takashi Koide et.al.|[2402.18093v1](http://arxiv.org/abs/2402.18093v1)|null|\n", "2402.18023": "|**2024-02-28**|**Do Large Language Models Mirror Cognitive Language Processing?**|Yuqi Ren et.al.|[2402.18023v1](http://arxiv.org/abs/2402.18023v1)|null|\n", "2402.17887": "|**2024-03-02**|**JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability**|Junda Wang et.al.|[2402.17887v2](http://arxiv.org/abs/2402.17887v2)|null|\n", "2402.17786": "|**2024-02-24**|**Stepwise Self-Consistent Mathematical Reasoning with Large Language Models**|Zilong Zhao et.al.|[2402.17786v1](http://arxiv.org/abs/2402.17786v1)|**[link](https://github.com/zhao-zilong/ssc-cot)**|\n", "2402.19471": "|**2024-02-29**|**Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling**|Gabriel Grand et.al.|[2402.19471v1](http://arxiv.org/abs/2402.19471v1)|null|\n", "2402.19446": "|**2024-02-29**|**ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL**|Yifei Zhou et.al.|[2402.19446v1](http://arxiv.org/abs/2402.19446v1)|**[link](https://github.com/yifeizhou02/archer)**|\n", "2402.19255": "|**2024-02-29**|**GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers**|Qintong Li et.al.|[2402.19255v1](http://arxiv.org/abs/2402.19255v1)|**[link](https://github.com/qtli/gsm-plus)**|\n", "2402.19173": "|**2024-02-29**|**StarCoder 2 and The Stack v2: The Next Generation**|Anton Lozhkov et.al.|[2402.19173v1](http://arxiv.org/abs/2402.19173v1)|null|\n", "2402.19150": "|**2024-02-29**|**Typographic Attacks in Large Multimodal Models Can be Alleviated by More Informative Prompts**|Hao Cheng et.al.|[2402.19150v1](http://arxiv.org/abs/2402.19150v1)|null|\n", "2402.18807": "|**2024-02-29**|**On the Decision-Making Abilities in Role-Playing using Large Language Models**|Chenglei Shen et.al.|[2402.18807v1](http://arxiv.org/abs/2402.18807v1)|null|\n", "2402.18695": "|**2024-02-28**|**Grounding Language Models for Visual Entity Recognition**|Zilin Xiao et.al.|[2402.18695v1](http://arxiv.org/abs/2402.18695v1)|**[link](https://github.com/mrzilinxiao/autover)**|\n", "2402.18679": "|**2024-03-12**|**Data Interpreter: An LLM Agent For Data Science**|Sirui Hong et.al.|[2402.18679v3](http://arxiv.org/abs/2402.18679v3)|**[link](https://github.com/geekan/metagpt)**|\n", "2402.18060": "|**2024-02-29**|**Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions**|Hanjie Chen et.al.|[2402.18060v2](http://arxiv.org/abs/2402.18060v2)|null|\n", "2403.01165": "|**2024-03-02**|**STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models**|Linhai Zhang et.al.|[2403.01165v1](http://arxiv.org/abs/2403.01165v1)|**[link](https://github.com/callanwu/star)**|\n", "2403.01133": "|**2024-03-02**|**Evaluating Large Language Models as Virtual Annotators for Time-series Physical Sensing Data**|Aritra Hota et.al.|[2403.01133v1](http://arxiv.org/abs/2403.01133v1)|null|\n", "2403.01106": "|**2024-03-02**|**Distilling Text Style Transfer With Self-Explanation From LLMs**|Chiyu Zhang et.al.|[2403.01106v1](http://arxiv.org/abs/2403.01106v1)|null|\n", "2403.01031": "|**2024-03-01**|**Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks**|Fakhraddin Alwajih et.al.|[2403.01031v1](http://arxiv.org/abs/2403.01031v1)|**[link](https://github.com/ubc-nlp/peacock)**|\n", "2403.00994": "|**2024-03-01**|**Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language**|Xiaohan Ding et.al.|[2403.00994v1](http://arxiv.org/abs/2403.00994v1)|null|\n", "2403.00758": "|**2024-03-07**|**Mitigating Reversal Curse via Semantic-aware Permutation Training**|Qingyan Guo et.al.|[2403.00758v2](http://arxiv.org/abs/2403.00758v2)|null|\n", "2403.00878": "|**2024-03-01**|**Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models**|Jiandong Jin et.al.|[2403.00878v1](http://arxiv.org/abs/2403.00878v1)|null|\n", "2403.00126": "|**2024-02-29**|**FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition**|Xiaoqiang Wang et.al.|[2403.00126v1](http://arxiv.org/abs/2403.00126v1)|null|\n", "2403.00839": "|**2024-02-29**|**ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph**|Xukun Liu et.al.|[2403.00839v1](http://arxiv.org/abs/2403.00839v1)|null|\n", "2403.00816": "|**2024-02-26**|**CFRet-DVQA: Coarse-to-Fine Retrieval and Efficient Tuning for Document Visual Question Answering**|Jinxu Zhang et.al.|[2403.00816v1](http://arxiv.org/abs/2403.00816v1)|null|\n", "2403.00806": "|**2024-02-24**|**Enhanced User Interaction in Operating Systems through Machine Learning Language Models**|Chenwei Zhang et.al.|[2403.00806v1](http://arxiv.org/abs/2403.00806v1)|null|\n", "2403.00800": "|**2024-02-23**|**Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought Processes**|Yezeng Chen et.al.|[2403.00800v1](http://arxiv.org/abs/2403.00800v1)|null|\n", "2403.00799": "|**2024-02-23**|**An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning**|Zui Chen et.al.|[2403.00799v1](http://arxiv.org/abs/2403.00799v1)|**[link](https://github.com/cyzhh/MMOS)**|\n", "2403.03203": "|**2024-03-05**|**CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments**|Savitha Sam Abraham et.al.|[2403.03203v1](http://arxiv.org/abs/2403.03203v1)|null|\n", "2403.03170": "|**2024-03-05**|**SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection**|Peng Qi et.al.|[2403.03170v1](http://arxiv.org/abs/2403.03170v1)|null|\n", "2403.03167": "|**2024-03-06**|**PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset**|Arda Uzunoglu et.al.|[2403.03167v2](http://arxiv.org/abs/2403.03167v2)|**[link](https://github.com/gglab-ku/paradise)**|\n", "2403.03154": "|**2024-03-05**|**Quantum Many-Body Physics Calculations with Large Language Models**|Haining Pan et.al.|[2403.03154v1](http://arxiv.org/abs/2403.03154v1)|null|\n", "2403.03101": "|**2024-03-05**|**KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents**|Yuqi Zhu et.al.|[2403.03101v1](http://arxiv.org/abs/2403.03101v1)|**[link](https://github.com/zjunlp/knowagent)**|\n", "2403.03029": "|**2024-03-05**|**Socratic Reasoning Improves Positive Text Rewriting**|Anmol Goel et.al.|[2403.03029v1](http://arxiv.org/abs/2403.03029v1)|null|\n", "2403.02965": "|**2024-03-05**|**ChatGPT and biometrics: an assessment of face recognition, gender detection, and age estimation capabilities**|Ahmad Hassanpour et.al.|[2403.02965v1](http://arxiv.org/abs/2403.02965v1)|null|\n", "2403.02889": "|**2024-03-05**|**In Search of Truth: An Interrogation Approach to Hallucination Detection**|Yakir Yehuda et.al.|[2403.02889v1](http://arxiv.org/abs/2403.02889v1)|null|\n", "2403.02884": "|**2024-03-05**|**MathScale: Scaling Instruction Tuning for Mathematical Reasoning**|Zhengyang Tang et.al.|[2403.02884v1](http://arxiv.org/abs/2403.02884v1)|null|\n", "2403.02760": "|**2024-03-12**|**Emerging Synergies Between Large Language Models and Machine Learning in Ecommerce Recommendations**|Xiaonan Xu et.al.|[2403.02760v2](http://arxiv.org/abs/2403.02760v2)|null|\n", "2403.02698": "|**2024-03-05**|**Causal Walk: Debiasing Multi-Hop Fact Verification with Front-Door Adjustment**|Congzhi Zhang et.al.|[2403.02698v1](http://arxiv.org/abs/2403.02698v1)|**[link](https://github.com/zcccccz/causalwalk)**|\n", "2403.02647": "|**2024-03-05**|**FinReport: Explainable Stock Earnings Forecasting via News Factor Analyzing Model**|Xiangyu Li et.al.|[2403.02647v1](http://arxiv.org/abs/2403.02647v1)|**[link](https://github.com/frinkleko/finreport)**|\n", "2403.02628": "|**2024-03-05**|**Interactive Continual Learning: Fast and Slow Thinking**|Biqing Qi et.al.|[2403.02628v1](http://arxiv.org/abs/2403.02628v1)|null|\n", "2403.02615": "|**2024-03-05**|**Exploring the Limitations of Large Language Models in Compositional Relation Reasoning**|Jinman Zhao et.al.|[2403.02615v1](http://arxiv.org/abs/2403.02615v1)|null|\n", "2403.02567": "|**2024-03-05**|**Eliciting Better Multilingual Structured Reasoning from LLMs through Code**|Bryan Li et.al.|[2403.02567v1](http://arxiv.org/abs/2403.02567v1)|null|\n", "2403.02333": "|**2024-03-04**|**Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning**|Yiming Huang et.al.|[2403.02333v1](http://arxiv.org/abs/2403.02333v1)|null|\n", "2403.02330": "|**2024-03-04**|**RegionGPT: Towards Region Understanding Vision Language Model**|Qiushan Guo et.al.|[2403.02330v1](http://arxiv.org/abs/2403.02330v1)|null|\n", "2403.02302": "|**2024-03-04**|**Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation**|Maksim Kuprashevich et.al.|[2403.02302v1](http://arxiv.org/abs/2403.02302v1)|**[link](https://github.com/wildchlamydia/mivolo)**|\n", "2403.02246": "|**2024-03-04**|**PHAnToM: Personality Has An Effect on Theory-of-Mind Reasoning in Large Language Models**|Fiona Anting Tan et.al.|[2403.02246v1](http://arxiv.org/abs/2403.02246v1)|null|\n", "2403.02178": "|**2024-03-04**|**Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models**|Changyu Chen et.al.|[2403.02178v1](http://arxiv.org/abs/2403.02178v1)|**[link](https://github.com/changyuchen347/maskedthought)**|\n", "2403.02164": "|**2024-03-05**|**Cognition is All You Need -- The Next Layer of AI Above Large Language Models**|Nova Spivack et.al.|[2403.02164v2](http://arxiv.org/abs/2403.02164v2)|null|\n", "2403.02054": "|**2024-03-04**|**Large Language Model-Based Evolutionary Optimizer: Reasoning with elitism**|Shuvayan Brahmachary et.al.|[2403.02054v1](http://arxiv.org/abs/2403.02054v1)|null|\n", "2403.01972": "|**2024-03-04**|**Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models**|Derong Xu et.al.|[2403.01972v1](http://arxiv.org/abs/2403.01972v1)|**[link](https://github.com/quqxui/mpikgc)**|\n", "2403.01969": "|**2024-03-04**|**AS-ES Learning: Towards Efficient CoT Learning in Small Models**|Nuwa Xi et.al.|[2403.01969v1](http://arxiv.org/abs/2403.01969v1)|null|\n", "2403.01777": "|**2024-03-05**|**NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models**|Lizhou Fan et.al.|[2403.01777v2](http://arxiv.org/abs/2403.01777v2)|**[link](https://github.com/lizhouf/nphardeval4v)**|\n", "2403.01457": "|**2024-03-03**|**Logic Rules as Explanations for Legal Case Retrieval**|Zhongxiang Sun et.al.|[2403.01457v1](http://arxiv.org/abs/2403.01457v1)|**[link](https://github.com/ke-01/ns-lcr)**|\n", "2403.01395": "|**2024-03-03**|**CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring Commonsense Reasoning and Long-Tail Knowledge**|Willis Guo et.al.|[2403.01395v1](http://arxiv.org/abs/2403.01395v1)|**[link](https://github.com/d3mlab/cr-lt-kgqa)**|\n", "2403.01390": "|**2024-03-03**|**Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering**|Armin Toroghi et.al.|[2403.01390v1](http://arxiv.org/abs/2403.01390v1)|null|\n", "2403.03870": "|**2024-03-06**|**Learning to Decode Collaboratively with Multiple Language Models**|Shannon Zejiang Shen et.al.|[2403.03870v1](http://arxiv.org/abs/2403.03870v1)|**[link](https://github.com/clinicalml/co-llm)**|\n", "2403.03864": "|**2024-03-13**|**Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning**|Deepanway Ghosal et.al.|[2403.03864v3](http://arxiv.org/abs/2403.03864v3)|**[link](https://github.com/declare-lab/llm-puzzletest)**|\n", "2403.03788": "|**2024-03-06**|**PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion**|Zekai Zhang et.al.|[2403.03788v1](http://arxiv.org/abs/2403.03788v1)|**[link](https://github.com/zekaigalaxy/pptcr)**|\n", "2403.03636": "|**2024-03-06**|**SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models**|Yibin Chen et.al.|[2403.03636v1](http://arxiv.org/abs/2403.03636v1)|null|\n", "2403.03627": "|**2024-03-06**|**Multimodal Large Language Models to Support Real-World Fact-Checking**|Jiahui Geng et.al.|[2403.03627v1](http://arxiv.org/abs/2403.03627v1)|null|\n", "2403.03585": "|**2024-03-06**|**RouteExplainer: An Explanation Framework for Vehicle Routing Problem**|Daisuke Kikuta et.al.|[2403.03585v1](http://arxiv.org/abs/2403.03585v1)|**[link](https://github.com/ntt-dkiku/route-explainer)**|\n", "2403.03536": "|**2024-03-06**|**Towards Efficient and Effective Unlearning of Large Language Models for Recommendation**|Hangyu Wang et.al.|[2403.03536v1](http://arxiv.org/abs/2403.03536v1)|**[link](https://github.com/justarter/e2urec)**|\n", "2403.03424": "|**2024-03-06**|**Generative News Recommendation**|Shen Gao et.al.|[2403.03424v1](http://arxiv.org/abs/2403.03424v1)|**[link](https://github.com/morganf33/gnr)**|\n", "2403.03288": "|**2024-03-05**|**Should We Fear Large Language Models? A Structural Analysis of the Human Reasoning System for Elucidating LLM Capabilities and Risks Through the Lens of Heidegger's Philosophy**|Jianqiiu Zhang et.al.|[2403.03288v1](http://arxiv.org/abs/2403.03288v1)|null|\n", "2403.04666": "|**2024-03-07**|**Telecom Language Models: Must They Be Large?**|Nicola Piovesan et.al.|[2403.04666v1](http://arxiv.org/abs/2403.04666v1)|null|\n", "2403.04642": "|**2024-03-07**|**Teaching Large Language Models to Reason with Reinforcement Learning**|Alex Havrilla et.al.|[2403.04642v1](http://arxiv.org/abs/2403.04642v1)|null|\n", "2403.04483": "|**2024-03-07**|**GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability**|Zihan Luo et.al.|[2403.04483v1](http://arxiv.org/abs/2403.04483v1)|**[link](https://github.com/cgcl-codes/graphinstruct)**|\n", "2403.04460": "|**2024-03-08**|**Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset**|Minjin Kim et.al.|[2403.04460v2](http://arxiv.org/abs/2403.04460v2)|null|\n", "2403.04382": "|**2024-03-07**|**Acceleron: A Tool to Accelerate Research Ideation**|Harshit Nigam et.al.|[2403.04382v1](http://arxiv.org/abs/2403.04382v1)|null|\n", "2403.04283": "|**2024-03-07**|**Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy**|Yu Zhu et.al.|[2403.04283v1](http://arxiv.org/abs/2403.04283v1)|null|\n", "2403.04260": "|**2024-03-07**|**Can Small Language Models be Good Reasoners for Sequential Recommendation?**|Yuling Wang et.al.|[2403.04260v1](http://arxiv.org/abs/2403.04260v1)|null|\n", "2403.04247": "|**2024-03-07**|**UltraWiki: Ultra-fine-grained Entity Set Expansion with Negative Seed Entities**|Yangning Li et.al.|[2403.04247v1](http://arxiv.org/abs/2403.04247v1)|**[link](https://github.com/thukelab/ultrawiki)**|\n", "2403.04123": "|**2024-03-07**|**Exploring LLM-based Agents for Root Cause Analysis**|Devjeet Roy et.al.|[2403.04123v1](http://arxiv.org/abs/2403.04123v1)|null|\n", "2403.04121": "|**2024-03-08**|**Can Large Language Models Reason and Plan?**|Subbarao Kambhampati et.al.|[2403.04121v2](http://arxiv.org/abs/2403.04121v2)|null|\n", "2403.04031": "|**2024-03-06**|**Can Large Language Models do Analytical Reasoning?**|Yebowen Hu et.al.|[2403.04031v1](http://arxiv.org/abs/2403.04031v1)|null|\n", "2403.04008": "|**2024-03-06**|**Human I/O: Towards a Unified Approach to Detecting Situational Impairments**|Xingyu Bruce Liu et.al.|[2403.04008v1](http://arxiv.org/abs/2403.04008v1)|null|\n", "2403.05530": "|**2024-03-08**|**Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context**|Machel Reid et.al.|[2403.05530v1](http://arxiv.org/abs/2403.05530v1)|null|\n", "2403.05523": "|**2024-03-11**|**Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapolation**|Yijiang Li et.al.|[2403.05523v2](http://arxiv.org/abs/2403.05523v2)|null|\n", "2403.05468": "|**2024-03-08**|**Will GPT-4 Run DOOM?**|Adrian de Wynter et.al.|[2403.05468v1](http://arxiv.org/abs/2403.05468v1)|null|\n", "2403.05326": "|**2024-03-19**|**ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues**|Yiding Liu et.al.|[2403.05326v3](http://arxiv.org/abs/2403.05326v3)|null|\n", "2403.05313": "|**2024-03-08**|**RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation**|Zihao Wang et.al.|[2403.05313v1](http://arxiv.org/abs/2403.05313v1)|null|\n", "2403.05060": "|**2024-03-08**|**Multimodal Infusion Tuning for Large Models**|Hao Sun et.al.|[2403.05060v1](http://arxiv.org/abs/2403.05060v1)|null|\n", "2403.04964": "|**2024-03-11**|**Tell me the truth: A system to measure the trustworthiness of Large Language Models**|Carlo Lipizzi et.al.|[2403.04964v2](http://arxiv.org/abs/2403.04964v2)|null|\n", "2403.04890": "|**2024-03-07**|**Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering**|Ojas Gramopadhye et.al.|[2403.04890v1](http://arxiv.org/abs/2403.04890v1)|null|\n", "2403.04780": "|**2024-03-13**|**MuseGraph: Graph-oriented Instruction Tuning of Large Language Models for Generic Graph Mining**|Yanchao Tan et.al.|[2403.04780v2](http://arxiv.org/abs/2403.04780v2)|null|\n", "2403.06935": "|**2024-03-13**|**Naming, Describing, and Quantifying Visual Objects in Humans and LLMs**|Alberto Testoni et.al.|[2403.06935v2](http://arxiv.org/abs/2403.06935v2)|**[link](https://github.com/albertotestoni/ndq_visual_objects)**|\n", "2403.06932": "|**2024-03-11**|**ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis**|Yanming Liu et.al.|[2403.06932v1](http://arxiv.org/abs/2403.06932v1)|**[link](https://github.com/oceanntwt/era-cot)**|\n", "2403.06840": "|**2024-03-11**|**RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback**|Yanming Liu et.al.|[2403.06840v1](http://arxiv.org/abs/2403.06840v1)|**[link](https://github.com/oceanntwt/ra-isf)**|\n", "2403.06642": "|**2024-03-11**|**KELLMRec: Knowledge-Enhanced Large Language Models for Recommendation**|Weiqing Luo et.al.|[2403.06642v1](http://arxiv.org/abs/2403.06642v1)|null|\n", "2403.06609": "|**2024-03-11**|**Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds**|Jiageng WU et.al.|[2403.06609v1](http://arxiv.org/abs/2403.06609v1)|null|\n", "2403.06591": "|**2024-03-11**|**Academically intelligent LLMs are not necessarily socially intelligent**|Ruoxi Xu et.al.|[2403.06591v1](http://arxiv.org/abs/2403.06591v1)|**[link](https://github.com/rossixu/social_intelligence_of_llms)**|\n", "2403.06574": "|**2024-03-11**|**AC-EVAL: Evaluating Ancient Chinese Language Understanding in Large Language Models**|Yuting Wei et.al.|[2403.06574v1](http://arxiv.org/abs/2403.06574v1)|**[link](https://github.com/yuting-wei/ac-eval)**|\n", "2403.06504": "|**2024-03-11**|**Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU**|Changyue Liao et.al.|[2403.06504v1](http://arxiv.org/abs/2403.06504v1)|null|\n", "2403.06485": "|**2024-03-11**|**Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach**|Jinxi Kuang et.al.|[2403.06485v1](http://arxiv.org/abs/2403.06485v1)|null|\n", "2403.06447": "|**2024-03-11**|**CoRAL: Collaborative Retrieval-Augmented Large Language Models Improve Long-tail Recommendation**|Junda Wu et.al.|[2403.06447v1](http://arxiv.org/abs/2403.06447v1)|null|\n", "2403.06400": "|**2024-03-11**|**DivCon: Divide and Conquer for Progressive Text-to-Image Generation**|Yuhao Jia et.al.|[2403.06400v1](http://arxiv.org/abs/2403.06400v1)|**[link](https://github.com/divcon-gen/divcon)**|\n", "2403.06294": "|**2024-03-10**|**ArgMed-Agents: Explainable Clinical Decision Reasoning with Large Language Models via Argumentation Schemes**|Shengxin Hong et.al.|[2403.06294v1](http://arxiv.org/abs/2403.06294v1)|null|\n", "2403.06199": "|**2024-03-15**|**Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models**|Minjie Zhu et.al.|[2403.06199v3](http://arxiv.org/abs/2403.06199v3)|null|\n", "2403.05854": "|**2024-03-13**|**LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content**|Qihao Zhao et.al.|[2403.05854v3](http://arxiv.org/abs/2403.05854v3)|null|\n", "2403.05632": "|**2024-03-08**|**Can Large Language Models Play Games? A Case Study of A Self-Play Approach**|Hongyi Guo et.al.|[2403.05632v1](http://arxiv.org/abs/2403.05632v1)|null|\n", "2403.07832": "|**2024-03-12**|**DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies**|William Xie et.al.|[2403.07832v1](http://arxiv.org/abs/2403.07832v1)|null|\n", "2403.07816": "|**2024-03-12**|**Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM**|Sainbayar Sukhbaatar et.al.|[2403.07816v1](http://arxiv.org/abs/2403.07816v1)|null|\n", "2403.07794": "|**2024-03-12**|**Fine-tuning Large Language Models with Sequential Instructions**|Hanxu Hu et.al.|[2403.07794v1](http://arxiv.org/abs/2403.07794v1)|**[link](https://github.com/hanxuhu/seq_it)**|\n", "2403.07769": "|**2024-03-15**|**Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations**|Carlos Jose Xavier Cruz et.al.|[2403.07769v3](http://arxiv.org/abs/2403.07769v3)|**[link](https://github.com/carlosxcruzcode/compet_colab_sma_llm)**|\n", "2403.07747": "|**2024-03-12**|**FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models**|Yan Liu et.al.|[2403.07747v1](http://arxiv.org/abs/2403.07747v1)|null|\n", "2403.07720": "|**2024-03-12**|**Multi-modal Auto-regressive Modeling via Visual Words**|Tianshuo Peng et.al.|[2403.07720v1](http://arxiv.org/abs/2403.07720v1)|**[link](https://github.com/pengts/vw-lmm)**|\n", "2403.07470": "|**2024-03-12**|**DrPlanner: Diagnosis and Repair of Motion Planners Using Large Language Models**|Yuanfei Lin et.al.|[2403.07470v1](http://arxiv.org/abs/2403.07470v1)|**[link](https://github.com/commonroad/drplanner)**|\n", "2403.07398": "|**2024-03-12**|**Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs**|Tianqing Fang et.al.|[2403.07398v1](http://arxiv.org/abs/2403.07398v1)|null|\n", "2403.07376": "|**2024-03-12**|**NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning**|Bingqian Lin et.al.|[2403.07376v1](http://arxiv.org/abs/2403.07376v1)|**[link](https://github.com/expectorlin/navcot)**|\n", "2403.07118": "|**2024-03-11**|**Narrating Causal Graphs with Large Language Models**|Atharva Phatak et.al.|[2403.07118v1](http://arxiv.org/abs/2403.07118v1)|null|\n", "2403.08743": "|**2024-03-13**|**Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework**|Jingling Li et.al.|[2403.08743v1](http://arxiv.org/abs/2403.08743v1)|null|\n", "2403.08739": "|**2024-03-13**|**The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models**|Carlo Nicolini et.al.|[2403.08739v1](http://arxiv.org/abs/2403.08739v1)|null|\n", "2403.08605": "|**2024-03-14**|**Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation**|Daniel Honerkamp et.al.|[2403.08605v2](http://arxiv.org/abs/2403.08605v2)|**[link](https://github.com/robot-learning-freiburg/MoMa-LLM)**|\n", "2403.08593": "|**2024-03-13**|**Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments**|Sitao Cheng et.al.|[2403.08593v1](http://arxiv.org/abs/2403.08593v1)|null|\n", "2403.08350": "|**2024-03-13**|**CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model**|Cheng Chen et.al.|[2403.08350v1](http://arxiv.org/abs/2403.08350v1)|**[link](https://github.com/zackschen/coin)**|\n", "2403.08337": "|**2024-03-13**|**LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments**|Maonan Wang et.al.|[2403.08337v1](http://arxiv.org/abs/2403.08337v1)|**[link](https://github.com/traffic-alpha/llm-assisted-light)**|\n", "2403.08213": "|**2024-03-13**|**Can Large Language Models Identify Authorship?**|Baixiang Huang et.al.|[2403.08213v1](http://arxiv.org/abs/2403.08213v1)|**[link](https://github.com/baixianghuang/authorship-llm)**|\n", "2403.08211": "|**2024-03-13**|**Large Language Models are Contrastive Reasoners**|Liang Yao et.al.|[2403.08211v1](http://arxiv.org/abs/2403.08211v1)|**[link](https://github.com/yao8839836/cp)**|\n", "2403.09631": "|**2024-03-14**|**3D-VLA: A 3D Vision-Language-Action Generative World Model**|Haoyu Zhen et.al.|[2403.09631v1](http://arxiv.org/abs/2403.09631v1)|null|\n", "2403.09611": "|**2024-03-22**|**MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training**|Brandon McKinzie et.al.|[2403.09611v3](http://arxiv.org/abs/2403.09611v3)|null|\n", "2403.09606": "|**2024-03-14**|**Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey**|Xiaoyu Liu et.al.|[2403.09606v1](http://arxiv.org/abs/2403.09606v1)|null|\n", "2403.09599": "|**2024-03-14**|**Logical Discrete Graphical Models Must Supplement Large Language Models for Information Synthesis**|Gregory Coppola et.al.|[2403.09599v1](http://arxiv.org/abs/2403.09599v1)|null|\n", "2403.09583": "|**2024-03-15**|**ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models**|Runyu Ma et.al.|[2403.09583v2](http://arxiv.org/abs/2403.09583v2)|null|\n", "2403.09572": "|**2024-03-22**|**Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation**|Yunhao Gou et.al.|[2403.09572v2](http://arxiv.org/abs/2403.09572v2)|null|\n", "2403.09559": "|**2024-03-21**|**Less is More: Data Value Estimation for Visual Instruction Tuning**|Zikang Liu et.al.|[2403.09559v2](http://arxiv.org/abs/2403.09559v2)|null|\n", "2403.09164": "|**2024-03-14**|**Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge**|Li Yizhen et.al.|[2403.09164v1](http://arxiv.org/abs/2403.09164v1)|null|\n", "2403.09163": "|**2024-03-14**|**Caveat Lector: Large Language Models in Legal Practice**|Eliza Mik et.al.|[2403.09163v1](http://arxiv.org/abs/2403.09163v1)|null|\n", "2403.09142": "|**2024-03-14**|**USimAgent: Large Language Models for Simulating Search Users**|Erhan Zhang et.al.|[2403.09142v1](http://arxiv.org/abs/2403.09142v1)|null|\n", "2403.09085": "|**2024-03-14**|**Meaningful Learning: Advancing Abstract Reasoning in Large Language Models via Generic Fact Guidance**|Kai Xiong et.al.|[2403.09085v1](http://arxiv.org/abs/2403.09085v1)|null|\n", "2403.09060": "|**2024-03-14**|**Query Rewriting via Large Language Models**|Jie Liu et.al.|[2403.09060v1](http://arxiv.org/abs/2403.09060v1)|null|\n", "2403.08946": "|**2024-03-13**|**Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era**|Xuansheng Wu et.al.|[2403.08946v1](http://arxiv.org/abs/2403.08946v1)|**[link](https://github.com/jacksonwuxs/usablexai_llm)**|\n", "2403.08844": "|**2024-03-13**|**AcademiaOS: Automating Grounded Theory Development in Qualitative Research with Large Language Models**|Thomas \u00dcbellacker et.al.|[2403.08844v1](http://arxiv.org/abs/2403.08844v1)|**[link](https://github.com/thomasuebi/academia-os)**|\n", "2403.08833": "|**2024-03-13**|**TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation**|Dingbang Li et.al.|[2403.08833v1](http://arxiv.org/abs/2403.08833v1)|null|\n", "2403.10517": "|**2024-03-15**|**VideoAgent: Long-form Video Understanding with Large Language Model as Agent**|Xiaohan Wang et.al.|[2403.10517v1](http://arxiv.org/abs/2403.10517v1)|null|\n", "2403.10507": "|**2024-03-15**|**Demystifying Faulty Code with LLM: Step-by-Step Reasoning for Explainable Fault Localization**|Ratnadira Widyasari et.al.|[2403.10507v1](http://arxiv.org/abs/2403.10507v1)|null|\n", "2403.10228": "|**2024-03-15**|**HawkEye: Training Video-Text LLMs for Grounding Text in Videos**|Yueqian Wang et.al.|[2403.10228v1](http://arxiv.org/abs/2403.10228v1)|**[link](https://github.com/yellow-binary-tree/hawkeye)**|\n", "2403.10171": "|**2024-03-15**|**AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation**|Arkajit Datta et.al.|[2403.10171v1](http://arxiv.org/abs/2403.10171v1)|null|\n", "2403.10131": "|**2024-03-15**|**RAFT: Adapting Language Model to Domain Specific RAG**|Tianjun Zhang et.al.|[2403.10131v1](http://arxiv.org/abs/2403.10131v1)|**[link](https://github.com/ShishirPatil/gorilla)**|\n", "2403.10107": "|**2024-03-15**|**Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning**|Hang Zhang et.al.|[2403.10107v1](http://arxiv.org/abs/2403.10107v1)|null|\n", "2403.10037": "|**2024-03-15**|**Knowledge Condensation and Reasoning for Knowledge-based VQA**|Dongze Hao et.al.|[2403.10037v1](http://arxiv.org/abs/2403.10037v1)|null|\n", "2403.09962": "|**2024-03-15**|**ViTCN: Vision Transformer Contrastive Network For Reasoning**|Bo Song et.al.|[2403.09962v1](http://arxiv.org/abs/2403.09962v1)|null|\n", "2403.09750": "|**2024-03-14**|**Meta-Cognitive Analysis: Evaluating Declarative and Procedural Knowledge in Datasets and Large Language Models**|Zhuoqun Li et.al.|[2403.09750v1](http://arxiv.org/abs/2403.09750v1)|**[link](https://github.com/li-z-q/meta-cognitive-analysis)**|\n", "2403.09747": "|**2024-03-14**|**Re-Search for The Truth: Multi-round Retrieval-augmented Large Language Models are Strong Fake News Detectors**|Guanghua Li et.al.|[2403.09747v1](http://arxiv.org/abs/2403.09747v1)|null|\n", "2403.09734": "|**2024-03-13**|**Do Large Language Models Solve ARC Visual Analogies Like People Do?**|Gustaw Opie\u0142ka et.al.|[2403.09734v1](http://arxiv.org/abs/2403.09734v1)|null|\n", "2403.11552": "|**2024-03-20**|**LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning**|Shu Wang et.al.|[2403.11552v2](http://arxiv.org/abs/2403.11552v2)|**[link](https://github.com/assassinws/llm-tamp)**|\n", "2403.11401": "|**2024-03-22**|**Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning**|Rao Fu et.al.|[2403.11401v2](http://arxiv.org/abs/2403.11401v2)|null|\n", "2403.11289": "|**2024-03-17**|**ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models**|Siyuan Huang et.al.|[2403.11289v1](http://arxiv.org/abs/2403.11289v1)|**[link](https://github.com/siyuanhuang95/manipvqa)**|\n", "2403.11129": "|**2024-03-17**|**Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering**|Baiyan Zhang et.al.|[2403.11129v1](http://arxiv.org/abs/2403.11129v1)|null|\n", "2403.11075": "|**2024-03-17**|**GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment**|Lance Ying et.al.|[2403.11075v1](http://arxiv.org/abs/2403.11075v1)|null|\n", "2403.10949": "|**2024-03-26**|**SelfIE: Self-Interpretation of Large Language Model Embeddings**|Haozhe Chen et.al.|[2403.10949v2](http://arxiv.org/abs/2403.10949v2)|**[link](https://github.com/tonychenxyz/selfie)**|\n", "2403.10900": "|**2024-03-16**|**BEnQA: A Question Answering and Reasoning Benchmark for Bengali and English**|Sheikh Shafayat et.al.|[2403.10900v1](http://arxiv.org/abs/2403.10900v1)|**[link](https://github.com/sheikhshafayat/benqa)**|\n", "2403.10854": "|**2024-03-16**|**A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment**|Tianhe Wu et.al.|[2403.10854v1](http://arxiv.org/abs/2403.10854v1)|**[link](https://github.com/tianhewu/mllms-for-iqa)**|\n", "2403.10762": "|**2024-03-16**|**NARRATE: Versatile Language Architecture for Optimal Control in Robotics**|Seif Ismail et.al.|[2403.10762v1](http://arxiv.org/abs/2403.10762v1)|null|\n", "2403.12958": "|**2024-03-19**|**Dated Data: Tracing Knowledge Cutoffs in Large Language Models**|Jeffrey Cheng et.al.|[2403.12958v1](http://arxiv.org/abs/2403.12958v1)|null|\n", "2403.12936": "|**2024-03-19**|**Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models**|Joana Ribeiro de Faria et.al.|[2403.12936v1](http://arxiv.org/abs/2403.12936v1)|null|\n", "2403.12895": "|**2024-03-19**|**mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding**|Anwen Hu et.al.|[2403.12895v1](http://arxiv.org/abs/2403.12895v1)|**[link](https://github.com/x-plug/mplug-docowl)**|\n", "2403.12884": "|**2024-03-19**|**HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning**|Fucai Ke et.al.|[2403.12884v1](http://arxiv.org/abs/2403.12884v1)|null|\n", "2403.12881": "|**2024-03-19**|**Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models**|Zehui Chen et.al.|[2403.12881v1](http://arxiv.org/abs/2403.12881v1)|**[link](https://github.com/internlm/agent-flan)**|\n", "2403.12848": "|**2024-03-19**|**Compositional 3D Scene Synthesis with Scene Graph Guided Layout-Shape Generation**|Yao Wei et.al.|[2403.12848v1](http://arxiv.org/abs/2403.12848v1)|null|\n", "2403.12801": "|**2024-03-19**|**RelationVLM: Making Large Vision-Language Models Understand Visual Relations**|Zhipeng Huang et.al.|[2403.12801v1](http://arxiv.org/abs/2403.12801v1)|null|\n", "2403.12744": "|**2024-03-19**|**Instructing Large Language Models to Identify and Ignore Irrelevant Conditions**|Zhenyu Wu et.al.|[2403.12744v1](http://arxiv.org/abs/2403.12744v1)|**[link](https://github.com/wzy6642/I3C-Select)**|\n", "2403.12596": "|**2024-03-19**|**Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs**|Victor Carbune et.al.|[2403.12596v1](http://arxiv.org/abs/2403.12596v1)|null|\n", "2403.12582": "|**2024-03-19**|**AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework**|Xiang Li et.al.|[2403.12582v1](http://arxiv.org/abs/2403.12582v1)|**[link](https://github.com/alphafin-proj/alphafin)**|\n", "2403.12533": "|**2024-03-19**|**To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions**|Daniel Tanneberg et.al.|[2403.12533v1](http://arxiv.org/abs/2403.12533v1)|null|\n", "2403.12482": "|**2024-03-19**|**Embodied LLM Agents Learn to Cooperate in Organized Teams**|Xudong Guo et.al.|[2403.12482v1](http://arxiv.org/abs/2403.12482v1)|null|\n", "2403.12393": "|**2024-03-19**|**Dr3: Ask Large Language Models Not to Give Off-Topic Answers in Open Domain Multi-Hop Question Answering**|Yuan Gao et.al.|[2403.12393v1](http://arxiv.org/abs/2403.12393v1)|null|\n", "2403.12373": "|**2024-03-22**|**RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners**|Chi Hu et.al.|[2403.12373v3](http://arxiv.org/abs/2403.12373v3)|null|\n", "2403.12316": "|**2024-03-18**|**OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety**|Chuang Liu et.al.|[2403.12316v1](http://arxiv.org/abs/2403.12316v1)|null|\n", "2403.12173": "|**2024-03-18**|**TnT-LLM: Text Mining at Scale with Large Language Models**|Mengting Wan et.al.|[2403.12173v1](http://arxiv.org/abs/2403.12173v1)|null|\n", "2403.12014": "|**2024-03-18**|**EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents**|Abhay Zala et.al.|[2403.12014v1](http://arxiv.org/abs/2403.12014v1)|null|\n", "2403.12766": "|**2024-03-18**|**NovelQA: A Benchmark for Long-Range Novel Question Answering**|Cunxiang Wang et.al.|[2403.12766v1](http://arxiv.org/abs/2403.12766v1)|**[link](https://github.com/novelqa/novelqa.github.io)**|\n", "2403.11886": "|**2024-03-18**|**QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction**|Xiang Huang et.al.|[2403.11886v1](http://arxiv.org/abs/2403.11886v1)|null|\n", "2403.11835": "|**2024-03-18**|**Agent3D-Zero: An Agent for Zero-shot 3D Understanding**|Sha Zhang et.al.|[2403.11835v1](http://arxiv.org/abs/2403.11835v1)|null|\n", "2403.11810": "|**2024-03-18**|**Metaphor Understanding Challenge Dataset for LLMs**|Xiaoyu Tong et.al.|[2403.11810v1](http://arxiv.org/abs/2403.11810v1)|null|\n", "2403.11802": "|**2024-03-25**|**Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models**|Mingyang Song et.al.|[2403.11802v2](http://arxiv.org/abs/2403.11802v2)|**[link](https://github.com/nick7nlp/counting-stars)**|\n", "2403.11793": "|**2024-03-18**|**Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus**|Seungpil Lee et.al.|[2403.11793v1](http://arxiv.org/abs/2403.11793v1)|null|\n", "2403.13786": "|**2024-03-23**|**Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts**|Guangzeng Han et.al.|[2403.13786v2](http://arxiv.org/abs/2403.13786v2)|null|\n", "2403.13592": "|**2024-03-22**|**Llama meets EU: Investigating the European Political Spectrum through the Lens of LLMs**|Ilias Chalkidis et.al.|[2403.13592v2](http://arxiv.org/abs/2403.13592v2)|**[link](https://github.com/coastalcph/eu-politics-llms)**|\n", "2403.13315": "|**2024-03-20**|**PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns**|Yew Ken Chia et.al.|[2403.13315v1](http://arxiv.org/abs/2403.13315v1)|**[link](https://github.com/declare-lab/llm-puzzletest)**|\n", "2403.13312": "|**2024-03-20**|**LeanReasoner: Boosting Complex Logical Reasoning with Lean**|Dongwei Jiang et.al.|[2403.13312v1](http://arxiv.org/abs/2403.13312v1)|**[link](https://github.com/some-random/theorem-proving-reasoning)**|\n", "2403.13271": "|**2024-03-20**|**Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs**|Zhihong Sun et.al.|[2403.13271v1](http://arxiv.org/abs/2403.13271v1)|null|\n", "2403.13164": "|**2024-03-19**|**VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning**|Yongshuo Zong et.al.|[2403.13164v1](http://arxiv.org/abs/2403.13164v1)|**[link](https://github.com/ys-zong/vl-icl)**|\n", "2403.13002": "|**2024-03-13**|**AutoTRIZ: Artificial Ideation with TRIZ and Large Language Models**|Shuo Jiang et.al.|[2403.13002v1](http://arxiv.org/abs/2403.13002v1)|null|\n", "2403.12999": "|**2024-03-11**|**Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control**|On Tai Wu et.al.|[2403.12999v1](http://arxiv.org/abs/2403.12999v1)|null|\n", "2403.14624": "|**2024-03-21**|**MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?**|Renrui Zhang et.al.|[2403.14624v1](http://arxiv.org/abs/2403.14624v1)|null|\n", "2403.14565": "|**2024-03-21**|**A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science**|Clayton Cohn et.al.|[2403.14565v1](http://arxiv.org/abs/2403.14565v1)|null|\n", "2403.14312": "|**2024-03-21**|**ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting**|Xiaoxue Cheng et.al.|[2403.14312v1](http://arxiv.org/abs/2403.14312v1)|**[link](https://github.com/rucaibox/chainlm)**|\n", "2403.14255": "|**2024-03-21**|**ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification**|Sehee Lim et.al.|[2403.14255v1](http://arxiv.org/abs/2403.14255v1)|null|\n", "2403.14253": "|**2024-03-23**|**K-Act2Emo: Korean Commonsense Knowledge Graph for Indirect Emotional Expression**|Kyuhee Kim et.al.|[2403.14253v2](http://arxiv.org/abs/2403.14253v2)|**[link](https://github.com/koreankiwi99/k-act2emo)**|\n", "2403.14141": "|**2024-03-21**|**Empowering Segmentation Ability to Multi-modal Large Language Models**|Yuqi Yang et.al.|[2403.14141v1](http://arxiv.org/abs/2403.14141v1)|null|\n", "2403.14112": "|**2024-03-21**|**Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations**|Jiaxing Sun et.al.|[2403.14112v1](http://arxiv.org/abs/2403.14112v1)|**[link](https://github.com/opendatalab/charm)**|\n", "2403.14071": "|**2024-03-21**|**Empowering Personalized Learning through a Conversation-based Tutoring System with Student Modeling**|Minju Park et.al.|[2403.14071v1](http://arxiv.org/abs/2403.14071v1)|null|\n", "2403.13838": "|**2024-03-14**|**Circuit Transformer: End-to-end Circuit Design by Predicting the Next Gate**|Xihan Li et.al.|[2403.13838v1](http://arxiv.org/abs/2403.13838v1)|null|\n", "2403.15388": "|**2024-04-01**|**LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models**|Yuzhang Shang et.al.|[2403.15388v3](http://arxiv.org/abs/2403.15388v3)|null|\n", "2403.15371": "|**2024-03-22**|**Can large language models explore in-context?**|Akshay Krishnamurthy et.al.|[2403.15371v1](http://arxiv.org/abs/2403.15371v1)|null|\n", "2403.15362": "|**2024-03-22**|**CoLLEGe: Concept Embedding Generation for Large Language Models**|Ryan Teehan et.al.|[2403.15362v1](http://arxiv.org/abs/2403.15362v1)|null|\n", "2403.15297": "|**2024-03-22**|**Sphere Neural-Networks for Rational Reasoning**|Tiansi Dong et.al.|[2403.15297v1](http://arxiv.org/abs/2403.15297v1)|null|\n", "2403.15209": "|**2024-03-22**|**MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection**|Taeheon Kim et.al.|[2403.15209v1](http://arxiv.org/abs/2403.15209v1)|null|\n", "2403.15137": "|**2024-03-22**|**CACA Agent: Capability Collaboration based AI Agent**|Peng Xu et.al.|[2403.15137v1](http://arxiv.org/abs/2403.15137v1)|null|\n", "2403.14982": "|**2024-04-03**|**MasonTigers at SemEval-2024 Task 9: Solving Puzzles with an Ensemble of Chain-of-Thoughts**|Md Nishat Raihan et.al.|[2403.14982v2](http://arxiv.org/abs/2403.14982v2)|null|\n", "2403.14932": "|**2024-03-22**|**Attention-Driven Reasoning: Unlocking the Potential of Large Language Models**|Bingli Liao et.al.|[2403.14932v1](http://arxiv.org/abs/2403.14932v1)|null|\n", "2403.14743": "|**2024-03-25**|**VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding**|Ahmad Mahmood et.al.|[2403.14743v2](http://arxiv.org/abs/2403.14743v2)|null|\n", "2403.16527": "|**2024-03-25**|**Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art**|Neeloy Chakraborty et.al.|[2403.16527v1](http://arxiv.org/abs/2403.16527v1)|null|\n", "2403.16524": "|**2024-03-25**|**Harnessing the power of LLMs for normative reasoning in MASs**|Bastin Tony Roy Savarimuthu et.al.|[2403.16524v1](http://arxiv.org/abs/2403.16524v1)|null|\n", "2403.16517": "|**2024-03-25**|**Norm Violation Detection in Multi-Agent Systems using Large Language Models: A Pilot Study**|Shawn He et.al.|[2403.16517v1](http://arxiv.org/abs/2403.16517v1)|null|\n", "2403.16437": "|**2024-03-25**|**Evaluating Large Language Models with Runtime Behavior of Program Execution**|Junkai Chen et.al.|[2403.16437v1](http://arxiv.org/abs/2403.16437v1)|null|\n", "2403.16427": "|**2024-03-27**|**Re2LLM: Reflective Reinforcement Large Language Model for Session-based Recommendation**|Ziyan Wang et.al.|[2403.16427v3](http://arxiv.org/abs/2403.16427v3)|null|\n", "2403.16385": "|**2024-03-28**|**Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA**|Zhuowan Li et.al.|[2403.16385v2](http://arxiv.org/abs/2403.16385v2)|null|\n", "2403.16097": "|**2024-03-28**|**Can Language Models Pretend Solvers? Logic Code Simulation with LLMs**|Minyu Chen et.al.|[2403.16097v2](http://arxiv.org/abs/2403.16097v2)|null|\n", "2403.16073": "|**2024-03-24**|**Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications**|Wei Ma et.al.|[2403.16073v1](http://arxiv.org/abs/2403.16073v1)|null|\n", "2403.15737": "|**2024-03-23**|**Few-shot Dialogue Strategy Learning for Motivational Interviewing via Inductive Reasoning**|Zhouhang Xie et.al.|[2403.15737v1](http://arxiv.org/abs/2403.15737v1)|null|\n", "2403.15736": "|**2024-03-23**|**LLMs Instruct LLMs:An Extraction and Editing Method**|Xin Zhang et.al.|[2403.15736v1](http://arxiv.org/abs/2403.15736v1)|null|\n", "2403.15491": "|**2024-03-21**|**Open Source Conversational LLMs do not know most Spanish words**|Javier Conde et.al.|[2403.15491v1](http://arxiv.org/abs/2403.15491v1)|null|\n", "2403.15464": "|**2024-03-19**|**LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction**|Hejie Cui et.al.|[2403.15464v1](http://arxiv.org/abs/2403.15464v1)|null|\n", "2403.17927": "|**2024-03-26**|**MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution**|Wei Tao et.al.|[2403.17927v1](http://arxiv.org/abs/2403.17927v1)|null|\n", "2403.17830": "|**2024-03-26**|**Assessment of Multimodal Large Language Models in Alignment with Human Values**|Zhelun Shi et.al.|[2403.17830v1](http://arxiv.org/abs/2403.17830v1)|null|\n", "2403.17760": "|**2024-03-26**|**Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons**|Shijia Zhou et.al.|[2403.17760v1](http://arxiv.org/abs/2403.17760v1)|**[link](https://github.com/shijiazh/constructions-are-so-difficult)**|\n", "2403.17688": "|**2024-03-26**|**Large Language Models Enhanced Collaborative Filtering**|Zhongxiang Sun et.al.|[2403.17688v1](http://arxiv.org/abs/2403.17688v1)|null|\n", "2403.17491": "|**2024-03-26**|**DGoT: Dynamic Graph of Thoughts for Scientific Abstract Generation**|Xinyu Ning et.al.|[2403.17491v1](http://arxiv.org/abs/2403.17491v1)|**[link](https://github.com/jaycening/dgot)**|\n", "2403.17368": "|**2024-03-26**|**ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?**|Fan Huang et.al.|[2403.17368v1](http://arxiv.org/abs/2403.17368v1)|**[link](https://github.com/muyuhuatang/chatgptrater)**|\n", "2403.17359": "|**2024-03-26**|**Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models**|Zhenyu Pan et.al.|[2403.17359v1](http://arxiv.org/abs/2403.17359v1)|null|\n", "2403.17246": "|**2024-03-25**|**TwoStep: Multi-agent Task Planning using Classical Planners and Large Language Models**|Ishika Singh et.al.|[2403.17246v1](http://arxiv.org/abs/2403.17246v1)|null|\n", "2403.17218": "|**2024-03-25**|**A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection**|Benjamin Steenhoek et.al.|[2403.17218v1](http://arxiv.org/abs/2403.17218v1)|null|\n", "2403.17124": "|**2024-03-25**|**Grounding Language Plans in Demonstrations Through Counterfactual Perturbations**|Yanwei Wang et.al.|[2403.17124v1](http://arxiv.org/abs/2403.17124v1)|null|\n", "2403.16999": "|**2024-03-25**|**Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models**|Hao Shao et.al.|[2403.16999v1](http://arxiv.org/abs/2403.16999v1)|**[link](https://github.com/deepcs233/visual-cot)**|\n", "2403.16921": "|**2024-03-25**|**PropTest: Automatic Property Testing for Improved Visual Programming**|Jaywon Koo et.al.|[2403.16921v1](http://arxiv.org/abs/2403.16921v1)|null|\n", "2403.18814": "|**2024-03-27**|**Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models**|Yanwei Li et.al.|[2403.18814v1](http://arxiv.org/abs/2403.18814v1)|**[link](https://github.com/dvlab-research/minigemini)**|\n", "2403.18802": "|**2024-04-03**|**Long-form factuality in large language models**|Jerry Wei et.al.|[2403.18802v3](http://arxiv.org/abs/2403.18802v3)|**[link](https://github.com/google-deepmind/long-form-factuality)**|\n", "2403.18537": "|**2024-03-27**|**A Path Towards Legal Autonomy: An interoperable and explainable approach to extracting, transforming, loading and computing legal information using large language models, expert systems and Bayesian networks**|Axel Constant et.al.|[2403.18537v1](http://arxiv.org/abs/2403.18537v1)|null|\n", "2403.18426": "|**2024-03-27**|**TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions**|Jamshid Mozafari et.al.|[2403.18426v1](http://arxiv.org/abs/2403.18426v1)|**[link](https://github.com/datascienceuibk/triviahg)**|\n", "2403.18415": "|**2024-03-27**|**The Topos of Transformer Networks**|Mattia Jacopo Villani et.al.|[2403.18415v1](http://arxiv.org/abs/2403.18415v1)|null|\n", "2403.18406": "|**2024-03-27**|**An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM**|Wonkyun Kim et.al.|[2403.18406v1](http://arxiv.org/abs/2403.18406v1)|**[link](https://github.com/imagegridworth/IG-VLM)**|\n", "2403.18405": "|**2024-03-27**|**Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval**|Shengjie Ma et.al.|[2403.18405v1](http://arxiv.org/abs/2403.18405v1)|null|\n", "2403.18365": "|**2024-03-27**|**BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models**|Haitao Li et.al.|[2403.18365v1](http://arxiv.org/abs/2403.18365v1)|null|\n", "2403.18346": "|**2024-04-03**|**Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective**|Meiqi Chen et.al.|[2403.18346v3](http://arxiv.org/abs/2403.18346v3)|null|\n", "2403.18344": "|**2024-03-27**|**LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models**|Mingxing Peng et.al.|[2403.18344v1](http://arxiv.org/abs/2403.18344v1)|null|\n", "2403.18295": "|**2024-03-27**|**Dual Instruction Tuning with Large Language Models for Mathematical Reasoning**|Yongwei Zhou et.al.|[2403.18295v1](http://arxiv.org/abs/2403.18295v1)|null|\n", "2403.18252": "|**2024-03-27**|**Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models**|Yiwu Zhong et.al.|[2403.18252v1](http://arxiv.org/abs/2403.18252v1)|**[link](https://github.com/lavi-lab/visual-table)**|\n", "2403.18230": "|**2024-03-27**|**Large Language Models Need Consultants for Reasoning: Becoming an Expert in a Complex Human System Through Behavior Simulation**|Chuwen Wang et.al.|[2403.18230v1](http://arxiv.org/abs/2403.18230v1)|**[link](https://github.com/hakys-a/meow)**|\n", "2403.18159": "|**2024-03-28**|**Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language Models**|Kartikeya Bhardwaj et.al.|[2403.18159v2](http://arxiv.org/abs/2403.18159v2)|null|\n", "2403.18120": "|**2024-03-26**|**Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization**|Jin Peng Zhou et.al.|[2403.18120v1](http://arxiv.org/abs/2403.18120v1)|**[link](https://github.com/jinpz/dtv)**|\n", "2403.18062": "|**2024-03-26**|**ShapeGrasp: Zero-Shot Task-Oriented Grasping with Large Language Models through Geometric Decomposition**|Samuel Li et.al.|[2403.18062v1](http://arxiv.org/abs/2403.18062v1)|null|\n", "2403.19631": "|**2024-03-28**|**Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models**|Yucheng Shi et.al.|[2403.19631v1](http://arxiv.org/abs/2403.19631v1)|null|\n", "2403.19414": "|**2024-03-28**|**BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue Generation**|Yuhong He et.al.|[2403.19414v1](http://arxiv.org/abs/2403.19414v1)|null|\n", "2403.19369": "|**2024-03-28**|**RAIL: Robot Affordance Imagination with Large Language Models**|Ceng Zhang et.al.|[2403.19369v1](http://arxiv.org/abs/2403.19369v1)|null|\n", "2403.19336": "|**2024-03-28**|**IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation**|Jiacui Huang et.al.|[2403.19336v1](http://arxiv.org/abs/2403.19336v1)|null|\n", "2403.19322": "|**2024-03-28**|**Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models**|Jiaxing Chen et.al.|[2403.19322v1](http://arxiv.org/abs/2403.19322v1)|null|\n", "2403.19318": "|**2024-04-01**|**TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios**|Xiaokang Zhang et.al.|[2403.19318v2](http://arxiv.org/abs/2403.19318v2)|**[link](https://github.com/TableLLM/TableLLM)**|\n", "2403.19167": "|**2024-03-28**|**Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering**|Yexin Wu et.al.|[2403.19167v1](http://arxiv.org/abs/2403.19167v1)|null|\n", "2403.19116": "|**2024-03-28**|**MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering**|Che Guan et.al.|[2403.19116v1](http://arxiv.org/abs/2403.19116v1)|null|\n", "2403.19094": "|**2024-03-28**|**Learning From Correctness Without Prompting Makes LLM Efficient Reasoner**|Yuxuan Yao et.al.|[2403.19094v1](http://arxiv.org/abs/2403.19094v1)|null|\n", "2403.19046": "|**2024-03-27**|**LITA: Language Instructed Temporal-Localization Assistant**|De-An Huang et.al.|[2403.19046v1](http://arxiv.org/abs/2403.19046v1)|**[link](https://github.com/nvlabs/lita)**|\n", "2403.20180": "|**2024-03-29**|**Measuring Taiwanese Mandarin Language Understanding**|Po-Heng Chen et.al.|[2403.20180v1](http://arxiv.org/abs/2403.20180v1)|null|\n", "2403.20097": "|**2024-03-29**|**ITCMA: A Generative Agent Based on a Computational Consciousness Structure**|Hanzhong Zhang et.al.|[2403.20097v1](http://arxiv.org/abs/2403.20097v1)|null|\n", "2403.20009": "|**2024-03-29**|**On Large Language Models' Hallucination with Regard to Known Facts**|Che Jiang et.al.|[2403.20009v1](http://arxiv.org/abs/2403.20009v1)|null|\n", "2403.19962": "|**2024-03-29**|**Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning**|Qinhao Zhou et.al.|[2403.19962v1](http://arxiv.org/abs/2403.19962v1)|null|\n", "2403.19857": "|**2024-03-28**|**LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces**|Xiaomin Ouyang et.al.|[2403.19857v1](http://arxiv.org/abs/2403.19857v1)|null|\n", "2403.19838": "|**2024-03-28**|**Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving**|Akshay Gopalkrishnan et.al.|[2403.19838v1](http://arxiv.org/abs/2403.19838v1)|**[link](https://github.com/akshaygopalkr/em-vlm4ad)**|\n", "2404.02078": "|**2024-04-02**|**Advancing LLM Reasoning Generalists with Preference Trees**|Lifan Yuan et.al.|[2404.02078v1](http://arxiv.org/abs/2404.02078v1)|**[link](https://github.com/openbmb/eurus)**|\n", "2404.02060": "|**2024-04-04**|**Long-context LLMs Struggle with Long In-context Learning**|Tianle Li et.al.|[2404.02060v2](http://arxiv.org/abs/2404.02060v2)|**[link](https://github.com/tiger-ai-lab/longiclbench)**|\n", "2404.02018": "|**2024-04-02**|**Large Language Models for Orchestrating Bimanual Robots**|Kun Chu et.al.|[2404.02018v1](http://arxiv.org/abs/2404.02018v1)|null|\n", "2404.01954": "|**2024-04-13**|**HyperCLOVA X Technical Report**|Kang Min Yoo et.al.|[2404.01954v2](http://arxiv.org/abs/2404.01954v2)|null|\n", "2404.01869": "|**2024-04-02**|**Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey**|Philipp Mondorf et.al.|[2404.01869v1](http://arxiv.org/abs/2404.01869v1)|null|\n", "2404.01855": "|**2024-04-02**|**Where to Move Next: Zero-shot Generalization of LLMs for Next POI Recommendation**|Shanshan Feng et.al.|[2404.01855v1](http://arxiv.org/abs/2404.01855v1)|**[link](https://github.com/llmmove/llmmove)**|\n", "2404.01677": "|**2024-04-03**|**Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation**|Zhouhao Sun et.al.|[2404.01677v2](http://arxiv.org/abs/2404.01677v2)|null|\n", "2404.01667": "|**2024-04-02**|**METAL: Towards Multilingual Meta-Evaluation**|Rishav Hada et.al.|[2404.01667v1](http://arxiv.org/abs/2404.01667v1)|null|\n", "2404.01644": "|**2024-04-02**|**InsightLens: Discovering and Exploring Insights from Conversational Contexts in Large-Language-Model-Powered Data Analysis**|Luoxuan Weng et.al.|[2404.01644v1](http://arxiv.org/abs/2404.01644v1)|null|\n", "2404.01535": "|**2024-04-01**|**Syntactic Robustness for LLM-based Code Generation**|Laboni Sarker et.al.|[2404.01535v1](http://arxiv.org/abs/2404.01535v1)|null|\n", "2404.01475": "|**2024-04-01**|**Are large language models superhuman chemists?**|Adrian Mirza et.al.|[2404.01475v1](http://arxiv.org/abs/2404.01475v1)|null|\n", "2404.01461": "|**2024-04-01**|**Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs**|Pengda Wang et.al.|[2404.01461v1](http://arxiv.org/abs/2404.01461v1)|null|\n", "2404.01261": "|**2024-04-01**|**FABLES: Evaluating faithfulness and content selection in book-length summarization**|Yekyung Kim et.al.|[2404.01261v1](http://arxiv.org/abs/2404.01261v1)|**[link](https://github.com/mungg/fables)**|\n", "2404.01245": "|**2024-04-01**|**A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules**|Xiang Li et.al.|[2404.01245v1](http://arxiv.org/abs/2404.01245v1)|null|\n", "2404.01230": "|**2024-04-01**|**LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models**|Yadong Zhang et.al.|[2404.01230v1](http://arxiv.org/abs/2404.01230v1)|null|\n", "2404.01135": "|**2024-04-01**|**Enhancing Reasoning Capacity of SLM using Cognitive Enhancement**|Jonathan Pan et.al.|[2404.01135v1](http://arxiv.org/abs/2404.01135v1)|null|\n", "2404.01096": "|**2024-04-01**|**Enabling Memory Safety of C Programs using LLMs**|Nausheen Mohammed et.al.|[2404.01096v1](http://arxiv.org/abs/2404.01096v1)|null|\n", "2404.00909": "|**2024-04-01**|**Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning**|Rongjie Li et.al.|[2404.00909v1](http://arxiv.org/abs/2404.00909v1)|null|\n", "2404.00732": "|**2024-04-02**|**An Abundance of Katherines: The Game Theory of Baby Naming**|Katy Blumer et.al.|[2404.00732v2](http://arxiv.org/abs/2404.00732v2)|null|\n", "2404.01343": "|**2024-03-31**|**CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs**|Jingzhe Shi et.al.|[2404.01343v1](http://arxiv.org/abs/2404.01343v1)|null|\n", "2404.00492": "|**2024-03-30**|**Multi-hop Question Answering under Temporal Knowledge Editing**|Keyuan Cheng et.al.|[2404.00492v1](http://arxiv.org/abs/2404.00492v1)|null|\n", "2404.00450": "|**2024-04-04**|**Planning and Editing What You Retrieve for Enhanced Tool Learning**|Tenghao Huang et.al.|[2404.00450v2](http://arxiv.org/abs/2404.00450v2)|**[link](https://github.com/tenghaohuang/pluto)**|\n", "2404.00376": "|**2024-03-30**|**Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks**|Hyunjae Kim et.al.|[2404.00376v1](http://arxiv.org/abs/2404.00376v1)|null|\n", "2404.00344": "|**2024-03-30**|**Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange**|Ankit Satpute et.al.|[2404.00344v1](http://arxiv.org/abs/2404.00344v1)|**[link](https://github.com/gipplab/llm-investig-mathstackexchange)**|\n", "2404.00246": "|**2024-03-30**|**Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World**|Guande Wu et.al.|[2404.00246v1](http://arxiv.org/abs/2404.00246v1)|**[link](https://github.com/jnzs1836/coblocks)**|\n", "2404.00245": "|**2024-03-30**|**Aligning Large Language Models with Recommendation Knowledge**|Yuwei Cao et.al.|[2404.00245v1](http://arxiv.org/abs/2404.00245v1)|null|\n", "2404.00242": "|**2024-03-30**|**DeFT: Flash Tree-attention with IO-Awareness for Efficient Tree-search-based LLM Inference**|Jinwei Yao et.al.|[2404.00242v1](http://arxiv.org/abs/2404.00242v1)|null|\n", "2404.00211": "|**2024-03-30**|**Multi-Conditional Ranking with Large Language Models**|Pouya Pezeshkpour et.al.|[2404.00211v1](http://arxiv.org/abs/2404.00211v1)|**[link](https://github.com/megagonlabs/mcr)**|\n", "2404.00209": "|**2024-03-30**|**EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs**|Cheng Jiayang et.al.|[2404.00209v1](http://arxiv.org/abs/2404.00209v1)|**[link](https://github.com/hkust-knowcomp/eventground)**|\n", "2404.00205": "|**2024-03-30**|**Conceptual and Unbiased Reasoning in Language Models**|Ben Zhou et.al.|[2404.00205v1](http://arxiv.org/abs/2404.00205v1)|null|\n", "2404.00141": "|**2024-03-29**|**Classifying Conspiratorial Narratives At Scale: False Alarms and Erroneous Connections**|Ahmad Diab et.al.|[2404.00141v1](http://arxiv.org/abs/2404.00141v1)|null|\n", "2404.02838": "|**2024-04-03**|**I-Design: Personalized LLM Interior Designer**|Ata \u00c7elen et.al.|[2404.02838v1](http://arxiv.org/abs/2404.02838v1)|null|\n", "2404.02831": "|**2024-04-03**|**Empowering Biomedical Discovery with AI Agents**|Shanghua Gao et.al.|[2404.02831v1](http://arxiv.org/abs/2404.02831v1)|null|\n", "2404.02817": "|**2024-04-05**|**A Survey of Optimization-based Task and Motion Planning: From Classical To Learning Approaches**|Zhigen Zhao et.al.|[2404.02817v2](http://arxiv.org/abs/2404.02817v2)|null|\n", "2404.02575": "|**2024-04-03**|**Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models**|Hyungjoo Chae et.al.|[2404.02575v1](http://arxiv.org/abs/2404.02575v1)|null|\n", "2404.02508": "|**2024-04-03**|**VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments**|Bufang Yang et.al.|[2404.02508v1](http://arxiv.org/abs/2404.02508v1)|null|\n", "2404.02403": "|**2024-04-03**|**Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT**|Amirhossein Abaskohi et.al.|[2404.02403v1](http://arxiv.org/abs/2404.02403v1)|**[link](https://github.com/ipouyall/benchmarking_chatgpt_for_persian)**|\n", "2404.02255": "|**2024-04-02**|**$\\texttt{LM}^\\texttt{2}$: A Simple Society of Language Models Solves Complex Reasoning**|Gurusha Juneja et.al.|[2404.02255v1](http://arxiv.org/abs/2404.02255v1)|null|\n", "2404.03647": "|**2024-04-04**|**Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra**|Darioush Kevian et.al.|[2404.03647v1](http://arxiv.org/abs/2404.03647v1)|null|\n", "2404.03623": "|**2024-04-04**|**Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph**|Marco Bronzini et.al.|[2404.03623v1](http://arxiv.org/abs/2404.03623v1)|null|\n", "2404.03622": "|**2024-04-04**|**Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models**|Wenshan Wu et.al.|[2404.03622v1](http://arxiv.org/abs/2404.03622v1)|null|\n", "2404.03608": "|**2024-04-04**|**Sailor: Open Language Models for South-East Asia**|Longxu Dou et.al.|[2404.03608v1](http://arxiv.org/abs/2404.03608v1)|**[link](https://github.com/sail-sg/sailor-llm)**|\n", "2404.03602": "|**2024-04-04**|**Evaluating LLMs at Detecting Errors in LLM Responses**|Ryo Kamoi et.al.|[2404.03602v1](http://arxiv.org/abs/2404.03602v1)|**[link](https://github.com/psunlpgroup/realmistake)**|\n", "2404.03577": "|**2024-04-04**|**Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models**|Yantao Liu et.al.|[2404.03577v1](http://arxiv.org/abs/2404.03577v1)|**[link](https://github.com/thu-keg/knot)**|\n", "2404.03428": "|**2024-04-04**|**Edisum: Summarizing and Explaining Wikipedia Edits at Scale**|Marija \u0160akota et.al.|[2404.03428v1](http://arxiv.org/abs/2404.03428v1)|**[link](https://github.com/epfl-dlab/edisum)**|\n", "2404.03414": "|**2024-04-04**|**Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought**|Jooyoung Lee et.al.|[2404.03414v1](http://arxiv.org/abs/2404.03414v1)|null|\n", "2404.03361": "|**2024-04-04**|**nicolay-r at SemEval-2024 Task 3: Using Flan-T5 for Reasoning Emotion Cause in Conversations with Chain-of-Thought on Emotion States**|Nicolay Rusnachenko et.al.|[2404.03361v1](http://arxiv.org/abs/2404.03361v1)|**[link](https://github.com/nicolay-r/thor-ecac)**|\n", "2404.03301": "|**2024-04-04**|**Probing Large Language Models for Scalar Adjective Lexical Semantics and Scalar Diversity Pragmatics**|Fangru Lin et.al.|[2404.03301v1](http://arxiv.org/abs/2404.03301v1)|**[link](https://github.com/fangru-lin/llm_scalar_adj)**|\n", "2404.03189": "|**2024-04-04**|**The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models**|Noah Y. Siegel et.al.|[2404.03189v1](http://arxiv.org/abs/2404.03189v1)|null|\n", "2404.03134": "|**2024-04-04**|**Robust Pronoun Use Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?**|Vagrant Gautam et.al.|[2404.03134v1](http://arxiv.org/abs/2404.03134v1)|**[link](https://github.com/uds-lsv/pronoun-use-fidelity)**|\n", "2404.03028": "|**2024-04-10**|**An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models**|Emmy Liu et.al.|[2404.03028v2](http://arxiv.org/abs/2404.03028v2)|null|\n", "2404.02983": "|**2024-04-03**|**Towards a Fully Interpretable and More Scalable RSA Model for Metaphor Understanding**|Gaia Carenini et.al.|[2404.02983v1](http://arxiv.org/abs/2404.02983v1)|null|\n", "2404.02937": "|**2024-04-03**|**Explainable Traffic Flow Prediction with Large Language Models**|Xusen Guo et.al.|[2404.02937v1](http://arxiv.org/abs/2404.02937v1)|null|\n", "2404.02935": "|**2024-04-03**|**KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking**|Jiawei Zhang et.al.|[2404.02935v1](http://arxiv.org/abs/2404.02935v1)|**[link](https://github.com/javyduck/knowhalu)**|\n", "2404.02934": "|**2024-04-03**|**GreedLlama: Performance of Financial Value-Aligned Large Language Models in Moral Reasoning**|Jeffy Yu et.al.|[2404.02934v1](http://arxiv.org/abs/2404.02934v1)|null|\n", "2404.04242": "|**2024-04-05**|**Physical Property Understanding from Language-Embedded Feature Fields**|Albert J. Zhai et.al.|[2404.04242v1](http://arxiv.org/abs/2404.04242v1)|null|\n", "2404.04237": "|**2024-04-05**|**Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents**|Harsh Kohli et.al.|[2404.04237v1](http://arxiv.org/abs/2404.04237v1)|null|\n", "2404.04042": "|**2024-04-05**|**Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer**|Hele-Andra Kuulmets et.al.|[2404.04042v1](http://arxiv.org/abs/2404.04042v1)|null|\n", "2404.03891": "|**2024-04-05**|**Can only LLMs do Reasoning?: Potential of Small Language Models in Task Planning**|Gawon Choi et.al.|[2404.03891v1](http://arxiv.org/abs/2404.03891v1)|**[link](https://github.com/gawon-choi/small-lms-task-planning)**|\n", "2404.03887": "|**2024-04-08**|**SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models**|Hyeonwoo Kim et.al.|[2404.03887v2](http://arxiv.org/abs/2404.03887v2)|null|\n", "2404.05719": "|**2024-04-08**|**Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs**|Keen You et.al.|[2404.05719v1](http://arxiv.org/abs/2404.05719v1)|null|\n", "2404.05692": "|**2024-04-08**|**Evaluating Mathematical Reasoning Beyond Accuracy**|Shijie Xia et.al.|[2404.05692v1](http://arxiv.org/abs/2404.05692v1)|**[link](https://github.com/gair-nlp/reasoneval)**|\n", "2404.05673": "|**2024-04-18**|**CoReS: Orchestrating the Dance of Reasoning and Segmentation**|Xiaoyi Bao et.al.|[2404.05673v2](http://arxiv.org/abs/2404.05673v2)|null|\n", "2404.05590": "|**2024-04-08**|**MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering**|I\u00f1igo Alonso et.al.|[2404.05590v1](http://arxiv.org/abs/2404.05590v1)|null|\n", "2404.05545": "|**2024-04-08**|**Evaluating Interventional Reasoning Capabilities of Large Language Models**|Tejas Kasetty et.al.|[2404.05545v1](http://arxiv.org/abs/2404.05545v1)|null|\n", "2404.05465": "|**2024-04-08**|**HAMMR: HierArchical MultiModal React agents for generic VQA**|Lluis Castrejon et.al.|[2404.05465v1](http://arxiv.org/abs/2404.05465v1)|null|\n", "2404.05449": "|**2024-04-11**|**RoT: Enhancing Large Language Models with Reflection on Search Trees**|Wenyang Hui et.al.|[2404.05449v2](http://arxiv.org/abs/2404.05449v2)|**[link](https://github.com/huiwy/reflection-on-trees)**|\n", "2404.05291": "|**2024-04-08**|**Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models**|Yutao Ouyang et.al.|[2404.05291v1](http://arxiv.org/abs/2404.05291v1)|null|\n", "2404.05221": "|**2024-04-08**|**LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models**|Shibo Hao et.al.|[2404.05221v1](http://arxiv.org/abs/2404.05221v1)|null|\n", "2404.05134": "|**2024-04-08**|**LLM-BT: Performing Robotic Adaptive Tasks based on Large Language Models and Behavior Trees**|Haotian Zhou et.al.|[2404.05134v1](http://arxiv.org/abs/2404.05134v1)|null|\n", "2404.05052": "|**2024-04-07**|**Facial Affective Behavior Analysis with Instruction Tuning**|Yifan Li et.al.|[2404.05052v1](http://arxiv.org/abs/2404.05052v1)|null|\n", "2404.04990": "|**2024-04-07**|**MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models**|Zihao Wei et.al.|[2404.04990v1](http://arxiv.org/abs/2404.04990v1)|**[link](https://github.com/hi-archers/mlake)**|\n", "2404.04963": "|**2024-04-07**|**SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials**|Mael Jullien et.al.|[2404.04963v1](http://arxiv.org/abs/2404.04963v1)|null|\n", "2404.04929": "|**2024-04-07**|**RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models**|Qi Lv et.al.|[2404.04929v1](http://arxiv.org/abs/2404.04929v1)|null|\n", "2404.04834": "|**2024-04-07**|**LLM-Based Multi-Agent Systems for Software Engineering: Vision and the Road Ahead**|Junda He et.al.|[2404.04834v1](http://arxiv.org/abs/2404.04834v1)|null|\n", "2404.04817": "|**2024-04-07**|**FRACTAL: Fine-Grained Scoring from Aggregate Text Labels**|Yukti Makhija et.al.|[2404.04817v1](http://arxiv.org/abs/2404.04817v1)|null|\n", "2404.04763": "|**2024-04-07**|**GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling**|Hritik Bansal et.al.|[2404.04763v1](http://arxiv.org/abs/2404.04763v1)|null|\n", "2404.04752": "|**2024-04-06**|**Challenges Faced by Large Language Models in Solving Multi-Agent Flocking**|Peihan Li et.al.|[2404.04752v1](http://arxiv.org/abs/2404.04752v1)|null|\n", "2404.04728": "|**2024-04-06**|**Navigating the Landscape of Hint Generation Research: From the Past to the Future**|Anubhav Jangra et.al.|[2404.04728v1](http://arxiv.org/abs/2404.04728v1)|null|\n", "2404.04667": "|**2024-04-06**|**Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology**|Dyke Ferber et.al.|[2404.04667v1](http://arxiv.org/abs/2404.04667v1)|null|\n", "2404.04627": "|**2024-04-06**|**Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement**|Zaid Khan et.al.|[2404.04627v1](http://arxiv.org/abs/2404.04627v1)|null|\n", "2404.04510": "|**2024-04-06**|**IITK at SemEval-2024 Task 2: Exploring the Capabilities of LLMs for Safe Biomedical Natural Language Inference for Clinical Trials**|Shreyasi Mandal et.al.|[2404.04510v1](http://arxiv.org/abs/2404.04510v1)|**[link](https://github.com/exploration-lab/iitk-semeval-2024-task-2-clinical-nli)**|\n", "2404.04442": "|**2024-04-05**|**Exploring Autonomous Agents through the Lens of Large Language Models: A Review**|Saikat Barua et.al.|[2404.04442v1](http://arxiv.org/abs/2404.04442v1)|null|\n", "2404.04351": "|**2024-04-05**|**Assisting humans in complex comparisons: automated information comparison at scale**|Truman Yuen et.al.|[2404.04351v1](http://arxiv.org/abs/2404.04351v1)|null|\n", "2404.04346": "|**2024-04-05**|**Koala: Key frame-conditioned long video-LLM**|Reuben Tan et.al.|[2404.04346v1](http://arxiv.org/abs/2404.04346v1)|null|\n", "2404.04302": "|**2024-04-04**|**CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering**|Nirmalie Wiratunga et.al.|[2404.04302v1](http://arxiv.org/abs/2404.04302v1)|**[link](https://github.com/rgu-iit-bt/cbr-for-legal-rag)**|\n", "2404.04293": "|**2024-04-04**|**Reason from Fallacy: Enhancing Large Language Models' Logical Reasoning through Logical Fallacy Understanding**|Yanda Li et.al.|[2404.04293v1](http://arxiv.org/abs/2404.04293v1)|null|\n", "2404.06411": "|**2024-04-09**|**AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents**|Luca Gioacchini et.al.|[2404.06411v1](http://arxiv.org/abs/2404.06411v1)|**[link](https://github.com/nec-research/agentquest)**|\n", "2404.06371": "|**2024-04-09**|**Model Generation from Requirements with LLMs: an Exploratory Study**|Alessio Ferrari et.al.|[2404.06371v1](http://arxiv.org/abs/2404.06371v1)|null|\n", "2404.06345": "|**2024-04-21**|**AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning**|Senkang Hu et.al.|[2404.06345v2](http://arxiv.org/abs/2404.06345v2)|null|\n", "2404.06311": "|**2024-04-09**|**DRE: Generating Recommendation Explanations by Aligning Large Language Models at Data-level**|Shen Gao et.al.|[2404.06311v1](http://arxiv.org/abs/2404.06311v1)|null|\n", "2404.06227": "|**2024-04-09**|**Multimodal Road Network Generation Based on Large Language Model**|Jiajing Chen et.al.|[2404.06227v1](http://arxiv.org/abs/2404.06227v1)|null|\n", "2404.05868": "|**2024-04-08**|**Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning**|Ruiqi Zhang et.al.|[2404.05868v1](http://arxiv.org/abs/2404.05868v1)|null|\n", "2404.07103": "|**2024-04-10**|**Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs**|Bowen Jin et.al.|[2404.07103v1](http://arxiv.org/abs/2404.07103v1)|**[link](https://github.com/petergriffinjin/graph-cot)**|\n", "2404.07078": "|**2024-04-10**|**VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning**|Alexandros Xenos et.al.|[2404.07078v1](http://arxiv.org/abs/2404.07078v1)|**[link](https://github.com/nickyfot/emocommonsense)**|\n", "2404.06962": "|**2024-04-10**|**Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study**|Hongru Du et.al.|[2404.06962v1](http://arxiv.org/abs/2404.06962v1)|**[link](https://github.com/miemieyanga/pandemicllm)**|\n", "2404.06904": "|**2024-04-10**|**Vision-Language Model-based Physical Reasoning for Robot Liquid Perception**|Wenqiang Lai et.al.|[2404.06904v1](http://arxiv.org/abs/2404.06904v1)|null|\n", "2404.06645": "|**2024-04-09**|**GenCHiP: Generating Robot Policy Code for High-Precision and Contact-Rich Manipulation Tasks**|Kaylee Burns et.al.|[2404.06645v1](http://arxiv.org/abs/2404.06645v1)|null|\n", "2404.06644": "|**2024-04-09**|**Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?**|Omid Ghahroodi et.al.|[2404.06644v1](http://arxiv.org/abs/2404.06644v1)|null|\n", "2404.07922": "|**2024-04-17**|**LaVy: Vietnamese Multimodal Large Language Model**|Chi Tran et.al.|[2404.07922v4](http://arxiv.org/abs/2404.07922v4)|**[link](https://github.com/baochi0212/lavy)**|\n", "2404.07677": "|**2024-04-11**|**ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs**|Lei Sun et.al.|[2404.07677v1](http://arxiv.org/abs/2404.07677v1)|null|\n", "2404.07456": "|**2024-04-11**|**WESE: Weak Exploration to Strong Exploitation for LLM Agents**|Xu Huang et.al.|[2404.07456v1](http://arxiv.org/abs/2404.07456v1)|null|\n", "2404.07449": "|**2024-04-11**|**Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs**|Kanchana Ranasinghe et.al.|[2404.07449v1](http://arxiv.org/abs/2404.07449v1)|null|\n", "2404.08589": "|**2024-04-12**|**Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts**|\u00d6vg\u00fc \u00d6zdemir et.al.|[2404.08589v1](http://arxiv.org/abs/2404.08589v1)|**[link](https://github.com/ovguyo/captions-in-vqa)**|\n", "2404.08506": "|**2024-04-12**|**LaSagnA: Language-based Segmentation Assistant for Complex Queries**|Cong Wei et.al.|[2404.08506v1](http://arxiv.org/abs/2404.08506v1)|**[link](https://github.com/congvvc/lasagna)**|\n", "2404.08492": "|**2024-04-12**|**Strategic Interactions between Large Language Models-based Agents in Beauty Contests**|Siting Lu et.al.|[2404.08492v1](http://arxiv.org/abs/2404.08492v1)|null|\n", "2404.08488": "|**2024-04-12**|**Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian**|Stefano De Paoli et.al.|[2404.08488v1](http://arxiv.org/abs/2404.08488v1)|null|\n", "2404.08148": "|**2024-04-11**|**Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs**|Jierui Li et.al.|[2404.08148v1](http://arxiv.org/abs/2404.08148v1)|null|\n", "2404.08092": "|**2024-04-11**|**Data-Augmentation-Based Dialectal Adaptation for LLMs**|Fahim Faisal et.al.|[2404.08092v1](http://arxiv.org/abs/2404.08092v1)|**[link](https://github.com/ffaisal93/dialect_copa)**|\n", "2404.08008": "|**2024-04-10**|**Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition**|Kehua Feng et.al.|[2404.08008v1](http://arxiv.org/abs/2404.08008v1)|**[link](https://github.com/weiji-feng/mad-eval)**|\n", "2404.09717": "|**2024-04-15**|**Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model**|Hyunsoo Cho et.al.|[2404.09717v1](http://arxiv.org/abs/2404.09717v1)|null|\n", "2404.09699": "|**2024-04-15**|**Generative AI for Game Theory-based Mobile Networking**|Long He et.al.|[2404.09699v1](http://arxiv.org/abs/2404.09699v1)|null|\n", "2404.09632": "|**2024-04-15**|**Bridging Vision and Language Spaces with Assignment Prediction**|Jungin Park et.al.|[2404.09632v1](http://arxiv.org/abs/2404.09632v1)|**[link](https://github.com/park-jungin/vlap)**|\n", "2404.09492": "|**2024-04-15**|**Bridging the Gap between Different Vocabularies for LLM Ensemble**|Yangyifan Xu et.al.|[2404.09492v1](http://arxiv.org/abs/2404.09492v1)|**[link](https://github.com/xydaytoy/eva)**|\n", "2404.09491": "|**2024-04-15**|**Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning**|Sungwon Han et.al.|[2404.09491v1](http://arxiv.org/abs/2404.09491v1)|**[link](https://github.com/sungwon-han/featllm)**|\n", "2404.09486": "|**2024-04-15**|**MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems**|Kaixin Li et.al.|[2404.09486v1](http://arxiv.org/abs/2404.09486v1)|**[link](https://github.com/happylkx/mmcode)**|\n", "2404.09228": "|**2024-04-14**|**A Survey on Integration of Large Language Models with Intelligent Robots**|Yeseung Kim et.al.|[2404.09228v1](http://arxiv.org/abs/2404.09228v1)|null|\n", "2404.09170": "|**2024-04-16**|**Post-Semantic-Thinking: A Robust Strategy to Distill Reasoning Capacity from Large Language Models**|Xiaoshu Chen et.al.|[2404.09170v2](http://arxiv.org/abs/2404.09170v2)|null|\n", "2404.09129": "|**2024-04-14**|**When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models**|Yanhong Li et.al.|[2404.09129v1](http://arxiv.org/abs/2404.09129v1)|null|\n", "2404.09077": "|**2024-04-13**|**CuriousLLM: Elevating Multi-Document QA with Reasoning-Infused Knowledge Graph Prompting**|Zukang Yang et.al.|[2404.09077v1](http://arxiv.org/abs/2404.09077v1)|**[link](https://github.com/zukangy/kgp-curiousllm)**|\n", "2404.08827": "|**2024-04-12**|**\"Don't forget to put the milk back!\" Dataset for Enabling Embodied Agents to Detect Anomalous Situations**|James F. Mullen Jr et.al.|[2404.08827v1](http://arxiv.org/abs/2404.08827v1)|null|\n", "2404.08767": "|**2024-04-12**|**LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning**|Junchi Wang et.al.|[2404.08767v1](http://arxiv.org/abs/2404.08767v1)|**[link](https://github.com/wangjunchi/llmseg)**|\n", "2404.08704": "|**2024-04-11**|**MM-PhyQA: Multimodal Physics Question-Answering With Multi-Image CoT Prompting**|Avinash Anand et.al.|[2404.08704v1](http://arxiv.org/abs/2404.08704v1)|null|\n", "2404.08692": "|**2024-04-10**|**Apollonion: Profile-centric Dialog Agent**|Shangyu Chen et.al.|[2404.08692v1](http://arxiv.org/abs/2404.08692v1)|null|\n", "2404.08676": "|**2024-04-06**|**ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming**|Simone Tedeschi et.al.|[2404.08676v1](http://arxiv.org/abs/2404.08676v1)|**[link](https://github.com/babelscape/alert)**|\n", "2404.10642": "|**2024-04-16**|**Self-playing Adversarial Language Game Enhances LLM Reasoning**|Pengyu Cheng et.al.|[2404.10642v1](http://arxiv.org/abs/2404.10642v1)|**[link](https://github.com/linear95/spag)**|\n", "2404.10618": "|**2024-04-16**|**Private Attribute Inference from Images with Vision-Language Models**|Batuhan T\u00f6mek\u00e7e et.al.|[2404.10618v1](http://arxiv.org/abs/2404.10618v1)|null|\n", "2404.10595": "|**2024-04-16**|**Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases**|Yanze Li et.al.|[2404.10595v1](http://arxiv.org/abs/2404.10595v1)|null|\n", "2404.10513": "|**2024-04-16**|**CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity**|Moshe Berchansky et.al.|[2404.10513v1](http://arxiv.org/abs/2404.10513v1)|null|\n", "2404.10429": "|**2024-04-16**|**MEEL: Multi-Modal Event Evolution Learning**|Zhengwei Tao et.al.|[2404.10429v1](http://arxiv.org/abs/2404.10429v1)|**[link](https://github.com/tzwwww/meel)**|\n", "2404.10384": "|**2024-04-16**|**Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering**|Yuqi Wang et.al.|[2404.10384v1](http://arxiv.org/abs/2404.10384v1)|null|\n", "2404.10346": "|**2024-04-16**|**Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards**|Hyeonbin Hwang et.al.|[2404.10346v1](http://arxiv.org/abs/2404.10346v1)|**[link](https://github.com/hbin0701/Self-Explore)**|\n", "2404.10150": "|**2024-04-15**|**TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition**|Md Mahadi Hasan Nahid et.al.|[2404.10150v1](http://arxiv.org/abs/2404.10150v1)|**[link](https://github.com/mahadi-nahid/tabsqlify)**|\n", "2404.10141": "|**2024-04-15**|**ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis**|Aashish Anantha Ramakrishnan et.al.|[2404.10141v1](http://arxiv.org/abs/2404.10141v1)|**[link](https://github.com/aashish2000/anchor)**|\n", "2404.09939": "|**2024-04-15**|**A Survey on Deep Learning for Theorem Proving**|Zhaoyu Li et.al.|[2404.09939v1](http://arxiv.org/abs/2404.09939v1)|**[link](https://github.com/zhaoyu-li/dl4tp)**|\n", "2404.09937": "|**2024-04-15**|**Compression Represents Intelligence Linearly**|Yuzhen Huang et.al.|[2404.09937v1](http://arxiv.org/abs/2404.09937v1)|**[link](https://github.com/hkust-nlp/llm-compression-intelligence)**|\n", "2404.09868": "|**2024-04-15**|**AI-Driven Statutory Reasoning via Software Engineering Methods**|Rohan Padhye et.al.|[2404.09868v1](http://arxiv.org/abs/2404.09868v1)|null|\n", "2404.09866": "|**2024-04-15**|**Reimagining Self-Adaptation in the Age of Large Language Models**|Raghav Donakanti et.al.|[2404.09866v1](http://arxiv.org/abs/2404.09866v1)|null|\n", "2404.11500": "|**2024-04-17**|**Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models**|Yue Zhou et.al.|[2404.11500v1](http://arxiv.org/abs/2404.11500v1)|**[link](https://github.com/yue-llm-pit/scop)**|\n", "2404.11207": "|**2024-04-17**|**Exploring the Transferability of Visual Prompting for Multimodal Large Language Models**|Yichi Zhang et.al.|[2404.11207v1](http://arxiv.org/abs/2404.11207v1)|**[link](https://github.com/zycheiheihei/transferable-visual-prompting)**|\n", "2404.11129": "|**2024-04-17**|**Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales**|Minghe Gao et.al.|[2404.11129v1](http://arxiv.org/abs/2404.11129v1)|null|\n", "2404.11121": "|**2024-04-17**|**TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment**|Qinfeng Li et.al.|[2404.11121v1](http://arxiv.org/abs/2404.11121v1)|null|\n", "2404.11086": "|**2024-04-18**|**ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models**|Trong-Hieu Nguyen et.al.|[2404.11086v2](http://arxiv.org/abs/2404.11086v2)|null|\n", "2404.11041": "|**2024-04-17**|**On the Empirical Complexity of Reasoning and Planning in LLMs**|Liwei Kang et.al.|[2404.11041v1](http://arxiv.org/abs/2404.11041v1)|null|\n", "2404.11027": "|**2024-04-17**|**Empowering Large Language Models on Robotic Manipulation with Affordance Prompting**|Guangran Cheng et.al.|[2404.11027v1](http://arxiv.org/abs/2404.11027v1)|null|\n", "2404.11018": "|**2024-04-17**|**Many-Shot In-Context Learning**|Rishabh Agarwal et.al.|[2404.11018v1](http://arxiv.org/abs/2404.11018v1)|null|\n", "2404.12390": "|**2024-04-25**|**BLINK: Multimodal Large Language Models Can See but Not Perceive**|Xingyu Fu et.al.|[2404.12390v2](http://arxiv.org/abs/2404.12390v2)|null|\n", "2404.12372": "|**2024-04-18**|**MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale**|Xiaotang Gai et.al.|[2404.12372v1](http://arxiv.org/abs/2404.12372v1)|null|\n", "2404.12342": "|**2024-04-18**|**Large Language Models in Targeted Sentiment Analysis**|Nicolay Rusnachenko et.al.|[2404.12342v1](http://arxiv.org/abs/2404.12342v1)|**[link](https://github.com/nicolay-r/reasoning-for-sentiment-analysis-framework)**|\n", "2404.12335": "|**2024-04-18**|**Normative Requirements Operationalization with Large Language Models**|Nick Feng et.al.|[2404.12335v1](http://arxiv.org/abs/2404.12335v1)|null|\n", "2404.12253": "|**2024-04-18**|**Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing**|Ye Tian et.al.|[2404.12253v1](http://arxiv.org/abs/2404.12253v1)|null|\n", "2404.12149": "|**2024-04-19**|**AccidentBlip2: Accident Detection With Multi-View MotionBlip2**|Yihua Shao et.al.|[2404.12149v2](http://arxiv.org/abs/2404.12149v2)|**[link](https://github.com/yihuajerry/accidentblip2)**|\n", "2404.12065": "|**2024-04-18**|**RAGAR, Your Falsehood RADAR: RAG-Augmented Reasoning for Political Fact-Checking using Multimodal Large Language Models**|M. Abdul Khaliq et.al.|[2404.12065v1](http://arxiv.org/abs/2404.12065v1)|null|\n", "2404.11978": "|**2024-04-18**|**EVIT: Event-Oriented Instruction Tuning for Event Reasoning**|Zhengwei Tao et.al.|[2404.11978v1](http://arxiv.org/abs/2404.11978v1)|null|\n", "2404.11891": "|**2024-04-18**|**Large Language Models Can Plan Your Travels Rigorously with Formal Verification Tools**|Yilun Hao et.al.|[2404.11891v1](http://arxiv.org/abs/2404.11891v1)|null|\n", "2404.11835": "|**2024-04-18**|**CAUS: A Dataset for Question Generation based on Human Cognition Leveraging Large Language Models**|Minjung Shin et.al.|[2404.11835v1](http://arxiv.org/abs/2404.11835v1)|null|\n", "2404.11792": "|**2024-04-19**|**Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study**|Zooey Nguyen et.al.|[2404.11792v2](http://arxiv.org/abs/2404.11792v2)|null|\n", "2404.11730": "|**2024-04-21**|**Missed Connections: Lateral Thinking Puzzles for Large Language Models**|Graham Todd et.al.|[2404.11730v2](http://arxiv.org/abs/2404.11730v2)|null|\n", "2404.11717": "|**2024-04-17**|**How often are errors in natural language reasoning due to paraphrastic variability?**|Neha Srikanth et.al.|[2404.11717v1](http://arxiv.org/abs/2404.11717v1)|null|\n", "2404.13033": "|**2024-04-19**|**Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs**|Biyang Guo et.al.|[2404.13033v1](http://arxiv.org/abs/2404.13033v1)|**[link](https://github.com/beyondguo/llm-tuning)**|\n", "2404.12966": "|**2024-04-24**|**Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models**|Yian Li et.al.|[2404.12966v2](http://arxiv.org/abs/2404.12966v2)|null|\n", "2404.12901": "|**2024-04-29**|**Large Language Models for Networking: Workflow, Advances and Challenges**|Chang Liu et.al.|[2404.12901v2](http://arxiv.org/abs/2404.12901v2)|null|\n", "2404.12843": "|**2024-04-19**|**Towards Logically Consistent Language Models via Probabilistic Reasoning**|Diego Calanzone et.al.|[2404.12843v1](http://arxiv.org/abs/2404.12843v1)|null|\n", "2404.12803": "|**2024-04-19**|**TextSquare: Scaling up Text-Centric Visual Instruction Tuning**|Jingqun Tang et.al.|[2404.12803v1](http://arxiv.org/abs/2404.12803v1)|null|\n", "2404.12728": "|**2024-04-19**|**Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?**|Chengwei Qin et.al.|[2404.12728v1](http://arxiv.org/abs/2404.12728v1)|null|\n", "2404.12715": "|**2024-04-19**|**Enabling Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration**|Yichong Huang et.al.|[2404.12715v1](http://arxiv.org/abs/2404.12715v1)|null|\n", "2404.12636": "|**2024-04-22**|**Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs**|Boyang Yang et.al.|[2404.12636v2](http://arxiv.org/abs/2404.12636v2)|null|\n", "2404.12494": "|**2024-04-18**|**BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models**|Yu Feng et.al.|[2404.12494v1](http://arxiv.org/abs/2404.12494v1)|null|\n", "2404.12464": "|**2024-04-18**|**NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models**|Abhinav Rao et.al.|[2404.12464v1](http://arxiv.org/abs/2404.12464v1)|null|\n", "2404.14222": "|**2024-04-22**|**An Artificial Neuron for Enhanced Problem Solving in Large Language Models**|Sumedh Rasal et.al.|[2404.14222v1](http://arxiv.org/abs/2404.14222v1)|null|\n", "2404.14215": "|**2024-04-22**|**Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction**|Zheye Deng et.al.|[2404.14215v1](http://arxiv.org/abs/2404.14215v1)|**[link](https://github.com/hiyouga/llama-factory)**|\n", "2404.13993": "|**2024-04-24**|**Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion**|Yingxuan Li et.al.|[2404.13993v2](http://arxiv.org/abs/2404.13993v2)|null|\n", "2404.13985": "|**2024-04-22**|**Information Re-Organization Improves Reasoning in Large Language Models**|Xiaoxia Cheng et.al.|[2404.13985v1](http://arxiv.org/abs/2404.13985v1)|null|\n", "2404.13925": "|**2024-04-22**|**MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit**|Boning Zhang et.al.|[2404.13925v1](http://arxiv.org/abs/2404.13925v1)|**[link](https://github.com/mario-math-reasoning/math_evaluation)**|\n", "2404.13919": "|**2024-04-22**|**Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models**|Yukyung Lee et.al.|[2404.13919v1](http://arxiv.org/abs/2404.13919v1)|null|\n", "2404.13847": "|**2024-04-22**|**EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning**|Mingjie Ma et.al.|[2404.13847v1](http://arxiv.org/abs/2404.13847v1)|null|\n", "2404.13591": "|**2024-04-24**|**MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning**|Yifan Jiang et.al.|[2404.13591v2](http://arxiv.org/abs/2404.13591v2)|**[link](https://github.com/1171-jpg/marvel_avr)**|\n", "2404.13340": "|**2024-04-20**|**Large Language Models as Test Case Generators: Performance Evaluation and Enhancement**|Kefan Li et.al.|[2404.13340v1](http://arxiv.org/abs/2404.13340v1)|null|\n", "2404.13236": "|**2024-05-03**|**LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models**|Mouhamed Amine Bouchiha et.al.|[2404.13236v2](http://arxiv.org/abs/2404.13236v2)|**[link](https://github.com/mohaminemed/llmgooaq)**|\n", "2404.13149": "|**2024-04-19**|**Beyond Self-Consistency: Ensemble Reasoning Boosts Consistency and Accuracy of LLMs in Cancer Staging**|Chia-Hsuan Chang et.al.|[2404.13149v1](http://arxiv.org/abs/2404.13149v1)|null|\n", "2404.13082": "|**2024-04-17**|**TREACLE: Thrifty Reasoning via Context-Aware LLM and Prompt Selection**|Xuechen Zhang et.al.|[2404.13082v1](http://arxiv.org/abs/2404.13082v1)|null|\n", "2404.13070": "|**2024-04-14**|**Evidence from counterfactual tasks supports emergent analogical reasoning in large language models**|Taylor Webb et.al.|[2404.13070v1](http://arxiv.org/abs/2404.13070v1)|**[link](https://github.com/taylorwwebb/counterfactual_analogies)**|\n", "2404.15228": "|**2024-04-23**|**Re-Thinking Inverse Graphics With Large Language Models**|Peter Kulits et.al.|[2404.15228v1](http://arxiv.org/abs/2404.15228v1)|null|\n", "2404.15156": "|**2024-04-23**|**Regressive Side Effects of Training Language Models to Mimic Student Misconceptions**|Shashank Sonkar et.al.|[2404.15156v1](http://arxiv.org/abs/2404.15156v1)|null|\n", "2404.15146": "|**2024-04-23**|**Rethinking LLM Memorization through the Lens of Adversarial Compression**|Avi Schwarzschild et.al.|[2404.15146v1](http://arxiv.org/abs/2404.15146v1)|null|\n", "2404.14963": "|**2024-04-28**|**Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Reasoners**|Qihuang Zhong et.al.|[2404.14963v2](http://arxiv.org/abs/2404.14963v2)|null|\n", "2404.14928": "|**2024-04-23**|**Graph Machine Learning in the Era of Large Language Models (LLMs)**|Wenqi Fan et.al.|[2404.14928v1](http://arxiv.org/abs/2404.14928v1)|null|\n", "2404.14812": "|**2024-04-23**|**Pattern-Aware Chain-of-Thought Prompting in Large Language Models**|Yufeng Zhang et.al.|[2404.14812v1](http://arxiv.org/abs/2404.14812v1)|null|\n", "2404.14809": "|**2024-04-23**|**A Survey of Large Language Models on Generative Graph Analytics: Query, Learning, and Applications**|Wenbo Shang et.al.|[2404.14809v1](http://arxiv.org/abs/2404.14809v1)|null|\n", "2404.14779": "|**2024-04-23**|**Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches**|Cl\u00e9ment Christophe et.al.|[2404.14779v1](http://arxiv.org/abs/2404.14779v1)|null|\n", "2404.14777": "|**2024-04-23**|**CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning**|Ling Yue et.al.|[2404.14777v1](http://arxiv.org/abs/2404.14777v1)|null|\n", "2404.14723": "|**2024-04-23**|**Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks**|Amir Saeidi et.al.|[2404.14723v1](http://arxiv.org/abs/2404.14723v1)|null|\n", "2404.14705": "|**2024-04-23**|**Think-Program-reCtify: 3D Situated Reasoning with Large Language Models**|Qingrong He et.al.|[2404.14705v1](http://arxiv.org/abs/2404.14705v1)|null|\n", "2404.14662": "|**2024-04-23**|**NExT: Teaching Large Language Models to Reason about Code Execution**|Ansong Ni et.al.|[2404.14662v1](http://arxiv.org/abs/2404.14662v1)|null|\n", "2404.14604": "|**2024-04-26**|**Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training**|Mengzhao Jia et.al.|[2404.14604v3](http://arxiv.org/abs/2404.14604v3)|null|\n", "2404.14464": "|**2024-04-22**|**Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-hop Question Answering**|Li Jiapeng et.al.|[2404.14464v1](http://arxiv.org/abs/2404.14464v1)|null|\n", "2404.14419": "|**2024-04-14**|**Enhancing Fault Detection for Large Language Models via Mutation-Based Confidence Smoothing**|Qiang Hu et.al.|[2404.14419v1](http://arxiv.org/abs/2404.14419v1)|null|\n", "2404.15578": "|**2024-04-24**|**Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations?**|Hossein Salami et.al.|[2404.15578v1](http://arxiv.org/abs/2404.15578v1)|null|\n", "2404.15522": "|**2024-04-23**|**Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models**|Mihir Parmar et.al.|[2404.15522v1](http://arxiv.org/abs/2404.15522v1)|**[link](https://github.com/mihir3009/logicbench)**|\n", "2404.15515": "|**2024-04-25**|**ToM-LM: Delegating Theory of Mind Reasoning to External Symbolic Executors in Large Language Models**|Weizhi Tang et.al.|[2404.15515v2](http://arxiv.org/abs/2404.15515v2)|null|\n", "2404.16811": "|**2024-04-26**|**Make Your LLM Fully Utilize the Context**|Shengnan An et.al.|[2404.16811v2](http://arxiv.org/abs/2404.16811v2)|**[link](https://github.com/microsoft/FILM)**|\n", "2404.16807": "|**2024-04-25**|**Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning**|Tianhui Zhang et.al.|[2404.16807v1](http://arxiv.org/abs/2404.16807v1)|null|\n", "2404.16754": "|**2024-04-25**|**RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis**|Xiaoman Zhang et.al.|[2404.16754v1](http://arxiv.org/abs/2404.16754v1)|null|\n", "2404.16698": "|**2024-04-25**|**Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents**|Giorgio Piatti et.al.|[2404.16698v1](http://arxiv.org/abs/2404.16698v1)|null|\n", "2404.16670": "|**2024-04-25**|**EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning**|Hongxia Xie et.al.|[2404.16670v1](http://arxiv.org/abs/2404.16670v1)|**[link](https://github.com/aimmemotion/emovit)**|\n", "2404.16651": "|**2024-04-25**|**Evolutionary Large Language Models for Hardware Security: A Comparative Survey**|Mohammad Akyash et.al.|[2404.16651v1](http://arxiv.org/abs/2404.16651v1)|null|\n", "2404.16478": "|**2024-04-25**|**Evaluating Consistency and Reasoning Capabilities of Large Language Models**|Yash Saxena et.al.|[2404.16478v1](http://arxiv.org/abs/2404.16478v1)|null|\n", "2404.16375": "|**2024-04-25**|**List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs**|An Yan et.al.|[2404.16375v1](http://arxiv.org/abs/2404.16375v1)|**[link](https://github.com/zzxslp/som-llava)**|\n", "2404.16158": "|**2024-04-24**|**The Feasibility of Implementing Large-Scale Transformers on Multi-FPGA Platforms**|Yu Gao et.al.|[2404.16158v1](http://arxiv.org/abs/2404.16158v1)|null|\n", "2404.16033": "|**2024-04-24**|**Cantor: Inspiring Multimodal Chain-of-Thought of MLLM**|Timin Gao et.al.|[2404.16033v1](http://arxiv.org/abs/2404.16033v1)|null|\n", "2404.15804": "|**2024-04-24**|**GeckOpt: LLM System Efficiency via Intent-Based Tool Selection**|Michael Fore et.al.|[2404.15804v1](http://arxiv.org/abs/2404.15804v1)|null|\n", "2404.15790": "|**2024-04-24**|**Leveraging Large Language Models for Multimodal Search**|Oriol Barbany et.al.|[2404.15790v1](http://arxiv.org/abs/2404.15790v1)|null|\n", "2404.15676": "|**2024-04-24**|**Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs**|Yu Xia et.al.|[2404.15676v1](http://arxiv.org/abs/2404.15676v1)|null|\n", "2404.17525": "|**2024-05-09**|**Large Language Model Agent as a Mechanical Designer**|Yayati Jadhav et.al.|[2404.17525v2](http://arxiv.org/abs/2404.17525v2)|null|\n", "2404.17524": "|**2024-04-29**|**On the Use of Large Language Models to Generate Capability Ontologies**|Luis Miguel Vieira da Silva et.al.|[2404.17524v2](http://arxiv.org/abs/2404.17524v2)|null|\n", "2404.17522": "|**2024-04-26**|**Enhancing Legal Compliance and Regulation Analysis with Large Language Models**|Shabnam Hassani et.al.|[2404.17522v1](http://arxiv.org/abs/2404.17522v1)|null|\n", "2404.17513": "|**2024-04-26**|**A Comprehensive Evaluation on Event Reasoning of Large Language Models**|Zhengwei Tao et.al.|[2404.17513v1](http://arxiv.org/abs/2404.17513v1)|**[link](https://github.com/tzwwww/ev2)**|\n", "2404.17460": "|**2024-04-26**|**Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System**|Robin Schmucker et.al.|[2404.17460v1](http://arxiv.org/abs/2404.17460v1)|null|\n", "2404.17140": "|**2024-04-26**|**Small Language Models Need Strong Verifiers to Self-Correct Reasoning**|Yunxiang Zhang et.al.|[2404.17140v1](http://arxiv.org/abs/2404.17140v1)|null|\n", "2404.18824": "|**2024-04-29**|**Benchmarking Benchmark Leakage in Large Language Models**|Ruijie Xu et.al.|[2404.18824v1](http://arxiv.org/abs/2404.18824v1)|**[link](https://github.com/gair-nlp/benbench)**|\n", "2404.18766": "|**2024-04-29**|**PECC: Problem Extraction and Coding Challenges**|Patrick Haller et.al.|[2404.18766v1](http://arxiv.org/abs/2404.18766v1)|**[link](https://github.com/hallerpatrick/pecc)**|\n", "2404.18564": "|**2024-04-29**|**Injecting Salesperson's Dialogue Strategies in Large Language Models with Chain-of-Thought Reasoning**|Wen-Yu Chang et.al.|[2404.18564v1](http://arxiv.org/abs/2404.18564v1)|null|\n", "2404.18460": "|**2024-04-29**|**Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language we Prompt them in**|Utkarsh Agarwal et.al.|[2404.18460v1](http://arxiv.org/abs/2404.18460v1)|null|\n", "2404.18359": "|**2024-04-29**|**FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models**|Wei Li et.al.|[2404.18359v1](http://arxiv.org/abs/2404.18359v1)|null|\n", "2404.18286": "|**2024-04-30**|**Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages**|David Ifeoluwa Adelani et.al.|[2404.18286v2](http://arxiv.org/abs/2404.18286v2)|null|\n", "2404.18130": "|**2024-04-28**|**Logic Agent: Enhancing Validity with Logic Rule Invocation**|Hanmeng Liu et.al.|[2404.18130v1](http://arxiv.org/abs/2404.18130v1)|null|\n", "2404.18077": "|**2024-04-28**|**Generative AI for Low-Carbon Artificial Intelligence of Things**|Jinbo Wen et.al.|[2404.18077v1](http://arxiv.org/abs/2404.18077v1)|null|\n", "2404.18021": "|**2024-04-27**|**CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments**|Kaixuan Huang et.al.|[2404.18021v1](http://arxiv.org/abs/2404.18021v1)|null|\n", "2404.17809": "|**2024-04-27**|**Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction**|Guozheng Li et.al.|[2404.17809v1](http://arxiv.org/abs/2404.17809v1)|null|\n", "2404.17729": "|**2024-04-26**|**CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving**|Pei Chen et.al.|[2404.17729v1](http://arxiv.org/abs/2404.17729v1)|**[link](https://github.com/amazon-science/comm-prompt)**|\n", "2404.17662": "|**2024-04-26**|**PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games**|Qinglin Zhu et.al.|[2404.17662v1](http://arxiv.org/abs/2404.17662v1)|**[link](https://github.com/alickzhu/player)**|\n", "2404.10160": "|**2024-04-28**|**RLRF:Reinforcement Learning from Reflection through Debates as Feedback for Bias Mitigation in LLMs**|Ruoxi Cheng et.al.|[2404.10160v2](http://arxiv.org/abs/2404.10160v2)|null|\n", "2404.19737": "|**2024-04-30**|**Better & Faster Large Language Models via Multi-token Prediction**|Fabian Gloeckle et.al.|[2404.19737v1](http://arxiv.org/abs/2404.19737v1)|null|\n", "2404.19696": "|**2024-04-30**|**Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners**|Chun Feng et.al.|[2404.19696v1](http://arxiv.org/abs/2404.19696v1)|null|\n", "2404.19509": "|**2024-04-30**|**Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom**|Shisen Yue et.al.|[2404.19509v1](http://arxiv.org/abs/2404.19509v1)|**[link](https://github.com/sjtu-compling/llm-pragmatics)**|\n", "2404.19438": "|**2024-05-01**|**Neuro-Vision to Language: Image Reconstruction and Language enabled Interaction via Brain Recordings**|Guobin Shen et.al.|[2404.19438v2](http://arxiv.org/abs/2404.19438v2)|null|\n", "2404.19432": "|**2024-04-30**|**Can Large Language Models put 2 and 2 together? Probing for Entailed Arithmetical Relationships**|D. Panas et.al.|[2404.19432v1](http://arxiv.org/abs/2404.19432v1)|null|\n", "2404.19369": "|**2024-04-30**|**Evaluating Telugu Proficiency in Large Language Models_ A Comparative Analysis of ChatGPT and Gemini**|Katikela Sreeharsha Kishore et.al.|[2404.19369v1](http://arxiv.org/abs/2404.19369v1)|null|\n", "2404.19234": "|**2024-04-30**|**Multi-hop Question Answering over Knowledge Graphs using Large Language Models**|Abir Chakraborty et.al.|[2404.19234v1](http://arxiv.org/abs/2404.19234v1)|null|\n", "2404.19221": "|**2024-04-30**|**Transcrib3D: 3D Referring Expression Resolution through Large Language Models**|Jiading Fang et.al.|[2404.19221v1](http://arxiv.org/abs/2404.19221v1)|null|\n", "2404.19063": "|**2024-04-29**|**SuperCLUE-Fin: Graded Fine-Grained Analysis of Chinese LLMs on Diverse Financial Tasks and Applications**|Liang Xu et.al.|[2404.19063v1](http://arxiv.org/abs/2404.19063v1)|null|\n", "2404.19055": "|**2024-04-29**|**Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models**|Houjun Liu et.al.|[2404.19055v1](http://arxiv.org/abs/2404.19055v1)|null|\n", "2404.18978": "|**2024-04-29**|**Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs**|Bahar Radmehr et.al.|[2404.18978v1](http://arxiv.org/abs/2404.18978v1)|null|\n", "2405.00648": "|**2024-05-01**|**HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models**|Ningke Li et.al.|[2405.00648v1](http://arxiv.org/abs/2405.00648v1)|null|\n", "2405.00451": "|**2024-05-01**|**Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning**|Yuxi Xie et.al.|[2405.00451v1](http://arxiv.org/abs/2405.00451v1)|null|\n", "2405.00449": "|**2024-05-01**|**RAG-based Explainable Prediction of Road Users Behaviors for Automated Driving using Knowledge Graphs and Large Language Models**|Mohamed Manzour Hussien et.al.|[2405.00449v1](http://arxiv.org/abs/2405.00449v1)|null|\n", "2405.00402": "|**2024-05-01**|**Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models**|Leonardo Ranaldi et.al.|[2405.00402v1](http://arxiv.org/abs/2405.00402v1)|null|\n", "2405.00361": "|**2024-05-01**|**AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts**|Zefang Liu et.al.|[2405.00361v1](http://arxiv.org/abs/2405.00361v1)|**[link](https://github.com/zefang-liu/adamole)**|\n", "2405.00338": "|**2024-05-03**|**Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model**|Yu Cui et.al.|[2405.00338v2](http://arxiv.org/abs/2405.00338v2)|null|\n", "2405.00332": "|**2024-05-03**|**A Careful Examination of Large Language Model Performance on Grade School Arithmetic**|Hugh Zhang et.al.|[2405.00332v3](http://arxiv.org/abs/2405.00332v3)|null|\n", "2405.00321": "|**2024-05-01**|**DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training**|Bhuvanesh Verma et.al.|[2405.00321v1](http://arxiv.org/abs/2405.00321v1)|null|\n", "2405.00204": "|**2024-04-30**|**General Purpose Verification for Chain of Thought Prompting**|Robert Vacareanu et.al.|[2405.00204v1](http://arxiv.org/abs/2405.00204v1)|null|\n", "2405.01533": "|**2024-05-02**|**OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning**|Shihao Wang et.al.|[2405.01533v1](http://arxiv.org/abs/2405.01533v1)|**[link](https://github.com/nvlabs/omnidrive)**|\n", "2405.01502": "|**2024-05-02**|**Analyzing the Role of Semantic Representations in the Era of Large Language Models**|Zhijing Jin et.al.|[2405.01502v1](http://arxiv.org/abs/2405.01502v1)|**[link](https://github.com/causalnlp/amr_llm)**|\n", "2405.01379": "|**2024-05-08**|**Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving**|Xin Quan et.al.|[2405.01379v2](http://arxiv.org/abs/2405.01379v2)|null|\n", "2405.01359": "|**2024-05-02**|**GAIA: A General AI Assistant for Intelligent Accelerator Operations**|Frank Mayet et.al.|[2405.01359v1](http://arxiv.org/abs/2405.01359v1)|null|\n", "2405.01345": "|**2024-05-02**|**The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights**|Wenhao Zhu et.al.|[2405.01345v1](http://arxiv.org/abs/2405.01345v1)|**[link](https://github.com/njunlp/qalign)**|\n", "2405.00981": "|**2024-05-02**|**Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation**|David Eric Austin et.al.|[2405.00981v1](http://arxiv.org/abs/2405.00981v1)|null|\n", "2405.00972": "|**2024-05-02**|**CACTUS: Chemistry Agent Connecting Tool-Usage to Science**|Andrew D. McNaughton et.al.|[2405.00972v1](http://arxiv.org/abs/2405.00972v1)|**[link](https://github.com/pnnl/cactus)**|\n", "2405.00718": "|**2024-04-25**|**Can't say cant? Measuring and Reasoning of Dark Jargons in Large Language Models**|Xu Ji et.al.|[2405.00718v1](http://arxiv.org/abs/2405.00718v1)|null|\n", "2405.00716": "|**2024-04-25**|**Large Language Models in Healthcare: A Comprehensive Benchmark**|Andrew Liu et.al.|[2405.00716v1](http://arxiv.org/abs/2405.00716v1)|null|\n", "2405.02228": "|**2024-05-09**|**REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs**|Deepa Tilwani et.al.|[2405.02228v2](http://arxiv.org/abs/2405.02228v2)|null|\n", "2405.02079": "|**2024-05-03**|**Argumentative Large Language Models for Explainable and Contestable Decision-Making**|Gabriel Freedman et.al.|[2405.02079v1](http://arxiv.org/abs/2405.02079v1)|null|\n", "2405.01997": "|**2024-05-03**|**Exploring Combinatorial Problem Solving with Large Language Models: A Case Study on the Travelling Salesman Problem Using GPT-3.5 Turbo**|Mahmoud Masoud et.al.|[2405.01997v1](http://arxiv.org/abs/2405.01997v1)|null|\n", "2405.01868": "|**2024-05-03**|**Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems**|Chuang Li et.al.|[2405.01868v1](http://arxiv.org/abs/2405.01868v1)|null|\n", "2405.01744": "|**2024-05-02**|**ALCM: Autonomous LLM-Augmented Causal Discovery Framework**|Elahe Khatibi et.al.|[2405.01744v1](http://arxiv.org/abs/2405.01744v1)|null|\n", "2405.01649": "|**2024-05-08**|**Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning**|Tianle Xia et.al.|[2405.01649v3](http://arxiv.org/abs/2405.01649v3)|null|\n", "2405.01593": "|**2024-04-30**|**Large Language Model Agent for Fake News Detection**|Xinyi Li et.al.|[2405.01593v1](http://arxiv.org/abs/2405.01593v1)|null|\n", "2405.01585": "|**2024-04-28**|**Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications**|Sujit Khanna et.al.|[2405.01585v1](http://arxiv.org/abs/2405.01585v1)|null|\n", "2405.03690": "|**2024-05-08**|**How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs**|Muhammad Uzair Khattak et.al.|[2405.03690v2](http://arxiv.org/abs/2405.03690v2)|null|\n", "2405.03685": "|**2024-05-06**|**Language-Image Models with 3D Understanding**|Jang Hyun Cho et.al.|[2405.03685v1](http://arxiv.org/abs/2405.03685v1)|null|\n", "2405.03594": "|**2024-05-06**|**Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment**|Abhinav Agarwalla et.al.|[2405.03594v1](http://arxiv.org/abs/2405.03594v1)|null|\n", "2405.03553": "|**2024-05-23**|**AlphaMath Almost Zero: process Supervision without process**|Guoxin Chen et.al.|[2405.03553v2](http://arxiv.org/abs/2405.03553v2)|**[link](https://github.com/MARIO-Math-Reasoning/Super_MARIO)**|\n", "2405.03548": "|**2024-05-15**|**MAmmoTH2: Scaling Instructions from the Web**|Xiang Yue et.al.|[2405.03548v3](http://arxiv.org/abs/2405.03548v3)|null|\n", "2405.03509": "|**2024-05-06**|**Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning**|Yubo Mai et.al.|[2405.03509v1](http://arxiv.org/abs/2405.03509v1)|null|\n", "2405.03371": "|**2024-05-06**|**Explainable Fake News Detection With Large Language Model via Defense Among Competing Wisdom**|Bo Wang et.al.|[2405.03371v1](http://arxiv.org/abs/2405.03371v1)|**[link](https://github.com/wangbo9719/L-Defense_EFND)**|\n", "2405.03359": "|**2024-05-06**|**MedDoc-Bot: A Chat Tool for Comparative Analysis of Large Language Models in the Context of the Pediatric Hypertension Guideline**|Mohamed Yaseen Jabarulla et.al.|[2405.03359v1](http://arxiv.org/abs/2405.03359v1)|**[link](https://github.com/yaseen28/meddoc-bot)**|\n", "2405.03272": "|**2024-05-06**|**WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning**|Yuanhan Zhang et.al.|[2405.03272v1](http://arxiv.org/abs/2405.03272v1)|null|\n", "2405.03138": "|**2024-05-06**|**CRAFT: Extracting and Tuning Cultural Instructions from the Wild**|Bin Wang et.al.|[2405.03138v1](http://arxiv.org/abs/2405.03138v1)|**[link](https://github.com/seaeval/craft)**|\n", "2405.03010": "|**2024-05-05**|**High Order Reasoning for Time Critical Recommendation in Evidence-based Medicine**|Manjiang Yu et.al.|[2405.03010v1](http://arxiv.org/abs/2405.03010v1)|null|\n", "2405.03000": "|**2024-05-05**|**MedAdapter: Efficient Test-Time Adaptation of Large Language Models towards Medical Reasoning**|Wenqi Shi et.al.|[2405.03000v1](http://arxiv.org/abs/2405.03000v1)|null|\n", "2405.02828": "|**2024-05-05**|**Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy**|Aftab Hussain et.al.|[2405.02828v1](http://arxiv.org/abs/2405.02828v1)|null|\n", "2405.02712": "|**2024-05-04**|**CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions**|Hanchong Zhang et.al.|[2405.02712v1](http://arxiv.org/abs/2405.02712v1)|**[link](https://github.com/x-lance/text2sql-multiturn-gpt)**|\n", "2405.02559": "|**2024-05-04**|**A Literature Review and Framework for Human Evaluation of Generative Large Language Models in Healthcare**|Thomas Yu Chow Tam et.al.|[2405.02559v1](http://arxiv.org/abs/2405.02559v1)|null|\n", "2405.02528": "|**2024-05-20**|**GigSense: An LLM-Infused Tool forWorkers' Collective Intelligence**|Kashif Imteyaz et.al.|[2405.02528v2](http://arxiv.org/abs/2405.02528v2)|null|\n", "2405.04533": "|**2024-05-07**|**ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning**|Jing Lin et.al.|[2405.04533v1](http://arxiv.org/abs/2405.04533v1)|null|\n", "2405.04497": "|**2024-05-08**|**Unveiling Disparities in Web Task Handling Between Human and Web Agent**|Kihoon Son et.al.|[2405.04497v2](http://arxiv.org/abs/2405.04497v2)|null|\n", "2405.04382": "|**2024-05-07**|**Large Language Models Cannot Explain Themselves**|Advait Sarkar et.al.|[2405.04382v1](http://arxiv.org/abs/2405.04382v1)|null|\n", "2405.04215": "|**2024-05-07**|**NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions**|Elliot Gestrin et.al.|[2405.04215v1](http://arxiv.org/abs/2405.04215v1)|null|\n", "2405.04170": "|**2024-05-07**|**D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities of Large Language Models**|Duygu Altinok et.al.|[2405.04170v1](http://arxiv.org/abs/2405.04170v1)|**[link](https://github.com/duygua/semeval2024_nli4ct)**|\n", "2405.04086": "|**2024-05-07**|**Optimizing Language Model's Reasoning Abilities with Weak Supervision**|Yongqi Tong et.al.|[2405.04086v1](http://arxiv.org/abs/2405.04086v1)|null|\n", "2405.03709": "|**2024-05-14**|**Generating Probabilistic Scenario Programs from Natural Language**|Karim Elmaaroufi et.al.|[2405.03709v2](http://arxiv.org/abs/2405.03709v2)|null|\n", "2405.05226": "|**2024-05-08**|**SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants**|Masoud Moghani et.al.|[2405.05226v1](http://arxiv.org/abs/2405.05226v1)|null|\n", "2405.05189": "|**2024-05-08**|**MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning**|Inderjeet Nair et.al.|[2405.05189v1](http://arxiv.org/abs/2405.05189v1)|null|\n", "2405.05109": "|**2024-05-08**|**QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs**|Weijia Zhang et.al.|[2405.05109v1](http://arxiv.org/abs/2405.05109v1)|null|\n", "2405.04840": "|**2024-05-08**|**Federated Adaptation for Foundation Model-based Recommendations**|Chunxu Zhang et.al.|[2405.04840v1](http://arxiv.org/abs/2405.04840v1)|**[link](https://github.com/Zhangcx19/IJCAI-24-FedPA)**|\n", "2405.04818": "|**2024-05-08**|**ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation**|Ana Brassard et.al.|[2405.04818v1](http://arxiv.org/abs/2405.04818v1)|**[link](https://github.com/a-brassard/acorn)**|\n", "2405.04776": "|**2024-05-08**|**Chain of Thoughtlessness: An Analysis of CoT in Planning**|Kaya Stechly et.al.|[2405.04776v1](http://arxiv.org/abs/2405.04776v1)|null|\n", "2405.04756": "|**2024-05-08**|**BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models**|Chu Fei Luo et.al.|[2405.04756v1](http://arxiv.org/abs/2405.04756v1)|**[link](https://github.com/VectorInstitute/biaskg)**|\n", "2405.04685": "|**2024-05-07**|**Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking**|Emre Can Acikgoz et.al.|[2405.04685v1](http://arxiv.org/abs/2405.04685v1)|null|\n", "2405.04669": "|**2024-05-07**|**Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics**|Hanlin Zhu et.al.|[2405.04669v1](http://arxiv.org/abs/2405.04669v1)|null|\n", "2405.05957": "|**2024-05-09**|**OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning**|Dan Qiao et.al.|[2405.05957v1](http://arxiv.org/abs/2405.05957v1)|**[link](https://github.com/opennlg/openba-v2)**|\n", "2405.05956": "|**2024-05-09**|**Probing Multimodal LLMs as World Models for Driving**|Shiva Sreeram et.al.|[2405.05956v1](http://arxiv.org/abs/2405.05956v1)|**[link](https://github.com/sreeramsa/drivesim)**|\n", "2405.05885": "|**2024-05-09**|**Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes**|Ziang Guo et.al.|[2405.05885v1](http://arxiv.org/abs/2405.05885v1)|null|\n", "2405.05824": "|**2024-05-09**|**Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning**|Artem Lykov et.al.|[2405.05824v1](http://arxiv.org/abs/2405.05824v1)|**[link](https://github.com/temalykov/robots_can_feel)**|\n", "2405.05508": "|**2024-05-09**|**Redefining Information Retrieval of Structured Database via Large Language Models**|Mingzhu Wang et.al.|[2405.05508v1](http://arxiv.org/abs/2405.05508v1)|null|\n", "2405.06399": "|**2024-05-10**|**Program Synthesis using Inductive Logic Programming for the Abstraction and Reasoning Corpus**|Filipe Marinho Rocha et.al.|[2405.06399v1](http://arxiv.org/abs/2405.06399v1)|null|\n", "2405.06001": "|**2024-05-09**|**LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models**|Ruihao Gong et.al.|[2405.06001v1](http://arxiv.org/abs/2405.06001v1)|**[link](https://github.com/modeltc/llmc)**|\n", "2405.07938": "|**2024-05-13**|**EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning**|Yinzhu Quan et.al.|[2405.07938v1](http://arxiv.org/abs/2405.07938v1)|null|\n", "2405.07784": "|**2024-05-13**|**Generating Human Motion in 3D Scenes from Text Descriptions**|Zhi Cen et.al.|[2405.07784v1](http://arxiv.org/abs/2405.07784v1)|null|\n", "2405.07667": "|**2024-05-13**|**Backdoor Removal for Generative Large Language Models**|Haoran Li et.al.|[2405.07667v1](http://arxiv.org/abs/2405.07667v1)|null|\n", "2405.07551": "|**2024-05-13**|**MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning**|Shuo Yin et.al.|[2405.07551v1](http://arxiv.org/abs/2405.07551v1)|null|\n", "2405.07496": "|**2024-05-13**|**Oedipus: LLM-enchanced Reasoning CAPTCHA Solver**|Gelei Deng et.al.|[2405.07496v1](http://arxiv.org/abs/2405.07496v1)|null|\n", "2405.07348": "|**2024-05-14**|**MedConceptsQA: Open Source Medical Concepts QA Benchmark**|Ofir Ben Shoham et.al.|[2405.07348v2](http://arxiv.org/abs/2405.07348v2)|**[link](https://github.com/nadavlab/MedConceptsQA)**|\n", "2405.07314": "|**2024-05-12**|**Learnable Tokenizer for LLM-based Generative Recommendation**|Wenjie Wang et.al.|[2405.07314v1](http://arxiv.org/abs/2405.07314v1)|null|\n", "2405.07229": "|**2024-05-12**|**MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning Tasks**|Xiaocui Yang et.al.|[2405.07229v1](http://arxiv.org/abs/2405.07229v1)|**[link](https://github.com/declare-lab/MM-InstructEval)**|\n", "2405.06919": "|**2024-05-11**|**Automating Thematic Analysis: How LLMs Analyse Controversial Topics**|Awais Hameed Khan et.al.|[2405.06919v1](http://arxiv.org/abs/2405.06919v1)|null|\n", "2405.06707": "|**2024-05-09**|**Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models**|Yitian Li et.al.|[2405.06707v1](http://arxiv.org/abs/2405.06707v1)|null|\n", "2405.06705": "|**2024-05-09**|**LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought**|Zhuoxuan Jiang et.al.|[2405.06705v1](http://arxiv.org/abs/2405.06705v1)|null|\n", "2405.06694": "|**2024-05-07**|**SUTRA: Scalable Multilingual Language Model Architecture**|Abhijit Bendale et.al.|[2405.06694v1](http://arxiv.org/abs/2405.06694v1)|null|\n", "2405.06691": "|**2024-05-07**|**Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle Filtering**|Akhil Arora et.al.|[2405.06691v1](http://arxiv.org/abs/2405.06691v1)|null|\n", "2405.06680": "|**2024-05-05**|**Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning**|Jun Zhao et.al.|[2405.06680v1](http://arxiv.org/abs/2405.06680v1)|null|\n", "2405.08603": "|**2024-05-14**|**A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine**|Hanguang Xiao et.al.|[2405.08603v1](http://arxiv.org/abs/2405.08603v1)|null|\n", "2405.08502": "|**2024-05-14**|**Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure**|Odysseas S. Chlapanis et.al.|[2405.08502v1](http://arxiv.org/abs/2405.08502v1)|**[link](https://github.com/nlpaueb/multiple-choice-mutation)**|\n", "2405.08373": "|**2024-05-14**|**PromptMind Team at MEDIQA-CORR 2024: Improving Clinical Text Correction with Error Categorization and LLM Ensembles**|Satya Kesav Gundabathula et.al.|[2405.08373v1](http://arxiv.org/abs/2405.08373v1)|null|\n", "2405.08154": "|**2024-05-13**|**LLM Theory of Mind and Alignment: Opportunities and Risks**|Winnie Street et.al.|[2405.08154v1](http://arxiv.org/abs/2405.08154v1)|null|\n", "2405.09395": "|**2024-05-15**|**Matching domain experts by training from scratch on domain knowledge**|Xiaoliang Luo et.al.|[2405.09395v1](http://arxiv.org/abs/2405.09395v1)|null|\n", "2405.09161": "|**2024-05-15**|**Exploring the Potential of Large Language Models for Automation in Technical Customer Service**|Jochen Wulf et.al.|[2405.09161v1](http://arxiv.org/abs/2405.09161v1)|null|\n", "2405.10255": "|**2024-05-16**|**When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models**|Xianzheng Ma et.al.|[2405.10255v1](http://arxiv.org/abs/2405.10255v1)|null|\n", "2405.10251": "|**2024-05-16**|**A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks**|Xuanfan Ni et.al.|[2405.10251v1](http://arxiv.org/abs/2405.10251v1)|null|\n", "2405.10166": "|**2024-05-16**|**LFED: A Literary Fiction Evaluation Dataset for Large Language Models**|Linhao Yu et.al.|[2405.10166v1](http://arxiv.org/abs/2405.10166v1)|**[link](https://github.com/tjunlp-lab/lfed)**|\n", "2405.09822": "|**2024-05-16**|**SEEK: Semantic Reasoning for Object Goal Navigation in Real World Inspection Tasks**|Muhammad Fadhil Ginting et.al.|[2405.09822v1](http://arxiv.org/abs/2405.09822v1)|null|\n", "2405.09783": "|**2024-05-16**|**LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery**|Pingchuan Ma et.al.|[2405.09783v1](http://arxiv.org/abs/2405.09783v1)|null|\n", "2405.10825": "|**2024-05-17**|**Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities**|Hao Zhou et.al.|[2405.10825v1](http://arxiv.org/abs/2405.10825v1)|null|\n", "2405.10739": "|**2024-05-17**|**Efficient Multimodal Large Language Models: A Survey**|Yizhang Jin et.al.|[2405.10739v1](http://arxiv.org/abs/2405.10739v1)|**[link](https://github.com/lijiannuist/efficient-multimodal-llms-survey)**|\n", "2405.10620": "|**2024-05-17**|**MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains**|Zhaohuan Zhan et.al.|[2405.10620v1](http://arxiv.org/abs/2405.10620v1)|null|\n", "2405.10587": "|**2024-05-17**|**RDRec: Rationale Distillation for LLM-based Recommendation**|Xinfeng Wang et.al.|[2405.10587v1](http://arxiv.org/abs/2405.10587v1)|**[link](https://github.com/wangxfng/rdrec)**|\n", "2405.10542": "|**2024-05-17**|**Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset**|Jie Zhu et.al.|[2405.10542v1](http://arxiv.org/abs/2405.10542v1)|**[link](https://github.com/aliyun/cflue)**|\n", "2405.10440": "|**2024-05-16**|**Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease Identification**|Jinge Wu et.al.|[2405.10440v1](http://arxiv.org/abs/2405.10440v1)|null|\n", "2405.12939": "|**2024-05-21**|**Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models**|Zhangyue Yin et.al.|[2405.12939v1](http://arxiv.org/abs/2405.12939v1)|**[link](https://github.com/yinzhangyue/AoR)**|\n", "2405.12933": "|**2024-05-21**|**Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs**|Bilgehan Sel et.al.|[2405.12933v1](http://arxiv.org/abs/2405.12933v1)|null|\n", "2405.12541": "|**2024-05-21**|**DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge**|Bufang Yang et.al.|[2405.12541v1](http://arxiv.org/abs/2405.12541v1)|null|\n", "2405.12433": "|**2024-05-21**|**LLM+Reasoning+Planning for supporting incomplete user queries in presence of APIs**|Sudhir Agarwal et.al.|[2405.12433v1](http://arxiv.org/abs/2405.12433v1)|null|\n", "2405.12147": "|**2024-05-20**|**Eliciting Problem Specifications via Large Language Models**|Robert E. Wray et.al.|[2405.12147v1](http://arxiv.org/abs/2405.12147v1)|null|\n", "2405.12130": "|**2024-05-20**|**MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning**|Ting Jiang et.al.|[2405.12130v1](http://arxiv.org/abs/2405.12130v1)|**[link](https://github.com/kongds/mora)**|\n", "2405.12100": "|**2024-05-20**|**DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction**|Hao Chen et.al.|[2405.12100v1](http://arxiv.org/abs/2405.12100v1)|null|\n", "2405.12035": "|**2024-05-20**|**KG-RAG: Bridging the Gap Between Knowledge and Creativity**|Diego Sanmartin et.al.|[2405.12035v1](http://arxiv.org/abs/2405.12035v1)|null|\n", "2405.11880": "|**2024-05-20**|**Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs**|Siyu Lou et.al.|[2405.11880v1](http://arxiv.org/abs/2405.11880v1)|null|\n", "2405.11841": "|**2024-05-20**|**Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities**|Junqi Wang et.al.|[2405.11841v1](http://arxiv.org/abs/2405.11841v1)|**[link](https://github.com/bigai-ai/evaluate-n-model-social-intelligence)**|\n", "2405.11640": "|**2024-05-19**|**Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning**|Zishan Gu et.al.|[2405.11640v1](http://arxiv.org/abs/2405.11640v1)|null|\n", "2405.11430": "|**2024-05-19**|**MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation**|Jianbo Dai et.al.|[2405.11430v1](http://arxiv.org/abs/2405.11430v1)|**[link](https://github.com/sparksofagi/mhpp)**|\n", "2405.11100": "|**2024-05-17**|**Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations**|Jos\u00e9 Luiz Nunes et.al.|[2405.11100v1](http://arxiv.org/abs/2405.11100v1)|null|\n", "2405.11040": "|**2024-05-17**|**From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT**|Jace Grandinetti et.al.|[2405.11040v1](http://arxiv.org/abs/2405.11040v1)|null|\n", "2405.14863": "|**2024-05-23**|**A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns**|Asaf Yehudai et.al.|[2405.14863v1](http://arxiv.org/abs/2405.14863v1)|null|\n", "2405.14862": "|**2024-05-23**|**Bitune: Bidirectional Instruction-Tuning**|Dawid J. Kopiczko et.al.|[2405.14862v1](http://arxiv.org/abs/2405.14862v1)|null|\n", "2405.14654": "|**2024-05-23**|**Efficient Medical Question Answering with Knowledge-Augmented Question Generation**|Julien Khlaut et.al.|[2405.14654v1](http://arxiv.org/abs/2405.14654v1)|null|\n", "2405.14619": "|**2024-05-24**|**Generating Exceptional Behavior Tests with Reasoning Augmented Large Language Models**|Jiyang Zhang et.al.|[2405.14619v2](http://arxiv.org/abs/2405.14619v2)|null|\n", "2405.14391": "|**2024-05-26**|**Explainable Few-shot Knowledge Tracing**|Haoxuan Li et.al.|[2405.14391v2](http://arxiv.org/abs/2405.14391v2)|**[link](https://github.com/leavesli1015/explainable-few-shot-knowledge-tracing)**|\n", "2405.14379": "|**2024-05-23**|**Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?**|Thomas Greatrix et.al.|[2405.14379v1](http://arxiv.org/abs/2405.14379v1)|null|\n", "2405.14365": "|**2024-05-23**|**JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models**|Kun Zhou et.al.|[2405.14365v1](http://arxiv.org/abs/2405.14365v1)|null|\n", "2405.14333": "|**2024-05-23**|**DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data**|Huajian Xin et.al.|[2405.14333v1](http://arxiv.org/abs/2405.14333v1)|null|\n", "2405.14314": "|**2024-05-26**|**Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration**|Yang Zhang et.al.|[2405.14314v2](http://arxiv.org/abs/2405.14314v2)|null|\n", "2405.14170": "|**2024-05-23**|**Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning**|Jiapu Wang et.al.|[2405.14170v1](http://arxiv.org/abs/2405.14170v1)|null|\n", "2405.14169": "|**2024-05-23**|**Towards Transferable Attacks Against Vision-LLMs in Autonomous Driving with Typography**|Nhat Chung et.al.|[2405.14169v1](http://arxiv.org/abs/2405.14169v1)|null|\n", "2405.14092": "|**2024-05-23**|**Large Language Models Can Self-Correct with Minimal Effort**|Zhenyu Wu et.al.|[2405.14092v1](http://arxiv.org/abs/2405.14092v1)|null|\n", "2405.14075": "|**2024-05-23**|**$T^2$ of Thoughts: Temperature Tree Elicits Reasoning in Large Language Models**|Chengkun Cai et.al.|[2405.14075v1](http://arxiv.org/abs/2405.14075v1)|null|\n", "2405.13966": "|**2024-05-22**|**On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models**|Mudit Verma et.al.|[2405.13966v1](http://arxiv.org/abs/2405.13966v1)|null|\n", "2405.13949": "|**2024-05-22**|**PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery**|Runlong He et.al.|[2405.13949v1](http://arxiv.org/abs/2405.13949v1)|**[link](https://github.com/mobarakol/pitvqa)**|\n", "2405.13873": "|**2024-05-22**|**FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering**|Yuan Sui et.al.|[2405.13873v1](http://arxiv.org/abs/2405.13873v1)|null|\n", "2405.13872": "|**2024-05-29**|**Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models**|Qiji Zhou et.al.|[2405.13872v2](http://arxiv.org/abs/2405.13872v2)|null|\n", "2405.13769": "|**2024-05-22**|**Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation**|Cyril Chhun et.al.|[2405.13769v1](http://arxiv.org/abs/2405.13769v1)|**[link](https://github.com/dig-team/hanna-benchmark-asg)**|\n", "2405.13547": "|**2024-05-22**|**HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model**|Mustafa Yildirim et.al.|[2405.13547v1](http://arxiv.org/abs/2405.13547v1)|null|\n", "2405.13516": "|**2024-05-22**|**LIRE: listwise reward enhancement for preference alignment**|Mingye Zhu et.al.|[2405.13516v1](http://arxiv.org/abs/2405.13516v1)|null|\n", "2405.13448": "|**2024-05-22**|**Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning**|Yuanhao Yue et.al.|[2405.13448v1](http://arxiv.org/abs/2405.13448v1)|null|\n", "2405.13432": "|**2024-05-22**|**Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction**|Tingchen Fu et.al.|[2405.13432v1](http://arxiv.org/abs/2405.13432v1)|null|\n", "2405.13209": "|**2024-05-21**|**Investigating Symbolic Capabilities of Large Language Models**|Neisarg Dave et.al.|[2405.13209v1](http://arxiv.org/abs/2405.13209v1)|null|\n", "2405.13206": "|**2024-05-21**|**Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding**|Rong Gao et.al.|[2405.13206v1](http://arxiv.org/abs/2405.13206v1)|null|\n", "2405.13057": "|**2024-05-20**|**Can Github issues be solved with Tree Of Thoughts?**|Ricardo La Rosa et.al.|[2405.13057v1](http://arxiv.org/abs/2405.13057v1)|**[link](https://github.com/ricardo-larosa/tree-of-thought-llm)**|\n", "2405.13039": "|**2024-05-17**|**Surgical Feature-Space Decomposition of LLMs: Why, When and How?**|Arnav Chavan et.al.|[2405.13039v1](http://arxiv.org/abs/2405.13039v1)|null|\n", "2405.13036": "|**2024-05-16**|**Can formal argumentative reasoning enhance LLMs performances?**|Federico Castagna et.al.|[2405.13036v1](http://arxiv.org/abs/2405.13036v1)|null|\n", "2405.13021": "|**2024-05-15**|**IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues**|Diji Yang et.al.|[2405.13021v1](http://arxiv.org/abs/2405.13021v1)|null|\n", "2405.13014": "|**2024-05-14**|**QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models**|Wei Wang et.al.|[2405.13014v1](http://arxiv.org/abs/2405.13014v1)|null|\n", "2405.13004": "|**2024-05-12**|**MathDivide: Improved mathematical reasoning by large language models**|Saksham Sahai Srivastava et.al.|[2405.13004v1](http://arxiv.org/abs/2405.13004v1)|null|\n", "2405.15684": "|**2024-05-24**|**Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models**|Yue Zhang et.al.|[2405.15684v1](http://arxiv.org/abs/2405.15684v1)|null|\n", "2405.15638": "|**2024-05-24**|**M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models**|Hongyu Wang et.al.|[2405.15638v1](http://arxiv.org/abs/2405.15638v1)|**[link](https://github.com/m4u-benchmark/m4u)**|\n", "2405.15604": "|**2024-05-24**|**Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges**|Jonas Becker et.al.|[2405.15604v1](http://arxiv.org/abs/2405.15604v1)|**[link](https://github.com/jonas-becker/text-generation)**|\n", "2405.15307": "|**2024-05-24**|**Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation**|Ge Qu et.al.|[2405.15307v1](http://arxiv.org/abs/2405.15307v1)|**[link](https://github.com/quge2023/TA-SQL)**|\n", "2405.15302": "|**2024-05-24**|**Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation**|Zhiwei Wang et.al.|[2405.15302v1](http://arxiv.org/abs/2405.15302v1)|null|\n", "2405.15250": "|**2024-05-24**|**Coaching Copilot: Blended Form of an LLM-Powered Chatbot and a Human Coach to Effectively Support Self-Reflection for Leadership Growth**|Riku Arakawa et.al.|[2405.15250v1](http://arxiv.org/abs/2405.15250v1)|null|\n", "2405.15165": "|**2024-05-24**|**A Solution-based LLM API-using Methodology for Academic Information Seeking**|Yuanchun Wang et.al.|[2405.15165v1](http://arxiv.org/abs/2405.15165v1)|**[link](https://github.com/ruckbreasoning/soay)**|\n", "2405.15164": "|**2024-05-24**|**From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks**|Jacob Russin et.al.|[2405.15164v1](http://arxiv.org/abs/2405.15164v1)|null|\n", "2405.15130": "|**2024-05-24**|**OptLLM: Optimal Assignment of Queries to Large Language Models**|Yueyue Liu et.al.|[2405.15130v1](http://arxiv.org/abs/2405.15130v1)|**[link](https://github.com/superyue72/OptLLM)**|\n", "2405.15114": "|**2024-05-24**|**Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning**|Yuyue Zhao et.al.|[2405.15114v1](http://arxiv.org/abs/2405.15114v1)|null|\n", "2405.15092": "|**2024-05-23**|**Dissociation of Faithful and Unfaithful Reasoning in LLMs**|Evelyn Yee et.al.|[2405.15092v1](http://arxiv.org/abs/2405.15092v1)|**[link](https://github.com/coterrorrecovery/coterrorrecovery)**|\n", "2405.15025": "|**2024-05-23**|**OAC: Output-adaptive Calibration for Accurate Post-training Quantization**|Ali Edalati et.al.|[2405.15025v1](http://arxiv.org/abs/2405.15025v1)|null|\n", "2405.15019": "|**2024-05-23**|**Agentic Skill Discovery**|Xufeng Zhao et.al.|[2405.15019v1](http://arxiv.org/abs/2405.15019v1)|null|\n", "2405.17430": "|**2024-05-27**|**Matryoshka Multimodal Models**|Mu Cai et.al.|[2405.17430v1](http://arxiv.org/abs/2405.17430v1)|null|\n", "2405.17427": "|**2024-05-27**|**Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model**|Kuan-Chih Huang et.al.|[2405.17427v1](http://arxiv.org/abs/2405.17427v1)|**[link](https://github.com/kuanchihhuang/reason3d)**|\n", "2405.17418": "|**2024-05-27**|**Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation**|Jiaming Liu et.al.|[2405.17418v1](http://arxiv.org/abs/2405.17418v1)|null|\n", "2405.17386": "|**2024-05-27**|**MindMerger: Efficient Boosting LLM Reasoning in non-English Languages**|Zixian Huang et.al.|[2405.17386v1](http://arxiv.org/abs/2405.17386v1)|**[link](https://github.com/cone-mt/mindmerger)**|\n", "2405.17249": "|**2024-05-27**|**Assessing LLMs Suitability for Knowledge Graph Completion**|Vasile Ionut Remus Iga et.al.|[2405.17249v1](http://arxiv.org/abs/2405.17249v1)|**[link](https://github.com/ionutiga/llms-for-kgc)**|\n", "2405.17238": "|**2024-05-27**|**LLM-Assisted Static Analysis for Detecting Security Vulnerabilities**|Ziyang Li et.al.|[2405.17238v1](http://arxiv.org/abs/2405.17238v1)|null|\n", "2405.17009": "|**2024-05-29**|**Position: Foundation Agents as the Paradigm Shift for Decision Making**|Xiaoqian Liu et.al.|[2405.17009v3](http://arxiv.org/abs/2405.17009v3)|**[link](https://github.com/microsoft/smart)**|\n", "2405.16806": "|**2024-05-28**|**Entity Alignment with Noisy Annotations from Large Language Models**|Shengyuan Chen et.al.|[2405.16806v2](http://arxiv.org/abs/2405.16806v2)|**[link](https://github.com/chensycn/llm4ea_official)**|\n", "2405.16803": "|**2024-05-27**|**TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing**|Xinyu Zhang et.al.|[2405.16803v1](http://arxiv.org/abs/2405.16803v1)|null|\n", "2405.16802": "|**2024-05-29**|**AutoCV: Empowering Reasoning with Automated Process Labeling via Confidence Variation**|Jianqiao Lu et.al.|[2405.16802v3](http://arxiv.org/abs/2405.16802v3)|**[link](https://github.com/rookie-joe/autocv)**|\n", "2405.16720": "|**2024-05-28**|**Large Scale Knowledge Washing**|Yu Wang et.al.|[2405.16720v2](http://arxiv.org/abs/2405.16720v2)|null|\n", "2405.16661": "|**2024-05-26**|**RLSF: Reinforcement Learning via Symbolic Feedback**|Piyush Jha et.al.|[2405.16661v1](http://arxiv.org/abs/2405.16661v1)|null|\n", "2405.16510": "|**2024-05-30**|**Meta-Task Planning for Language Agents**|Cong Zhang et.al.|[2405.16510v3](http://arxiv.org/abs/2405.16510v3)|null|\n", "2405.16473": "|**2024-05-26**|**M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought**|Qiguang Chen et.al.|[2405.16473v1](http://arxiv.org/abs/2405.16473v1)|**[link](https://github.com/LightChen233/M3CoT)**|\n", "2405.16450": "|**2024-05-26**|**Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search**|Max Liu et.al.|[2405.16450v1](http://arxiv.org/abs/2405.16450v1)|null|\n", "2405.16413": "|**2024-05-26**|**Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models**|Jiankun Wang et.al.|[2405.16413v1](http://arxiv.org/abs/2405.16413v1)|null|\n", "2405.16406": "|**2024-05-28**|**SpinQuant: LLM quantization with learned rotations**|Zechun Liu et.al.|[2405.16406v2](http://arxiv.org/abs/2405.16406v2)|null|\n", "2405.16376": "|**2024-05-28**|**STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making**|Chuanhao Li et.al.|[2405.16376v2](http://arxiv.org/abs/2405.16376v2)|**[link](https://github.com/cyrilli/stride)**|\n", "2405.16277": "|**2024-06-03**|**Picturing Ambiguity: A Visual Twist on the Winograd Schema Challenge**|Brendan Park et.al.|[2405.16277v3](http://arxiv.org/abs/2405.16277v3)|**[link](https://github.com/bpark2/winovis)**|\n", "2405.16265": "|**2024-05-25**|**MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time**|Jikun Kang et.al.|[2405.16265v1](http://arxiv.org/abs/2405.16265v1)|null|\n", "2405.16127": "|**2024-05-25**|**Finetuning Large Language Model for Personalized Ranking**|Zhuoxi Bai et.al.|[2405.16127v1](http://arxiv.org/abs/2405.16127v1)|null|\n", "2405.16064": "|**2024-05-25**|**Keypoint-based Progressive Chain-of-Thought Distillation for LLMs**|Kaituo Feng et.al.|[2405.16064v1](http://arxiv.org/abs/2405.16064v1)|null|\n", "2405.16009": "|**2024-05-25**|**Streaming Long Video Understanding with Large Language Models**|Rui Qian et.al.|[2405.16009v1](http://arxiv.org/abs/2405.16009v1)|null|\n", "2405.15924": "|**2024-05-30**|**SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation**|Kun Zhao et.al.|[2405.15924v3](http://arxiv.org/abs/2405.15924v3)|**[link](https://github.com/hegehongcha/slide-acl2024)**|\n", "2405.15880": "|**2024-05-24**|**HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis**|Shraddha Barke et.al.|[2405.15880v1](http://arxiv.org/abs/2405.15880v1)|null|\n", "2405.15877": "|**2024-05-24**|**Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications**|Yang Li et.al.|[2405.15877v1](http://arxiv.org/abs/2405.15877v1)|null|\n", "2405.18414": "|**2024-05-28**|**Don't Forget to Connect! Improving RAG with Graph-based Reranking**|Jialin Dong et.al.|[2405.18414v1](http://arxiv.org/abs/2405.18414v1)|null|\n", "2405.18380": "|**2024-05-28**|**OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning**|Pengxiang Li et.al.|[2405.18380v1](http://arxiv.org/abs/2405.18380v1)|**[link](https://github.com/pixeli99/owlore)**|\n", "2405.18377": "|**2024-05-28**|**LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models**|Anthony Sarah et.al.|[2405.18377v1](http://arxiv.org/abs/2405.18377v1)|null|\n", "2405.18375": "|**2024-05-28**|**Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning**|Phakphum Artkaew et.al.|[2405.18375v1](http://arxiv.org/abs/2405.18375v1)|**[link](https://github.com/PhakphumAdev/Thai-Winograd)**|\n", "2405.18369": "|**2024-05-28**|**PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework**|Eshaan Agarwal et.al.|[2405.18369v1](http://arxiv.org/abs/2405.18369v1)|null|\n", "2405.18361": "|**2024-05-28**|**Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?**|Yifan Bai et.al.|[2405.18361v1](http://arxiv.org/abs/2405.18361v1)|null|\n", "2405.18358": "|**2024-05-28**|**MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning**|Somnath Kumar et.al.|[2405.18358v1](http://arxiv.org/abs/2405.18358v1)|null|\n", "2405.18357": "|**2024-05-28**|**Faithful Logical Reasoning via Symbolic Chain-of-Thought**|Jundong Xu et.al.|[2405.18357v1](http://arxiv.org/abs/2405.18357v1)|**[link](https://github.com/aiden0526/symbcot)**|\n", "2405.18292": "|**2024-05-28**|**Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning**|Renzhi Wang et.al.|[2405.18292v1](http://arxiv.org/abs/2405.18292v1)|null|\n", "2405.18208": "|**2024-05-28**|**A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models**|Chengxing Xie et.al.|[2405.18208v1](http://arxiv.org/abs/2405.18208v1)|null|\n", "2405.18092": "|**2024-05-28**|**LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins**|Yuchen Xia et.al.|[2405.18092v1](http://arxiv.org/abs/2405.18092v1)|**[link](https://github.com/yuchenxia/llmdrivensimulation)**|\n", "2405.18073": "|**2024-05-28**|**Towards Dialogues for Joint Human-AI Reasoning and Value Alignment**|Elfia Bezou-Vrakatseli et.al.|[2405.18073v1](http://arxiv.org/abs/2405.18073v1)|null|\n", "2405.18027": "|**2024-05-28**|**TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models**|Jaewoo Ahn et.al.|[2405.18027v1](http://arxiv.org/abs/2405.18027v1)|null|\n", "2405.17969": "|**2024-05-28**|**Knowledge Circuits in Pretrained Transformers**|Yunzhi Yao et.al.|[2405.17969v1](http://arxiv.org/abs/2405.17969v1)|**[link](https://github.com/zjunlp/knowledgecircuits)**|\n", "2405.17950": "|**2024-05-28**|**Self-Guiding Exploration for Combinatorial Problems**|Zangir Iklassov et.al.|[2405.17950v1](http://arxiv.org/abs/2405.17950v1)|**[link](https://github.com/zangir/llm-for-cp)**|\n", "2405.17893": "|**2024-05-28**|**Arithmetic Reasoning with LLM: Prolog Generation & Permutation**|Xiaocheng Yang et.al.|[2405.17893v1](http://arxiv.org/abs/2405.17893v1)|null|\n", "2405.17822": "|**2024-05-28**|**Conv-CoA: Improving Open-domain Question Answering in Large Language Models via Conversational Chain-of-Action**|Zhenyu Pan et.al.|[2405.17822v1](http://arxiv.org/abs/2405.17822v1)|null|\n", "2405.17755": "|**2024-05-28**|**XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference**|Shengnan Wang et.al.|[2405.17755v1](http://arxiv.org/abs/2405.17755v1)|null|\n", "2405.17712": "|**2024-05-28**|**CLAIM Your Data: Enhancing Imputation Accuracy with Contextual Large Language Models**|Ahatsham Hayat et.al.|[2405.17712v1](http://arxiv.org/abs/2405.17712v1)|null|\n", "2405.17706": "|**2024-05-27**|**Video Enriched Retrieval Augmented Generation Using Aligned Video Captions**|Kevin Dela Rosa et.al.|[2405.17706v1](http://arxiv.org/abs/2405.17706v1)|**[link](https://github.com/kdr/videorag-mrr2024)**|\n", "2405.17631": "|**2024-05-27**|**BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments**|Yusuf Roohani et.al.|[2405.17631v1](http://arxiv.org/abs/2405.17631v1)|**[link](https://github.com/snap-stanford/biodiscoveryagent)**|\n", "2405.17503": "|**2024-05-30**|**Code Repair with LLMs gives an Exploration-Exploitation Tradeoff**|Hao Tang et.al.|[2405.17503v2](http://arxiv.org/abs/2405.17503v2)|null|\n", "2405.19335": "|**2024-05-29**|**X-VILA: Cross-Modality Alignment for Large Language Model**|Hanrong Ye et.al.|[2405.19335v1](http://arxiv.org/abs/2405.19335v1)|null|\n", "2405.19327": "|**2024-06-02**|**MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series**|Ge Zhang et.al.|[2405.19327v3](http://arxiv.org/abs/2405.19327v3)|**[link](https://github.com/multimodal-art-projection/map-neo)**|\n", "2405.19326": "|**2024-05-29**|**Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models**|Tianrun Chen et.al.|[2405.19326v1](http://arxiv.org/abs/2405.19326v1)|null|\n", "2405.19255": "|**2024-05-29**|**Towards Next-Generation Urban Decision Support Systems through AI-Powered Generation of Scientific Ontology using Large Language Models -- A Case in Optimizing Intermodal Freight Transportation**|Jose Tupayachi et.al.|[2405.19255v1](http://arxiv.org/abs/2405.19255v1)|null|\n", "2405.19209": "|**2024-05-29**|**VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos**|Ziyang Wang et.al.|[2405.19209v1](http://arxiv.org/abs/2405.19209v1)|**[link](https://github.com/Ziyang412/VideoTree)**|\n", "2405.19164": "|**2024-05-29**|**Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery**|Sounak Lahiri et.al.|[2405.19164v1](http://arxiv.org/abs/2405.19164v1)|null|\n", "2405.19109": "|**2024-05-29**|**PathReasoner: Modeling Reasoning Path with Equivalent Extension for Logical Question Answering**|Fangzhi Xu et.al.|[2405.19109v1](http://arxiv.org/abs/2405.19109v1)|null|\n", "2405.19076": "|**2024-06-02**|**Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design**|Markus J. Buehler et.al.|[2405.19076v2](http://arxiv.org/abs/2405.19076v2)|**[link](https://github.com/lamm-mit/Cephalo)**|\n", "2405.18915": "|**2024-05-29**|**Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners**|Jiachun Li et.al.|[2405.18915v1](http://arxiv.org/abs/2405.18915v1)|null|\n", "2405.18870": "|**2024-05-31**|**LLMs achieve adult human performance on higher-order theory of mind tasks**|Winnie Street et.al.|[2405.18870v2](http://arxiv.org/abs/2405.18870v2)|null|\n", "2405.18732": "|**2024-06-02**|**Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts**|S. Mostafa Mousavi et.al.|[2405.18732v2](http://arxiv.org/abs/2405.18732v2)|null|\n", "2405.18718": "|**2024-05-29**|**Efficient Model-agnostic Alignment via Bayesian Persuasion**|Fengshuo Bai et.al.|[2405.18718v1](http://arxiv.org/abs/2405.18718v1)|null|\n", "2405.18711": "|**2024-05-29**|**Calibrating Reasoning in Language Models with Internal Consistency**|Zhihui Xie et.al.|[2405.18711v1](http://arxiv.org/abs/2405.18711v1)|null|\n", "2405.18641": "|**2024-05-30**|**Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning**|Tiansheng Huang et.al.|[2405.18641v2](http://arxiv.org/abs/2405.18641v2)|**[link](https://github.com/git-disl/lisa)**|\n", "2405.20340": "|**2024-05-30**|**MotionLLM: Understanding Human Behaviors from Human Motions and Videos**|Ling-Hao Chen et.al.|[2405.20340v1](http://arxiv.org/abs/2405.20340v1)|null|\n", "2405.20192": "|**2024-05-30**|**TAIA: Large Language Models are Out-of-Distribution Data Learners**|Shuyang Jiang et.al.|[2405.20192v1](http://arxiv.org/abs/2405.20192v1)|**[link](https://github.com/pixas/TAIA_LLM)**|\n", "2405.20189": "|**2024-05-30**|**Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory**|Hangyeol Kang et.al.|[2405.20189v1](http://arxiv.org/abs/2405.20189v1)|null|\n", "2405.20163": "|**2024-05-30**|**Reasoning about concepts with LLMs: Inconsistencies abound**|Rosario Uceda-Sosa et.al.|[2405.20163v1](http://arxiv.org/abs/2405.20163v1)|null|\n", "2405.20139": "|**2024-05-30**|**GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning**|Costas Mavromatis et.al.|[2405.20139v1](http://arxiv.org/abs/2405.20139v1)|**[link](https://github.com/cmavro/gnn-rag)**|\n", "2405.19842": "|**2024-05-30**|**Improve Student's Reasoning Generalizability through Cascading Decomposed CoTs Distillation**|Chengwei Dai et.al.|[2405.19842v1](http://arxiv.org/abs/2405.19842v1)|**[link](https://github.com/c-w-d/cascod)**|\n", "2405.19773": "|**2024-05-30**|**VQA Training Sets are Self-play Environments for Generating Few-shot Pools**|Tautvydas Misiunas et.al.|[2405.19773v1](http://arxiv.org/abs/2405.19773v1)|null|\n", "2405.19737": "|**2024-05-30**|**Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation**|Chengwei Dai et.al.|[2405.19737v1](http://arxiv.org/abs/2405.19737v1)|**[link](https://github.com/c-w-d/edit)**|\n", "2405.19716": "|**2024-05-30**|**Enhancing Large Vision Language Models with Self-Training on Image Comprehension**|Yihe Deng et.al.|[2405.19716v1](http://arxiv.org/abs/2405.19716v1)|null|\n", "2405.19668": "|**2024-05-30**|**AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization**|Jiawei Chen et.al.|[2405.19668v1](http://arxiv.org/abs/2405.19668v1)|null|\n", "2405.19616": "|**2024-06-01**|**Easy Problems That LLMs Get Wrong**|Sean Williams et.al.|[2405.19616v2](http://arxiv.org/abs/2405.19616v2)|**[link](https://github.com/autogenai/easy-problems-that-llms-get-wrong)**|\n", "2405.19578": "|**2024-05-30**|**The Accuracy of Domain Specific and Descriptive Analysis Generated by Large Language Models**|Denish Omondi Otieno et.al.|[2405.19578v1](http://arxiv.org/abs/2405.19578v1)|null|\n", "2405.19561": "|**2024-05-29**|**Quo Vadis ChatGPT? From Large Language Models to Large Knowledge Models**|Venkat Venkatasubramanian et.al.|[2405.19561v1](http://arxiv.org/abs/2405.19561v1)|null|\n", "2405.19444": "|**2024-05-29**|**MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions**|Zhenwen Liang et.al.|[2405.19444v1](http://arxiv.org/abs/2405.19444v1)|**[link](https://github.com/zhenwen-nlp/mathchat)**|\n", "2405.20978": "|**2024-05-31**|**Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training**|Feiteng Fang et.al.|[2405.20978v1](http://arxiv.org/abs/2405.20978v1)|null|\n", "2405.20974": "|**2024-06-05**|**SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales**|Tianyang Xu et.al.|[2405.20974v2](http://arxiv.org/abs/2405.20974v2)|**[link](https://github.com/xu1868/sayself)**|\n", "2405.20962": "|**2024-06-03**|**Large Language Models are Zero-Shot Next Location Predictors**|Ciro Beneduce et.al.|[2405.20962v2](http://arxiv.org/abs/2405.20962v2)|**[link](https://github.com/ssai-trento/llm-zero-shot-nl)**|\n", "2405.20902": "|**2024-05-31**|**Preemptive Answer \"Attacks\" on Chain-of-Thought Reasoning**|Rongwu Xu et.al.|[2405.20902v1](http://arxiv.org/abs/2405.20902v1)|null|\n", "2405.20834": "|**2024-05-31**|**Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning**|Cheng Tan et.al.|[2405.20834v1](http://arxiv.org/abs/2405.20834v1)|null|\n", "2405.20625": "|**2024-05-31**|**Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning**|Atharva Gundawar et.al.|[2405.20625v1](http://arxiv.org/abs/2405.20625v1)|null|\n", "2405.20535": "|**2024-05-30**|**Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning**|Xinlu Zhang et.al.|[2405.20535v1](http://arxiv.org/abs/2405.20535v1)|null|\n", "2405.20441": "|**2024-05-30**|**SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory**|Dipkamal Bhusal et.al.|[2405.20441v1](http://arxiv.org/abs/2405.20441v1)|null|\n", "2405.20774": "|**2024-05-27**|**Exploring Backdoor Attacks against Large Language Model-based Decision Making**|Ruochen Jiao et.al.|[2405.20774v1](http://arxiv.org/abs/2405.20774v1)|null|\n", "2406.02394": "|**2024-06-04**|**Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data**|Maxime Griot et.al.|[2406.02394v1](http://arxiv.org/abs/2406.02394v1)|**[link](https://github.com/maximegmd/glianorex-gen)**|\n", "2406.02356": "|**2024-06-04**|**Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks**|Andrew Gambardella et.al.|[2406.02356v1](http://arxiv.org/abs/2406.02356v1)|null|\n", "2406.02301": "|**2024-06-04**|**mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models**|Huiyuan Lai et.al.|[2406.02301v1](http://arxiv.org/abs/2406.02301v1)|**[link](https://github.com/laihuiyuan/mcot)**|\n", "2406.02128": "|**2024-06-04**|**Iteration Head: A Mechanistic Study of Chain-of-Thought**|Vivien Cabannes et.al.|[2406.02128v1](http://arxiv.org/abs/2406.02128v1)|null|\n", "2406.02106": "|**2024-06-04**|**MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset**|Weiqi Wang et.al.|[2406.02106v1](http://arxiv.org/abs/2406.02106v1)|**[link](https://github.com/hkust-knowcomp/mars)**|\n", "2406.02100": "|**2024-06-04**|**Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data**|Haolong Li et.al.|[2406.02100v1](http://arxiv.org/abs/2406.02100v1)|null|\n", "2406.02061": "|**2024-06-05**|**Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models**|Marianna Nezhurina et.al.|[2406.02061v2](http://arxiv.org/abs/2406.02061v2)|**[link](https://github.com/laion-ai/aiw)**|\n", "2406.02030": "|**2024-06-05**|**Multimodal Reasoning with Multimodal Knowledge Graph**|Junlin Lee et.al.|[2406.02030v2](http://arxiv.org/abs/2406.02030v2)|null|\n", "2406.02018": "|**2024-06-04**|**Why Would You Suggest That? Human Trust in Language Model Responses**|Manasi Sharma et.al.|[2406.02018v1](http://arxiv.org/abs/2406.02018v1)|null|\n", "2406.01940": "|**2024-06-04**|**Process-Driven Autoformalization in Lean 4**|Jianqiao Lu et.al.|[2406.01940v1](http://arxiv.org/abs/2406.01940v1)|**[link](https://github.com/rookie-joe/pda)**|\n", "2406.01587": "|**2024-06-04**|**PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning**|Yupeng Zheng et.al.|[2406.01587v2](http://arxiv.org/abs/2406.01587v2)|null|\n", "2406.01563": "|**2024-06-03**|**LoFiT: Localized Fine-tuning on LLM Representations**|Fangcong Yin et.al.|[2406.01563v1](http://arxiv.org/abs/2406.01563v1)|**[link](https://github.com/fc2869/lo-fit)**|\n", "2406.01311": "|**2024-06-03**|**FactGenius: Combining Zero-Shot Prompting and Fuzzy Relation Mining to Improve Fact Verification with Knowledge Graphs**|Sushant Gautam et.al.|[2406.01311v1](http://arxiv.org/abs/2406.01311v1)|null|\n", "2406.01238": "|**2024-06-03**|**EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs**|Zixuan Dong et.al.|[2406.01238v1](http://arxiv.org/abs/2406.01238v1)|null|\n", "2406.01145": "|**2024-06-03**|**Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph**|Guangyi Liu et.al.|[2406.01145v1](http://arxiv.org/abs/2406.01145v1)|null|\n", "2406.01006": "|**2024-06-03**|**SemCoder: Training Code Language Models with Comprehensive Semantics**|Yangruibo Ding et.al.|[2406.01006v1](http://arxiv.org/abs/2406.01006v1)|null|\n", "2406.00965": "|**2024-06-04**|**Efficient Behavior Tree Planning with Commonsense Pruning and Heuristic**|Xinglin Chen et.al.|[2406.00965v2](http://arxiv.org/abs/2406.00965v2)|null|\n", "2406.00922": "|**2024-06-04**|**MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning**|Shuyue Stella Li et.al.|[2406.00922v2](http://arxiv.org/abs/2406.00922v2)|**[link](https://github.com/stellali7/mediq)**|\n", "2406.00894": "|**2024-06-02**|**Pretrained Hybrids with MAD Skills**|Nicholas Roberts et.al.|[2406.00894v1](http://arxiv.org/abs/2406.00894v1)|null|\n", "2406.00872": "|**2024-06-02**|**OLIVE: Object Level In-Context Visual Embeddings**|Timothy Ossowski et.al.|[2406.00872v1](http://arxiv.org/abs/2406.00872v1)|**[link](https://github.com/tossowski/OLIVE)**|\n", "2406.00806": "|**2024-06-02**|**Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection**|Chentao Cao et.al.|[2406.00806v1](http://arxiv.org/abs/2406.00806v1)|null|\n", "2406.00755": "|**2024-06-02**|**Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction**|Xiaoyuan Li et.al.|[2406.00755v1](http://arxiv.org/abs/2406.00755v1)|**[link](https://github.com/littlecirc1e/eic)**|\n", "2406.00451": "|**2024-06-01**|**Task Planning for Object Rearrangement in Multi-room Environments**|Karan Mirakhor et.al.|[2406.00451v1](http://arxiv.org/abs/2406.00451v1)|null|\n", "2406.00430": "|**2024-06-01**|**Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners**|Zhi Zheng et.al.|[2406.00430v1](http://arxiv.org/abs/2406.00430v1)|null|\n", "2406.00284": "|**2024-06-01**|**A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters**|Long Hei Matthew Lam et.al.|[2406.00284v1](http://arxiv.org/abs/2406.00284v1)|**[link](https://github.com/mattylam/logic_symbolic_solvers_experiment)**|\n", "2406.00257": "|**2024-06-01**|**Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs**|Mohammed Saidul Islam et.al.|[2406.00257v1](http://arxiv.org/abs/2406.00257v1)|null|\n", "2406.00252": "|**2024-06-05**|**Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey**|Bowen Jiang et.al.|[2406.00252v2](http://arxiv.org/abs/2406.00252v2)|**[link](https://github.com/bowen-upenn/mmma_rationality)**|\n", "2406.00222": "|**2024-05-31**|**Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training**|Maximillian Chen et.al.|[2406.00222v1](http://arxiv.org/abs/2406.00222v1)|null|\n", "2406.00179": "|**2024-05-31**|**Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation**|Bernd Bohnet et.al.|[2406.00179v1](http://arxiv.org/abs/2406.00179v1)|null|\n", "2406.00132": "|**2024-05-31**|**QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation**|Zhuo Chen et.al.|[2406.00132v1](http://arxiv.org/abs/2406.00132v1)|null|\n", "2406.00115": "|**2024-05-31**|**Towards LLM-Powered Verilog RTL Assistant: Self-Verification and Self-Correction**|Hanxian Huang et.al.|[2406.00115v1](http://arxiv.org/abs/2406.00115v1)|null|\n", "2406.03474": "|**2024-06-05**|**AD-H: Autonomous Driving with Hierarchical Agents**|Zaibin Zhang et.al.|[2406.03474v1](http://arxiv.org/abs/2406.03474v1)|null|\n", "2406.03445": "|**2024-06-05**|**Pre-trained Large Language Models Use Fourier Features to Compute Addition**|Tianyi Zhou et.al.|[2406.03445v1](http://arxiv.org/abs/2406.03445v1)|null|\n", "2406.03368": "|**2024-06-05**|**IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models**|David Ifeoluwa Adelani et.al.|[2406.03368v1](http://arxiv.org/abs/2406.03368v1)|null|\n", "2406.03367": "|**2024-06-05**|**CLMASP: Coupling Large Language Models with Answer Set Programming for Robotic Task Planning**|Xinrui Lin et.al.|[2406.03367v1](http://arxiv.org/abs/2406.03367v1)|null|\n", "2406.03248": "|**2024-06-06**|**Large Language Models as Evaluators for Recommendation Explanations**|Xiaoyu Zhang et.al.|[2406.03248v2](http://arxiv.org/abs/2406.03248v2)|**[link](https://github.com/xiaoyu-sz/llmasevaluator)**|\n", "2406.03181": "|**2024-06-05**|**Missci: Reconstructing Fallacies in Misrepresented Science**|Max Glockner et.al.|[2406.03181v1](http://arxiv.org/abs/2406.03181v1)|**[link](https://github.com/UKPLab/acl2024-missci)**|\n", "2406.03085": "|**2024-06-05**|**Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation**|Tingjia Shen et.al.|[2406.03085v1](http://arxiv.org/abs/2406.03085v1)|null|\n", "2406.03068": "|**2024-06-05**|**How Truncating Weights Improves Reasoning in Language Models**|Lei Chen et.al.|[2406.03068v1](http://arxiv.org/abs/2406.03068v1)|null|\n", "2406.03003": "|**2024-06-05**|**Verified Code Transpilation with LLMs**|Sahil Bhatia et.al.|[2406.03003v1](http://arxiv.org/abs/2406.03003v1)|null|\n", "2406.02864": "|**2024-06-05**|**NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models**|Ancheng Xu et.al.|[2406.02864v1](http://arxiv.org/abs/2406.02864v1)|**[link](https://github.com/cas-siat-consistencyai/numcot)**|\n", "2406.02863": "|**2024-06-05**|**LLM as a Scorer: The Impact of Output Order on Dialogue Evaluation**|Yi-Pei Chen et.al.|[2406.02863v1](http://arxiv.org/abs/2406.02863v1)|null|\n", "2406.02844": "|**2024-06-05**|**Item-Language Model for Conversational Recommendation**|Li Yang et.al.|[2406.02844v1](http://arxiv.org/abs/2406.02844v1)|null|\n", "2406.02818": "|**2024-06-04**|**Chain of Agents: Large Language Models Collaborating on Long-Context Tasks**|Yusen Zhang et.al.|[2406.02818v1](http://arxiv.org/abs/2406.02818v1)|null|\n", "2406.02804": "|**2024-06-04**|**$\\texttt{ACCORD}$: Closing the Commonsense Measurability Gap**|Fran\u00e7ois Roewer-Despr\u00e9s et.al.|[2406.02804v1](http://arxiv.org/abs/2406.02804v1)|**[link](https://github.com/francois-rd/accord)**|\n", "2406.02787": "|**2024-06-04**|**Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities**|Wenyue Hua et.al.|[2406.02787v1](http://arxiv.org/abs/2406.02787v1)|null|\n", "2406.02764": "|**2024-06-04**|**Adaptive Preference Scaling for Reinforcement Learning with Human Feedback**|Ilgee Hong et.al.|[2406.02764v1](http://arxiv.org/abs/2406.02764v1)|null|\n", "2406.02746": "|**2024-06-09**|**RATT: A Thought Structure for Coherent and Correct LLM Reasoning**|Jinghan Zhang et.al.|[2406.02746v2](http://arxiv.org/abs/2406.02746v2)|null|\n", "2406.02721": "|**2024-06-04**|**Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller**|Min Cai et.al.|[2406.02721v1](http://arxiv.org/abs/2406.02721v1)|**[link](https://github.com/henrycai11/llm-self-control)**|\n", "2406.04339": "|**2024-06-06**|**RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation**|Jiaming Liu et.al.|[2406.04339v1](http://arxiv.org/abs/2406.04339v1)|null|\n", "2406.04300": "|**2024-06-06**|**Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models**|Phat Nguyen et.al.|[2406.04300v1](http://arxiv.org/abs/2406.04300v1)|null|\n", "2406.04276": "|**2024-06-06**|**Generative AI-in-the-loop: Integrating LLMs and GPTs into the Next Generation Networks**|Han Zhang et.al.|[2406.04276v1](http://arxiv.org/abs/2406.04276v1)|null|\n", "2406.04271": "|**2024-06-06**|**Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models**|Ling Yang et.al.|[2406.04271v1](http://arxiv.org/abs/2406.04271v1)|**[link](https://github.com/yangling0818/buffer-of-thought-llm)**|\n", "2406.04197": "|**2024-06-06**|**DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning**|Shangqing Tu et.al.|[2406.04197v1](http://arxiv.org/abs/2406.04197v1)|**[link](https://github.com/thu-keg/dice)**|\n", "2406.04046": "|**2024-06-06**|**ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints**|Divij Handa et.al.|[2406.04046v1](http://arxiv.org/abs/2406.04046v1)|null|\n", "2406.04031": "|**2024-06-06**|**Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt**|Zonghao Ying et.al.|[2406.04031v1](http://arxiv.org/abs/2406.04031v1)|**[link](https://github.com/NY1024/BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt)**|\n", "2406.03843": "|**2024-06-14**|**POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models**|Jianben He et.al.|[2406.03843v2](http://arxiv.org/abs/2406.03843v2)|null|\n", "2406.03807": "|**2024-06-06**|**Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering**|Yanming Liu et.al.|[2406.03807v1](http://arxiv.org/abs/2406.03807v1)|**[link](https://github.com/OceannTwT/Tool-Planner)**|\n", "2406.03768": "|**2024-06-06**|**Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective**|Xinhao Yao et.al.|[2406.03768v1](http://arxiv.org/abs/2406.03768v1)|**[link](https://github.com/chen123ctrls/enhancingicl_svdpruning)**|\n", "2406.03753": "|**2024-06-06**|**VisLTR: Visualization-in-the-Loop Table Reasoning**|Jianing Hao et.al.|[2406.03753v1](http://arxiv.org/abs/2406.03753v1)|null|\n", "2406.03712": "|**2024-06-06**|**A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions**|Lei Liu et.al.|[2406.03712v1](http://arxiv.org/abs/2406.03712v1)|null|\n", "2406.03689": "|**2024-06-06**|**Evaluating the World Model Implicit in a Generative Model**|Keyon Vafa et.al.|[2406.03689v1](http://arxiv.org/abs/2406.03689v1)|**[link](https://github.com/keyonvafa/world-model-evaluation)**|\n", "2406.03618": "|**2024-06-05**|**TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools**|Avi Caciularu et.al.|[2406.03618v1](http://arxiv.org/abs/2406.03618v1)|null|\n", "2406.05055": "|**2024-06-07**|**Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions**|Shi-Yu Tian et.al.|[2406.05055v1](http://arxiv.org/abs/2406.05055v1)|null|\n", "2406.04952": "|**2024-06-07**|**Quantifying Geospatial in the Common Crawl Corpus**|Ilya Ilyankou et.al.|[2406.04952v1](http://arxiv.org/abs/2406.04952v1)|null|\n", "2406.04926": "|**2024-06-07**|**Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models**|Micha\u0142 Romaszewski et.al.|[2406.04926v1](http://arxiv.org/abs/2406.04926v1)|null|\n", "2406.04866": "|**2024-06-07**|**ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering**|Raphael Gruber et.al.|[2406.04866v1](http://arxiv.org/abs/2406.04866v1)|**[link](https://github.com/datascienceuibk/complextempqa)**|\n", "2406.04817": "|**2024-06-07**|**Experiences from Integrating Large Language Model Chatbots into the Classroom**|Arto Hellas et.al.|[2406.04817v1](http://arxiv.org/abs/2406.04817v1)|null|\n", "2406.04800": "|**2024-06-07**|**Zero, Finite, and Infinite Belief History of Theory of Mind Reasoning in Large Language Models**|Weizhi Tang et.al.|[2406.04800v1](http://arxiv.org/abs/2406.04800v1)|null|\n", "2406.04758": "|**2024-06-07**|**Think out Loud: Emotion Deducing Explanation in Dialogues**|Jiangnan Li et.al.|[2406.04758v1](http://arxiv.org/abs/2406.04758v1)|null|\n", "2406.04687": "|**2024-06-07**|**LogiCode: an LLM-Driven Framework for Logical Anomaly Detection**|Yiheng Zhang et.al.|[2406.04687v1](http://arxiv.org/abs/2406.04687v1)|**[link](https://github.com/22strongestme/LOCO-Annotations)**|\n", "2406.04659": "|**2024-06-07**|**LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model**|Dongkai Wang et.al.|[2406.04659v1](http://arxiv.org/abs/2406.04659v1)|**[link](https://github.com/kennethwdk/LocLLM)**|\n", "2406.04640": "|**2024-06-07**|**LinkGPT: Teaching Large Language Models To Predict Missing Links**|Zhongmou He et.al.|[2406.04640v1](http://arxiv.org/abs/2406.04640v1)|null|\n", "2406.04615": "|**2024-06-07**|**What do MLLMs hear? Examining reasoning with text and sound components in Multimodal Large Language Models**|Enis Berk \u00c7oban et.al.|[2406.04615v1](http://arxiv.org/abs/2406.04615v1)|null|\n", "2406.04568": "|**2024-06-07**|**StackSight: Unveiling WebAssembly through Large Language Models and Neurosymbolic Chain-of-Thought Decompilation**|Weike Fang et.al.|[2406.04568v1](http://arxiv.org/abs/2406.04568v1)|null|\n", "2406.04566": "|**2024-06-07**|**SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models**|Md Imbesat Hassan Rizvi et.al.|[2406.04566v1](http://arxiv.org/abs/2406.04566v1)|**[link](https://github.com/ukplab/acl2024-sparc-and-sparp)**|\n", "2406.04501": "|**2024-06-06**|**FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models**|Max Zhu et.al.|[2406.04501v1](http://arxiv.org/abs/2406.04501v1)|null|\n", "2406.04496": "|**2024-06-06**|**Time Sensitive Knowledge Editing through Efficient Finetuning**|Xiou Ge et.al.|[2406.04496v1](http://arxiv.org/abs/2406.04496v1)|null|\n", "2406.04464": "|**2024-06-06**|**On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing**|Alexander Kovrigin et.al.|[2406.04464v1](http://arxiv.org/abs/2406.04464v1)|**[link](https://github.com/jetbrains-research/ai-agents-code-editing)**|\n", "2406.04449": "|**2024-06-06**|**MAIRA-2: Grounded Radiology Report Generation**|Shruthi Bannur et.al.|[2406.04449v1](http://arxiv.org/abs/2406.04449v1)|null|\n", "2406.04428": "|**2024-06-06**|**MoralBench: Moral Evaluation of LLMs**|Jianchao Ji et.al.|[2406.04428v1](http://arxiv.org/abs/2406.04428v1)|**[link](https://github.com/agiresearch/moralbench)**|\n", "2406.06474": "|**2024-06-10**|**Towards a Personal Health Large Language Model**|Justin Cosentino et.al.|[2406.06474v1](http://arxiv.org/abs/2406.06474v1)|null|\n", "2406.06464": "|**2024-06-11**|**Transforming Wearable Data into Health Insights using Large Language Model Agents**|Mike A. Merrill et.al.|[2406.06464v2](http://arxiv.org/abs/2406.06464v2)|null|\n", "2406.06461": "|**2024-06-15**|**Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies**|Junlin Wang et.al.|[2406.06461v3](http://arxiv.org/abs/2406.06461v3)|null|\n", "2406.06326": "|**2024-06-15**|**Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching**|Xiaoying Zhang et.al.|[2406.06326v3](http://arxiv.org/abs/2406.06326v3)|null|\n", "2406.06196": "|**2024-06-11**|**LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages**|Andrew M. Bean et.al.|[2406.06196v2](http://arxiv.org/abs/2406.06196v2)|**[link](https://github.com/am-bean/lingOly)**|\n", "2406.06124": "|**2024-06-10**|**Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation**|Aadharsh Aadhithya A et.al.|[2406.06124v1](http://arxiv.org/abs/2406.06124v1)|null|\n", "2406.05968": "|**2024-06-10**|**Prompting Large Language Models with Audio for General-Purpose Speech Summarization**|Wonjune Kang et.al.|[2406.05968v1](http://arxiv.org/abs/2406.05968v1)|**[link](https://github.com/wonjune-kang/llm-speech-summarization)**|\n", "2406.05967": "|**2024-06-10**|**CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark**|David Romero et.al.|[2406.05967v1](http://arxiv.org/abs/2406.05967v1)|null|\n", "2406.05948": "|**2024-06-10**|**Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models**|Xi Li et.al.|[2406.05948v1](http://arxiv.org/abs/2406.05948v1)|null|\n", "2406.05925": "|**2024-06-09**|**Hello Again! LLM-powered Personalized Agent for Long-term Dialogue**|Hao Li et.al.|[2406.05925v1](http://arxiv.org/abs/2406.05925v1)|**[link](https://github.com/leolee99/ld-agent)**|\n", "2406.05918": "|**2024-06-09**|**Why Don't Prompt-Based Fairness Metrics Correlate?**|Abdelrahman Zayed et.al.|[2406.05918v1](http://arxiv.org/abs/2406.05918v1)|null|\n", "2406.05881": "|**2024-06-09**|**LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning**|Utsav Singh et.al.|[2406.05881v1](http://arxiv.org/abs/2406.05881v1)|null|\n", "2406.05804": "|**2024-06-09**|**A Survey on LLM-Based Agentic Workflows and LLM-Profiled Components**|Xinzhe Li et.al.|[2406.05804v1](http://arxiv.org/abs/2406.05804v1)|null|\n", "2406.05673": "|**2024-06-09**|**Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking**|Fangxu Yu et.al.|[2406.05673v1](http://arxiv.org/abs/2406.05673v1)|**[link](https://github.com/yu-fangxu/for)**|\n", "2406.05659": "|**2024-06-09**|**Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses**|Maryam Amirizaniani et.al.|[2406.05659v1](http://arxiv.org/abs/2406.05659v1)|null|\n", "2406.05516": "|**2024-06-08**|**Verbalized Probabilistic Graphical Modeling with Large Language Models**|Hengguan Huang et.al.|[2406.05516v1](http://arxiv.org/abs/2406.05516v1)|null|\n", "2406.05506": "|**2024-06-08**|**Towards a Benchmark for Causal Business Process Reasoning with LLMs**|Fabiana Fournier et.al.|[2406.05506v1](http://arxiv.org/abs/2406.05506v1)|null|\n", "2406.05494": "|**2024-06-08**|**Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation**|Neeraj Varshney et.al.|[2406.05494v1](http://arxiv.org/abs/2406.05494v1)|null|\n", "2406.05322": "|**2024-06-08**|**Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from Imperfect Teacher Models in Low-Budget Scenarios**|Yuhang Zhou et.al.|[2406.05322v1](http://arxiv.org/abs/2406.05322v1)|null|\n", "2406.05194": "|**2024-06-07**|**LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs**|Arash Gholami Davoodi et.al.|[2406.05194v1](http://arxiv.org/abs/2406.05194v1)|**[link](https://github.com/arashgholami/MaTT)**|\n", "2406.07528": "|**2024-06-11**|**QuickLLaMA: Query-aware Inference Acceleration for Large Language Models**|Jingyao Li et.al.|[2406.07528v1](http://arxiv.org/abs/2406.07528v1)|**[link](https://github.com/dvlab-research/q-llm)**|\n", "2406.07496": "|**2024-06-11**|**TextGrad: Automatic \"Differentiation\" via Text**|Mert Yuksekgonul et.al.|[2406.07496v1](http://arxiv.org/abs/2406.07496v1)|**[link](https://github.com/zou-group/textgrad)**|\n", "2406.07476": "|**2024-06-17**|**VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs**|Zesen Cheng et.al.|[2406.07476v2](http://arxiv.org/abs/2406.07476v2)|**[link](https://github.com/damo-nlp-sg/videollama2)**|\n", "2406.07444": "|**2024-06-11**|**On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations**|Shiao Meng et.al.|[2406.07444v1](http://arxiv.org/abs/2406.07444v1)|**[link](https://github.com/THU-BPM/Env-DocRE)**|\n", "2406.07394": "|**2024-06-13**|**Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B**|Di Zhang et.al.|[2406.07394v2](http://arxiv.org/abs/2406.07394v2)|**[link](https://github.com/trotsky1997/mathblackbox)**|\n", "2406.07393": "|**2024-06-11**|**Limited Out-of-Context Knowledge Reasoning in Large Language Models**|Peng Hu et.al.|[2406.07393v1](http://arxiv.org/abs/2406.07393v1)|null|\n", "2406.07378": "|**2024-06-11**|**Large Language Models for Constrained-Based Causal Discovery**|Kai-Hendrik Cohrs et.al.|[2406.07378v1](http://arxiv.org/abs/2406.07378v1)|**[link](https://github.com/ipl-uv/causal_gpt)**|\n", "2406.07353": "|**2024-06-11**|**Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities**|Delfina Sol Martinez Pandiani et.al.|[2406.07353v1](http://arxiv.org/abs/2406.07353v1)|null|\n", "2406.07296": "|**2024-06-11**|**Instruct Large Language Models to Drive like Humans**|Ruijun Zhang et.al.|[2406.07296v1](http://arxiv.org/abs/2406.07296v1)|**[link](https://github.com/bonbon-rj/instructdriver)**|\n", "2406.07230": "|**2024-06-11**|**Needle In A Multimodal Haystack**|Weiyun Wang et.al.|[2406.07230v1](http://arxiv.org/abs/2406.07230v1)|**[link](https://github.com/opengvlab/mm-niah)**|\n", "2406.07155": "|**2024-06-11**|**Scaling Large-Language-Model-based Multi-Agent Collaboration**|Chen Qian et.al.|[2406.07155v1](http://arxiv.org/abs/2406.07155v1)|**[link](https://github.com/openbmb/chatdev)**|\n", "2406.07115": "|**2024-06-11**|**Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees**|Sijia Chen et.al.|[2406.07115v1](http://arxiv.org/abs/2406.07115v1)|null|\n", "2406.07113": "|**2024-06-17**|**Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph**|Sergey Linok et.al.|[2406.07113v2](http://arxiv.org/abs/2406.07113v2)|null|\n", "2406.07080": "|**2024-06-11**|**DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs**|Haishuo Fang et.al.|[2406.07080v1](http://arxiv.org/abs/2406.07080v1)|**[link](https://github.com/UKPLab/acl2024-DARA)**|\n", "2406.06947": "|**2024-06-11**|**CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only**|Junhee Cho et.al.|[2406.06947v1](http://arxiv.org/abs/2406.06947v1)|**[link](https://github.com/caap-agent/caap-agent)**|\n", "2406.06870": "|**2024-06-15**|**What's in an embedding? Would a rose by any embedding smell as sweet?**|Venkat Venkatasubramanian et.al.|[2406.06870v3](http://arxiv.org/abs/2406.06870v3)|null|\n", "2406.06865": "|**2024-06-11**|**Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems**|Mohammed Elhenawy et.al.|[2406.06865v1](http://arxiv.org/abs/2406.06865v1)|null|\n", "2406.06863": "|**2024-06-11**|**Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent Cybersecurity**|Tam n. Nguyen et.al.|[2406.06863v1](http://arxiv.org/abs/2406.06863v1)|**[link](https://github.com/cybonto/ollabench)**|\n", "2406.06613": "|**2024-06-07**|**GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents**|Anthony Costarelli et.al.|[2406.06613v1](http://arxiv.org/abs/2406.06613v1)|**[link](https://github.com/Joshuaclymer/GameBench)**|\n", "2406.06610": "|**2024-06-06**|**Reinterpreting 'the Company a Word Keeps': Towards Explainable and Ontologically Grounded Language Models**|Walid S. Saba et.al.|[2406.06610v1](http://arxiv.org/abs/2406.06610v1)|null|\n", "2406.06592": "|**2024-06-05**|**Improve Mathematical Reasoning in Language Models by Automated Process Supervision**|Liangchen Luo et.al.|[2406.06592v1](http://arxiv.org/abs/2406.06592v1)|null|\n", "2406.06588": "|**2024-06-05**|**Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models**|Flavio Petruzzellis et.al.|[2406.06588v1](http://arxiv.org/abs/2406.06588v1)|null|\n", "2406.06586": "|**2024-06-05**|**Bi-Chainer: Automated Large Language Models Reasoning with Bidirectional Chaining**|Shuqi Liu et.al.|[2406.06586v1](http://arxiv.org/abs/2406.06586v1)|null|\n", "2406.06580": "|**2024-06-04**|**Break the Chain: Large Language Models Can be Shortcut Reasoners**|Mengru Ding et.al.|[2406.06580v1](http://arxiv.org/abs/2406.06580v1)|null|\n", "2406.06579": "|**2024-06-04**|**From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models**|Xiaofeng Zhang et.al.|[2406.06579v1](http://arxiv.org/abs/2406.06579v1)|null|\n", "2406.08223": "|**2024-06-12**|**Research Trends for the Interplay between Large Language Models and Knowledge Graphs**|Hanieh Khorashadizadeh et.al.|[2406.08223v1](http://arxiv.org/abs/2406.08223v1)|null|\n", "2406.08164": "|**2024-06-12**|**ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs**|Irene Huang et.al.|[2406.08164v1](http://arxiv.org/abs/2406.08164v1)|**[link](https://github.com/jmiemirza/conme)**|\n", "2406.07685": "|**2024-06-11**|**Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions**|Leonardo Cotta et.al.|[2406.07685v1](http://arxiv.org/abs/2406.07685v1)|null|\n", "2406.09418": "|**2024-06-13**|**VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding**|Muhammad Maaz et.al.|[2406.09418v1](http://arxiv.org/abs/2406.09418v1)|**[link](https://github.com/mbzuai-oryx/videogpt-plus)**|\n", "2406.09397": "|**2024-06-13**|**Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms**|Miaosen Zhang et.al.|[2406.09397v1](http://arxiv.org/abs/2406.09397v1)|null|\n", "2406.09187": "|**2024-06-13**|**GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning**|Zhen Xiang et.al.|[2406.09187v1](http://arxiv.org/abs/2406.09187v1)|null|\n", "2406.09175": "|**2024-06-13**|**ReMI: A Dataset for Reasoning with Multiple Images**|Mehran Kazemi et.al.|[2406.09175v1](http://arxiv.org/abs/2406.09175v1)|null|\n", "2406.09170": "|**2024-06-13**|**Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning**|Bahare Fatemi et.al.|[2406.09170v1](http://arxiv.org/abs/2406.09170v1)|null|\n", "2406.09136": "|**2024-06-13**|**Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs**|Xuan Zhang et.al.|[2406.09136v1](http://arxiv.org/abs/2406.09136v1)|**[link](https://github.com/sail-sg/cpo)**|\n", "2406.09121": "|**2024-06-13**|**MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era**|Jiahao Nie et.al.|[2406.09121v1](http://arxiv.org/abs/2406.09121v1)|**[link](https://github.com/niejiahao1998/mmrel)**|\n", "2406.09103": "|**2024-06-13**|**Chain-of-Though (CoT) prompting strategies for medical error detection and correction**|Zhaolong Wu et.al.|[2406.09103v1](http://arxiv.org/abs/2406.09103v1)|null|\n", "2406.09098": "|**2024-06-13**|**SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models**|Kehua Feng et.al.|[2406.09098v1](http://arxiv.org/abs/2406.09098v1)|**[link](https://github.com/hicai-zju/sciknoweval)**|\n", "2406.09072": "|**2024-06-13**|**Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?**|Zhaochen Su et.al.|[2406.09072v1](http://arxiv.org/abs/2406.09072v1)|**[link](https://github.com/zhaochen0110/cotempqa)**|\n", "2406.09044": "|**2024-06-13**|**MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning**|Hanqing Wang et.al.|[2406.09044v1](http://arxiv.org/abs/2406.09044v1)|null|\n", "2406.09043": "|**2024-06-14**|**Language Models are Crossword Solvers**|Soumadeep Saha et.al.|[2406.09043v2](http://arxiv.org/abs/2406.09043v2)|null|\n", "2406.09041": "|**2024-06-13**|**ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models**|Jing Liu et.al.|[2406.09041v1](http://arxiv.org/abs/2406.09041v1)|null|\n", "2406.08862": "|**2024-06-13**|**Cognitively Inspired Energy-Based World Models**|Alexi Gladstone et.al.|[2406.08862v1](http://arxiv.org/abs/2406.08862v1)|null|\n", "2406.08824": "|**2024-06-13**|**LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions**|Rumaisa Azeem et.al.|[2406.08824v1](http://arxiv.org/abs/2406.08824v1)|null|\n", "2406.08811": "|**2024-06-13**|**Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models**|Minghao Wu et.al.|[2406.08811v1](http://arxiv.org/abs/2406.08811v1)|null|\n", "2406.08787": "|**2024-06-13**|**A Survey on Compositional Learning of AI Models: Theoretical and Experimetnal Practices**|Sania Sinha et.al.|[2406.08787v1](http://arxiv.org/abs/2406.08787v1)|null|\n", "2406.08657": "|**2024-06-12**|**Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs**|Chen Zheng et.al.|[2406.08657v1](http://arxiv.org/abs/2406.08657v1)|null|\n", "2406.08648": "|**2024-06-12**|**LLM-Craft: Robotic Crafting of Elasto-Plastic Objects with Large Language Models**|Alison Bartsch et.al.|[2406.08648v1](http://arxiv.org/abs/2406.08648v1)|null|\n", "2406.08587": "|**2024-06-12**|**CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery**|Xiaoshuai Song et.al.|[2406.08587v1](http://arxiv.org/abs/2406.08587v1)|**[link](https://github.com/csbench/csbench)**|\n", "2406.08527": "|**2024-06-12**|**Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning**|Jaehyun Nam et.al.|[2406.08527v1](http://arxiv.org/abs/2406.08527v1)|null|\n", "2406.10149": "|**2024-06-14**|**BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack**|Yuri Kuratov et.al.|[2406.10149v1](http://arxiv.org/abs/2406.10149v1)|null|\n", "2406.10099": "|**2024-06-14**|**Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning**|Jiaqi Li et.al.|[2406.10099v1](http://arxiv.org/abs/2406.10099v1)|null|\n", "2406.10057": "|**2024-06-18**|**First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models**|Enming Zhang et.al.|[2406.10057v2](http://arxiv.org/abs/2406.10057v2)|**[link](https://github.com/360ailab-nlp/flowce)**|\n", "2406.09994": "|**2024-06-14**|**Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models**|Manas Jhalani et.al.|[2406.09994v1](http://arxiv.org/abs/2406.09994v1)|null|\n", "2406.09972": "|**2024-06-14**|**A Better LLM Evaluator for Text Generation: The Impact of Prompt Output Sequencing and Optimization**|KuanChao Chu et.al.|[2406.09972v1](http://arxiv.org/abs/2406.09972v1)|null|\n", "2406.09671": "|**2024-06-14**|**Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam**|Nabor C. Mendon\u00e7a et.al.|[2406.09671v1](http://arxiv.org/abs/2406.09671v1)|**[link](https://github.com/nabormendonca/gpt-4v-enade-cs-2021)**|\n", "2406.09613": "|**2024-06-13**|**ImageNet3D: Towards General-Purpose Object-Level 3D Understanding**|Wufei Ma et.al.|[2406.09613v1](http://arxiv.org/abs/2406.09613v1)|**[link](https://github.com/wufeim/imagenet3d)**|\n", "2406.09455": "|**2024-06-12**|**Pandora: Towards General World Model with Natural Language Actions and Video States**|Jiannan Xiang et.al.|[2406.09455v1](http://arxiv.org/abs/2406.09455v1)|null|\n", "2406.11776": "|**2024-06-17**|**Improving Multi-Agent Debate with Sparse Communication Topology**|Yunxuan Li et.al.|[2406.11776v1](http://arxiv.org/abs/2406.11776v1)|null|\n", "2406.11698": "|**2024-06-17**|**Meta Reasoning for Large Language Models**|Peizhong Gao et.al.|[2406.11698v1](http://arxiv.org/abs/2406.11698v1)|null|\n", "2406.11678": "|**2024-06-17**|**TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy**|Yiqun Chen et.al.|[2406.11678v1](http://arxiv.org/abs/2406.11678v1)|**[link](https://github.com/chenyiqun/TourRank)**|\n", "2406.11651": "|**2024-06-17**|**A Two-dimensional Zero-shot Dialogue State Tracking Evaluation Method using GPT-4**|Ming Gu et.al.|[2406.11651v1](http://arxiv.org/abs/2406.11651v1)|**[link](https://github.com/SLEEPWALKERG/LLM-DST-EVAL)**|\n", "2406.11568": "|**2024-06-17**|**Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models**|Sheng Feng et.al.|[2406.11568v1](http://arxiv.org/abs/2406.11568v1)|**[link](https://github.com/fsfrancis15/brainllm)**|\n", "2406.11566": "|**2024-06-17**|**MEMLA: Enhancing Multilingual Knowledge Editing with Neuron-Masked Low-Rank Adaptation**|Jiakuan Xie et.al.|[2406.11566v1](http://arxiv.org/abs/2406.11566v1)|null|\n", "2406.11548": "|**2024-06-17**|**AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation**|Chuyan Xiong et.al.|[2406.11548v1](http://arxiv.org/abs/2406.11548v1)|null|\n", "2406.11514": "|**2024-06-17**|**Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs**|Yi Fang et.al.|[2406.11514v1](http://arxiv.org/abs/2406.11514v1)|null|\n", "2406.11426": "|**2024-06-17**|**Can AI with High Reasoning Ability Replicate Human-like Decision Making in Economic Experiments?**|Ayato Kitadai et.al.|[2406.11426v1](http://arxiv.org/abs/2406.11426v1)|null|\n", "2406.11391": "|**2024-06-17**|**P-TA: Using Proximal Policy Optimization to Enhance Tabular Data Augmentation via Large Language Models**|Shuo Yang et.al.|[2406.11391v1](http://arxiv.org/abs/2406.11391v1)|null|\n", "2406.11341": "|**2024-06-17**|**A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences**|Leonardo Bertolazzi et.al.|[2406.11341v1](http://arxiv.org/abs/2406.11341v1)|null|\n", "2406.11327": "|**2024-06-17**|**ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding**|Tianren Ma et.al.|[2406.11327v1](http://arxiv.org/abs/2406.11327v1)|null|\n", "2406.11258": "|**2024-06-17**|**Enhancing Biomedical Knowledge Retrieval-Augmented Generation with Self-Rewarding Tree Search and Proximal Policy Optimization**|Minda Hu et.al.|[2406.11258v1](http://arxiv.org/abs/2406.11258v1)|null|\n", "2406.11200": "|**2024-06-18**|**AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval**|Shirley Wu et.al.|[2406.11200v2](http://arxiv.org/abs/2406.11200v2)|**[link](https://github.com/zou-group/avatar)**|\n", "2406.11161": "|**2024-06-17**|**Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning**|Zebang Cheng et.al.|[2406.11161v1](http://arxiv.org/abs/2406.11161v1)|**[link](https://github.com/zebangcheng/emotion-llama)**|\n", "2406.11160": "|**2024-06-21**|**Contextual Knowledge Graph**|Chengjin Xu et.al.|[2406.11160v2](http://arxiv.org/abs/2406.11160v2)|null|\n", "2406.11147": "|**2024-06-19**|**Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG**|Xueying Du et.al.|[2406.11147v2](http://arxiv.org/abs/2406.11147v2)|null|\n", "2406.11132": "|**2024-06-17**|**RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents**|Weizhe Chen et.al.|[2406.11132v1](http://arxiv.org/abs/2406.11132v1)|null|\n", "2406.11107": "|**2024-06-17**|**Exploring Safety-Utility Trade-Offs in Personalized Language Models**|Anvesh Rao Vijjini et.al.|[2406.11107v1](http://arxiv.org/abs/2406.11107v1)|null|\n", "2406.11050": "|**2024-06-16**|**A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners**|Bowen Jiang et.al.|[2406.11050v1](http://arxiv.org/abs/2406.11050v1)|null|\n", "2406.11020": "|**2024-06-16**|**RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models**|Yuqing Wang et.al.|[2406.11020v1](http://arxiv.org/abs/2406.11020v1)|null|\n", "2406.11012": "|**2024-06-18**|**Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game**|Prisha Samadarshi et.al.|[2406.11012v2](http://arxiv.org/abs/2406.11012v2)|**[link](https://github.com/mustafamariam/llm-connections-solver)**|\n", "2406.10999": "|**2024-06-16**|**Not All Bias is Bad: Balancing Rational Deviations and Cognitive Biases in Large Language Model Reasoning**|Liman Wang et.al.|[2406.10999v1](http://arxiv.org/abs/2406.10999v1)|null|\n", "2406.10958": "|**2024-06-18**|**City-LEO: Toward Transparent City Management Using LLM with End-to-End Optimization**|Zihao Jiao et.al.|[2406.10958v2](http://arxiv.org/abs/2406.10958v2)|null|\n", "2406.10950": "|**2024-06-16**|**E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models**|Zhenyu Zhang et.al.|[2406.10950v1](http://arxiv.org/abs/2406.10950v1)|null|\n", "2406.10942": "|**2024-06-16**|**Effective Generative AI: The Human-Algorithm Centaur**|Soroush Saghafian et.al.|[2406.10942v1](http://arxiv.org/abs/2406.10942v1)|null|\n", "2406.10923": "|**2024-06-16**|**Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies**|Hung-Ting Su et.al.|[2406.10923v1](http://arxiv.org/abs/2406.10923v1)|null|\n", "2406.10890": "|**2024-06-16**|**RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models**|Zhuoran Jin et.al.|[2406.10890v1](http://arxiv.org/abs/2406.10890v1)|**[link](https://github.com/jinzhuoran/rwku)**|\n", "2406.10878": "|**2024-06-16**|**Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactions**|Yiming Tang et.al.|[2406.10878v1](http://arxiv.org/abs/2406.10878v1)|null|\n", "2406.10858": "|**2024-06-16**|**Step-level Value Preference Optimization for Mathematical Reasoning**|Guoxin Chen et.al.|[2406.10858v1](http://arxiv.org/abs/2406.10858v1)|null|\n", "2406.10834": "|**2024-06-16**|**Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning**|Joykirat Singh et.al.|[2406.10834v1](http://arxiv.org/abs/2406.10834v1)|null|\n", "2406.10789": "|**2024-06-16**|**Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses**|Zhiwen Fan et.al.|[2406.10789v1](http://arxiv.org/abs/2406.10789v1)|null|\n", "2406.10740": "|**2024-06-15**|**FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models**|Zhikai Zhang et.al.|[2406.10740v1](http://arxiv.org/abs/2406.10740v1)|null|\n", "2406.10638": "|**2024-06-15**|**Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions**|Yexin Liu et.al.|[2406.10638v1](http://arxiv.org/abs/2406.10638v1)|**[link](https://github.com/baai-dcai/multimodal-robustness-benchmark)**|\n", "2406.10625": "|**2024-06-15**|**On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models**|Sree Harsha Tanneru et.al.|[2406.10625v1](http://arxiv.org/abs/2406.10625v1)|null|\n", "2406.10515": "|**2024-06-15**|**Reactor Mk.1 performances: MMLU, HumanEval and BBH test results**|TJ Dunham et.al.|[2406.10515v1](http://arxiv.org/abs/2406.10515v1)|null|\n", "2406.10424": "|**2024-06-14**|**What is the Visual Cognition Gap between Humans and Multimodal LLMs?**|Xu Cao et.al.|[2406.10424v1](http://arxiv.org/abs/2406.10424v1)|**[link](https://github.com/IrohXu/VCog-Bench)**|\n", "2406.10400": "|**2024-06-14**|**Self-Reflection Outcome is Sensitive to Prompt Construction**|Fengyuan Liu et.al.|[2406.10400v1](http://arxiv.org/abs/2406.10400v1)|**[link](https://github.com/michael98liu/mixture-of-prompts)**|\n", "2406.10382": "|**2024-06-18**|**Efficient Prompting for LLM-based Generative Internet of Things**|Bin Xiao et.al.|[2406.10382v2](http://arxiv.org/abs/2406.10382v2)|null|\n", "2406.10305": "|**2024-06-14**|**Unlock the Correlation between Supervised Fine-Tuning and Reinforcement Learning in Training Code Large Language Models**|Jie Chen et.al.|[2406.10305v1](http://arxiv.org/abs/2406.10305v1)|null|\n", "2406.10288": "|**2024-06-12**|**Mimicking User Data: On Mitigating Fine-Tuning Risks in Closed Large Language Models**|Francisco Eiras et.al.|[2406.10288v1](http://arxiv.org/abs/2406.10288v1)|null|\n", "2406.07794": "|**2024-06-16**|**Making Task-Oriented Dialogue Datasets More Natural by Synthetically Generating Indirect User Requests**|Amogh Mannekote et.al.|[2406.07794v2](http://arxiv.org/abs/2406.07794v2)|null|\n", "2406.10261": "|**2024-06-11**|**FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination**|Pengfei Zhou et.al.|[2406.10261v1](http://arxiv.org/abs/2406.10261v1)|null|\n", "2406.10251": "|**2024-06-10**|**The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs**|Mert Yazan et.al.|[2406.10251v1](http://arxiv.org/abs/2406.10251v1)|null|\n", "2406.12846": "|**2024-06-18**|**DrVideo: Document Retrieval Based Long Video Understanding**|Ziyu Ma et.al.|[2406.12846v1](http://arxiv.org/abs/2406.12846v1)|null|\n", "2406.12832": "|**2024-06-18**|**LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation**|Seyedarmin Azizi et.al.|[2406.12832v1](http://arxiv.org/abs/2406.12832v1)|**[link](https://github.com/arminazizi98/lamda)**|\n", "2406.12784": "|**2024-06-18**|**UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions**|Xunzhi Wang et.al.|[2406.12784v1](http://arxiv.org/abs/2406.12784v1)|**[link](https://github.com/Cyno2232/UBENCH)**|\n", "2406.12775": "|**2024-06-18**|**Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries**|Eden Biran et.al.|[2406.12775v1](http://arxiv.org/abs/2406.12775v1)|**[link](https://github.com/edenbiran/HoppingTooLate)**|\n", "2406.12753": "|**2024-06-18**|**OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI**|Zhen Huang et.al.|[2406.12753v1](http://arxiv.org/abs/2406.12753v1)|**[link](https://github.com/gair-nlp/olympicarena)**|\n", "2406.12742": "|**2024-06-18**|**Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning**|Bingchen Zhao et.al.|[2406.12742v1](http://arxiv.org/abs/2406.12742v1)|**[link](https://github.com/dtennant/mirb_eval)**|\n", "2406.12692": "|**2024-06-18**|**MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL**|Arian Askari et.al.|[2406.12692v1](http://arxiv.org/abs/2406.12692v1)|null|\n", "2406.12641": "|**2024-06-18**|**DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?**|Zhouhong Gu et.al.|[2406.12641v1](http://arxiv.org/abs/2406.12641v1)|**[link](https://github.com/MikeGu721/DetectBench)**|\n", "2406.12639": "|**2024-06-18**|**Ask-before-Plan: Proactive Language Agents for Real-World Planning**|Xuan Zhang et.al.|[2406.12639v1](http://arxiv.org/abs/2406.12639v1)|**[link](https://github.com/magicgh/ask-before-plan)**|\n", "2406.12628": "|**2024-06-18**|**Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics**|Chenggang Cui et.al.|[2406.12628v1](http://arxiv.org/abs/2406.12628v1)|null|\n", "2406.12624": "|**2024-06-18**|**Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges**|Aman Singh Thakur et.al.|[2406.12624v1](http://arxiv.org/abs/2406.12624v1)|null|\n", "2406.12585": "|**2024-06-18**|**Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling**|Yao-Ching Yu et.al.|[2406.12585v1](http://arxiv.org/abs/2406.12585v1)|**[link](https://github.com/yaoching0/gac)**|\n", "2406.12572": "|**2024-06-19**|**Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models**|Eldar Kurtic et.al.|[2406.12572v2](http://arxiv.org/abs/2406.12572v2)|**[link](https://github.com/ist-daslab/mathador-lm)**|\n", "2406.12546": "|**2024-06-18**|**Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models**|Philipp Mondorf et.al.|[2406.12546v1](http://arxiv.org/abs/2406.12546v1)|null|\n", "2406.12529": "|**2024-06-18**|**LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation**|Yuhao Wang et.al.|[2406.12529v1](http://arxiv.org/abs/2406.12529v1)|null|\n", "2406.12494": "|**2024-06-18**|**LightPAL: Lightweight Passage Retrieval for Open Domain Multi-Document Summarization**|Masafumi Enomoto et.al.|[2406.12494v1](http://arxiv.org/abs/2406.12494v1)|null|\n", "2406.12479": "|**2024-06-18**|**RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding**|Linrui Xu et.al.|[2406.12479v1](http://arxiv.org/abs/2406.12479v1)|**[link](https://github.com/geox-lab/rs-gpt4v)**|\n", "2406.12386": "|**2024-06-18**|**IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models**|Qiyao Wang et.al.|[2406.12386v1](http://arxiv.org/abs/2406.12386v1)|**[link](https://github.com/Mathsion2/IPEval)**|\n", "2406.12374": "|**2024-06-18**|**Problem-Solving in Language Model Networks**|Ciaran Regan et.al.|[2406.12374v1](http://arxiv.org/abs/2406.12374v1)|**[link](https://github.com/tsukuba-websci/psilmn)**|\n", "2406.12331": "|**2024-06-18**|**Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding**|Weizhi Fei et.al.|[2406.12331v1](http://arxiv.org/abs/2406.12331v1)|null|\n", "2406.12319": "|**2024-06-18**|**PRePair: Pointwise Reasoning Enhance Pairwise Evaluating for Robust Instruction-Following Assessments**|Hawon Jeong et.al.|[2406.12319v1](http://arxiv.org/abs/2406.12319v1)|null|\n", "2406.12288": "|**2024-06-18**|**An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs**|Daking Rai et.al.|[2406.12288v1](http://arxiv.org/abs/2406.12288v1)|null|\n", "2406.12269": "|**2024-06-18**|**Unveiling Implicit Table Knowledge with Question-Then-Pinpoint Reasoner for Insightful Table Summarization**|Kwangwook Seo et.al.|[2406.12269v1](http://arxiv.org/abs/2406.12269v1)|null|\n", "2406.12255": "|**2024-06-18**|**A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning**|Lijie Hu et.al.|[2406.12255v1](http://arxiv.org/abs/2406.12255v1)|null|\n", "2406.12227": "|**2024-06-24**|**Interpretable Catastrophic Forgetting of Large Language Model Fine-tuning via Instruction Vector**|Gangwei Jiang et.al.|[2406.12227v2](http://arxiv.org/abs/2406.12227v2)|null|\n", "2406.12224": "|**2024-06-18**|**Leveraging Large Language Model for Heterogeneous Ad Hoc Teamwork Collaboration**|Xinzhu Liu et.al.|[2406.12224v1](http://arxiv.org/abs/2406.12224v1)|null|\n", "2406.12172": "|**2024-06-18**|**Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems**|Nasim Borazjanizadeh et.al.|[2406.12172v1](http://arxiv.org/abs/2406.12172v1)|null|\n", "2406.12091": "|**2024-06-19**|**Is poisoning a real threat to LLM alignment? Maybe more so than you think**|Pankayaraj Pathmanathan et.al.|[2406.12091v2](http://arxiv.org/abs/2406.12091v2)|**[link](https://github.com/pankayaraj/RLHFPoisoning)**|\n", "2406.12053": "|**2024-06-17**|**InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States**|Mohammad Beigi et.al.|[2406.12053v1](http://arxiv.org/abs/2406.12053v1)|null|\n", "2406.12036": "|**2024-06-17**|**MedCalc-Bench: Evaluating Large Language Models for Medical Calculations**|Nikhil Khandekar et.al.|[2406.12036v1](http://arxiv.org/abs/2406.12036v1)|**[link](https://github.com/ncbi-nlp/medcalc-bench)**|\n", "2406.12034": "|**2024-06-17**|**Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts**|Junmo Kang et.al.|[2406.12034v1](http://arxiv.org/abs/2406.12034v1)|null|\n", "2406.11945": "|**2024-06-17**|**GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models**|Yi Fang et.al.|[2406.11945v1](http://arxiv.org/abs/2406.11945v1)|**[link](https://github.com/nyushcs/gaugllm)**|\n", "2406.11911": "|**2024-06-16**|**A Notion of Complexity for Theory of Mind via Discrete World Models**|X. Angelo Huang et.al.|[2406.11911v1](http://arxiv.org/abs/2406.11911v1)|**[link](https://github.com/flecart/complexity-tom-dwm)**|\n", "2406.11903": "|**2024-06-15**|**A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges**|Yuqi Nie et.al.|[2406.11903v1](http://arxiv.org/abs/2406.11903v1)|null|\n", "2406.14562": "|**2024-06-20**|**Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities**|Sachit Menon et.al.|[2406.14562v1](http://arxiv.org/abs/2406.14562v1)|null|\n", "2406.14556": "|**2024-06-21**|**Asynchronous Large Language Model Enhanced Planner for Autonomous Driving**|Yuan Chen et.al.|[2406.14556v2](http://arxiv.org/abs/2406.14556v2)|null|\n", "2406.14546": "|**2024-06-20**|**Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data**|Johannes Treutlein et.al.|[2406.14546v1](http://arxiv.org/abs/2406.14546v1)|**[link](https://github.com/choidami/inductive-oocr)**|\n", "2406.14544": "|**2024-06-20**|**Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs**|Yuxuan Qiao et.al.|[2406.14544v1](http://arxiv.org/abs/2406.14544v1)|**[link](https://github.com/sparksjoe/prism)**|\n", "2406.14425": "|**2024-06-25**|**SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages**|Gayane Ghazaryan et.al.|[2406.14425v2](http://arxiv.org/abs/2406.14425v2)|null|\n", "2406.14358": "|**2024-06-20**|**The neural correlates of logical-mathematical symbol systems processing resemble that of spatial cognition more than natural language processing**|Yuannan Li et.al.|[2406.14358v1](http://arxiv.org/abs/2406.14358v1)|null|\n", "2406.14326": "|**2024-06-20**|**medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs**|Mingyi Jia et.al.|[2406.14326v1](http://arxiv.org/abs/2406.14326v1)|null|\n", "2406.14283": "|**2024-06-27**|**Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning**|Chaojie Wang et.al.|[2406.14283v3](http://arxiv.org/abs/2406.14283v3)|null|\n", "2406.14208": "|**2024-06-20**|**SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots**|Weixing Wang et.al.|[2406.14208v1](http://arxiv.org/abs/2406.14208v1)|null|\n", "2406.14192": "|**2024-06-20**|**Timo: Towards Better Temporal Reasoning for Language Models**|Zhaochen Su et.al.|[2406.14192v1](http://arxiv.org/abs/2406.14192v1)|**[link](https://github.com/zhaochen0110/timo)**|\n", "2406.14167": "|**2024-06-20**|**Definition generation for lexical semantic change detection**|Mariia Fedorova et.al.|[2406.14167v1](http://arxiv.org/abs/2406.14167v1)|**[link](https://github.com/ltgoslo/Definition-generation-for-LSCD)**|\n", "2406.14097": "|**2024-07-01**|**Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration**|Haokun Liu et.al.|[2406.14097v2](http://arxiv.org/abs/2406.14097v2)|null|\n", "2406.13975": "|**2024-06-20**|**MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models**|Zhongshen Zeng et.al.|[2406.13975v1](http://arxiv.org/abs/2406.13975v1)|null|\n", "2406.13966": "|**2024-06-20**|**Causal Inference with Latent Variables: Recent Advances and Future Prospectives**|Yaochen Zhu et.al.|[2406.13966v1](http://arxiv.org/abs/2406.13966v1)|null|\n", "2406.13948": "|**2024-06-20**|**CityGPT: Empowering Urban Spatial Cognition of Large Language Models**|Jie Feng et.al.|[2406.13948v1](http://arxiv.org/abs/2406.13948v1)|null|\n", "2406.13947": "|**2024-06-20**|**AspirinSum: an Aspect-based utility-preserved de-identification Summarization framework**|Ya-Lun Li et.al.|[2406.13947v1](http://arxiv.org/abs/2406.13947v1)|null|\n", "2406.13894": "|**2024-06-19**|**Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events**|Mohammad Abu Tami et.al.|[2406.13894v1](http://arxiv.org/abs/2406.13894v1)|null|\n", "2406.13892": "|**2024-06-19**|**Adaptable Logical Control for Large Language Models**|Honghua Zhang et.al.|[2406.13892v1](http://arxiv.org/abs/2406.13892v1)|**[link](https://github.com/joshuacnf/Ctrl-G)**|\n", "2406.13858": "|**2024-06-19**|**Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning**|Yuval Shalev et.al.|[2406.13858v1](http://arxiv.org/abs/2406.13858v1)|null|\n", "2406.13808": "|**2024-06-27**|**Can Low-Rank Knowledge Distillation in LLMs be Useful for Microelectronic Reasoning?**|Nirjhor Rouf et.al.|[2406.13808v3](http://arxiv.org/abs/2406.13808v3)|null|\n", "2406.13805": "|**2024-06-19**|**WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia**|Yufang Hou et.al.|[2406.13805v1](http://arxiv.org/abs/2406.13805v1)|null|\n", "2406.13803": "|**2024-06-19**|**Semantic Structure-Mapping in LLM and Human Analogical Reasoning**|Sam Musker et.al.|[2406.13803v1](http://arxiv.org/abs/2406.13803v1)|**[link](https://github.com/AnonymousReview123/Semantic_Structure_Mapping_Anon)**|\n", "2406.13764": "|**2024-06-19**|**Can LLMs Reason in the Wild with Programs?**|Yuan Yang et.al.|[2406.13764v1](http://arxiv.org/abs/2406.13764v1)|**[link](https://github.com/gblackout/reason-in-the-wild)**|\n", "2406.13763": "|**2024-06-19**|**Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models**|Zhawnen Chen et.al.|[2406.13763v1](http://arxiv.org/abs/2406.13763v1)|null|\n", "2406.13621": "|**2024-06-19**|**Improving Visual Commonsense in Language Models via Multiple Image Generation**|Guy Yariv et.al.|[2406.13621v1](http://arxiv.org/abs/2406.13621v1)|**[link](https://github.com/guyyariv/vlmig)**|\n", "2406.13444": "|**2024-06-27**|**VDebugger: Harnessing Execution Feedback for Debugging Visual Programs**|Xueqing Wu et.al.|[2406.13444v2](http://arxiv.org/abs/2406.13444v2)|**[link](https://github.com/shirley-wu/vdebugger)**|\n", "2406.13439": "|**2024-06-19**|**Finding Blind Spots in Evaluator LLMs with Interpretable Checklists**|Sumanth Doddapaneni et.al.|[2406.13439v1](http://arxiv.org/abs/2406.13439v1)|**[link](https://github.com/ai4bharat/fbi)**|\n", "2406.13397": "|**2024-06-19**|**MoreHopQA: More Than Multi-hop Reasoning**|Julian Schnitzler et.al.|[2406.13397v1](http://arxiv.org/abs/2406.13397v1)|**[link](https://github.com/alab-nii/morehopqa)**|\n", "2406.13375": "|**2024-06-19**|**ALiiCE: Evaluating Positional Fine-grained Citation Generation**|Yilong Xu et.al.|[2406.13375v1](http://arxiv.org/abs/2406.13375v1)|null|\n", "2406.13269": "|**2024-06-19**|**Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets**|Lucas Druart et.al.|[2406.13269v1](http://arxiv.org/abs/2406.13269v1)|null|\n", "2406.13217": "|**2024-06-19**|**Bridging Law and Data: Augmenting Reasoning via a Semi-Structured Dataset with IRAC methodology**|Xiaoxi Kang et.al.|[2406.13217v1](http://arxiv.org/abs/2406.13217v1)|null|\n", "2406.13213": "|**2024-06-19**|**Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata**|Mykhailo Poliakov et.al.|[2406.13213v1](http://arxiv.org/abs/2406.13213v1)|**[link](https://github.com/mxpoliakov/multi-meta-rag)**|\n", "2406.13144": "|**2024-06-19**|**DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents**|Jiho Kim et.al.|[2406.13144v1](http://arxiv.org/abs/2406.13144v1)|**[link](https://github.com/jiho283/simulator)**|\n", "2406.13114": "|**2024-06-19**|**Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation**|Yuhang Zhou et.al.|[2406.13114v1](http://arxiv.org/abs/2406.13114v1)|null|\n", "2406.13049": "|**2024-06-18**|**Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study Using the TRAPD Method**|Jerson Francia et.al.|[2406.13049v1](http://arxiv.org/abs/2406.13049v1)|null|\n", "2406.12950": "|**2024-06-18**|**MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction**|Yuyan Liu et.al.|[2406.12950v1](http://arxiv.org/abs/2406.12950v1)|**[link](https://github.com/nyushcs/moleculargpt)**|\n", "2406.15109": "|**2024-06-21**|**Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network**|Badr AlKhamissi et.al.|[2406.15109v1](http://arxiv.org/abs/2406.15109v1)|**[link](https://github.com/bkhmsi/brain-language-suma)**|\n", "2406.15019": "|**2024-06-21**|**MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens**|Yongqi Fan et.al.|[2406.15019v1](http://arxiv.org/abs/2406.15019v1)|**[link](https://github.com/johnny-fans/medodyssey)**|\n", "2406.14986": "|**2024-06-21**|**Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers**|Manuel Mondal et.al.|[2406.14986v1](http://arxiv.org/abs/2406.14986v1)|null|\n", "2406.14955": "|**2024-06-21**|**ICLEval: Evaluating In-Context Learning Ability of Large Language Models**|Wentong Chen et.al.|[2406.14955v1](http://arxiv.org/abs/2406.14955v1)|**[link](https://github.com/yiye3/icleval)**|\n", "2406.14928": "|**2024-06-21**|**Autonomous Agents for Collaborative Task under Information Asymmetry**|Wei Liu et.al.|[2406.14928v1](http://arxiv.org/abs/2406.14928v1)|**[link](https://github.com/thinkwee/iAgents)**|\n", "2406.14877": "|**2024-06-21**|**Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video**|Zhengbang Yang et.al.|[2406.14877v1](http://arxiv.org/abs/2406.14877v1)|null|\n", "2406.14867": "|**2024-06-21**|**DistiLRR: Transferring Code Repair for Low-Resource Programming Languages**|Kyle Wong et.al.|[2406.14867v1](http://arxiv.org/abs/2406.14867v1)|**[link](https://github.com/kylewong288/distilrr)**|\n", "2406.14852": "|**2024-06-21**|**Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models**|Jiayu Wang et.al.|[2406.14852v1](http://arxiv.org/abs/2406.14852v1)|null|\n", "2406.14780": "|**2024-06-20**|**ACR: A Benchmark for Automatic Cohort Retrieval**|Dung Ngoc Thai et.al.|[2406.14780v1](http://arxiv.org/abs/2406.14780v1)|null|\n", "2406.14763": "|**2024-06-20**|**A Learn-Then-Reason Model Towards Generalization in Knowledge Base Question Answering**|Lingxi Zhang et.al.|[2406.14763v1](http://arxiv.org/abs/2406.14763v1)|null|\n", "2406.14737": "|**2024-06-20**|**Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?**|Zhiqiang Pi et.al.|[2406.14737v1](http://arxiv.org/abs/2406.14737v1)|null|\n", "2406.14673": "|**2024-06-20**|**Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell**|Taiming Lu et.al.|[2406.14673v1](http://arxiv.org/abs/2406.14673v1)|**[link](https://github.com/TaiMingLu/know-dont-tell)**|\n", "2406.14655": "|**2024-06-20**|**HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation**|Jin Wang et.al.|[2406.14655v1](http://arxiv.org/abs/2406.14655v1)|null|\n", "2406.16833": "|**2024-06-24**|**USDC: A Dataset of $\\underline{U}$ser $\\underline{S}$tance and $\\underline{D}$ogmatism in Long $\\underline{C}$onversations**|Mounika Marreddy et.al.|[2406.16833v1](http://arxiv.org/abs/2406.16833v1)|null|\n", "2406.16797": "|**2024-06-25**|**Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs**|Ashwinee Panda et.al.|[2406.16797v2](http://arxiv.org/abs/2406.16797v2)|**[link](https://github.com/kiddyboots216/lottery-ticket-adaptation)**|\n", "2406.16690": "|**2024-06-24**|**Scaling Laws for Linear Complexity Language Models**|Xuyang Shen et.al.|[2406.16690v1](http://arxiv.org/abs/2406.16690v1)|**[link](https://github.com/opennlplab/scalinglaws)**|\n", "2406.16655": "|**2024-06-24**|**Large Language Models Are Cross-Lingual Knowledge-Free Reasoners**|Peng Hu et.al.|[2406.16655v1](http://arxiv.org/abs/2406.16655v1)|**[link](https://github.com/NJUNLP/Knowledge-Free-Reasoning)**|\n", "2406.16620": "|**2024-06-25**|**OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer**|Lu Zhang et.al.|[2406.16620v2](http://arxiv.org/abs/2406.16620v2)|null|\n", "2406.16528": "|**2024-06-24**|**Evaluating the Ability of Large Language Models to Reason about Cardinal Directions**|Anthony G Cohn et.al.|[2406.16528v1](http://arxiv.org/abs/2406.16528v1)|null|\n", "2406.16490": "|**2024-06-24**|**eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure**|Hoorieh Sabzevari et.al.|[2406.16490v1](http://arxiv.org/abs/2406.16490v1)|**[link](https://github.com/lhoorie/SemEval2024-Task5)**|\n", "2406.16449": "|**2024-06-24**|**Evaluating and Analyzing Relationship Hallucinations in LVLMs**|Mingrui Wu et.al.|[2406.16449v1](http://arxiv.org/abs/2406.16449v1)|**[link](https://github.com/mrwu-mac/R-Bench)**|\n", "2406.16442": "|**2024-06-29**|**EmoLLM: Multimodal Emotional Understanding Meets Large Language Models**|Qu Yang et.al.|[2406.16442v2](http://arxiv.org/abs/2406.16442v2)|**[link](https://github.com/yan9qu/emollm)**|\n", "2406.16441": "|**2024-06-24**|**UniCoder: Scaling Code Large Language Model via Universal Code**|Tao Sun et.al.|[2406.16441v1](http://arxiv.org/abs/2406.16441v1)|null|\n", "2406.16308": "|**2024-06-24**|**Anomaly Detection of Tabular Data Using LLMs**|Aodong Li et.al.|[2406.16308v1](http://arxiv.org/abs/2406.16308v1)|null|\n", "2406.16176": "|**2024-06-23**|**GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets**|Qiming Wu et.al.|[2406.16176v1](http://arxiv.org/abs/2406.16176v1)|null|\n", "2406.16144": "|**2024-06-23**|**Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step**|Zezhong Wang et.al.|[2406.16144v1](http://arxiv.org/abs/2406.16144v1)|null|\n", "2406.16061": "|**2024-06-23**|**PORT: Preference Optimization on Reasoning Traces**|Salem Lahlou et.al.|[2406.16061v1](http://arxiv.org/abs/2406.16061v1)|null|\n", "2406.15992": "|**2024-06-23**|**Can LLM Graph Reasoning Generalize beyond Pattern Memorization?**|Yizhuo Zhang et.al.|[2406.15992v1](http://arxiv.org/abs/2406.15992v1)|null|\n", "2406.15877": "|**2024-06-26**|**BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions**|Terry Yue Zhuo et.al.|[2406.15877v2](http://arxiv.org/abs/2406.15877v2)|**[link](https://github.com/bigcode-project/bigcodebench)**|\n", "2406.15859": "|**2024-06-30**|**LLM-Powered Explanations: Unraveling Recommendations Through Subgraph Reasoning**|Guangsi Shi et.al.|[2406.15859v2](http://arxiv.org/abs/2406.15859v2)|null|\n", "2406.15768": "|**2024-06-22**|**MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception**|Guanqun Wang et.al.|[2406.15768v1](http://arxiv.org/abs/2406.15768v1)|null|\n", "2406.15704": "|**2024-06-22**|**video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models**|Guangzhi Sun et.al.|[2406.15704v1](http://arxiv.org/abs/2406.15704v1)|**[link](https://github.com/bytedance/salmonn)**|\n", "2406.15568": "|**2024-06-21**|**Robust Reinforcement Learning from Corrupted Human Feedback**|Alexander Bukharin et.al.|[2406.15568v1](http://arxiv.org/abs/2406.15568v1)|null|\n", "2406.15492": "|**2024-06-18**|**On the Principles behind Opinion Dynamics in Multi-Agent Systems of Large Language Models**|Pedro Cisneros-Velarde et.al.|[2406.15492v1](http://arxiv.org/abs/2406.15492v1)|null|\n", "2406.17663": "|**2024-06-25**|**LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic**|Aditya Kalyanpur et.al.|[2406.17663v1](http://arxiv.org/abs/2406.17663v1)|null|\n", "2406.17642": "|**2024-06-25**|**Banishing LLM Hallucinations Requires Rethinking Generalization**|Johnny Li et.al.|[2406.17642v1](http://arxiv.org/abs/2406.17642v1)|null|\n", "2406.17600": "|**2024-06-25**|**\"Seeing the Big through the Small\": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?**|Beiduo Chen et.al.|[2406.17600v1](http://arxiv.org/abs/2406.17600v1)|null|\n", "2406.17588": "|**2024-06-26**|**LongIns: A Challenging Long-context Instruction-based Exam for LLMs**|Shawn Gavin et.al.|[2406.17588v2](http://arxiv.org/abs/2406.17588v2)|null|\n", "2406.17574": "|**2024-06-25**|**Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats**|Ryan Pavlich et.al.|[2406.17574v1](http://arxiv.org/abs/2406.17574v1)|null|\n", "2406.17557": "|**2024-06-25**|**The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale**|Guilherme Penedo et.al.|[2406.17557v1](http://arxiv.org/abs/2406.17557v1)|null|\n", "2406.17520": "|**2024-06-25**|**Tell Me Where You Are: Multimodal LLMs Meet Place Recognition**|Zonglin Lyu et.al.|[2406.17520v1](http://arxiv.org/abs/2406.17520v1)|null|\n", "2406.17419": "|**2024-06-25**|**Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA**|Minzheng Wang et.al.|[2406.17419v1](http://arxiv.org/abs/2406.17419v1)|**[link](https://github.com/mozerwang/loong)**|\n", "2406.17304": "|**2024-06-25**|**Leveraging LLMs for Dialogue Quality Measurement**|Jinghan Jia et.al.|[2406.17304v1](http://arxiv.org/abs/2406.17304v1)|null|\n", "2406.17294": "|**2024-06-26**|**Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models**|Wenhao Shi et.al.|[2406.17294v2](http://arxiv.org/abs/2406.17294v2)|**[link](https://github.com/hzq950419/math-llava)**|\n", "2406.17271": "|**2024-06-25**|**DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph**|Zhehao Zhang et.al.|[2406.17271v1](http://arxiv.org/abs/2406.17271v1)|**[link](https://github.com/salt-nlp/darg)**|\n", "2406.17180": "|**2024-06-24**|**CogExplore: Contextual Exploration with Language-Encoded Environment Representations**|Harel Biggie et.al.|[2406.17180v1](http://arxiv.org/abs/2406.17180v1)|null|\n", "2406.17169": "|**2024-06-24**|**Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models**|Nisarg Patel et.al.|[2406.17169v1](http://arxiv.org/abs/2406.17169v1)|**[link](https://github.com/mihir3009/multi-logieval)**|\n", "2406.18521": "|**2024-06-26**|**CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs**|Zirui Wang et.al.|[2406.18521v1](http://arxiv.org/abs/2406.18521v1)|**[link](https://github.com/princeton-nlp/CharXiv)**|\n", "2406.18505": "|**2024-06-26**|**Mental Modeling of Reinforcement Learning Agents by Language Models**|Wenhao Lu et.al.|[2406.18505v1](http://arxiv.org/abs/2406.18505v1)|null|\n", "2406.18321": "|**2024-06-26**|**MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data**|Meng Fang et.al.|[2406.18321v1](http://arxiv.org/abs/2406.18321v1)|null|\n", "2406.18312": "|**2024-06-26**|**AI-native Memory: A Pathway from LLMs Towards AGI**|Jingbo Shang et.al.|[2406.18312v1](http://arxiv.org/abs/2406.18312v1)|null|\n", "2406.18200": "|**2024-06-26**|**SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding**|Zhenglin Wang et.al.|[2406.18200v1](http://arxiv.org/abs/2406.18200v1)|null|\n", "2406.18114": "|**2024-06-26**|**Knowledge Graph Enhanced Retrieval-Augmented Generation for Failure Mode and Effects Analysis**|Lukas Bahr et.al.|[2406.18114v1](http://arxiv.org/abs/2406.18114v1)|**[link](https://github.com/lukasbahr/kg-rag-fmea)**|\n", "2406.17987": "|**2024-06-26**|**Multi-step Knowledge Retrieval and Inference over Unstructured Data**|Aditya Kalyanpur et.al.|[2406.17987v1](http://arxiv.org/abs/2406.17987v1)|null|\n", "2406.17961": "|**2024-06-25**|**NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization**|Md Mahadi Hasan Nahid et.al.|[2406.17961v1](http://arxiv.org/abs/2406.17961v1)|null|\n", "2406.17873": "|**2024-06-25**|**Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback**|Zhongtao Miao et.al.|[2406.17873v1](http://arxiv.org/abs/2406.17873v1)|**[link](https://github.com/gpgg/art)**|\n", "2406.17806": "|**2024-06-22**|**MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?**|Xirui Li et.al.|[2406.17806v1](http://arxiv.org/abs/2406.17806v1)|null|\n", "2406.19392": "|**2024-07-02**|**ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos**|Jr-Jen Chen et.al.|[2406.19392v2](http://arxiv.org/abs/2406.19392v2)|**[link](https://github.com/rextime/rextime)**|\n", "2406.19292": "|**2024-06-27**|**From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data**|Zheyang Xiong et.al.|[2406.19292v1](http://arxiv.org/abs/2406.19292v1)|null|\n", "2406.19227": "|**2024-06-27**|**Aligning Teacher with Student Preferences for Tailored Training Data Generation**|Yantao Liu et.al.|[2406.19227v1](http://arxiv.org/abs/2406.19227v1)|null|\n", "2406.19121": "|**2024-06-27**|**Towards Learning Abductive Reasoning using VSA Distributed Representations**|Giacomo Camposampiero et.al.|[2406.19121v1](http://arxiv.org/abs/2406.19121v1)|**[link](https://github.com/ibm/abductive-rule-learner-with-context-awareness)**|\n", "2406.19065": "|**2024-06-27**|**STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis**|Wenbin Li et.al.|[2406.19065v1](http://arxiv.org/abs/2406.19065v1)|**[link](https://github.com/lwbxc/stbench)**|\n", "2406.18966": "|**2024-06-28**|**UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models**|Siyuan Wu et.al.|[2406.18966v2](http://arxiv.org/abs/2406.18966v2)|**[link](https://github.com/howiehwong/unigen)**|\n", "2406.18839": "|**2024-06-27**|**Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA**|Elham J. Barezi et.al.|[2406.18839v1](http://arxiv.org/abs/2406.18839v1)|null|\n", "2406.18762": "|**2024-06-26**|**Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism**|Shi Zong et.al.|[2406.18762v1](http://arxiv.org/abs/2406.18762v1)|null|\n", "2406.18746": "|**2024-06-26**|**Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models**|Georgios Tziafas et.al.|[2406.18746v1](http://arxiv.org/abs/2406.18746v1)|null|\n", "2406.18722": "|**2024-07-01**|**Towards Open-World Grasping with Large Vision-Language Models**|Georgios Tziafas et.al.|[2406.18722v2](http://arxiv.org/abs/2406.18722v2)|null|\n", "2406.18695": "|**2024-06-26**|**Learning to Correct for QA Reasoning with Black-box LLMs**|Jaehyung Kim et.al.|[2406.18695v1](http://arxiv.org/abs/2406.18695v1)|**[link](https://github.com/bbuing9/cobb)**|\n", "2406.18676": "|**2024-06-26**|**Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation**|Guanting Dong et.al.|[2406.18676v1](http://arxiv.org/abs/2406.18676v1)|**[link](https://github.com/dongguanting/dpa-rag)**|\n", "2406.18629": "|**2024-06-26**|**Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs**|Xin Lai et.al.|[2406.18629v1](http://arxiv.org/abs/2406.18629v1)|**[link](https://github.com/dvlab-research/step-dpo)**|\n", "2406.18626": "|**2024-06-26**|**An LLM-based Knowledge Synthesis and Scientific Reasoning Framework for Biomedical Discovery**|Oskar Wysocki et.al.|[2406.18626v1](http://arxiv.org/abs/2406.18626v1)|null|\n", "2406.20095": "|**2024-06-28**|**LLaRA: Supercharging Robot Learning Data for Vision-Language Policy**|Xiang Li et.al.|[2406.20095v1](http://arxiv.org/abs/2406.20095v1)|**[link](https://github.com/lostxine/llara)**|\n", "2406.20094": "|**2024-06-28**|**Scaling Synthetic Data Creation with 1,000,000,000 Personas**|Xin Chan et.al.|[2406.20094v1](http://arxiv.org/abs/2406.20094v1)|**[link](https://github.com/tencent-ailab/persona-hub)**|\n", "2406.20085": "|**2024-06-28**|**Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language**|Yicheng Chen et.al.|[2406.20085v1](http://arxiv.org/abs/2406.20085v1)|null|\n", "2406.20041": "|**2024-07-02**|**BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration**|Noel Crawford et.al.|[2406.20041v3](http://arxiv.org/abs/2406.20041v3)|null|\n", "2406.20015": "|**2024-06-28**|**ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models**|Yuxiang Zhang et.al.|[2406.20015v1](http://arxiv.org/abs/2406.20015v1)|**[link](https://github.com/toolbehonest/toolbehonest)**|\n", "2406.19967": "|**2024-06-28**|**Into the Unknown: Generating Geospatial Descriptions for New Environments**|Tzuf Paz-Argaman et.al.|[2406.19967v1](http://arxiv.org/abs/2406.19967v1)|null|\n", "2406.19820": "|**2024-06-28**|**BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering**|Zheng Chu et.al.|[2406.19820v1](http://arxiv.org/abs/2406.19820v1)|null|\n", "2406.19764": "|**2024-06-28**|**Belief Revision: The Adaptability of Large Language Models Reasoning**|Bryan Wilie et.al.|[2406.19764v1](http://arxiv.org/abs/2406.19764v1)|null|\n", "2406.19741": "|**2024-07-02**|**ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning**|Christopher E. Mower et.al.|[2406.19741v2](http://arxiv.org/abs/2406.19741v2)|**[link](https://github.com/huawei-noah/hebo)**|\n", "2406.19693": "|**2024-06-28**|**MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?**|Jinming Li et.al.|[2406.19693v1](http://arxiv.org/abs/2406.19693v1)|null|\n", "2406.19552": "|**2024-06-27**|**Rethinking harmless refusals when fine-tuning foundation models**|Florin Pop et.al.|[2406.19552v1](http://arxiv.org/abs/2406.19552v1)|null|\n", "2406.19545": "|**2024-06-27**|**Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations**|Ritam Dutt et.al.|[2406.19545v1](http://arxiv.org/abs/2406.19545v1)|**[link](https://github.com/shorit/ratdial)**|\n", "2406.19538": "|**2024-06-27**|**Context Matters: An Empirical Study of the Impact of Contextual Information in Temporal Question Answering Systems**|Dan Schumacher et.al.|[2406.19538v1](http://arxiv.org/abs/2406.19538v1)|null|\n", "2406.19528": "|**2024-07-04**|**Using Large Language Models to Assist Video Content Analysis: An Exploratory Study of Short Videos on Depression**|Jiaying Liu et.al.|[2406.19528v2](http://arxiv.org/abs/2406.19528v2)|null|\n", "2406.19502": "|**2024-06-27**|**Investigating How Large Language Models Leverage Internal Knowledge to Perform Complex Reasoning**|Miyoung Ko et.al.|[2406.19502v1](http://arxiv.org/abs/2406.19502v1)|**[link](https://github.com/kaistai/knowledge-reasoning)**|\n", "2407.03211": "|**2024-07-03**|**How Does Quantization Affect Multilingual LLMs?**|Kelly Marchisio et.al.|[2407.03211v1](http://arxiv.org/abs/2407.03211v1)|null|\n", "2407.03203": "|**2024-07-03**|**TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts**|Ruida Wang et.al.|[2407.03203v1](http://arxiv.org/abs/2407.03203v1)|**[link](https://github.com/RickySkywalker/TheoremLlama)**|\n", "2407.03181": "|**2024-07-03**|**Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models**|Haritz Puerto et.al.|[2407.03181v1](http://arxiv.org/abs/2407.03181v1)|**[link](https://github.com/ukplab/arxiv2024-divergent-cot)**|\n", "2407.03169": "|**2024-07-03**|**Investigating Decoder-only Large Language Models for Speech-to-text Translation**|Chao-Wei Huang et.al.|[2407.03169v1](http://arxiv.org/abs/2407.03169v1)|null|\n", "2407.03129": "|**2024-07-03**|**Social Bias Evaluation for Large Language Models Requires Prompt Variations**|Rem Hida et.al.|[2407.03129v1](http://arxiv.org/abs/2407.03129v1)|**[link](https://github.com/rem-h4/llm_socialbias_prompts)**|\n", "2407.03061": "|**2024-07-03**|**ALTER: Augmentation for Large-Table-Based Reasoning**|Han Zhang et.al.|[2407.03061v1](http://arxiv.org/abs/2407.03061v1)|**[link](https://github.com/Hanzhang-lang/ALTER)**|\n", "2407.03008": "|**2024-07-03**|**Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering**|Zhaohe Liao et.al.|[2407.03008v1](http://arxiv.org/abs/2407.03008v1)|null|\n", "2407.03004": "|**2024-07-03**|**SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research**|Meghal Dani et.al.|[2407.03004v1](http://arxiv.org/abs/2407.03004v1)|null|\n", "2407.02977": "|**2024-07-03**|**Large Language Models as Evaluators for Scientific Synthesis**|Julia Evans et.al.|[2407.02977v1](http://arxiv.org/abs/2407.02977v1)|null|\n", "2407.02964": "|**2024-07-03**|**FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering**|Xiaochen Wang et.al.|[2407.02964v1](http://arxiv.org/abs/2407.02964v1)|null|\n", "2407.02936": "|**2024-07-03**|**GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models**|Zike Yuan et.al.|[2407.02936v1](http://arxiv.org/abs/2407.02936v1)|**[link](https://github.com/zikeyuan/gracore)**|\n", "2407.02833": "|**2024-07-03**|**LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation**|Hongke Zhao et.al.|[2407.02833v1](http://arxiv.org/abs/2407.02833v1)|null|\n", "2407.02678": "|**2024-07-02**|**Reasoning in Large Language Models: A Geometric Perspective**|Romain Cosentino et.al.|[2407.02678v1](http://arxiv.org/abs/2407.02678v1)|null|\n", "2407.02606": "|**2024-07-02**|**An AI-Based System Utilizing IoT-Enabled Ambient Sensors and LLMs for Complex Activity Tracking**|Yuan Sun et.al.|[2407.02606v1](http://arxiv.org/abs/2407.02606v1)|null|\n", "2407.02473": "|**2024-07-02**|**Open Scene Graphs for Open World Object-Goal Navigation**|Joel Loo et.al.|[2407.02473v1](http://arxiv.org/abs/2407.02473v1)|null|\n", "2407.02392": "|**2024-07-02**|**TokenPacker: Efficient Visual Projector for Multimodal LLM**|Wentong Li et.al.|[2407.02392v1](http://arxiv.org/abs/2407.02392v1)|**[link](https://github.com/circleradon/tokenpacker)**|\n", "2407.02351": "|**2024-07-02**|**Generative Large Language Models in Automated Fact-Checking: A Survey**|Ivan Vykopal et.al.|[2407.02351v1](http://arxiv.org/abs/2407.02351v1)|null|\n", "2407.02340": "|**2024-07-02**|**RVISA: Reasoning and Verification for Implicit Sentiment Analysis**|Wenna Lai et.al.|[2407.02340v1](http://arxiv.org/abs/2407.02340v1)|null|\n", "2407.02310": "|**2024-07-02**|**Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks**|Adrian Rebmann et.al.|[2407.02310v1](http://arxiv.org/abs/2407.02310v1)|**[link](https://github.com/a-rebmann/llms4pm)**|\n", "2407.02273": "|**2024-07-02**|**Multilingual Trolley Problems for Language Models**|Zhijing Jin et.al.|[2407.02273v1](http://arxiv.org/abs/2407.02273v1)|**[link](https://github.com/causalNLP/moralmachine)**|\n", "2407.02220": "|**2024-07-04**|**Embodied AI in Mobile Robots: Coverage Path Planning with Large Language Models**|Xiangrui Kong et.al.|[2407.02220v2](http://arxiv.org/abs/2407.02220v2)|null|\n", "2407.02203": "|**2024-07-02**|**Automatic Adaptation Rule Optimization via Large Language Models**|Yusei Ishimizu et.al.|[2407.02203v1](http://arxiv.org/abs/2407.02203v1)|null|\n", "2407.01992": "|**2024-07-02**|**Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?**|Nishant Balepur et.al.|[2407.01992v1](http://arxiv.org/abs/2407.01992v1)|null|\n", "2407.01964": "|**2024-07-04**|**Enabling Discriminative Reasoning in LLMs for Legal Judgment Prediction**|Chenlong Deng et.al.|[2407.01964v3](http://arxiv.org/abs/2407.01964v3)|**[link](https://github.com/chenlongdeng/adapt)**|\n", "2407.01942": "|**2024-07-02**|**Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness**|Khyathi Raghavi Chandu et.al.|[2407.01942v1](http://arxiv.org/abs/2407.01942v1)|null|\n", "2407.01892": "|**2024-07-02**|**GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning**|Zhisheng Tang et.al.|[2407.01892v1](http://arxiv.org/abs/2407.01892v1)|**[link](https://github.com/jasontangzs0/GRASP)**|\n", "2407.01725": "|**2024-07-01**|**DiscoveryBench: Towards Data-Driven Discovery with Large Language Models**|Bodhisattwa Prasad Majumder et.al.|[2407.01725v1](http://arxiv.org/abs/2407.01725v1)|**[link](https://github.com/allenai/discoverybench)**|\n", "2407.01687": "|**2024-07-01**|**Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning**|Akshara Prabhakar et.al.|[2407.01687v1](http://arxiv.org/abs/2407.01687v1)|**[link](https://github.com/aksh555/deciphering_cot)**|\n", "2407.01527": "|**2024-07-01**|**KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches**|Jiayi Yuan et.al.|[2407.01527v1](http://arxiv.org/abs/2407.01527v1)|null|\n", "2407.01525": "|**2024-07-02**|**Empowering 3D Visual Grounding with Reasoning Capabilities**|Chenming Zhu et.al.|[2407.01525v2](http://arxiv.org/abs/2407.01525v2)|null|\n", "2407.01455": "|**2024-07-01**|**TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models' Theory-of-Mind**|Guiyang Hou et.al.|[2407.01455v1](http://arxiv.org/abs/2407.01455v1)|null|\n", "2407.01231": "|**2024-07-01**|**MIRAI: Evaluating LLM Agents for Event Forecasting**|Chenchen Ye et.al.|[2407.01231v1](http://arxiv.org/abs/2407.01231v1)|null|\n", "2407.01212": "|**2024-07-01**|**EconNLI: Evaluating Large Language Models on Economics Reasoning**|Yue Guo et.al.|[2407.01212v1](http://arxiv.org/abs/2407.01212v1)|**[link](https://github.com/irenehere/econnli)**|\n", "2407.01093": "|**2024-07-01**|**IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation**|Senyu Han et.al.|[2407.01093v1](http://arxiv.org/abs/2407.01093v1)|**[link](https://github.com/OpenDFM/ibsen)**|\n", "2407.01046": "|**2024-07-03**|**FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models**|Yiyuan Li et.al.|[2407.01046v2](http://arxiv.org/abs/2407.01046v2)|**[link](https://github.com/nativeatom/frog)**|\n", "2407.01009": "|**2024-07-01**|**DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models**|Jiabao Pan et.al.|[2407.01009v1](http://arxiv.org/abs/2407.01009v1)|null|\n", "2407.00995": "|**2024-07-01**|**Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense**|Yi Yu et.al.|[2407.00995v1](http://arxiv.org/abs/2407.00995v1)|null|\n", "2407.00993": "|**2024-07-01**|**Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents**|Shihan Deng et.al.|[2407.00993v1](http://arxiv.org/abs/2407.00993v1)|null|\n", "2407.00959": "|**2024-07-01**|**Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving**|Ran Tian et.al.|[2407.00959v1](http://arxiv.org/abs/2407.00959v1)|null|\n", "2407.00938": "|**2024-07-01**|**MalAlgoQA: A Pedagogical Approach for Evaluating Counterfactual Reasoning Abilities**|Naiming Liu et.al.|[2407.00938v1](http://arxiv.org/abs/2407.00938v1)|null|\n", "2407.00900": "|**2024-07-01**|**MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula**|Shubhra Mishra et.al.|[2407.00900v1](http://arxiv.org/abs/2407.00900v1)|**[link](https://github.com/gpoesia/mathcamps)**|\n", "2407.00869": "|**2024-07-01**|**Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks**|Yue Zhou et.al.|[2407.00869v1](http://arxiv.org/abs/2407.00869v1)|null|\n", "2407.00782": "|**2024-07-02**|**Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning**|Zimu Lu et.al.|[2407.00782v2](http://arxiv.org/abs/2407.00782v2)|**[link](https://github.com/mathllm/Step-Controlled_DPO)**|\n", "2407.00653": "|**2024-06-30**|**Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs**|Yifei Zhang et.al.|[2407.00653v1](http://arxiv.org/abs/2407.00653v1)|null|\n", "2407.00497": "|**2024-06-29**|**LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement**|Jiahao Ying et.al.|[2407.00497v1](http://arxiv.org/abs/2407.00497v1)|null|\n", "2407.00468": "|**2024-06-29**|**MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation**|Jinsheng Huang et.al.|[2407.00468v1](http://arxiv.org/abs/2407.00468v1)|**[link](https://github.com/chenllliang/mmevalpro)**|\n", "2407.00416": "|**2024-06-29**|**Too Late to Train, Too Early To Use? A Study on Necessity and Viability of Low-Resource Bengali LLMs**|Tamzeed Mahfuz et.al.|[2407.00416v1](http://arxiv.org/abs/2407.00416v1)|null|\n", "2407.00390": "|**2024-06-29**|**Advancing Process Verification for Large Language Models via Tree-Based Preference Learning**|Mingqian He et.al.|[2407.00390v1](http://arxiv.org/abs/2407.00390v1)|null|\n", "2407.00219": "|**2024-06-28**|**Evaluating Human Alignment and Model Faithfulness of LLM Rationale**|Mohsen Fayyaz et.al.|[2407.00219v1](http://arxiv.org/abs/2407.00219v1)|null|\n", "2407.00118": "|**2024-06-27**|**From Efficient Multimodal Models to World Models: A Survey**|Xinji Mai et.al.|[2407.00118v1](http://arxiv.org/abs/2407.00118v1)|null|\n", "2407.00092": "|**2024-06-26**|**Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges**|Mohammed Elhenawy et.al.|[2407.00092v1](http://arxiv.org/abs/2407.00092v1)|null|\n", "2407.04622": "|**2024-07-12**|**On scalable oversight with weak LLMs judging strong LLMs**|Zachary Kenton et.al.|[2407.04622v2](http://arxiv.org/abs/2407.04622v2)|null|\n", "2407.04489": "|**2024-07-05**|**Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model**|Duy M. H. Nguyen et.al.|[2407.04489v1](http://arxiv.org/abs/2407.04489v1)|null|\n", "2407.04420": "|**2024-07-05**|**cosmosage: A Natural-Language Assistant for Cosmologists**|Tijmen de Haan et.al.|[2407.04420v1](http://arxiv.org/abs/2407.04420v1)|**[link](https://github.com/tijmen/cosmosage)**|\n", "2407.04363": "|**2024-07-05**|**AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents**|Petr Anokhin et.al.|[2407.04363v1](http://arxiv.org/abs/2407.04363v1)|**[link](https://github.com/airi-institute/arigraph)**|\n", "2407.04362": "|**2024-07-05**|**Towards Context-aware Support for Color Vision Deficiency: An Approach Integrating LLM and AR**|Shogo Morita et.al.|[2407.04362v1](http://arxiv.org/abs/2407.04362v1)|null|\n", "2407.04281": "|**2024-07-05**|**WOMD-Reasoning: A Large-Scale Language Dataset for Interaction and Driving Intentions Reasoning**|Yiheng Li et.al.|[2407.04281v1](http://arxiv.org/abs/2407.04281v1)|null|\n", "2407.04078": "|**2024-07-09**|**DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning**|Chengpeng Li et.al.|[2407.04078v2](http://arxiv.org/abs/2407.04078v2)|**[link](https://github.com/chengpengli1003/dotamath)**|\n", "2407.04067": "|**2024-07-04**|**Semantic Graphs for Syntactic Simplification: A Revisit from the Age of LLM**|Peiran Yao et.al.|[2407.04067v1](http://arxiv.org/abs/2407.04067v1)|**[link](https://github.com/U-Alberta/AMRS3)**|\n", "2407.03993": "|**2024-07-04**|**A Survey on Natural Language Counterfactual Generation**|Yongjie Wang et.al.|[2407.03993v1](http://arxiv.org/abs/2407.03993v1)|null|\n", "2407.03913": "|**2024-07-04**|**MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices**|Jiayi Zhang et.al.|[2407.03913v1](http://arxiv.org/abs/2407.03913v1)|null|\n", "2407.03778": "|**2024-07-04**|**From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI**|Stefanie Krause et.al.|[2407.03778v1](http://arxiv.org/abs/2407.03778v1)|null|\n", "2407.03687": "|**2024-07-04**|**STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering**|Zhenyu Bi et.al.|[2407.03687v1](http://arxiv.org/abs/2407.03687v1)|null|\n", "2407.03678": "|**2024-07-04**|**Improving Self Consistency in LLMs through Probabilistic Tokenization**|Ashutosh Sathe et.al.|[2407.03678v1](http://arxiv.org/abs/2407.03678v1)|null|\n", "2407.03651": "|**2024-07-14**|**Evaluating Language Model Context Windows: A \"Working Memory\" Test and Inference-time Correction**|Amanda Dsouza et.al.|[2407.03651v2](http://arxiv.org/abs/2407.03651v2)|**[link](https://github.com/snorkel-ai/long-context-eval)**|\n", "2407.03615": "|**2024-07-04**|**Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models**|Chang-Sheng Kao et.al.|[2407.03615v1](http://arxiv.org/abs/2407.03615v1)|**[link](https://github.com/MiuLab/VisualDialog)**|\n", "2407.03525": "|**2024-07-03**|**UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization**|Md Nayem Uddin et.al.|[2407.03525v1](http://arxiv.org/abs/2407.03525v1)|null|\n", "2407.03453": "|**2024-07-03**|**On Large Language Models in National Security Applications**|William N. Caballero et.al.|[2407.03453v1](http://arxiv.org/abs/2407.03453v1)|null|\n", "2407.07403": "|**2024-07-12**|**A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends**|Daizong Liu et.al.|[2407.07403v2](http://arxiv.org/abs/2407.07403v2)|**[link](https://github.com/liudaizong/awesome-lvlm-attack)**|\n", "2407.07370": "|**2024-07-10**|**LokiLM: Technical Report**|Justin Kiefel et.al.|[2407.07370v1](http://arxiv.org/abs/2407.07370v1)|null|\n", "2407.07330": "|**2024-07-10**|**Interpretable Differential Diagnosis with Dual-Inference Large Language Models**|Shuang Zhou et.al.|[2407.07330v1](http://arxiv.org/abs/2407.07330v1)|null|\n", "2407.07053": "|**2024-07-10**|**Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model**|Wenqi Zhang et.al.|[2407.07053v2](http://arxiv.org/abs/2407.07053v2)|**[link](https://github.com/zwq2018/multi-modal-self-instruct)**|\n", "2407.06902": "|**2024-07-09**|**Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective**|Shahana Ibrahim et.al.|[2407.06902v1](http://arxiv.org/abs/2407.06902v1)|null|\n", "2407.06438": "|**2024-07-08**|**A Single Transformer for Scalable Vision-Language Modeling**|Yangyi Chen et.al.|[2407.06438v1](http://arxiv.org/abs/2407.06438v1)|**[link](https://github.com/yangyi-chen/solo)**|\n", "2407.06309": "|**2024-07-08**|**Multimodal Chain-of-Thought Reasoning via ChatGPT to Protect Children from Age-Inappropriate Apps**|Chuanbo Hu et.al.|[2407.06309v1](http://arxiv.org/abs/2407.06309v1)|null|\n", "2407.06189": "|**2024-07-08**|**Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision**|Orr Zohar et.al.|[2407.06189v1](http://arxiv.org/abs/2407.06189v1)|**[link](https://github.com/orrzohar/Video-STaR)**|\n", "2407.06249": "|**2024-07-08**|**CodeUpdateArena: Benchmarking Knowledge Editing on API Updates**|Zeyu Leo Liu et.al.|[2407.06249v1](http://arxiv.org/abs/2407.06249v1)|null|\n", "2407.06025": "|**2024-07-08**|**iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement**|Aoyu Pang et.al.|[2407.06025v1](http://arxiv.org/abs/2407.06025v1)|**[link](https://github.com/traffic-alpha/illm-tsc)**|\n", "2407.06023": "|**2024-07-09**|**Distilling System 2 into System 1**|Ping Yu et.al.|[2407.06023v2](http://arxiv.org/abs/2407.06023v2)|null|\n", "2407.05925": "|**2024-07-08**|**Towards Optimizing and Evaluating a Retrieval Augmented QA Chatbot using LLMs with Human in the Loop**|Anum Afzal et.al.|[2407.05925v1](http://arxiv.org/abs/2407.05925v1)|null|\n", "2407.05778": "|**2024-07-08**|**When is the consistent prediction likely to be a correct prediction?**|Alex Nguyen et.al.|[2407.05778v1](http://arxiv.org/abs/2407.05778v1)|null|\n", "2407.05750": "|**2024-07-08**|**Large Language Models Understand Layouts**|Weiming Li et.al.|[2407.05750v1](http://arxiv.org/abs/2407.05750v1)|null|\n", "2407.05734": "|**2024-07-08**|**Empirical Study of Symmetrical Reasoning in Conversational Chatbots**|Daniela N. Rim et.al.|[2407.05734v1](http://arxiv.org/abs/2407.05734v1)|null|\n", "2407.05682": "|**2024-07-08**|**Retrieved In-Context Principles from Previous Mistakes**|Hao Sun et.al.|[2407.05682v1](http://arxiv.org/abs/2407.05682v1)|null|\n", "2407.06241": "|**2024-07-08**|**SimPal: Towards a Meta-Conversational Framework to Understand Teacher's Instructional Goals for K-12 Physics**|Effat Farhana et.al.|[2407.06241v1](http://arxiv.org/abs/2407.06241v1)|null|\n", "2407.05463": "|**2024-07-07**|**Training Task Experts through Retrieval Based Distillation**|Jiaxin Ge et.al.|[2407.05463v1](http://arxiv.org/abs/2407.05463v1)|null|\n", "2407.05434": "|**2024-07-07**|**LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models**|Weizhi Tang et.al.|[2407.05434v1](http://arxiv.org/abs/2407.05434v1)|**[link](https://github.com/rutatang/ltlbench)**|\n", "2407.05413": "|**2024-07-10**|**SBoRA: Low-Rank Adaptation with Regional Weight Updates**|Lai-Man Po et.al.|[2407.05413v2](http://arxiv.org/abs/2407.05413v2)|**[link](https://github.com/cityuhkai/sbora)**|\n", "2407.05365": "|**2024-07-07**|**ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models**|Xiyuan Zhou et.al.|[2407.05365v1](http://arxiv.org/abs/2407.05365v1)|**[link](https://github.com/xiyuan-zhou/elecbench-a-power-dispatch-evaluation-benchmark-for-large-language-models)**|\n", "2407.05355": "|**2024-07-07**|**VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool**|Yan Wang et.al.|[2407.05355v1](http://arxiv.org/abs/2407.05355v1)|null|\n", "2407.05291": "|**2024-07-07**|**WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks**|L\u00e9o Boisvert et.al.|[2407.05291v1](http://arxiv.org/abs/2407.05291v1)|**[link](https://github.com/servicenow/workarena)**|\n", "2407.05271": "|**2024-07-07**|**Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions**|Zhiwen You et.al.|[2407.05271v1](http://arxiv.org/abs/2407.05271v1)|**[link](https://github.com/zhiwenyou103/Beyond-Binary-Gender-Labels)**|\n", "2407.05153": "|**2024-07-06**|**Lucy: Think and Reason to Solve Text-to-SQL**|Nina Narodytska et.al.|[2407.05153v1](http://arxiv.org/abs/2407.05153v1)|null|\n", "2407.05134": "|**2024-07-06**|**Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?**|Kuei-Chun Kao et.al.|[2407.05134v1](http://arxiv.org/abs/2407.05134v1)|null|\n", "2407.05013": "|**2024-07-06**|**Progress or Regress? Self-Improvement Reversal in Post-training**|Ting Wu et.al.|[2407.05013v1](http://arxiv.org/abs/2407.05013v1)|null|\n", "2407.04973": "|**2024-07-06**|**LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts**|Yijia Xiao et.al.|[2407.04973v1](http://arxiv.org/abs/2407.04973v1)|**[link](https://github.com/yijia-xiao/logicvista)**|\n", "2407.04960": "|**2024-07-06**|**MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models**|Yunjia Xi et.al.|[2407.04960v1](http://arxiv.org/abs/2407.04960v1)|**[link](https://github.com/mindspore-lab/models)**|\n", "2407.04915": "|**2024-07-06**|**Safe Generative Chats in a WhatsApp Intelligent Tutoring System**|Zachary Levonian et.al.|[2407.04915v1](http://arxiv.org/abs/2407.04915v1)|null|\n", "2407.04899": "|**2024-07-06**|**Algorithmic Language Models with Neurally Compiled Libraries**|Lucas Saldyt et.al.|[2407.04899v1](http://arxiv.org/abs/2407.04899v1)|null|\n", "2407.08739": "|**2024-07-11**|**MAVIS: Mathematical Visual Instruction Tuning**|Renrui Zhang et.al.|[2407.08739v1](http://arxiv.org/abs/2407.08739v1)|**[link](https://github.com/zrrskywalker/mavis)**|\n", "2407.08735": "|**2024-07-11**|**Real-Time Anomaly Detection and Reactive Planning with Large Language Models**|Rohan Sinha et.al.|[2407.08735v1](http://arxiv.org/abs/2407.08735v1)|null|\n", "2407.08733": "|**2024-07-11**|**Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist**|Zihao Zhou et.al.|[2407.08733v1](http://arxiv.org/abs/2407.08733v1)|null|\n", "2407.08713": "|**2024-07-11**|**GTA: A Benchmark for General Tool Agents**|Jize Wang et.al.|[2407.08713v1](http://arxiv.org/abs/2407.08713v1)|**[link](https://github.com/open-compass/GTA)**|\n", "2407.08694": "|**2024-07-11**|**Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight**|Zhiqiang Xie et.al.|[2407.08694v1](http://arxiv.org/abs/2407.08694v1)|null|\n", "2407.08521": "|**2024-07-15**|**Emergent Visual-Semantic Hierarchies in Image-Text Representations**|Morris Alper et.al.|[2407.08521v2](http://arxiv.org/abs/2407.08521v2)|null|\n", "2407.08516": "|**2024-07-16**|**Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents**|Haoyi Xiong et.al.|[2407.08516v2](http://arxiv.org/abs/2407.08516v2)|null|\n", "2407.08495": "|**2024-07-11**|**Investigating LLMs as Voting Assistants via Contextual Augmentation: A Case Study on the European Parliament Elections 2024**|Ilias Chalkidis et.al.|[2407.08495v1](http://arxiv.org/abs/2407.08495v1)|null|\n", "2407.08488": "|**2024-07-11**|**Lynx: An Open Source Hallucination Evaluation Model**|Selvan Sunitha Ravi et.al.|[2407.08488v1](http://arxiv.org/abs/2407.08488v1)|null|\n", "2407.08348": "|**2024-07-17**|**Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On**|Liang Zeng et.al.|[2407.08348v2](http://arxiv.org/abs/2407.08348v2)|null|\n", "2407.08273": "|**2024-07-12**|**RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL**|Zhenhe Wu et.al.|[2407.08273v2](http://arxiv.org/abs/2407.08273v2)|null|\n", "2407.08150": "|**2024-07-16**|**Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding**|Minghui Wu et.al.|[2407.08150v2](http://arxiv.org/abs/2407.08150v2)|null|\n", "2407.08044": "|**2024-07-10**|**RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization**|Xijie Huang et.al.|[2407.08044v1](http://arxiv.org/abs/2407.08044v1)|**[link](https://github.com/huangowen/rolora)**|\n", "2407.08029": "|**2024-07-10**|**A Critical Review of Causal Reasoning Benchmarks for Large Language Models**|Linying Yang et.al.|[2407.08029v1](http://arxiv.org/abs/2407.08029v1)|null|\n", "2407.07913": "|**2024-07-04**|**CaseGPT: a case reasoning framework based on language models and retrieval-augmented generation**|Rui Yang et.al.|[2407.07913v1](http://arxiv.org/abs/2407.07913v1)|null|\n", "2407.09295": "|**2024-07-17**|**Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study**|Yulong Yang et.al.|[2407.09295v2](http://arxiv.org/abs/2407.09295v2)|null|\n", "2407.09292": "|**2024-07-17**|**Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models**|Dong Shu et.al.|[2407.09292v2](http://arxiv.org/abs/2407.09292v2)|null|\n", "2407.09281": "|**2024-07-12**|**Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning**|Thuy Ngoc Nguyen et.al.|[2407.09281v1](http://arxiv.org/abs/2407.09281v1)|null|\n", "2407.09136": "|**2024-07-12**|**Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors**|Nico Daheim et.al.|[2407.09136v1](http://arxiv.org/abs/2407.09136v1)|**[link](https://github.com/eth-lre/verify-then-generate)**|\n", "2407.09096": "|**2024-07-12**|**STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs**|Yiheng Huang et.al.|[2407.09096v1](http://arxiv.org/abs/2407.09096v1)|null|\n", "2407.09025": "|**2024-07-12**|**SpreadsheetLLM: Encoding Spreadsheets for Large Language Models**|Yuzhang Tian et.al.|[2407.09025v1](http://arxiv.org/abs/2407.09025v1)|null|\n", "2407.08922": "|**2024-07-12**|**Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?**|Yingming Pu et.al.|[2407.08922v1](http://arxiv.org/abs/2407.08922v1)|**[link](https://github.com/Dandelionym/llm_for_mechanisms)**|\n", "2407.08842": "|**2024-07-11**|**Evaluating Nuanced Bias in Large Language Model Free Response Answers**|Jennifer Healey et.al.|[2407.08842v1](http://arxiv.org/abs/2407.08842v1)|null|\n", "2407.10947": "|**2024-07-15**|**Can Textual Semantics Mitigate Sounding Object Segmentation Preference?**|Yaoting Wang et.al.|[2407.10947v1](http://arxiv.org/abs/2407.10947v1)|**[link](https://github.com/gewu-lab/sounding-object-segmentation-preference)**|\n", "2407.10805": "|**2024-07-15**|**Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval**|Shengjie Ma et.al.|[2407.10805v1](http://arxiv.org/abs/2407.10805v1)|null|\n", "2407.10795": "|**2024-07-15**|**Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping**|Wenhao Zhu et.al.|[2407.10795v1](http://arxiv.org/abs/2407.10795v1)|**[link](https://github.com/njunlp/skiplayercd)**|\n", "2407.10794": "|**2024-07-15**|**Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education**|Rui Yang et.al.|[2407.10794v1](http://arxiv.org/abs/2407.10794v1)|**[link](https://github.com/irenezihuili/cgprompt)**|\n", "2407.10718": "|**2024-07-16**|**Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning**|Yulong Wang et.al.|[2407.10718v2](http://arxiv.org/abs/2407.10718v2)|**[link](https://github.com/ag2s1/sibyl-system)**|\n", "2407.10671": "|**2024-07-18**|**Qwen2 Technical Report**|An Yang et.al.|[2407.10671v3](http://arxiv.org/abs/2407.10671v3)|**[link](https://github.com/qwenlm/qwen2)**|\n", "2407.10362": "|**2024-07-17**|**LAB-Bench: Measuring Capabilities of Language Models for Biology Research**|Jon M. Laurent et.al.|[2407.10362v3](http://arxiv.org/abs/2407.10362v3)|null|\n", "2407.10299": "|**2024-07-20**|**Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models**|Yuchen Yang et.al.|[2407.10299v2](http://arxiv.org/abs/2407.10299v2)|**[link](https://github.com/Yuchen413/AnomalyRuler)**|\n", "2407.10245": "|**2024-07-14**|**GenSco: Can Question Decomposition based Passage Alignment improve Question Answering?**|Barah Fazili et.al.|[2407.10245v1](http://arxiv.org/abs/2407.10245v1)|null|\n", "2407.10241": "|**2024-07-20**|**BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs**|Zhiting Fan et.al.|[2407.10241v2](http://arxiv.org/abs/2407.10241v2)|null|\n", "2407.10167": "|**2024-07-22**|**Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model**|Xunyu Zhu et.al.|[2407.10167v2](http://arxiv.org/abs/2407.10167v2)|null|\n", "2407.10162": "|**2024-07-14**|**ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning**|Zhongsheng Wang et.al.|[2407.10162v1](http://arxiv.org/abs/2407.10162v1)|**[link](https://github.com/strong-ai-lab/chatlogic)**|\n", "2407.10086": "|**2024-07-19**|**Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine**|Omid Rohanian et.al.|[2407.10086v2](http://arxiv.org/abs/2407.10086v2)|null|\n", "2407.10081": "|**2024-07-14**|**All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era**|Bo Chen et.al.|[2407.10081v1](http://arxiv.org/abs/2407.10081v1)|null|\n", "2407.09887": "|**2024-07-13**|**Benchmarking LLMs for Optimization Modeling and Enhancing Reasoning via Reverse Socratic Synthesis**|Zhicheng Yang et.al.|[2407.09887v1](http://arxiv.org/abs/2407.09887v1)|**[link](https://github.com/yangzhch6/ReSocratic)**|\n", "2407.09801": "|**2024-07-13**|**IoT-LM: Large Multisensory Language Models for the Internet of Things**|Shentong Mo et.al.|[2407.09801v1](http://arxiv.org/abs/2407.09801v1)|**[link](https://github.com/multi-iot/multiiot)**|\n", "2407.11963": "|**2024-07-16**|**NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?**|Mo Li et.al.|[2407.11963v1](http://arxiv.org/abs/2407.11963v1)|**[link](https://github.com/open-compass/opencompass)**|\n", "2407.11712": "|**2024-07-17**|**Harnessing Large Language Models for Multimodal Product Bundling**|Xiaohao Liu et.al.|[2407.11712v2](http://arxiv.org/abs/2407.11712v2)|null|\n", "2407.11638": "|**2024-07-16**|**A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting**|He Chang et.al.|[2407.11638v1](http://arxiv.org/abs/2407.11638v1)|null|\n", "2407.11511": "|**2024-07-16**|**Reasoning with Large Language Models, a Survey**|Aske Plaat et.al.|[2407.11511v1](http://arxiv.org/abs/2407.11511v1)|null|\n", "2407.11417": "|**2024-07-16**|**SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions**|Shicheng Liu et.al.|[2407.11417v1](http://arxiv.org/abs/2407.11417v1)|null|\n", "2407.11373": "|**2024-07-19**|**Reliable Reasoning Beyond Natural Language**|Nasim Borazjanizadeh et.al.|[2407.11373v2](http://arxiv.org/abs/2407.11373v2)|null|\n", "2407.11325": "|**2024-07-16**|**VISA: Reasoning Video Object Segmentation via Large Language Models**|Cilin Yan et.al.|[2407.11325v1](http://arxiv.org/abs/2407.11325v1)|**[link](https://github.com/cilinyan/revos-api)**|\n", "2407.11240": "|**2024-07-15**|**Making New Connections: LLMs as Puzzle Generators for The New York Times' Connections Word Game**|Tim Merino et.al.|[2407.11240v1](http://arxiv.org/abs/2407.11240v1)|null|\n", "2407.11068": "|**2024-07-17**|**Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay**|Gon\u00e7alo Hora de Carvalho et.al.|[2407.11068v2](http://arxiv.org/abs/2407.11068v2)|**[link](https://github.com/child-play-neurips/child-play)**|\n", "2407.13761": "|**2024-07-18**|**SegPoint: Segment Any Point Cloud via Large Language Model**|Shuting He et.al.|[2407.13761v1](http://arxiv.org/abs/2407.13761v1)|null|\n", "2407.13692": "|**2024-07-18**|**Prover-Verifier Games improve legibility of LLM outputs**|Jan Hendrik Kirchner et.al.|[2407.13692v1](http://arxiv.org/abs/2407.13692v1)|null|\n", "2407.13647": "|**2024-07-18**|**Weak-to-Strong Reasoning**|Yuqing Yang et.al.|[2407.13647v1](http://arxiv.org/abs/2407.13647v1)|**[link](https://github.com/gair-nlp/weak-to-strong-reasoning)**|\n", "2407.13598": "|**2024-07-18**|**KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration**|Youfu Yan et.al.|[2407.13598v1](http://arxiv.org/abs/2407.13598v1)|null|\n", "2407.13505": "|**2024-07-18**|**Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation**|Hassan Ali et.al.|[2407.13505v1](http://arxiv.org/abs/2407.13505v1)|null|\n", "2407.13490": "|**2024-07-18**|**Combining Constraint Programming Reasoning with Large Language Model Predictions**|Florian R\u00e9gin et.al.|[2407.13490v1](http://arxiv.org/abs/2407.13490v1)|null|\n", "2407.13442": "|**2024-07-18**|**BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models**|Moon Ye-Bin et.al.|[2407.13442v1](http://arxiv.org/abs/2407.13442v1)|null|\n", "2407.13331": "|**2024-07-18**|**Reconstruct the Pruned Model without Any Retraining**|Pingjie Wang et.al.|[2407.13331v1](http://arxiv.org/abs/2407.13331v1)|null|\n", "2407.13301": "|**2024-07-18**|**CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis**|Junying Chen et.al.|[2407.13301v1](http://arxiv.org/abs/2407.13301v1)|null|\n", "2407.13248": "|**2024-07-18**|**Are Large Language Models Capable of Generating Human-Level Narratives?**|Yufei Tian et.al.|[2407.13248v1](http://arxiv.org/abs/2407.13248v1)|null|\n", "2407.13094": "|**2024-07-18**|**Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data**|Wufei Ma et.al.|[2407.13094v1](http://arxiv.org/abs/2407.13094v1)|null|\n", "2407.12979": "|**2024-07-17**|**Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models**|Sadegh Mahdavi et.al.|[2407.12979v1](http://arxiv.org/abs/2407.12979v1)|null|\n", "2407.12725": "|**2024-07-17**|**Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?**|Ben Yao et.al.|[2407.12725v1](http://arxiv.org/abs/2407.12725v1)|null|\n", "2407.12532": "|**2024-07-17**|**Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models**|Xihe Qiu et.al.|[2407.12532v1](http://arxiv.org/abs/2407.12532v1)|null|\n", "2407.12522": "|**2024-07-17**|**Struct-X: Enhancing Large Language Models Reasoning with Structured Data**|Xiaoyu Tan et.al.|[2407.12522v1](http://arxiv.org/abs/2407.12522v1)|null|\n", "2407.12504": "|**2024-07-17**|**Case2Code: Learning Inductive Reasoning with Synthetic Data**|Yunfan Shao et.al.|[2407.12504v1](http://arxiv.org/abs/2407.12504v1)|**[link](https://github.com/choosewhatulike/case2code)**|\n", "2407.12498": "|**2024-07-17**|**Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning**|Mustafa Dogan et.al.|[2407.12498v1](http://arxiv.org/abs/2407.12498v1)|null|\n", "2407.12435": "|**2024-07-17**|**F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions**|Jie Yang et.al.|[2407.12435v1](http://arxiv.org/abs/2407.12435v1)|null|\n", "2407.12402": "|**2024-07-17**|**TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish**|Arda Y\u00fcksel et.al.|[2407.12402v1](http://arxiv.org/abs/2407.12402v1)|null|\n", "2407.12397": "|**2024-07-17**|**Mamba-PTQ: Outlier Channels in Recurrent Large Language Models**|Alessandro Pierro et.al.|[2407.12397v1](http://arxiv.org/abs/2407.12397v1)|null|\n", "2407.12366": "|**2024-07-17**|**NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models**|Gengze Zhou et.al.|[2407.12366v1](http://arxiv.org/abs/2407.12366v1)|**[link](https://github.com/gengzezhou/navgpt-2)**|\n", "2407.12341": "|**2024-07-17**|**LLM-based query paraphrasing for video search**|Jiaxin Wu et.al.|[2407.12341v1](http://arxiv.org/abs/2407.12341v1)|null|\n", "2407.12108": "|**2024-07-16**|**Private prediction for large-scale synthetic text generation**|Kareem Amin et.al.|[2407.12108v1](http://arxiv.org/abs/2407.12108v1)|null|\n", "2407.12101": "|**2024-07-16**|**Better RAG using Relevant Information Gain**|Marc Pickett et.al.|[2407.12101v1](http://arxiv.org/abs/2407.12101v1)|**[link](https://github.com/EmergenceAI/dartboard)**|\n", "2407.12883": "|**2024-07-16**|**BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval**|Hongjin Su et.al.|[2407.12883v1](http://arxiv.org/abs/2407.12883v1)|null|\n", "2407.12879": "|**2024-07-16**|**Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection**|Ye Jiang et.al.|[2407.12879v1](http://arxiv.org/abs/2407.12879v1)|null|\n", "2407.12877": "|**2024-07-16**|**Review-Feedback-Reason (ReFeR): A Novel Framework for NLG Evaluation and Reasoning**|Yaswanth Narsupalli et.al.|[2407.12877v1](http://arxiv.org/abs/2407.12877v1)|null|\n", "2407.12863": "|**2024-07-12**|**Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models**|Jung Hyun Lee et.al.|[2407.12863v1](http://arxiv.org/abs/2407.12863v1)|null|\n", "2407.12862": "|**2024-07-10**|**Analyzing Large language models chatbots: An experimental approach using a probability test**|Melise Peruchini et.al.|[2407.12862v1](http://arxiv.org/abs/2407.12862v1)|null|\n", "2407.15720": "|**2024-07-22**|**Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability**|Zhuoyan Xu et.al.|[2407.15720v1](http://arxiv.org/abs/2407.15720v1)|**[link](https://github.com/oliverxuzy/llm_compose)**|\n", "2407.15716": "|**2024-07-22**|**CrashEventLLM: Predicting System Crashes with Large Language Models**|Priyanka Mudgal et.al.|[2407.15716v1](http://arxiv.org/abs/2407.15716v1)|null|\n", "2407.15680": "|**2024-07-22**|**HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning**|Zhecan Wang et.al.|[2407.15680v1](http://arxiv.org/abs/2407.15680v1)|null|\n", "2407.15017": "|**2024-07-22**|**Knowledge Mechanisms in Large Language Models: A Survey and Perspective**|Mengru Wang et.al.|[2407.15017v1](http://arxiv.org/abs/2407.15017v1)|null|\n", "2407.15360": "|**2024-07-22**|**Dissecting Multiplication in Transformers: Insights into LLMs**|Luyu Qiu et.al.|[2407.15360v1](http://arxiv.org/abs/2407.15360v1)|null|\n", "2407.15291": "|**2024-07-21**|**Evidence-Based Temporal Fact Verification**|Anab Maulana Barik et.al.|[2407.15291v1](http://arxiv.org/abs/2407.15291v1)|null|\n", "2407.15272": "|**2024-07-21**|**MIBench: Evaluating Multimodal Large Language Models over Multiple Images**|Haowei Liu et.al.|[2407.15272v1](http://arxiv.org/abs/2407.15272v1)|null|\n", "2407.15073": "|**2024-07-21**|**Multi-Agent Causal Discovery Using Large Language Models**|Hao Duong Le et.al.|[2407.15073v1](http://arxiv.org/abs/2407.15073v1)|null|\n", "2407.14985": "|**2024-07-20**|**Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data**|Antonis Antoniades et.al.|[2407.14985v1](http://arxiv.org/abs/2407.14985v1)|null|\n", "2407.14926": "|**2024-07-20**|**TraveLLM: Could you plan my new public transit route in face of a network disruption?**|Bowen Fang et.al.|[2407.14926v1](http://arxiv.org/abs/2407.14926v1)|null|\n", "2407.14845": "|**2024-07-20**|**Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models**|Ze Yu Zhang et.al.|[2407.14845v1](http://arxiv.org/abs/2407.14845v1)|null|\n", "2407.14834": "|**2024-07-20**|**Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators**|Harsh Lunia et.al.|[2407.14834v1](http://arxiv.org/abs/2407.14834v1)|null|\n", "2407.14788": "|**2024-07-20**|**On the Design and Analysis of LLM-Based Algorithms**|Yanxi Chen et.al.|[2407.14788v1](http://arxiv.org/abs/2407.14788v1)|**[link](https://github.com/modelscope/agentscope)**|\n", "2407.14609": "|**2024-07-19**|**Adversarial Databases Improve Success in Retrieval-based Large Language Models**|Sean Wu et.al.|[2407.14609v1](http://arxiv.org/abs/2407.14609v1)|null|\n", "2407.14507": "|**2024-07-19**|**Internal Consistency and Self-Feedback in Large Language Models: A Survey**|Xun Liang et.al.|[2407.14507v1](http://arxiv.org/abs/2407.14507v1)|**[link](https://github.com/iaar-shanghai/icsfsurvey)**|\n", "2407.14506": "|**2024-07-19**|**On Pre-training of Multimodal Language Models Customized for Chart Understanding**|Wan-Cyuan Fan et.al.|[2407.14506v1](http://arxiv.org/abs/2407.14506v1)|null|\n", "2407.14487": "|**2024-07-19**|**Evaluating the Reliability of Self-Explanations in Large Language Models**|Korbinian Randl et.al.|[2407.14487v1](http://arxiv.org/abs/2407.14487v1)|**[link](https://github.com/k-randl/self-explaining_llms)**|\n", "2407.14279": "|**2024-07-19**|**OpenSU3D: Open World 3D Scene Understanding using Foundation Models**|Rafay Mohiuddin et.al.|[2407.14279v1](http://arxiv.org/abs/2407.14279v1)|null|\n", "2407.14192": "|**2024-07-19**|**LeKUBE: A Legal Knowledge Update BEnchmark**|Changyue Wang et.al.|[2407.14192v1](http://arxiv.org/abs/2407.14192v1)|null|\n", "2407.14138": "|**2024-07-19**|**Visual Text Generation in the Wild**|Yuanzhi Zhu et.al.|[2407.14138v1](http://arxiv.org/abs/2407.14138v1)|**[link](https://github.com/alibabaresearch/advancedliteratemachinery)**|\n", "2407.13989": "|**2024-07-19**|**Enhancing Data-Limited Graph Neural Networks by Actively Distilling Knowledge from Large Language Models**|Quan Li et.al.|[2407.13989v1](http://arxiv.org/abs/2407.13989v1)|null|\n", "2407.13943": "|**2024-07-18**|**Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction**|Suma Bailis et.al.|[2407.13943v1](http://arxiv.org/abs/2407.13943v1)|null|\n", "2407.13909": "|**2024-07-18**|**PRAGyan -- Connecting the Dots in Tweets**|Rahul Ravi et.al.|[2407.13909v1](http://arxiv.org/abs/2407.13909v1)|null|\n", "2407.14562": "|**2024-07-18**|**Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Though**|Xiaoyu Tan et.al.|[2407.14562v1](http://arxiv.org/abs/2407.14562v1)|null|\n", "2407.13851": "|**2024-07-18**|**X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs**|Sirnam Swetha et.al.|[2407.13851v1](http://arxiv.org/abs/2407.13851v1)|null|\n", "2407.14500": "|**2024-07-18**|**ViLLa: Video Reasoning Segmentation with Large Language Model**|Rongkun Zheng et.al.|[2407.14500v1](http://arxiv.org/abs/2407.14500v1)|**[link](https://github.com/rkzheng99/villa)**|\n", "2407.13811": "|**2024-07-18**|**Which objects help me to act effectively? Reasoning about physically-grounded affordances**|Anne Kemmeren et.al.|[2407.13811v1](http://arxiv.org/abs/2407.13811v1)|null|\n", "2407.16222": "|**2024-07-23**|**PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment**|Jiahuan Li et.al.|[2407.16222v1](http://arxiv.org/abs/2407.16222v1)|null|\n", "2407.16205": "|**2024-07-23**|**Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models**|Shi Lin et.al.|[2407.16205v1](http://arxiv.org/abs/2407.16205v1)|null|\n", "2407.16160": "|**2024-07-23**|**UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models**|Liu Qi et.al.|[2407.16160v1](http://arxiv.org/abs/2407.16160v1)|null|\n", "2407.16030": "|**2024-07-22**|**Enhancing Temporal Understanding in LLMs for Semi-structured Tables**|Irwin Deng et.al.|[2407.16030v1](http://arxiv.org/abs/2407.16030v1)|null|\n", "2407.17404": "|**2024-07-24**|**Grammar-based Game Description Generation using Large Language Models**|Tsunehiko Tanaka et.al.|[2407.17404v1](http://arxiv.org/abs/2407.17404v1)|null|\n", "2407.17349": "|**2024-07-24**|**Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching**|Yuyang Ding et.al.|[2407.17349v1](http://arxiv.org/abs/2407.17349v1)|null|\n", "2407.17227": "|**2024-07-24**|**LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover**|Zijian Wu et.al.|[2407.17227v1](http://arxiv.org/abs/2407.17227v1)|null|\n", "2407.17190": "|**2024-07-24**|**Fusing LLMs and KGs for Formal Causal Reasoning behind Financial Risk Contagion**|Guanyuan Yu et.al.|[2407.17190v1](http://arxiv.org/abs/2407.17190v1)|null|\n", "2407.17115": "|**2024-07-24**|**Reinforced Prompt Personalization for Recommendation with Large Language Models**|Wenyu Mao et.al.|[2407.17115v1](http://arxiv.org/abs/2407.17115v1)|**[link](https://github.com/maowenyu-11/rpp)**|\n", "2407.16994": "|**2024-07-24**|**A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs**|Jake R. Watts et.al.|[2407.16994v1](http://arxiv.org/abs/2407.16994v1)|null|\n", "2407.16931": "|**2024-07-24**|**ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering**|Xiuying Chen et.al.|[2407.16931v1](http://arxiv.org/abs/2407.16931v1)|null|\n", "2407.16837": "|**2024-07-23**|**CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs**|Jihyung Kil et.al.|[2407.16837v1](http://arxiv.org/abs/2407.16837v1)|**[link](https://github.com/raptormai/compbench)**|\n"}, "LLM - Uncertainty": {"2311.15451": "|**2023-11-26**|**Uncertainty-aware Language Modeling for Selective Question Answering**|Qi Yang et.al.|[2311.15451v1](http://arxiv.org/abs/2311.15451v1)|null|\n", "2311.15180": "|**2023-11-26**|**Benchmarking Large Language Model Volatility**|Boyang Yu et.al.|[2311.15180v1](http://arxiv.org/abs/2311.15180v1)|null|\n", "2311.13230": "|**2023-11-22**|**Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus**|Tianhang Zhang et.al.|[2311.13230v1](http://arxiv.org/abs/2311.13230v1)|**[link](https://github.com/zthang/focus)**|\n", "2311.09731": "|**2024-02-16**|**Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge**|Genglin Liu et.al.|[2311.09731v2](http://arxiv.org/abs/2311.09731v2)|**[link](https://github.com/genglinliu/unknownbench)**|\n", "2311.09677": "|**2023-11-16**|**R-Tuning: Teaching Large Language Models to Refuse Unknown Questions**|Hanning Zhang et.al.|[2311.09677v1](http://arxiv.org/abs/2311.09677v1)|**[link](https://github.com/shizhediao/r-tuning)**|\n", "2311.09358": "|**2023-11-15**|**Empirical evaluation of Uncertainty Quantification in Retrieval-Augmented Language Models for Science**|Sridevi Wagle et.al.|[2311.09358v1](http://arxiv.org/abs/2311.09358v1)|**[link](https://github.com/pnnl/expert2)**|\n", "2311.09336": "|**2023-11-15**|**Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback**|Wenda Xu et.al.|[2311.09336v1](http://arxiv.org/abs/2311.09336v1)|null|\n", "2311.08718": "|**2023-11-15**|**Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling**|Bairu Hou et.al.|[2311.08718v1](http://arxiv.org/abs/2311.08718v1)|**[link](https://github.com/ucsb-nlp-chang/llm_uncertainty)**|\n", "2311.08692": "|**2023-11-15**|**Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models**|Keming Lu et.al.|[2311.08692v1](http://arxiv.org/abs/2311.08692v1)|null|\n", "2311.07383": "|**2023-11-13**|**LM-Polygraph: Uncertainty Estimation for Language Models**|Ekaterina Fadeeva et.al.|[2311.07383v1](http://arxiv.org/abs/2311.07383v1)|null|\n", "2311.06697": "|**2023-11-12**|**Trusted Source Alignment in Large Language Models**|Vasilisa Bashlovkina et.al.|[2311.06697v1](http://arxiv.org/abs/2311.06697v1)|null|\n", "2311.05965": "|**2023-11-10**|**Large Language Models are Zero Shot Hypothesis Proposers**|Biqing Qi et.al.|[2311.05965v1](http://arxiv.org/abs/2311.05965v1)|null|\n", "2311.03783": "|**2023-11-07**|**Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI**|Song Yaoxian et.al.|[2311.03783v1](http://arxiv.org/abs/2311.03783v1)|null|\n", "2311.03533": "|**2023-11-06**|**Quantifying Uncertainty in Natural Language Explanations of Large Language Models**|Sree Harsha Tanneru et.al.|[2311.03533v1](http://arxiv.org/abs/2311.03533v1)|**[link](https://github.com/harsha070/uncertainty-quantification-nle)**|\n", "2311.00288": "|**2023-11-01**|**Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks**|Po-Nien Kung et.al.|[2311.00288v1](http://arxiv.org/abs/2311.00288v1)|**[link](https://github.com/pluslabnlp/active-it)**|\n", "2310.20624": "|**2023-10-31**|**LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B**|Simon Lermen et.al.|[2310.20624v1](http://arxiv.org/abs/2310.20624v1)|null|\n", "2310.20046": "|**2023-10-30**|**Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection**|Costas Mavromatis et.al.|[2310.20046v1](http://arxiv.org/abs/2310.20046v1)|**[link](https://github.com/amazon-science/adaptive-in-context-learning)**|\n", "2310.18365": "|**2023-11-18**|**Using GPT-4 to Augment Unbalanced Data for Automatic Scoring**|Luyang Fang et.al.|[2310.18365v2](http://arxiv.org/abs/2310.18365v2)|null|\n", "2310.15638": "|**2023-10-24**|**CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation**|Minzhi Li et.al.|[2310.15638v1](http://arxiv.org/abs/2310.15638v1)|**[link](https://github.com/salt-nlp/coannotating)**|\n", "2310.13976": "|**2023-11-01**|**Advancing Requirements Engineering through Generative AI: Assessing the Role of LLMs**|Chetan Arora et.al.|[2310.13976v2](http://arxiv.org/abs/2310.13976v2)|null|\n", "2310.12808": "|**2023-10-19**|**Model Merging by Uncertainty-Based Gradient Matching**|Nico Daheim et.al.|[2310.12808v1](http://arxiv.org/abs/2310.12808v1)|null|\n", "2310.12663": "|**2023-10-19**|**Knowledge from Uncertainty in Evidential Deep Learning**|Cai Davies et.al.|[2310.12663v1](http://arxiv.org/abs/2310.12663v1)|null|\n", "2310.12523": "|**2023-10-19**|**Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework**|Imdad Ullah et.al.|[2310.12523v1](http://arxiv.org/abs/2310.12523v1)|null|\n", "2310.10544": "|**2023-11-25**|**Use of probabilistic phrases in a coordination game: human versus GPT-4**|Laurence T Maloney et.al.|[2310.10544v3](http://arxiv.org/abs/2310.10544v3)|null|\n", "2310.10317": "|**2023-10-16**|**Stochastic spin-orbit-torque synapse and its application in uncertainty quantification**|Cen Wang et.al.|[2310.10317v1](http://arxiv.org/abs/2310.10317v1)|null|\n", "2310.08027": "|**2023-10-12**|**Exploring Large Language Models for Multi-Modal Out-of-Distribution Detection**|Yi Dai et.al.|[2310.08027v1](http://arxiv.org/abs/2310.08027v1)|null|\n", "2310.07820": "|**2023-10-11**|**Large Language Models Are Zero-Shot Time Series Forecasters**|Nate Gruver et.al.|[2310.07820v1](http://arxiv.org/abs/2310.07820v1)|**[link](https://github.com/ngruver/llmtime)**|\n", "2310.05833": "|**2023-10-09**|**A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models**|Sebastian G. Gruber et.al.|[2310.05833v1](http://arxiv.org/abs/2310.05833v1)|null|\n", "2310.05553": "|**2023-10-09**|**Regulation and NLP (RegNLP): Taming Large Language Models**|Catalina Goanta et.al.|[2310.05553v1](http://arxiv.org/abs/2310.05553v1)|null|\n", "2310.04782": "|**2023-10-07**|**Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning**|Yuchen Yang et.al.|[2310.04782v1](http://arxiv.org/abs/2310.04782v1)|null|\n", "2310.04230": "|**2023-10-06**|**Lending Interaction Wings to Recommender Systems with Conversational Agents**|Jiarui Jin et.al.|[2310.04230v1](http://arxiv.org/abs/2310.04230v1)|null|\n", "2310.03192": "|**2023-11-10**|**Generative AI in the Classroom: Can Students Remain Active Learners?**|Rania Abdelghani et.al.|[2310.03192v2](http://arxiv.org/abs/2310.03192v2)|null|\n", "2310.02743": "|**2023-10-04**|**Reward Model Ensembles Help Mitigate Overoptimization**|Thomas Coste et.al.|[2310.02743v1](http://arxiv.org/abs/2310.02743v1)|null|\n", "2310.01290": "|**2023-10-02**|**Knowledge Crosswords: Geometric Reasoning over Structured Knowledge with Large Language Models**|Wenxuan Ding et.al.|[2310.01290v1](http://arxiv.org/abs/2310.01290v1)|**[link](https://github.com/wenwen-d/knowledgecrosswords)**|\n", "2310.00867": "|**2023-10-14**|**(Dynamic) Prompting might be all you need to repair Compressed LLMs**|Duc N. M Hoang et.al.|[2310.00867v2](http://arxiv.org/abs/2310.00867v2)|null|\n", "2310.00035": "|**2023-10-04**|**LoRA ensembles for large language model fine-tuning**|Xi Wang et.al.|[2310.00035v2](http://arxiv.org/abs/2310.00035v2)|null|\n", "2309.17382": "|**2023-10-11**|**Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency**|Zhihan Liu et.al.|[2309.17382v2](http://arxiv.org/abs/2309.17382v2)|**[link](https://github.com/agentification/RAFA_code)**|\n", "2309.16347": "|**2023-09-28**|**Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks**|Eleftherios Triantafyllidis et.al.|[2309.16347v1](http://arxiv.org/abs/2309.16347v1)|null|\n", "2309.16052": "|**2023-09-27**|**OceanChat: Piloting Autonomous Underwater Vehicles in Natural Language**|Ruochu Yang et.al.|[2309.16052v1](http://arxiv.org/abs/2309.16052v1)|null|\n", "2309.13007": "|**2023-09-22**|**ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs**|Justin Chih-Yao Chen et.al.|[2309.13007v1](http://arxiv.org/abs/2309.13007v1)|**[link](https://github.com/dinobby/reconcile)**|\n", "2309.12731": "|**2023-09-22**|**Defeasible Reasoning with Knowledge Graphs**|Dave Raggett et.al.|[2309.12731v1](http://arxiv.org/abs/2309.12731v1)|null|\n", "2309.09181": "|**2023-09-17**|**From Cooking Recipes to Robot Task Trees -- Improving Planning Correctness and Task Efficiency by Leveraging LLMs with a Knowledge Network**|Md Sadman Sakib et.al.|[2309.09181v1](http://arxiv.org/abs/2309.09181v1)|null|\n", "2309.07694": "|**2023-09-14**|**Tree of Uncertain Thoughts Reasoning for Large Language Models**|Shentong Mo et.al.|[2309.07694v1](http://arxiv.org/abs/2309.07694v1)|null|\n", "2309.05520": "|**2023-09-14**|**When ChatGPT Meets Smart Contract Vulnerability Detection: How Far Are We?**|Chong Chen et.al.|[2309.05520v3](http://arxiv.org/abs/2309.05520v3)|null|\n", "2309.05077": "|**2023-10-14**|**Generalization error bounds for iterative learning algorithms with bounded updates**|Jingwen Fu et.al.|[2309.05077v3](http://arxiv.org/abs/2309.05077v3)|null|\n", "2309.05076": "|**2023-09-10**|**An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents**|Maximilian Croissant et.al.|[2309.05076v1](http://arxiv.org/abs/2309.05076v1)|null|\n", "2309.04842": "|**2023-09-12**|**Leveraging Large Language Models for Exploiting ASR Uncertainty**|Pranay Dighe et.al.|[2309.04842v2](http://arxiv.org/abs/2309.04842v2)|null|\n", "2309.03433": "|**2023-09-07**|**Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty**|Chen Ling et.al.|[2309.03433v1](http://arxiv.org/abs/2309.03433v1)|null|\n", "2308.16175": "|**2023-10-04**|**Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness**|Jiuhai Chen et.al.|[2308.16175v2](http://arxiv.org/abs/2308.16175v2)|null|\n", "2308.15684": "|**2023-10-18**|**Interactively Robot Action Planning with Uncertainty Analysis and Active Questioning by Large Language Model**|Kazuki Hori et.al.|[2308.15684v2](http://arxiv.org/abs/2308.15684v2)|null|\n", "2308.13111": "|**2024-02-05**|**Bayesian Low-rank Adaptation for Large Language Models**|Adam X. Yang et.al.|[2308.13111v5](http://arxiv.org/abs/2308.13111v5)|**[link](https://github.com/adamxyang/laplace-lora)**|\n", "2308.06391": "|**2023-08-11**|**Dynamic Planning with a LLM**|Gautier Dagan et.al.|[2308.06391v1](http://arxiv.org/abs/2308.06391v1)|**[link](https://github.com/itl-ed/llm-dp)**|\n", "2308.03740": "|**2023-08-07**|**A Cost Analysis of Generative Language Models and Influence Operations**|Micah Musser et.al.|[2308.03740v1](http://arxiv.org/abs/2308.03740v1)|**[link](https://github.com/georgetown-cset/disinfo-costs)**|\n", "2308.01222": "|**2024-02-05**|**Calibration in Deep Learning: A Survey of the State-of-the-Art**|Cheng Wang et.al.|[2308.01222v2](http://arxiv.org/abs/2308.01222v2)|null|\n", "2308.00389": "|**2023-08-01**|**Autonomous data extraction from peer reviewed literature for training machine learning models of oxidation potentials**|Siwoo Lee et.al.|[2308.00389v1](http://arxiv.org/abs/2308.00389v1)|null|\n", "2307.14324": "|**2023-07-26**|**Evaluating the Moral Beliefs Encoded in LLMs**|Nino Scherrer et.al.|[2307.14324v1](http://arxiv.org/abs/2307.14324v1)|**[link](https://github.com/ninodimontalcino/moralchoice)**|\n", "2307.08423": "|**2023-11-15**|**Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems**|Xuan Zhang et.al.|[2307.08423v2](http://arxiv.org/abs/2307.08423v2)|**[link](https://github.com/divelab/AIRS)**|\n", "2307.10236": "|**2023-10-17**|**Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models**|Yuheng Huang et.al.|[2307.10236v3](http://arxiv.org/abs/2307.10236v3)|null|\n", "2307.01928": "|**2023-09-04**|**Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners**|Allen Z. Ren et.al.|[2307.01928v2](http://arxiv.org/abs/2307.01928v2)|null|\n", "2307.01379": "|**2023-10-09**|**Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models**|Jinhao Duan et.al.|[2307.01379v2](http://arxiv.org/abs/2307.01379v2)|**[link](https://github.com/jinhaoduan/shifting-attention-to-relevance)**|\n", "2306.15766": "|**2023-06-27**|**Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost**|Parikshit Bansal et.al.|[2306.15766v1](http://arxiv.org/abs/2306.15766v1)|null|\n", "2306.13063": "|**2023-06-22**|**Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs**|Miao Xiong et.al.|[2306.13063v1](http://arxiv.org/abs/2306.13063v1)|null|\n", "2306.10376": "|**2023-10-23**|**CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents**|Jeongeun Park et.al.|[2306.10376v5](http://arxiv.org/abs/2306.10376v5)|**[link](https://github.com/jeongeun980906/CLARA-SaGC-Code)**|\n", "2306.04746": "|**2024-01-14**|**Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models**|Naoki Egami et.al.|[2306.04746v3](http://arxiv.org/abs/2306.04746v3)|null|\n", "2306.02224": "|**2023-06-04**|**Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions**|Hui Yang et.al.|[2306.02224v1](http://arxiv.org/abs/2306.02224v1)|**[link](https://github.com/younghuman/llmagent)**|\n", "2306.01941": "|**2023-08-08**|**AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap**|Q. Vera Liao et.al.|[2306.01941v2](http://arxiv.org/abs/2306.01941v2)|null|\n", "2306.01694": "|**2023-11-05**|**Evaluating Language Models for Mathematics through Interactions**|Katherine M. Collins et.al.|[2306.01694v2](http://arxiv.org/abs/2306.01694v2)|**[link](https://github.com/collinskatie/checkmate)**|\n", "2305.19187": "|**2023-10-09**|**Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models**|Zhen Lin et.al.|[2305.19187v2](http://arxiv.org/abs/2305.19187v2)|**[link](https://github.com/zlin7/uq-nlg)**|\n", "2305.18262": "|**2023-10-30**|**Beyond Confidence: Reliable Models Should Also Consider Atypicality**|Mert Yuksekgonul et.al.|[2305.18262v2](http://arxiv.org/abs/2305.18262v2)|null|\n", "2305.18153": "|**2023-05-30**|**Do Large Language Models Know What They Don't Know?**|Zhangyue Yin et.al.|[2305.18153v2](http://arxiv.org/abs/2305.18153v2)|**[link](https://github.com/yinzhangyue/selfaware)**|\n", "2305.18404": "|**2023-07-08**|**Conformal Prediction with Large Language Models for Multi-Choice Question Answering**|Bhawesh Kumar et.al.|[2305.18404v3](http://arxiv.org/abs/2305.18404v3)|**[link](https://github.com/bhaweshiitk/conformalllm)**|\n", "2305.16617": "|**2023-05-26**|**Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model**|Zhijie Deng et.al.|[2305.16617v1](http://arxiv.org/abs/2305.16617v1)|null|\n", "2305.17144": "|**2023-06-01**|**Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory**|Xizhou Zhu et.al.|[2305.17144v2](http://arxiv.org/abs/2305.17144v2)|**[link](https://github.com/opengvlab/gitm)**|\n", "2305.14928": "|**2023-10-31**|**Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4**|Kellin Pelrine et.al.|[2305.14928v3](http://arxiv.org/abs/2305.14928v3)|**[link](https://github.com/complexdata-mila/mitigatemisinfo)**|\n", "2305.14802": "|**2023-10-26**|**Estimating Large Language Model Capabilities without Labeled Test Data**|Harvey Yiyun Fu et.al.|[2305.14802v2](http://arxiv.org/abs/2305.14802v2)|**[link](https://github.com/harvey-fin/icl-estimate)**|\n", "2305.14264": "|**2023-11-22**|**Active Learning Principles for In-Context Learning with Large Language Models**|Katerina Margatina et.al.|[2305.14264v2](http://arxiv.org/abs/2305.14264v2)|null|\n", "2305.14072": "|**2023-05-23**|**When the Music Stops: Tip-of-the-Tongue Retrieval for Music**|Samarth Bhargav et.al.|[2305.14072v1](http://arxiv.org/abs/2305.14072v1)|**[link](https://github.com/spotify-research/tot)**|\n", "2305.13712": "|**2023-05-23**|**Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models**|Alfonso Amayuelas et.al.|[2305.13712v1](http://arxiv.org/abs/2305.13712v1)|null|\n", "2305.07504": "|**2023-05-12**|**Calibration-Aware Bayesian Learning**|Jiayi Huang et.al.|[2305.07504v1](http://arxiv.org/abs/2305.07504v1)|null|\n", "2305.02897": "|**2023-08-03**|**An automatically discovered chain-of-thought prompt generalizes to novel models and datasets**|Konstantin Hebenstreit et.al.|[2305.02897v2](http://arxiv.org/abs/2305.02897v2)|null|\n", "2305.00633": "|**2023-10-26**|**Self-Evaluation Guided Beam Search for Reasoning**|Yuxi Xie et.al.|[2305.00633v3](http://arxiv.org/abs/2305.00633v3)|null|\n", "2304.05341": "|**2023-04-11**|**Bayesian Optimization of Catalysts With In-context Learning**|Mayk Caldas Ramos et.al.|[2304.05341v1](http://arxiv.org/abs/2304.05341v1)|**[link](https://github.com/ur-whitelab/bo-lift)**|\n", "2303.15621": "|**2023-04-13**|**ChatGPT as a Factual Inconsistency Evaluator for Text Summarization**|Zheheng Luo et.al.|[2303.15621v2](http://arxiv.org/abs/2303.15621v2)|null|\n", "2303.10464": "|**2023-07-29**|**SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models**|Vithursan Thangarasa et.al.|[2303.10464v2](http://arxiv.org/abs/2303.10464v2)|null|\n", "2303.05352": "|**2023-06-27**|**Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering**|Maciej P. Polak et.al.|[2303.05352v2](http://arxiv.org/abs/2303.05352v2)|null|\n", "2303.00732": "|**2023-04-28**|**R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents**|Daniel D. Johnson et.al.|[2303.00732v2](http://arxiv.org/abs/2303.00732v2)|**[link](https://github.com/google-research/r_u_sure)**|\n", "2302.12246": "|**2023-05-23**|**Active Prompting with Chain-of-Thought for Large Language Models**|Shizhe Diao et.al.|[2302.12246v3](http://arxiv.org/abs/2302.12246v3)|**[link](https://github.com/shizhediao/active-cot)**|\n", "2302.09664": "|**2023-04-15**|**Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation**|Lorenz Kuhn et.al.|[2302.09664v3](http://arxiv.org/abs/2302.09664v3)|**[link](https://github.com/lorenzkuhn/semantic_uncertainty)**|\n", "2302.08703": "|**2023-06-21**|**PAC Prediction Sets for Large Language Models of Code**|Adam Khakhar et.al.|[2302.08703v2](http://arxiv.org/abs/2302.08703v2)|**[link](https://github.com/adamkhakhar/python-pac-code-prediction-set)**|\n", "2302.03686": "|**2023-09-29**|**Long Horizon Temperature Scaling**|Andy Shih et.al.|[2302.03686v2](http://arxiv.org/abs/2302.03686v2)|**[link](https://github.com/andyshih12/longhorizontemperaturescaling)**|\n", "2212.13371": "|**2022-12-27**|**Measuring an artificial intelligence agent's trust in humans using machine incentives**|Tim Johnson et.al.|[2212.13371v1](http://arxiv.org/abs/2212.13371v1)|null|\n", "2211.13196": "|**2022-11-23**|**SeedBERT: Recovering Annotator Rating Distributions from an Aggregated Label**|Aneesha Sampath et.al.|[2211.13196v1](http://arxiv.org/abs/2211.13196v1)|null|\n", "2209.06995": "|**2023-05-08**|**Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach**|Yue Yu et.al.|[2209.06995v2](http://arxiv.org/abs/2209.06995v2)|**[link](https://github.com/yueyu1030/patron)**|\n", "2208.10063": "|**2022-09-13**|**Selection Collider Bias in Large Language Models**|Emily McMilin et.al.|[2208.10063v2](http://arxiv.org/abs/2208.10063v2)|**[link](https://github.com/2dot71mily/selection_collider_bias_uai_clr_2022)**|\n", "2312.01619": "|**2023-12-06**|**How Many Validation Labels Do You Need? Exploring the Design Space of Label-Efficient Model Ranking**|Zhengyu Hu et.al.|[2312.01619v2](http://arxiv.org/abs/2312.01619v2)|**[link](https://github.com/ppsmk388/morabench)**|\n", "2312.03733": "|**2023-12-08**|**Methods to Estimate Large Language Model Confidence**|Maia Kotelanski et.al.|[2312.03733v2](http://arxiv.org/abs/2312.03733v2)|null|\n", "2312.05488": "|**2023-12-12**|**Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis**|Caoyun Fan et.al.|[2312.05488v2](http://arxiv.org/abs/2312.05488v2)|null|\n", "2312.06876": "|**2023-12-11**|**Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks**|Lingfeng Sun et.al.|[2312.06876v1](http://arxiv.org/abs/2312.06876v1)|null|\n", "2312.08027": "|**2023-12-13**|**Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning**|Jinta Weng et.al.|[2312.08027v1](http://arxiv.org/abs/2312.08027v1)|null|\n", "2312.07843": "|**2023-12-13**|**Foundation Models in Robotics: Applications, Challenges, and the Future**|Roya Firoozi et.al.|[2312.07843v1](http://arxiv.org/abs/2312.07843v1)|**[link](https://github.com/robotics-survey/awesome-robotics-foundation-models)**|\n", "2312.09300": "|**2023-12-14**|**Self-Evaluation Improves Selective Generation in Large Language Models**|Jie Ren et.al.|[2312.09300v1](http://arxiv.org/abs/2312.09300v1)|null|\n", "2312.10057": "|**2023-12-04**|**Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarly Work**|Rishab Jain et.al.|[2312.10057v1](http://arxiv.org/abs/2312.10057v1)|null|\n", "2312.12112": "|**2024-02-07**|**Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in ultra low-data regimes**|Nabeel Seedat et.al.|[2312.12112v2](http://arxiv.org/abs/2312.12112v2)|null|\n", "2312.15576": "|**2023-12-25**|**Reducing LLM Hallucinations using Epistemic Neural Networks**|Shreyas Verma et.al.|[2312.15576v1](http://arxiv.org/abs/2312.15576v1)|null|\n", "2312.15184": "|**2023-12-23**|**ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-order Optimization**|Shuoran Jiang et.al.|[2312.15184v1](http://arxiv.org/abs/2312.15184v1)|**[link](https://github.com/mathisall/zo-adamu)**|\n", "2312.16279": "|**2023-12-26**|**Cloud-Device Collaborative Learning for Multimodal Large Language Models**|Guanqun Wang et.al.|[2312.16279v1](http://arxiv.org/abs/2312.16279v1)|null|\n", "2401.02009": "|**2024-03-27**|**Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives**|Wenqi Zhang et.al.|[2401.02009v2](http://arxiv.org/abs/2401.02009v2)|null|\n", "2401.01780": "|**2024-01-03**|**Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering**|Pierre Erbacher et.al.|[2401.01780v1](http://arxiv.org/abs/2401.01780v1)|null|\n", "2401.01197": "|**2024-01-02**|**Uncertainty Resolution in Misinformation Detection**|Yury Orlovskiy et.al.|[2401.01197v1](http://arxiv.org/abs/2401.01197v1)|null|\n", "2401.00243": "|**2023-12-30**|**Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles**|Yuanzhao Zhai et.al.|[2401.00243v1](http://arxiv.org/abs/2401.00243v1)|null|\n", "2401.02843": "|**2024-04-30**|**Thousands of AI Authors on the Future of AI**|Katja Grace et.al.|[2401.02843v2](http://arxiv.org/abs/2401.02843v2)|null|\n", "2401.03426": "|**2024-01-07**|**On Leveraging Large Language Models for Enhancing Entity Resolution**|Huahang Li et.al.|[2401.03426v1](http://arxiv.org/abs/2401.03426v1)|null|\n", "2401.03238": "|**2024-01-06**|**Using Large Language Models to Assess Tutors' Performance in Reacting to Students Making Math Errors**|Sanjit Kakarla et.al.|[2401.03238v1](http://arxiv.org/abs/2401.03238v1)|null|\n", "2401.04695": "|**2024-01-09**|**Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers**|Gal Yona et.al.|[2401.04695v1](http://arxiv.org/abs/2401.04695v1)|null|\n", "2401.06692": "|**2024-05-06**|**An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models**|Gantavya Bhatt et.al.|[2401.06692v2](http://arxiv.org/abs/2401.06692v2)|null|\n", "2401.07441": "|**2024-01-15**|**Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance**|Tinghui Ouyang et.al.|[2401.07441v1](http://arxiv.org/abs/2401.07441v1)|null|\n", "2401.08694": "|**2024-01-30**|**Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation**|Mauricio Rivera et.al.|[2401.08694v2](http://arxiv.org/abs/2401.08694v2)|null|\n", "2401.11506": "|**2024-01-21**|**Enhancing Recommendation Diversity by Re-ranking with Large Language Models**|Diego Carraro et.al.|[2401.11506v1](http://arxiv.org/abs/2401.11506v1)|null|\n", "2401.12794": "|**2024-04-25**|**Benchmarking LLMs via Uncertainty Quantification**|Fanghua Ye et.al.|[2401.12794v2](http://arxiv.org/abs/2401.12794v2)|**[link](https://github.com/smartyfh/llm-uncertainty-bench)**|\n", "2401.14016": "|**2024-05-30**|**Towards Uncertainty-Aware Language Agent**|Jiuzhou Han et.al.|[2401.14016v3](http://arxiv.org/abs/2401.14016v3)|null|\n", "2401.15077": "|**2024-02-04**|**EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty**|Yuhui Li et.al.|[2401.15077v2](http://arxiv.org/abs/2401.15077v2)|**[link](https://github.com/safeailab/eagle)**|\n", "2401.16458": "|**2024-01-29**|**Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending**|Mario Sanz-Guerrero et.al.|[2401.16458v1](http://arxiv.org/abs/2401.16458v1)|null|\n", "2402.00396": "|**2024-02-01**|**Efficient Exploration for LLMs**|Vikranth Dwaracherla et.al.|[2402.00396v1](http://arxiv.org/abs/2402.00396v1)|null|\n", "2402.00251": "|**2024-02-01**|**Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning**|Yao-Hung Hubert Tsai et.al.|[2402.00251v1](http://arxiv.org/abs/2402.00251v1)|null|\n", "2402.03284": "|**2024-02-05**|**Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models**|Anthony Sicilia et.al.|[2402.03284v1](http://arxiv.org/abs/2402.03284v1)|null|\n", "2402.03271": "|**2024-05-30**|**Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models**|Zhiyuan Hu et.al.|[2402.03271v2](http://arxiv.org/abs/2402.03271v2)|**[link](https://github.com/zhiyuanhubj/uot)**|\n", "2402.02392": "|**2024-02-04**|**DeLLMa: A Framework for Decision Making Under Uncertainty with Large Language Models**|Ollie Liu et.al.|[2402.02392v1](http://arxiv.org/abs/2402.02392v1)|null|\n", "2402.01968": "|**2024-02-03**|**A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions**|Hung Du et.al.|[2402.01968v1](http://arxiv.org/abs/2402.01968v1)|null|\n", "2402.03563": "|**2024-02-27**|**Distinguishing the Knowable from the Unknowable with Language Models**|Gustaf Ahdritz et.al.|[2402.03563v2](http://arxiv.org/abs/2402.03563v2)|**[link](https://github.com/gahdritz/llm_uncertainty)**|\n", "2402.03494": "|**2024-02-05**|**Beyond Text: Improving LLM's Decision Making for Robot Navigation via Vocal Cues**|Xingpeng Sun et.al.|[2402.03494v1](http://arxiv.org/abs/2402.03494v1)|null|\n", "2402.03366": "|**2024-01-31**|**Uncertainty-Aware Explainable Recommendation with Large Language Models**|Yicui Peng et.al.|[2402.03366v1](http://arxiv.org/abs/2402.03366v1)|null|\n", "2402.03349": "|**2024-01-25**|**When Geoscience Meets Generative AI and Large Language Models: Foundations, Trends, and Future Challenges**|Abdenour Hadid et.al.|[2402.03349v1](http://arxiv.org/abs/2402.03349v1)|null|\n", "2402.05015": "|**2024-05-28**|**A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?**|Agustinus Kristiadi et.al.|[2402.05015v2](http://arxiv.org/abs/2402.05015v2)|**[link](https://github.com/wiseodd/lapeft-bayesopt)**|\n", "2402.04957": "|**2024-02-07**|**Reconfidencing LLMs from the Grouping Loss Perspective**|Lihu Chen et.al.|[2402.04957v1](http://arxiv.org/abs/2402.04957v1)|null|\n", "2402.05457": "|**2024-02-08**|**It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition**|Chen Chen et.al.|[2402.05457v1](http://arxiv.org/abs/2402.05457v1)|null|\n", "2402.06529": "|**2024-06-04**|**Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity**|Kaiqu Liang et.al.|[2402.06529v3](http://arxiv.org/abs/2402.06529v3)|**[link](https://github.com/kevinliang888/IntroPlan)**|\n", "2402.05939": "|**2024-01-12**|**Uncertainty Awareness of Large Language Models Under Code Distribution Shifts: A Benchmark Study**|Yufei Li et.al.|[2402.05939v1](http://arxiv.org/abs/2402.05939v1)|**[link](https://github.com/yul091/llmuncertainty)**|\n", "2402.07470": "|**2024-02-16**|**Pushing The Limit of LLM Capacity for Text Classification**|Yazhou Zhang et.al.|[2402.07470v2](http://arxiv.org/abs/2402.07470v2)|null|\n", "2402.10189": "|**2024-03-28**|**Uncertainty Quantification for In-Context Learning of Large Language Models**|Chen Ling et.al.|[2402.10189v2](http://arxiv.org/abs/2402.10189v2)|**[link](https://github.com/lingchen0331/uq_icl)**|\n", "2402.09614": "|**2024-02-14**|**Probabilistic Reasoning in Generative Large Language Models**|Aliakbar Nafar et.al.|[2402.09614v1](http://arxiv.org/abs/2402.09614v1)|**[link](https://github.com/hlr/blind)**|\n", "2402.10767": "|**2024-02-16**|**Inference to the Best Explanation in Large Language Models**|Dhairya Dalal et.al.|[2402.10767v1](http://arxiv.org/abs/2402.10767v1)|null|\n", "2402.10573": "|**2024-02-27**|**LinkNER: Linking Local Named Entity Recognition Models to Large Language Models using Uncertainty**|Zhen Zhang et.al.|[2402.10573v2](http://arxiv.org/abs/2402.10573v2)|null|\n", "2402.12276": "|**2024-02-19**|**Explain then Rank: Scale Calibration of Neural Rankers Using Natural Language Explanations from Large Language Models**|Puxuan Yu et.al.|[2402.12276v1](http://arxiv.org/abs/2402.12276v1)|**[link](https://github.com/pxyu/llm-nle-for-calibration)**|\n", "2402.12264": "|**2024-02-19**|**Uncertainty quantification in fine-tuned LLMs using LoRA ensembles**|Oleksandr Balabanov et.al.|[2402.12264v1](http://arxiv.org/abs/2402.12264v1)|null|\n", "2402.11997": "|**2024-02-19**|**Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models**|Himanshu Beniwal et.al.|[2402.11997v1](http://arxiv.org/abs/2402.11997v1)|null|\n", "2402.11756": "|**2024-06-08**|**MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs**|Yavuz Faruk Bakman et.al.|[2402.11756v3](http://arxiv.org/abs/2402.11756v3)|**[link](https://github.com/ybakman/llm_uncertainity)**|\n", "2402.11406": "|**2024-02-26**|**Don't Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection**|Min Zhang et.al.|[2402.11406v2](http://arxiv.org/abs/2402.11406v2)|null|\n", "2402.11324": "|**2024-02-17**|**EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries**|Jiateng Liu et.al.|[2402.11324v1](http://arxiv.org/abs/2402.11324v1)|null|\n", "2402.11051": "|**2024-02-16**|**Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives**|Runcong Zhao et.al.|[2402.11051v1](http://arxiv.org/abs/2402.11051v1)|null|\n", "2402.11035": "|**2024-04-16**|**Retrieval-Augmented Generation: Is Dense Passage Retrieval Retrieving?**|Benjamin Reichman et.al.|[2402.11035v2](http://arxiv.org/abs/2402.11035v2)|null|\n", "2402.13210": "|**2024-02-20**|**Bayesian Reward Models for LLM Alignment**|Adam X. Yang et.al.|[2402.13210v1](http://arxiv.org/abs/2402.13210v1)|null|\n", "2402.13098": "|**2024-02-20**|**ELAD: Explanation-Guided Large Language Models Active Distillation**|Yifei Zhang et.al.|[2402.13098v1](http://arxiv.org/abs/2402.13098v1)|null|\n", "2401.16553": "|**2024-04-18**|**SelectLLM: Can LLMs Select Important Instructions to Annotate?**|Ritik Sachin Parkar et.al.|[2401.16553v5](http://arxiv.org/abs/2401.16553v5)|**[link](https://github.com/minnesotanlp/select-llm)**|\n", "2402.13606": "|**2024-02-21**|**A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models**|Boyang Xue et.al.|[2402.13606v1](http://arxiv.org/abs/2402.13606v1)|null|\n", "2402.14568": "|**2024-02-22**|**LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition**|Junjie Ye et.al.|[2402.14568v1](http://arxiv.org/abs/2402.14568v1)|null|\n", "2402.14259": "|**2024-02-22**|**Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond**|Zhiyuan Wang et.al.|[2402.14259v1](http://arxiv.org/abs/2402.14259v1)|null|\n", "2402.15368": "|**2024-02-23**|**Safe Task Planning for Language-Instructed Multi-Robot Systems using Conformal Prediction**|Jun Wang et.al.|[2402.15368v1](http://arxiv.org/abs/2402.15368v1)|null|\n", "2402.17641": "|**2024-06-06**|**Variational Learning is Effective for Large Deep Networks**|Yuesong Shen et.al.|[2402.17641v2](http://arxiv.org/abs/2402.17641v2)|**[link](https://github.com/team-approx-bayes/ivon)**|\n", "2402.16705": "|**2024-02-26**|**SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection**|Liangxin Liu et.al.|[2402.16705v1](http://arxiv.org/abs/2402.16705v1)|**[link](https://github.com/blue-raincoat/selectit)**|\n", "2402.18048": "|**2024-02-28**|**Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension**|Fan Yin et.al.|[2402.18048v1](http://arxiv.org/abs/2402.18048v1)|null|\n", "2402.17826": "|**2024-05-23**|**Prediction-Powered Ranking of Large Language Models**|Ivi Chatzi et.al.|[2402.17826v2](http://arxiv.org/abs/2402.17826v2)|**[link](https://github.com/networks-learning/prediction-powered-ranking)**|\n", "2402.19471": "|**2024-05-01**|**Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling**|Gabriel Grand et.al.|[2402.19471v2](http://arxiv.org/abs/2402.19471v2)|null|\n", "2403.01216": "|**2024-04-04**|**API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access**|Jiayuan Su et.al.|[2403.01216v2](http://arxiv.org/abs/2403.01216v2)|null|\n", "2403.01165": "|**2024-06-06**|**STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models**|Linhai Zhang et.al.|[2403.01165v2](http://arxiv.org/abs/2403.01165v2)|**[link](https://github.com/callanwu/star)**|\n", "2403.02509": "|**2024-03-04**|**SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models**|Xiang Gao et.al.|[2403.02509v1](http://arxiv.org/abs/2403.02509v1)|null|\n", "2403.01755": "|**2024-03-04**|**AI Language Models Could Both Help and Harm Equity in Marine Policymaking: The Case Study of the BBNJ Question-Answering Bot**|Matt Ziegler et.al.|[2403.01755v1](http://arxiv.org/abs/2403.01755v1)|null|\n", "2403.04696": "|**2024-06-06**|**Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification**|Ekaterina Fadeeva et.al.|[2403.04696v2](http://arxiv.org/abs/2403.04696v2)|**[link](https://github.com/iinemo/lm-polygraph)**|\n", "2403.04427": "|**2024-03-07**|**Sentiment-driven prediction of financial returns: a Bayesian-enhanced FinBERT approach**|Raffaele Giuseppe Cestari et.al.|[2403.04427v1](http://arxiv.org/abs/2403.04427v1)|null|\n", "2403.04024": "|**2024-03-06**|**Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification**|Ricardo Bigolin Lanfredi et.al.|[2403.04024v1](http://arxiv.org/abs/2403.04024v1)|**[link](https://github.com/rsummers11/CADLab)**|\n", "2403.05171": "|**2024-03-08**|**Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation**|Xiaoying Zhang et.al.|[2403.05171v1](http://arxiv.org/abs/2403.05171v1)|null|\n", "2403.07708": "|**2024-03-14**|**Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards**|Wei Shen et.al.|[2403.07708v2](http://arxiv.org/abs/2403.07708v2)|null|\n", "2403.08229": "|**2024-03-13**|**Boosting Disfluency Detection with Large Language Model as Disfluency Generator**|Zhenrong Cheng et.al.|[2403.08229v1](http://arxiv.org/abs/2403.08229v1)|null|\n", "2403.09599": "|**2024-03-14**|**Logical Discrete Graphical Models Must Supplement Large Language Models for Information Synthesis**|Gregory Coppola et.al.|[2403.09599v1](http://arxiv.org/abs/2403.09599v1)|null|\n", "2403.10854": "|**2024-03-16**|**A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment**|Tianhe Wu et.al.|[2403.10854v1](http://arxiv.org/abs/2403.10854v1)|**[link](https://github.com/tianhewu/mllms-for-iqa)**|\n", "2403.13198": "|**2024-03-19**|**Towards Robots That Know When They Need Help: Affordance-Based Uncertainty for Large Language Model Planners**|James F. Mullen Jr. et.al.|[2403.13198v1](http://arxiv.org/abs/2403.13198v1)|null|\n", "2403.16950": "|**2024-03-26**|**Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators**|Yinhong Liu et.al.|[2403.16950v2](http://arxiv.org/abs/2403.16950v2)|**[link](https://github.com/cambridgeltl/pairs)**|\n", "2403.19305": "|**2024-04-15**|**MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation**|Yu Li et.al.|[2403.19305v2](http://arxiv.org/abs/2403.19305v2)|**[link](https://github.com/kse-eleven/mateval)**|\n", "2403.20279": "|**2024-03-29**|**LUQ: Long-text Uncertainty Quantification for LLMs**|Caiqi Zhang et.al.|[2403.20279v1](http://arxiv.org/abs/2403.20279v1)|null|\n", "2404.01869": "|**2024-04-02**|**Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey**|Philipp Mondorf et.al.|[2404.01869v1](http://arxiv.org/abs/2404.01869v1)|null|\n", "2404.00589": "|**2024-04-12**|**Harnessing the Power of Large Language Model for Uncertainty Aware Graph Processing**|Zhenyu Qian et.al.|[2404.00589v2](http://arxiv.org/abs/2404.00589v2)|**[link](https://github.com/code4paper-2024/code4paper)**|\n", "2404.02650": "|**2024-04-03**|**Towards detecting unanticipated bias in Large Language Models**|Anna Kruspe et.al.|[2404.02650v1](http://arxiv.org/abs/2404.02650v1)|null|\n", "2404.02649": "|**2024-04-03**|**On the Importance of Uncertainty in Decision-Making with Large Language Models**|Nicol\u00f2 Felicioni et.al.|[2404.02649v1](http://arxiv.org/abs/2404.02649v1)|null|\n", "2404.04102": "|**2024-05-28**|**ROPO: Robust Preference Optimization for Large Language Models**|Xize Liang et.al.|[2404.04102v2](http://arxiv.org/abs/2404.04102v2)|null|\n", "2404.04287": "|**2024-04-04**|**CONFLARE: CONFormal LArge language model REtrieval**|Pouria Rouzrokh et.al.|[2404.04287v1](http://arxiv.org/abs/2404.04287v1)|**[link](https://github.com/mayo-radiology-informatics-lab/conflare)**|\n", "2404.06948": "|**2024-04-11**|**MetaCheckGPT -- A Multi-task Hallucination Detector Using LLM Uncertainty and Meta-models**|Rahul Mehta et.al.|[2404.06948v2](http://arxiv.org/abs/2404.06948v2)|null|\n", "2404.08517": "|**2024-04-12**|**Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward**|Xuan Xie et.al.|[2404.08517v1](http://arxiv.org/abs/2404.08517v1)|null|\n", "2404.09127": "|**2024-05-10**|**Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation**|Ruixin Yang et.al.|[2404.09127v3](http://arxiv.org/abs/2404.09127v3)|**[link](https://github.com/minnesotanlp/collaborative-calibration)**|\n", "2404.08846": "|**2024-05-31**|**Experimental Design for Active Transductive Inference in Large Language Models**|Subhojyoti Mukherjee et.al.|[2404.08846v2](http://arxiv.org/abs/2404.08846v2)|null|\n", "2404.10776": "|**2024-04-16**|**Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback**|Qiwei Di et.al.|[2404.10776v1](http://arxiv.org/abs/2404.10776v1)|null|\n", "2404.10315": "|**2024-04-16**|**Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience**|Haixia Han et.al.|[2404.10315v1](http://arxiv.org/abs/2404.10315v1)|null|\n", "2404.09866": "|**2024-04-15**|**Reimagining Self-Adaptation in the Age of Large Language Models**|Raghav Donakanti et.al.|[2404.09866v1](http://arxiv.org/abs/2404.09866v1)|null|\n", "2404.10960": "|**2024-04-16**|**Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations**|Christian Tomani et.al.|[2404.10960v1](http://arxiv.org/abs/2404.10960v1)|null|\n", "2404.12273": "|**2024-04-18**|**FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom**|Yuanqin He et.al.|[2404.12273v1](http://arxiv.org/abs/2404.12273v1)|null|\n", "2404.11835": "|**2024-05-19**|**CAUS: A Dataset for Question Generation based on Human Cognition Leveraging Large Language Models**|Minjung Shin et.al.|[2404.11835v2](http://arxiv.org/abs/2404.11835v2)|**[link](https://github.com/lbaa2022/CAUS_v1)**|\n", "2404.13409": "|**2024-04-20**|**\"I Wish There Were an AI\": Challenges and AI Potential in Cancer Patient-Provider Communication**|Ziqi Yang et.al.|[2404.13409v1](http://arxiv.org/abs/2404.13409v1)|null|\n", "2404.14547": "|**2024-04-22**|**Integrating Disambiguation and User Preferences into Large Language Models for Robot Motion Planning**|Mohammed Abugurain et.al.|[2404.14547v1](http://arxiv.org/abs/2404.14547v1)|null|\n", "2404.16557": "|**2024-04-25**|**Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples**|Kuofeng Gao et.al.|[2404.16557v1](http://arxiv.org/abs/2404.16557v1)|null|\n", "2404.15993": "|**2024-04-24**|**Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach**|Linyu Liu et.al.|[2404.15993v1](http://arxiv.org/abs/2404.15993v1)|null|\n", "2405.00623": "|**2024-05-15**|**\"I'm Not Sure, But...\": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust**|Sunnie S. Y. Kim et.al.|[2405.00623v2](http://arxiv.org/abs/2405.00623v2)|null|\n", "2405.00981": "|**2024-05-02**|**Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation**|David Eric Austin et.al.|[2405.00981v1](http://arxiv.org/abs/2405.00981v1)|null|\n", "2405.02134": "|**2024-05-03**|**Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection**|Guillem Ram\u00edrez et.al.|[2405.02134v1](http://arxiv.org/abs/2405.02134v1)|null|\n", "2405.01976": "|**2024-05-03**|**Conformal Prediction for Natural Language Processing: A Survey**|Margarida M. Campos et.al.|[2405.01976v1](http://arxiv.org/abs/2405.01976v1)|null|\n", "2405.01563": "|**2024-04-04**|**Mitigating LLM Hallucinations via Conformal Abstention**|Yasin Abbasi Yadkori et.al.|[2405.01563v1](http://arxiv.org/abs/2405.01563v1)|null|\n", "2405.03709": "|**2024-05-14**|**Generating Probabilistic Scenario Programs from Natural Language**|Karim Elmaaroufi et.al.|[2405.03709v2](http://arxiv.org/abs/2405.03709v2)|null|\n", "2405.06999": "|**2024-05-11**|**Large Language Model-aided Edge Learning in Distribution System State Estimation**|Renyou Xie et.al.|[2405.06999v1](http://arxiv.org/abs/2405.06999v1)|null|\n", "2405.06840": "|**2024-05-10**|**MEIC: Re-thinking RTL Debug Automation using LLMs**|Ke Xu et.al.|[2405.06840v1](http://arxiv.org/abs/2405.06840v1)|null|\n", "2405.12486": "|**2024-05-21**|**Time Matters: Enhancing Pre-trained News Recommendation Models with Robust User Dwell Time Injection**|Hao Jiang et.al.|[2405.12486v1](http://arxiv.org/abs/2405.12486v1)|null|\n", "2405.13907": "|**2024-06-16**|**Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries**|Adam Yang et.al.|[2405.13907v2](http://arxiv.org/abs/2405.13907v2)|null|\n", "2405.13845": "|**2024-05-25**|**Semantic Density: Uncertainty Quantification in Semantic Space for Large Language Models**|Xin Qiu et.al.|[2405.13845v2](http://arxiv.org/abs/2405.13845v2)|null|\n", "2405.13022": "|**2024-07-03**|**LLMs can learn self-restraint through iterative self-reflection**|Alexandre Pich\u00e9 et.al.|[2405.13022v2](http://arxiv.org/abs/2405.13022v2)|null|\n", "2405.15185": "|**2024-05-24**|**An Evaluation of Estimative Uncertainty in Large Language Models**|Zhisheng Tang et.al.|[2405.15185v1](http://arxiv.org/abs/2405.15185v1)|null|\n", "2405.15130": "|**2024-05-24**|**OptLLM: Optimal Assignment of Queries to Large Language Models**|Yueyue Liu et.al.|[2405.15130v1](http://arxiv.org/abs/2405.15130v1)|**[link](https://github.com/superyue72/OptLLM)**|\n", "2405.16908": "|**2024-05-27**|**Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?**|Gal Yona et.al.|[2405.16908v1](http://arxiv.org/abs/2405.16908v1)|null|\n", "2405.16436": "|**2024-05-26**|**Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer**|Zhihan Liu et.al.|[2405.16436v1](http://arxiv.org/abs/2405.16436v1)|null|\n", "2405.15784": "|**2024-04-28**|**CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval**|Yizhou Chi et.al.|[2405.15784v1](http://arxiv.org/abs/2405.15784v1)|null|\n", "2405.18208": "|**2024-05-28**|**A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models**|Chengxing Xie et.al.|[2405.18208v1](http://arxiv.org/abs/2405.18208v1)|null|\n", "2405.19320": "|**2024-07-05**|**Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF**|Shicong Cen et.al.|[2405.19320v3](http://arxiv.org/abs/2405.19320v3)|null|\n", "2405.18741": "|**2024-06-03**|**Genshin: General Shield for Natural Language Processing with Large Language Models**|Xiao Peng et.al.|[2405.18741v2](http://arxiv.org/abs/2405.18741v2)|null|\n", "2405.18638": "|**2024-05-28**|**ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models**|Aparna Elangovan et.al.|[2405.18638v1](http://arxiv.org/abs/2405.18638v1)|null|\n", "2405.20003": "|**2024-05-30**|**Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities**|Alexander Nikitin et.al.|[2405.20003v1](http://arxiv.org/abs/2405.20003v1)|null|\n", "2405.19946": "|**2024-05-30**|**Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf**|Xuanfa Jin et.al.|[2405.19946v1](http://arxiv.org/abs/2405.19946v1)|null|\n", "2405.19740": "|**2024-05-30**|**PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations**|Jiatong Li et.al.|[2405.19740v1](http://arxiv.org/abs/2405.19740v1)|null|\n", "2405.20974": "|**2024-06-05**|**SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales**|Tianyang Xu et.al.|[2405.20974v2](http://arxiv.org/abs/2405.20974v2)|**[link](https://github.com/xu1868/sayself)**|\n", "2405.20657": "|**2024-06-07**|**DORY: Deliberative Prompt Recovery for LLM**|Lirong Gao et.al.|[2405.20657v2](http://arxiv.org/abs/2405.20657v2)|null|\n", "2401.17244": "|**2024-06-02**|**LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation**|Yuan Chiang et.al.|[2401.17244v2](http://arxiv.org/abs/2401.17244v2)|**[link](https://github.com/chiang-yuan/llamp)**|\n", "2406.02543": "|**2024-07-17**|**To Believe or Not to Believe Your LLM**|Yasin Abbasi Yadkori et.al.|[2406.02543v2](http://arxiv.org/abs/2406.02543v2)|null|\n", "2406.02378": "|**2024-06-04**|**On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept**|Guangliang Liu et.al.|[2406.02378v1](http://arxiv.org/abs/2406.02378v1)|null|\n", "2406.01587": "|**2024-06-04**|**PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning**|Yupeng Zheng et.al.|[2406.01587v2](http://arxiv.org/abs/2406.01587v2)|null|\n", "2406.00974": "|**2024-06-03**|**Large Language Model Assisted Optimal Bidding of BESS in FCAS Market: An AI-agent based Approach**|Borui Zhang et.al.|[2406.00974v1](http://arxiv.org/abs/2406.00974v1)|null|\n", "2406.00793": "|**2024-06-02**|**Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective**|Fabian Falck et.al.|[2406.00793v1](http://arxiv.org/abs/2406.00793v1)|**[link](https://github.com/meta-inf/bayes_icl)**|\n", "2406.00430": "|**2024-06-01**|**Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners**|Zhi Zheng et.al.|[2406.00430v1](http://arxiv.org/abs/2406.00430v1)|null|\n", "2406.00380": "|**2024-06-01**|**The Best of Both Worlds: Toward an Honest and Helpful Large Language Model**|Chujie Gao et.al.|[2406.00380v1](http://arxiv.org/abs/2406.00380v1)|**[link](https://github.com/Flossiee/HonestyLLM)**|\n", "2406.00244": "|**2024-06-01**|**Controlling Large Language Model Agents with Entropic Activation Steering**|Nate Rahn et.al.|[2406.00244v1](http://arxiv.org/abs/2406.00244v1)|null|\n", "2406.03441": "|**2024-06-05**|**Cycles of Thought: Measuring LLM Confidence through Stable Explanations**|Evan Becker et.al.|[2406.03441v1](http://arxiv.org/abs/2406.03441v1)|null|\n", "2406.03158": "|**2024-06-05**|**CSS: Contrastive Semantic Similarity for Uncertainty Quantification of LLMs**|Shuang Ao et.al.|[2406.03158v1](http://arxiv.org/abs/2406.03158v1)|**[link](https://github.com/aoshuang92/css_uq_llms)**|\n", "2406.02764": "|**2024-06-04**|**Adaptive Preference Scaling for Reinforcement Learning with Human Feedback**|Ilgee Hong et.al.|[2406.02764v1](http://arxiv.org/abs/2406.02764v1)|null|\n", "2402.10500": "|**2024-06-05**|**Active Preference Optimization for Sample Efficient RLHF**|Nirjhar Das et.al.|[2402.10500v2](http://arxiv.org/abs/2402.10500v2)|**[link](https://github.com/nirjhar-das/active-preference-optimization)**|\n", "2406.04306": "|**2024-06-06**|**Semantically Diverse Language Generation for Uncertainty Estimation in Language Models**|Lukas Aichberger et.al.|[2406.04306v1](http://arxiv.org/abs/2406.04306v1)|**[link](https://github.com/ml-jku/SDLG)**|\n", "2405.00301": "|**2024-06-07**|**Enhanced Language Model Truthfulness with Learnable Intervention and Uncertainty Expression**|Farima Fatahi Bayat et.al.|[2405.00301v3](http://arxiv.org/abs/2405.00301v3)|**[link](https://github.com/launchnlp/lito)**|\n", "2406.04854": "|**2024-06-07**|**Uncertainty Aware Learning for Language Model Alignment**|Yikun Wang et.al.|[2406.04854v1](http://arxiv.org/abs/2406.04854v1)|null|\n", "2406.04370": "|**2024-06-01**|**Large Language Model Confidence Estimation via Black-Box Access**|Tejaswini Pedapati et.al.|[2406.04370v1](http://arxiv.org/abs/2406.04370v1)|null|\n", "2406.05972": "|**2024-06-10**|**Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context**|Jingru Jia et.al.|[2406.05972v1](http://arxiv.org/abs/2406.05972v1)|null|\n", "2406.05588": "|**2024-06-08**|**CERET: Cost-Effective Extrinsic Refinement for Text Generation**|Jason Cai et.al.|[2406.05588v1](http://arxiv.org/abs/2406.05588v1)|**[link](https://github.com/amazon-science/ceret-llm-refine)**|\n", "2406.05516": "|**2024-06-08**|**Verbalized Probabilistic Graphical Modeling with Large Language Models**|Hengguan Huang et.al.|[2406.05516v1](http://arxiv.org/abs/2406.05516v1)|null|\n", "2406.05322": "|**2024-06-08**|**Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from Imperfect Teacher Models in Low-Budget Scenarios**|Yuhang Zhou et.al.|[2406.05322v1](http://arxiv.org/abs/2406.05322v1)|null|\n", "2406.05213": "|**2024-06-07**|**On Subjective Uncertainty Quantification and Calibration in Natural Language Generation**|Ziyu Wang et.al.|[2406.05213v1](http://arxiv.org/abs/2406.05213v1)|null|\n", "2406.07212": "|**2024-07-03**|**Towards Human-AI Collaboration in Healthcare: Guided Deferral Systems with Large Language Models**|Joshua Strong et.al.|[2406.07212v2](http://arxiv.org/abs/2406.07212v2)|null|\n", "2406.08391": "|**2024-06-12**|**Large Language Models Must Be Taught to Know What They Don't Know**|Sanyam Kapoor et.al.|[2406.08391v1](http://arxiv.org/abs/2406.08391v1)|**[link](https://github.com/activatedgeek/calibration-tuning)**|\n", "2406.07735": "|**2024-06-11**|**REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy**|Haw-Shiuan Chang et.al.|[2406.07735v1](http://arxiv.org/abs/2406.07735v1)|null|\n", "2406.10099": "|**2024-06-14**|**Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning**|Jiaqi Li et.al.|[2406.10099v1](http://arxiv.org/abs/2406.10099v1)|null|\n", "2406.10023": "|**2024-06-14**|**Deep Bayesian Active Learning for Preference Modeling in Large Language Models**|Luckeciano C. Melo et.al.|[2406.10023v1](http://arxiv.org/abs/2406.10023v1)|**[link](https://github.com/luckeciano/bal-pm)**|\n", "2406.09864": "|**2024-06-14**|**LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data**|Grigor Bezirganyan et.al.|[2406.09864v1](http://arxiv.org/abs/2406.09864v1)|**[link](https://github.com/bezirganyan/luma)**|\n", "2406.11675": "|**2024-06-18**|**BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models**|Yibin Wang et.al.|[2406.11675v2](http://arxiv.org/abs/2406.11675v2)|null|\n", "2406.11657": "|**2024-06-17**|**Can LLM be a Personalized Judge?**|Yijiang River Dong et.al.|[2406.11657v1](http://arxiv.org/abs/2406.11657v1)|**[link](https://github.com/dong-river/personalized-judge)**|\n", "2406.11345": "|**2024-06-17**|**Full-ECE: A Metric For Token-level Calibration on Large Language Models**|Han Liu et.al.|[2406.11345v1](http://arxiv.org/abs/2406.11345v1)|null|\n", "2406.11278": "|**2024-06-17**|**Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs**|Duygu Nur Yaldiz et.al.|[2406.11278v1](http://arxiv.org/abs/2406.11278v1)|null|\n", "2406.11231": "|**2024-06-17**|**Enabling robots to follow abstract instructions and complete complex dynamic tasks**|Ruaridh Mon-Williams et.al.|[2406.11231v1](http://arxiv.org/abs/2406.11231v1)|null|\n", "2406.10958": "|**2024-06-18**|**City-LEO: Toward Transparent City Management Using LLM with End-to-End Optimization**|Zihao Jiao et.al.|[2406.10958v2](http://arxiv.org/abs/2406.10958v2)|null|\n", "2406.12784": "|**2024-06-18**|**UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions**|Xunzhi Wang et.al.|[2406.12784v1](http://arxiv.org/abs/2406.12784v1)|**[link](https://github.com/Cyno2232/UBENCH)**|\n", "2406.12628": "|**2024-06-18**|**Large Language Models based Multi-Agent Framework for Objective Oriented Control Design in Power Electronics**|Chenggang Cui et.al.|[2406.12628v1](http://arxiv.org/abs/2406.12628v1)|null|\n", "2406.12569": "|**2024-06-28**|**MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs**|Chi Ma et.al.|[2406.12569v2](http://arxiv.org/abs/2406.12569v2)|null|\n", "2406.12295": "|**2024-06-18**|**Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding**|Kaiyan Zhang et.al.|[2406.12295v1](http://arxiv.org/abs/2406.12295v1)|**[link](https://github.com/tsinghuac3i/fs-gen)**|\n", "2406.12114": "|**2024-06-17**|**Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation**|Hamidreza Rouzegar et.al.|[2406.12114v1](http://arxiv.org/abs/2406.12114v1)|**[link](https://github.com/hrouzegar/enhancing-text-classification-through-llm-driven-active-learning-and-human-annotation)**|\n", "2406.14986": "|**2024-07-02**|**Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers**|Manuel Mondal et.al.|[2406.14986v2](http://arxiv.org/abs/2406.14986v2)|null|\n", "2406.14979": "|**2024-06-21**|**Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation**|Yuanjie Lyu et.al.|[2406.14979v1](http://arxiv.org/abs/2406.14979v1)|**[link](https://github.com/haruhi-sudo/RPG)**|\n", "2406.16306": "|**2024-06-24**|**Cascade Reward Sampling for Efficient Decoding-Time Alignment**|Bolian Li et.al.|[2406.16306v1](http://arxiv.org/abs/2406.16306v1)|**[link](https://github.com/lblaoke/CARDS)**|\n", "2406.16254": "|**2024-06-24**|**Confidence Regulation Neurons in Language Models**|Alessandro Stolfo et.al.|[2406.16254v1](http://arxiv.org/abs/2406.16254v1)|null|\n", "2406.15927": "|**2024-06-22**|**Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs**|Jannik Kossen et.al.|[2406.15927v1](http://arxiv.org/abs/2406.15927v1)|null|\n", "2406.15627": "|**2024-06-21**|**Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph**|Roman Vashurin et.al.|[2406.15627v1](http://arxiv.org/abs/2406.15627v1)|null|\n", "2406.17274": "|**2024-06-25**|**Can We Trust the Performance Evaluation of Uncertainty Estimation Methods in Text Summarization?**|Jianfeng He et.al.|[2406.17274v1](http://arxiv.org/abs/2406.17274v1)|null|\n", "2406.19712": "|**2024-06-28**|**Uncertainty Quantification in Large Language Models Through Convex Hull Analysis**|Ferhat Ozgur Catak et.al.|[2406.19712v1](http://arxiv.org/abs/2406.19712v1)|null|\n", "2407.03282": "|**2024-07-03**|**LLM Internal States Reveal Hallucination Risk Faced With a Query**|Ziwei Ji et.al.|[2407.03282v1](http://arxiv.org/abs/2407.03282v1)|null|\n", "2407.02089": "|**2024-07-02**|**GPTCast: a weather language model for precipitation nowcasting**|Gabriele Franch et.al.|[2407.02089v1](http://arxiv.org/abs/2407.02089v1)|null|\n", "2407.01942": "|**2024-07-02**|**Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness**|Khyathi Raghavi Chandu et.al.|[2407.01942v1](http://arxiv.org/abs/2407.01942v1)|null|\n", "2407.01122": "|**2024-07-01**|**Calibrated Large Language Models for Binary Question Answering**|Patrizio Giovannotti et.al.|[2407.01122v1](http://arxiv.org/abs/2407.01122v1)|null|\n", "2407.00994": "|**2024-07-08**|**LLM Uncertainty Quantification through Directional Entailment Graph and Claim Level Response Augmentation**|Longchao Da et.al.|[2407.00994v2](http://arxiv.org/abs/2407.00994v2)|null|\n", "2407.00499": "|**2024-06-29**|**ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees**|Zhiyuan Wang et.al.|[2407.00499v1](http://arxiv.org/abs/2407.00499v1)|null|\n", "2407.03951": "|**2024-07-04**|**Uncertainty-Guided Optimization on Large Language Model Search Trees**|Julia Grosse et.al.|[2407.03951v1](http://arxiv.org/abs/2407.03951v1)|null|\n", "2407.06426": "|**2024-07-08**|**DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations**|Luke Yoffe et.al.|[2407.06426v1](http://arxiv.org/abs/2407.06426v1)|**[link](https://github.com/lukeyoffe/debunc)**|\n", "2407.06349": "|**2024-07-08**|**Large Language Model Recall Uncertainty is Modulated by the Fan Effect**|Jesse Roberts et.al.|[2407.06349v1](http://arxiv.org/abs/2407.06349v1)|null|\n", "2407.06129": "|**2024-07-09**|**Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization**|Hannah K. Bako et.al.|[2407.06129v2](http://arxiv.org/abs/2407.06129v2)|**[link](https://github.com/hdi-umd/semantic_profiling_llm_evaluation)**|\n", "2407.06071": "|**2024-07-08**|**From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty**|Maor Ivgi et.al.|[2407.06071v1](http://arxiv.org/abs/2407.06071v1)|**[link](https://github.com/mivg/fallbacks)**|\n", "2407.08662": "|**2024-07-11**|**Uncertainty Estimation of Large Language Models in Medical Question Answering**|Jiaxin Wu et.al.|[2407.08662v1](http://arxiv.org/abs/2407.08662v1)|null|\n", "2407.08642": "|**2024-07-11**|**Towards Building Specialized Generalist AI with System 1 and System 2 Fusion**|Kaiyan Zhang et.al.|[2407.08642v1](http://arxiv.org/abs/2407.08642v1)|null|\n", "2407.08940": "|**2024-07-15**|**Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation**|Biqing Qi et.al.|[2407.08940v2](http://arxiv.org/abs/2407.08940v2)|**[link](https://github.com/tsinghuac3i/llm4biohypogen)**|\n", "2407.10834": "|**2024-07-24**|**MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs**|Quang H. Nguyen et.al.|[2407.10834v2](http://arxiv.org/abs/2407.10834v2)|null|\n", "2407.11282": "|**2024-07-19**|**Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models**|Qingcheng Zeng et.al.|[2407.11282v3](http://arxiv.org/abs/2407.11282v3)|**[link](https://github.com/qcznlp/uncertainty_attack)**|\n", "2407.12927": "|**2024-07-17**|**Text- and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild**|Nicolas Richet et.al.|[2407.12927v1](http://arxiv.org/abs/2407.12927v1)|**[link](https://github.com/nicolas-richet/feature-vs-text-compound-emotion)**|\n", "2407.12850": "|**2024-07-08**|**Limits to Predicting Online Speech Using Large Language Models**|Mina Remeli et.al.|[2407.12850v1](http://arxiv.org/abs/2407.12850v1)|null|\n", "2407.12812": "|**2024-06-27**|**Building Understandable Messaging for Policy and Evidence Review (BUMPER) with AI**|Katherine A. Rosenfeld et.al.|[2407.12812v1](http://arxiv.org/abs/2407.12812v1)|null|\n", "2407.14573": "|**2024-07-21**|**Trading Devil Final: Backdoor attack via Stock market and Bayesian Optimization**|Orson Mengara et.al.|[2407.14573v1](http://arxiv.org/abs/2407.14573v1)|null|\n", "2407.14845": "|**2024-07-20**|**Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models**|Ze Yu Zhang et.al.|[2407.14845v1](http://arxiv.org/abs/2407.14845v1)|null|\n", "2407.14614": "|**2024-07-19**|**Evaluating language models as risk scores**|Andr\u00e9 F. Cruz et.al.|[2407.14614v1](http://arxiv.org/abs/2407.14614v1)|null|\n"}, "LLM - Perplexity": {"2311.13133": "|**2023-11-22**|**LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms**|Aditi Jha et.al.|[2311.13133v1](http://arxiv.org/abs/2311.13133v1)|null|\n", "2311.11509": "|**2024-02-18**|**Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information**|Zhengmian Hu et.al.|[2311.11509v3](http://arxiv.org/abs/2311.11509v3)|null|\n", "2311.10054": "|**2023-11-16**|**Is \"A Helpful Assistant\" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts**|Mingqian Zheng et.al.|[2311.10054v1](http://arxiv.org/abs/2311.10054v1)|**[link](https://github.com/jiaxin-pei/prompting-with-social-roles)**|\n", "2311.09090": "|**2024-02-19**|**Social Bias Probing: Fairness Benchmarking for Language Models**|Marta Marchiori Manerba et.al.|[2311.09090v2](http://arxiv.org/abs/2311.09090v2)|null|\n", "2311.08349": "|**2024-04-02**|**AI-generated text boundary detection with RoFT**|Laida Kushnareva et.al.|[2311.08349v2](http://arxiv.org/abs/2311.08349v2)|null|\n", "2311.07484": "|**2023-11-13**|**Psychometric Predictive Power of Large Language Models**|Tatsuki Kuribayashi et.al.|[2311.07484v1](http://arxiv.org/abs/2311.07484v1)|null|\n", "2311.04902": "|**2023-11-08**|**Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models**|Rocktim Jyoti Das et.al.|[2311.04902v1](http://arxiv.org/abs/2311.04902v1)|**[link](https://github.com/rocktimjyotidas/gblm-pruner)**|\n", "2311.04879": "|**2023-11-09**|**LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models**|Jianxin Yang et.al.|[2311.04879v2](http://arxiv.org/abs/2311.04879v2)|**[link](https://github.com/yangjianxin1/longqlora)**|\n", "2311.03084": "|**2023-11-08**|**A Simple yet Efficient Ensemble Approach for AI-generated Text Detection**|Harika Abburi et.al.|[2311.03084v2](http://arxiv.org/abs/2311.03084v2)|null|\n", "2311.01544": "|**2024-04-03**|**Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization**|Bj\u00f6rn Deiseroth et.al.|[2311.01544v3](http://arxiv.org/abs/2311.01544v3)|null|\n", "2310.17630": "|**2023-10-26**|**InstOptima: Evolutionary Multi-objective Instruction Optimization via Large Language Model-based Instruction Operators**|Heng Yang et.al.|[2310.17630v1](http://arxiv.org/abs/2310.17630v1)|**[link](https://github.com/yangheng95/instoptima)**|\n", "2310.15393": "|**2024-02-05**|**DoGE: Domain Reweighting with Generalization Estimation**|Simin Fan et.al.|[2310.15393v2](http://arxiv.org/abs/2310.15393v2)|null|\n", "2310.15389": "|**2023-10-23**|**Irreducible Curriculum for Language Model Pretraining**|Simin Fan et.al.|[2310.15389v1](http://arxiv.org/abs/2310.15389v1)|null|\n", "2310.15140": "|**2023-12-14**|**AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models**|Sicheng Zhu et.al.|[2310.15140v2](http://arxiv.org/abs/2310.15140v2)|null|\n", "2310.09930": "|**2023-10-15**|**FiLM: Fill-in Language Models for Any-Order Generation**|Tianxiao Shen et.al.|[2310.09930v1](http://arxiv.org/abs/2310.09930v1)|**[link](https://github.com/shentianxiao/film)**|\n", "2310.08920": "|**2023-10-13**|**Embarrassingly Simple Text Watermarks**|Ryoma Sato et.al.|[2310.08920v1](http://arxiv.org/abs/2310.08920v1)|null|\n", "2310.08915": "|**2024-02-26**|**Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs**|Yuxin Zhang et.al.|[2310.08915v3](http://arxiv.org/abs/2310.08915v3)|**[link](https://github.com/zyxxmu/dsnot)**|\n", "2310.07713": "|**2024-05-29**|**InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining**|Boxin Wang et.al.|[2310.07713v3](http://arxiv.org/abs/2310.07713v3)|**[link](https://github.com/NVIDIA/Megatron-LM)**|\n", "2310.05869": "|**2023-12-01**|**HyperAttention: Long-context Attention in Near-Linear Time**|Insu Han et.al.|[2310.05869v3](http://arxiv.org/abs/2310.05869v3)|**[link](https://github.com/insuhan/hyper-attn)**|\n", "2310.05175": "|**2024-05-06**|**Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity**|Lu Yin et.al.|[2310.05175v3](http://arxiv.org/abs/2310.05175v3)|**[link](https://github.com/luuyin/owl)**|\n", "2310.02842": "|**2023-10-05**|**Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation**|Chen Dun et.al.|[2310.02842v2](http://arxiv.org/abs/2310.02842v2)|null|\n", "2310.04451": "|**2024-03-20**|**AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models**|Xiaogeng Liu et.al.|[2310.04451v2](http://arxiv.org/abs/2310.04451v2)|**[link](https://github.com/sheltonliu-n/autodan)**|\n", "2310.01382": "|**2024-03-17**|**Compressing LLMs: The Truth is Rarely Pure and Never Simple**|Ajay Jaiswal et.al.|[2310.01382v2](http://arxiv.org/abs/2310.01382v2)|**[link](https://github.com/vita-group/llm-kick)**|\n", "2310.00867": "|**2023-10-14**|**(Dynamic) Prompting might be all you need to repair Compressed LLMs**|Duc N. M Hoang et.al.|[2310.00867v2](http://arxiv.org/abs/2310.00867v2)|null|\n", "2309.14021": "|**2023-09-25**|**LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression**|Ayush Kaushal et.al.|[2309.14021v1](http://arxiv.org/abs/2309.14021v1)|null|\n", "2309.10677": "|**2023-09-27**|**Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model Evaluation**|Yucheng Li et.al.|[2309.10677v2](http://arxiv.org/abs/2309.10677v2)|**[link](https://github.com/liyucheng09/contamination_detector)**|\n", "2309.09507": "|**2023-10-10**|**Pruning Large Language Models via Accuracy Predictor**|Yupeng Ji et.al.|[2309.09507v2](http://arxiv.org/abs/2309.09507v2)|null|\n", "2309.06126": "|**2023-09-12**|**AstroLLaMA: Towards Specialized Foundation Models in Astronomy**|Tuan Dung Nguyen et.al.|[2309.06126v1](http://arxiv.org/abs/2309.06126v1)|null|\n", "2309.04564": "|**2023-09-08**|**When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale**|Max Marion et.al.|[2309.04564v1](http://arxiv.org/abs/2309.04564v1)|null|\n", "2309.01885": "|**2023-12-01**|**QuantEase: Optimization-based Quantization for Language Models**|Kayhan Behdin et.al.|[2309.01885v2](http://arxiv.org/abs/2309.01885v2)|null|\n", "2309.00614": "|**2023-09-04**|**Baseline Defenses for Adversarial Attacks Against Aligned Language Models**|Neel Jain et.al.|[2309.00614v2](http://arxiv.org/abs/2309.00614v2)|null|\n", "2308.14132": "|**2023-11-07**|**Detecting Language Model Attacks with Perplexity**|Gabriel Alon et.al.|[2308.14132v3](http://arxiv.org/abs/2308.14132v3)|null|\n", "2309.00638": "|**2023-08-23**|**Generative AI for End-to-End Limit Order Book Modelling: A Token-Level Autoregressive Generative Model of Message Flow Using a Deep State Space Network**|Peer Nagy et.al.|[2309.00638v1](http://arxiv.org/abs/2309.00638v1)|null|\n", "2308.10882": "|**2023-08-21**|**Giraffe: Adventures in Expanding Context Lengths in LLMs**|Arka Pal et.al.|[2308.10882v1](http://arxiv.org/abs/2308.10882v1)|**[link](https://github.com/abacusai/long-context)**|\n", "2308.04014": "|**2023-09-06**|**Continual Pre-Training of Large Language Models: How to (re)warm your model?**|Kshitij Gupta et.al.|[2308.04014v2](http://arxiv.org/abs/2308.04014v2)|**[link](https://github.com/eleutherai/gpt-neox)**|\n", "2307.15504": "|**2024-01-08**|**Exploring Format Consistency for Instruction Tuning**|Shihao Liang et.al.|[2307.15504v2](http://arxiv.org/abs/2307.15504v2)|**[link](https://github.com/thunlp/unifiedinstructiontuning)**|\n", "2307.11991": "|**2023-09-01**|**Psy-LLM: Scaling up Global Mental Health Psychological Services with AI-based Large Language Models**|Tin Lai et.al.|[2307.11991v2](http://arxiv.org/abs/2307.11991v2)|null|\n", "2306.17439": "|**2023-10-13**|**Provable Robust Watermarking for AI-Generated Text**|Xuandong Zhao et.al.|[2306.17439v2](http://arxiv.org/abs/2306.17439v2)|**[link](https://github.com/xuandongzhao/unigram-watermark)**|\n", "2306.07629": "|**2024-02-05**|**SqueezeLLM: Dense-and-Sparse Quantization**|Sehoon Kim et.al.|[2306.07629v3](http://arxiv.org/abs/2306.07629v3)|**[link](https://github.com/squeezeailab/squeezellm)**|\n", "2306.07486": "|**2023-06-13**|**Knowledge-Prompted Estimator: A Novel Approach to Explainable Machine Translation Assessment**|Hao Yang et.al.|[2306.07486v1](http://arxiv.org/abs/2306.07486v1)|null|\n", "2306.03078": "|**2023-06-05**|**SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression**|Tim Dettmers et.al.|[2306.03078v1](http://arxiv.org/abs/2306.03078v1)|**[link](https://github.com/vahe1994/spqr)**|\n", "2305.18226": "|**2023-06-07**|**HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis**|Christoforos Vasilatos et.al.|[2305.18226v2](http://arxiv.org/abs/2305.18226v2)|null|\n", "2305.15004": "|**2023-11-03**|**LLMDet: A Third Party Large Language Models Generated Text Detection Tool**|Kangxi Wu et.al.|[2305.15004v3](http://arxiv.org/abs/2305.15004v3)|**[link](https://github.com/trustedllm/llmdet)**|\n", "2305.14864": "|**2023-11-19**|**How To Train Your (Compressed) Large Language Model**|Ananya Harsh Jha et.al.|[2305.14864v2](http://arxiv.org/abs/2305.14864v2)|null|\n", "2305.14726": "|**2023-11-27**|**In-Context Demonstration Selection with Cross Entropy Difference**|Dan Iter et.al.|[2305.14726v2](http://arxiv.org/abs/2305.14726v2)|**[link](https://github.com/microsoft/lmops)**|\n", "2305.13999": "|**2023-10-24**|**Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model**|Zeyu Leo Liu et.al.|[2305.13999v3](http://arxiv.org/abs/2305.13999v3)|null|\n", "2305.13862": "|**2023-08-29**|**A Trip Towards Fairness: Bias and De-Biasing in Large Language Models**|Leonardo Ranaldi et.al.|[2305.13862v2](http://arxiv.org/abs/2305.13862v2)|null|\n", "2305.11759": "|**2023-05-19**|**Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning**|Mustafa Safa Ozdayi et.al.|[2305.11759v1](http://arxiv.org/abs/2305.11759v1)|**[link](https://github.com/amazon-science/controlling-llm-memorization)**|\n", "2304.11567": "|**2023-04-23**|**Differentiate ChatGPT-generated and Human-written Medical Texts**|Wenxiong Liao et.al.|[2304.11567v1](http://arxiv.org/abs/2304.11567v1)|null|\n", "2302.10879": "|**2023-02-21**|**$k$NN-Adapter: Efficient Domain Adaptation for Black-Box Language Models**|Yangsibo Huang et.al.|[2302.10879v1](http://arxiv.org/abs/2302.10879v1)|null|\n", "2212.10440": "|**2022-12-20**|**Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data**|Tim Jansen et.al.|[2212.10440v1](http://arxiv.org/abs/2212.10440v1)|null|\n", "2212.10378": "|**2023-05-24**|**Data Curation Alone Can Stabilize In-context Learning**|Ting-Yun Chang et.al.|[2212.10378v2](http://arxiv.org/abs/2212.10378v2)|**[link](https://github.com/terarachang/dataicl)**|\n", "2210.11689": "|**2022-10-21**|**SLING: Sino Linguistic Evaluation of Large Language Models**|Yixiao Song et.al.|[2210.11689v1](http://arxiv.org/abs/2210.11689v1)|**[link](https://github.com/yixiao-song/sling_data_code)**|\n", "2208.03306": "|**2022-08-05**|**Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models**|Margaret Li et.al.|[2208.03306v1](http://arxiv.org/abs/2208.03306v1)|**[link](https://github.com/hadasah/btm)**|\n", "2207.06814": "|**2022-07-14**|**BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling**|Javier de la Rosa et.al.|[2207.06814v1](http://arxiv.org/abs/2207.06814v1)|null|\n", "2205.01863": "|**2022-06-23**|**Provably Confidential Language Modelling**|Xuandong Zhao et.al.|[2205.01863v2](http://arxiv.org/abs/2205.01863v2)|**[link](https://github.com/xuandongzhao/crt)**|\n", "2106.08181": "|**2021-08-03**|**Direction is what you need: Improving Word Embedding Compression in Large Language Models**|Klaudia Ba\u0142azy et.al.|[2106.08181v2](http://arxiv.org/abs/2106.08181v2)|**[link](https://github.com/MohammadrezaBanaei/orientation_based_embedding_compression)**|\n", "2106.04279": "|**2021-06-08**|**Staircase Attention for Recurrent Processing of Sequences**|Da Ju et.al.|[2106.04279v1](http://arxiv.org/abs/2106.04279v1)|null|\n", "2102.12459": "|**2021-09-15**|**When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute**|Tao Lei et.al.|[2102.12459v3](http://arxiv.org/abs/2102.12459v3)|**[link](https://github.com/asappresearch/sru)**|\n", "2004.11714": "|**2020-04-22**|**Residual Energy-Based Models for Text Generation**|Yuntian Deng et.al.|[2004.11714v1](http://arxiv.org/abs/2004.11714v1)|null|\n", "2001.08896": "|**2020-11-17**|**Compressing Language Models using Doped Kronecker Products**|Urmish Thakker et.al.|[2001.08896v5](http://arxiv.org/abs/2001.08896v5)|null|\n", "1909.08053": "|**2020-03-13**|**Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism**|Mohammad Shoeybi et.al.|[1909.08053v4](http://arxiv.org/abs/1909.08053v4)|**[link](https://github.com/NVIDIA/Megatron-LM)**|\n", "2312.00960": "|**2023-12-01**|**The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**|Satya Sai Srinath Namburi et.al.|[2312.00960v1](http://arxiv.org/abs/2312.00960v1)|**[link](https://github.com/namburisrinath/llmcompression)**|\n", "2312.02406": "|**2023-12-09**|**Efficient Online Data Mixing For Language Model Pre-Training**|Alon Albalak et.al.|[2312.02406v2](http://arxiv.org/abs/2312.02406v2)|null|\n", "2312.02382": "|**2023-12-04**|**New Evaluation Metrics Capture Quality Degradation due to LLM Watermarking**|Karanpartap Singh et.al.|[2312.02382v1](http://arxiv.org/abs/2312.02382v1)|null|\n", "2312.10302": "|**2024-06-03**|**One-Shot Learning as Instruction Data Prospector for Large Language Models**|Yunshui Li et.al.|[2312.10302v4](http://arxiv.org/abs/2312.10302v4)|**[link](https://github.com/pldlgb/nuggets)**|\n", "2312.09300": "|**2023-12-14**|**Self-Evaluation Improves Selective Generation in Large Language Models**|Jie Ren et.al.|[2312.09300v1](http://arxiv.org/abs/2312.09300v1)|null|\n", "2312.12006": "|**2023-12-19**|**Can ChatGPT be Your Personal Medical Assistant?**|Md. Rafiul Biswas et.al.|[2312.12006v1](http://arxiv.org/abs/2312.12006v1)|null|\n", "2201.03327": "|**2023-12-20**|**Latency Adjustable Transformer Encoder for Language Understanding**|Sajjad Kachuee et.al.|[2201.03327v7](http://arxiv.org/abs/2201.03327v7)|null|\n", "2312.17296": "|**2024-04-29**|**Structured Packing in LLM Training Improves Long Context Utilization**|Konrad Staniszewski et.al.|[2312.17296v6](http://arxiv.org/abs/2312.17296v6)|null|\n", "2401.06118": "|**2024-02-06**|**Extreme Compression of Large Language Models via Additive Quantization**|Vage Egiazarian et.al.|[2401.06118v2](http://arxiv.org/abs/2401.06118v2)|**[link](https://github.com/vahe1994/aqlm)**|\n", "2401.06088": "|**2024-01-11**|**Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models**|K M Sajjadul Islam et.al.|[2401.06088v1](http://arxiv.org/abs/2401.06088v1)|null|\n", "2401.08491": "|**2024-01-24**|**Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models**|Tassilo Klein et.al.|[2401.08491v2](http://arxiv.org/abs/2401.08491v2)|null|\n", "2401.13927": "|**2024-06-09**|**Adaptive Text Watermark for Large Language Models**|Yepeng Liu et.al.|[2401.13927v2](http://arxiv.org/abs/2401.13927v2)|**[link](https://github.com/yepengliu/adaptive-text-watermark)**|\n", "2401.16380": "|**2024-01-29**|**Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling**|Pratyush Maini et.al.|[2401.16380v1](http://arxiv.org/abs/2401.16380v1)|null|\n", "2401.17505": "|**2024-07-24**|**Arrows of Time for Large Language Models**|Vassilis Papadopoulos et.al.|[2401.17505v4](http://arxiv.org/abs/2401.17505v4)|**[link](https://github.com/frotaur/icmlbackperp)**|\n", "2401.17377": "|**2024-04-04**|**Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens**|Jiacheng Liu et.al.|[2401.17377v3](http://arxiv.org/abs/2401.17377v3)|null|\n", "2402.01093": "|**2024-02-02**|**Specialized Language Models with Cheap Inference from Limited Domain Data**|David Grangier et.al.|[2402.01093v1](http://arxiv.org/abs/2402.01093v1)|null|\n", "2402.03303": "|**2024-02-05**|**Nevermind: Instruction Override and Moderation in Large Language Models**|Edward Kim et.al.|[2402.03303v1](http://arxiv.org/abs/2402.03303v1)|null|\n", "2402.03009": "|**2024-02-05**|**UniMem: Towards a Unified View of Long-Context Large Language Models**|Junjie Fang et.al.|[2402.03009v1](http://arxiv.org/abs/2402.03009v1)|null|\n", "2402.04347": "|**2024-02-06**|**The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry**|Michael Zhang et.al.|[2402.04347v1](http://arxiv.org/abs/2402.04347v1)|null|\n", "2402.04291": "|**2024-05-15**|**BiLLM: Pushing the Limit of Post-Training Quantization for LLMs**|Wei Huang et.al.|[2402.04291v2](http://arxiv.org/abs/2402.04291v2)|**[link](https://github.com/aaronhuang-778/billm)**|\n", "2402.09363": "|**2024-06-04**|**Copyright Traps for Large Language Models**|Matthieu Meeus et.al.|[2402.09363v2](http://arxiv.org/abs/2402.09363v2)|**[link](https://github.com/computationalprivacy/copyright-traps)**|\n", "2402.09759": "|**2024-02-15**|**Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for Polish**|Szymon Ruci\u0144ski et.al.|[2402.09759v1](http://arxiv.org/abs/2402.09759v1)|null|\n", "2402.09656": "|**2024-06-05**|**The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse**|Wanli Yang et.al.|[2402.09656v4](http://arxiv.org/abs/2402.09656v4)|**[link](https://github.com/wanliyoung/collapse-in-model-editing)**|\n", "2402.12261": "|**2024-06-05**|**NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms**|Jonathan Zheng et.al.|[2402.12261v3](http://arxiv.org/abs/2402.12261v3)|null|\n", "2402.11960": "|**2024-02-19**|**DB-LLM: Accurate Dual-Binarization for Efficient LLMs**|Hong Chen et.al.|[2402.11960v1](http://arxiv.org/abs/2402.11960v1)|null|\n", "2402.11218": "|**2024-05-24**|**Controlled Text Generation for Large Language Model with Dynamic Attribute Graphs**|Xun Liang et.al.|[2402.11218v2](http://arxiv.org/abs/2402.11218v2)|**[link](https://github.com/iaar-shanghai/datg)**|\n", "2402.12847": "|**2024-05-26**|**Instruction-tuned Language Models are Better Knowledge Learners**|Zhengbao Jiang et.al.|[2402.12847v2](http://arxiv.org/abs/2402.12847v2)|**[link](https://github.com/edward-sun/pit)**|\n", "2402.13449": "|**2024-02-21**|**CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory**|Zexue He et.al.|[2402.13449v1](http://arxiv.org/abs/2402.13449v1)|null|\n", "2402.14866": "|**2024-04-16**|**APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models**|Ziyi Guan et.al.|[2402.14866v2](http://arxiv.org/abs/2402.14866v2)|null|\n", "2402.14848": "|**2024-02-19**|**Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models**|Mosh Levy et.al.|[2402.14848v1](http://arxiv.org/abs/2402.14848v1)|**[link](https://github.com/alonj/Same-Task-More-Tokens)**|\n", "2402.16006": "|**2024-06-04**|**ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings**|Hao Wang et.al.|[2402.16006v2](http://arxiv.org/abs/2402.16006v2)|null|\n", "2402.17764": "|**2024-02-27**|**The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits**|Shuming Ma et.al.|[2402.17764v1](http://arxiv.org/abs/2402.17764v1)|null|\n", "2402.16775": "|**2024-06-06**|**A Comprehensive Evaluation of Quantization Strategies for Large Language Models**|Renren Jin et.al.|[2402.16775v2](http://arxiv.org/abs/2402.16775v2)|null|\n", "2402.19421": "|**2024-02-29**|**Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines**|Lijia Ma et.al.|[2402.19421v1](http://arxiv.org/abs/2402.19421v1)|null|\n", "2308.16137": "|**2024-03-09**|**LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models**|Chi Han et.al.|[2308.16137v6](http://arxiv.org/abs/2308.16137v6)|null|\n", "2310.07240": "|**2024-04-30**|**CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving**|Yuhan Liu et.al.|[2310.07240v5](http://arxiv.org/abs/2310.07240v5)|**[link](https://github.com/uchi-jcl/cachegen)**|\n", "2403.12544": "|**2024-03-19**|**AffineQuant: Affine Transformation Quantization for Large Language Models**|Yuexiao Ma et.al.|[2403.12544v1](http://arxiv.org/abs/2403.12544v1)|**[link](https://github.com/bytedance/affinequant)**|\n", "2403.13027": "|**2024-03-19**|**Towards Better Statistical Understanding of Watermarking LLMs**|Zhongze Cai et.al.|[2403.13027v1](http://arxiv.org/abs/2403.13027v1)|**[link](https://github.com/zhongzecai/dualga)**|\n", "2403.16038": "|**2024-04-18**|**Monotonic Paraphrasing Improves Generalization of Language Model Prompting**|Qin Liu et.al.|[2403.16038v2](http://arxiv.org/abs/2403.16038v2)|null|\n", "2403.15747": "|**2024-03-23**|**CodeShell Technical Report**|Rui Xie et.al.|[2403.15747v1](http://arxiv.org/abs/2403.15747v1)|null|\n", "2404.02060": "|**2024-06-12**|**Long-context LLMs Struggle with Long In-context Learning**|Tianle Li et.al.|[2404.02060v3](http://arxiv.org/abs/2404.02060v3)|**[link](https://github.com/tiger-ai-lab/longiclbench)**|\n", "2404.01892": "|**2024-04-02**|**Minimize Quantization Output Error with Bias Compensation**|Cheng Gong et.al.|[2404.01892v1](http://arxiv.org/abs/2404.01892v1)|**[link](https://github.com/gongcheng1919/bias-compensation)**|\n", "2404.01147": "|**2024-04-01**|**Do LLMs Find Human Answers To Fact-Driven Questions Perplexing? A Case Study on Reddit**|Parker Seegmiller et.al.|[2404.01147v1](http://arxiv.org/abs/2404.01147v1)|null|\n", "2404.02837": "|**2024-04-03**|**Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models**|Wanyun Cui et.al.|[2404.02837v1](http://arxiv.org/abs/2404.02837v1)|null|\n", "2404.03626": "|**2024-04-04**|**Training LLMs over Neurally Compressed Text**|Brian Lester et.al.|[2404.03626v1](http://arxiv.org/abs/2404.03626v1)|null|\n", "2404.06634": "|**2024-04-09**|**Perplexed: Understanding When Large Language Models are Confused**|Nathan Cooper et.al.|[2404.06634v1](http://arxiv.org/abs/2404.06634v1)|null|\n", "2404.09695": "|**2024-04-15**|**LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models**|Guangyan Li et.al.|[2404.09695v1](http://arxiv.org/abs/2404.09695v1)|null|\n", "2404.11531": "|**2024-04-17**|**Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization**|Costas Mavromatis et.al.|[2404.11531v1](http://arxiv.org/abs/2404.11531v1)|**[link](https://github.com/cmavro/packllm)**|\n", "2404.13033": "|**2024-04-19**|**Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs**|Biyang Guo et.al.|[2404.13033v1](http://arxiv.org/abs/2404.13033v1)|**[link](https://github.com/beyondguo/llm-tuning)**|\n", "2404.15196": "|**2024-07-12**|**Setting up the Data Printer with Improved English to Ukrainian Machine Translation**|Yurii Paniv et.al.|[2404.15196v2](http://arxiv.org/abs/2404.15196v2)|**[link](https://github.com/lang-uk/dragoman)**|\n", "2404.17120": "|**2024-04-29**|**Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs**|Valeriia Cherepanova et.al.|[2404.17120v2](http://arxiv.org/abs/2404.17120v2)|null|\n", "2404.16873": "|**2024-04-21**|**AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs**|Anselm Paulus et.al.|[2404.16873v1](http://arxiv.org/abs/2404.16873v1)|**[link](https://github.com/facebookresearch/advprompter)**|\n", "2404.18824": "|**2024-04-29**|**Benchmarking Benchmark Leakage in Large Language Models**|Ruijie Xu et.al.|[2404.18824v1](http://arxiv.org/abs/2404.18824v1)|**[link](https://github.com/gair-nlp/benbench)**|\n", "2401.10862": "|**2024-04-29**|**Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning**|Adib Hasan et.al.|[2401.10862v2](http://arxiv.org/abs/2401.10862v2)|**[link](https://github.com/crystaleye42/eval-safety)**|\n", "2405.00204": "|**2024-04-30**|**General Purpose Verification for Chain of Thought Prompting**|Robert Vacareanu et.al.|[2405.00204v1](http://arxiv.org/abs/2405.00204v1)|null|\n", "2405.01724": "|**2024-05-02**|**Large Language Models are Inconsistent and Biased Evaluators**|Rickard Stureborg et.al.|[2405.01724v1](http://arxiv.org/abs/2405.01724v1)|null|\n", "2405.05647": "|**2024-05-09**|**Letter to the Editor: What are the legal and ethical considerations of submitting radiology reports to ChatGPT?**|Siddharth Agarwal et.al.|[2405.05647v1](http://arxiv.org/abs/2405.05647v1)|null|\n", "2405.06105": "|**2024-05-09**|**Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding?**|Yutong Hu et.al.|[2405.06105v1](http://arxiv.org/abs/2405.06105v1)|null|\n", "2405.07745": "|**2024-05-13**|**LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language**|Cagri Toraman et.al.|[2405.07745v1](http://arxiv.org/abs/2405.07745v1)|**[link](https://github.com/metunlp/llamaturk)**|\n", "2405.07194": "|**2024-05-12**|**Differentiable Model Scaling using Differentiable Topk**|Kai Liu et.al.|[2405.07194v1](http://arxiv.org/abs/2405.07194v1)|**[link](https://github.com/LKJacky/Differentiable-Model-Scaling)**|\n", "2405.07135": "|**2024-05-12**|**Combining multiple post-training techniques to achieve most efficient quantized LLMs**|Sayeh Sharify et.al.|[2405.07135v1](http://arxiv.org/abs/2405.07135v1)|null|\n", "2405.04065": "|**2024-05-16**|**FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference**|Runheng Liu et.al.|[2405.04065v3](http://arxiv.org/abs/2405.04065v3)|null|\n", "2405.11029": "|**2024-05-17**|**Generative Artificial Intelligence: A Systematic Review and Applications**|Sandeep Singh Sengar et.al.|[2405.11029v1](http://arxiv.org/abs/2405.11029v1)|null|\n", "2402.12170": "|**2024-05-23**|**Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction**|Kuniaki Saito et.al.|[2402.12170v2](http://arxiv.org/abs/2402.12170v2)|null|\n", "2405.15346": "|**2024-05-24**|**BiSup: Bidirectional Quantization Error Suppression for Large Language Models**|Minghui Zou et.al.|[2405.15346v1](http://arxiv.org/abs/2405.15346v1)|null|\n", "2405.14917": "|**2024-05-23**|**SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models**|Wei Huang et.al.|[2405.14917v1](http://arxiv.org/abs/2405.14917v1)|**[link](https://github.com/Aaronhuang-778/SliM-LLM)**|\n", "2405.17264": "|**2024-05-27**|**On the Noise Robustness of In-Context Learning for Text Generation**|Hongfu Gao et.al.|[2405.17264v1](http://arxiv.org/abs/2405.17264v1)|null|\n", "2405.17915": "|**2024-05-28**|**Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models**|Longze Chen et.al.|[2405.17915v1](http://arxiv.org/abs/2405.17915v1)|**[link](https://github.com/October2001/ProLong)**|\n", "2405.18719": "|**2024-05-30**|**Contextual Position Encoding: Learning to Count What's Important**|Olga Golovneva et.al.|[2405.18719v2](http://arxiv.org/abs/2405.18719v2)|null|\n", "2402.09025": "|**2024-07-19**|**SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks**|Jiwon Song et.al.|[2402.09025v5](http://arxiv.org/abs/2402.09025v5)|**[link](https://github.com/jiwonsong-dev/sleb)**|\n", "2405.19358": "|**2024-05-31**|**Robustifying Safety-Aligned Large Language Models through Clean Data Curation**|Xiaoqun Liu et.al.|[2405.19358v2](http://arxiv.org/abs/2405.19358v2)|null|\n", "2402.11192": "|**2024-06-01**|**I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses**|Xuan Ren et.al.|[2402.11192v2](http://arxiv.org/abs/2402.11192v2)|null|\n", "2406.01943": "|**2024-06-04**|**Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs**|Nik Bear Brown et.al.|[2406.01943v1](http://arxiv.org/abs/2406.01943v1)|null|\n", "2406.01931": "|**2024-06-05**|**Dishonesty in Helpful and Harmless Alignment**|Youcheng Huang et.al.|[2406.01931v2](http://arxiv.org/abs/2406.01931v2)|null|\n", "2406.01333": "|**2024-06-03**|**Probing Language Models for Pre-training Data Detection**|Zhenhua Liu et.al.|[2406.01333v1](http://arxiv.org/abs/2406.01333v1)|**[link](https://github.com/zhliu0106/probing-lm-data)**|\n", "2406.05981": "|**2024-06-11**|**ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization**|Haoran You et.al.|[2406.05981v2](http://arxiv.org/abs/2406.05981v2)|**[link](https://github.com/gatech-eic/shiftaddllm)**|\n", "2406.05678": "|**2024-06-09**|**SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models**|Hengyu Zhang et.al.|[2406.05678v1](http://arxiv.org/abs/2406.05678v1)|**[link](https://github.com/dexter-gt-86/sinklora)**|\n", "2406.07368": "|**2024-06-11**|**When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models**|Haoran You et.al.|[2406.07368v1](http://arxiv.org/abs/2406.07368v1)|**[link](https://github.com/gatech-eic/linearized-llm)**|\n", "2406.07177": "|**2024-06-11**|**TernaryLLM: Ternarized Large Language Model**|Tianqi Chen et.al.|[2406.07177v1](http://arxiv.org/abs/2406.07177v1)|null|\n", "2406.07831": "|**2024-06-12**|**ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models**|Xiang Meng et.al.|[2406.07831v1](http://arxiv.org/abs/2406.07831v1)|null|\n", "2406.09008": "|**2024-06-13**|**LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models**|Xiaohao Yang et.al.|[2406.09008v1](http://arxiv.org/abs/2406.09008v1)|null|\n", "2406.11473": "|**2024-07-10**|**Promises, Outlooks and Challenges of Diffusion Language Modeling**|Justin Deschenaux et.al.|[2406.11473v2](http://arxiv.org/abs/2406.11473v2)|null|\n", "2406.11190": "|**2024-06-17**|**Aligning Large Language Models from Self-Reference AI Feedback with one General Principle**|Rong Bao et.al.|[2406.11190v1](http://arxiv.org/abs/2406.11190v1)|**[link](https://github.com/rbao2018/self_ref_feedback)**|\n", "2406.11162": "|**2024-06-26**|**How Good are LLMs at Relation Extraction under Low-Resource Scenario? Comprehensive Evaluation**|Dawulie Jinensibieke et.al.|[2406.11162v2](http://arxiv.org/abs/2406.11162v2)|**[link](https://github.com/victor812-hub/entity_datasets)**|\n", "2406.10594": "|**2024-06-20**|**BlockPruner: Fine-grained Pruning for Large Language Models**|Longguang Zhong et.al.|[2406.10594v2](http://arxiv.org/abs/2406.10594v2)|**[link](https://github.com/MrGGLS/BlockPruner)**|\n", "2406.10576": "|**2024-06-15**|**Optimization-based Structural Pruning for Large Language Models without Back-Propagation**|Yuan Gao et.al.|[2406.10576v1](http://arxiv.org/abs/2406.10576v1)|null|\n", "2406.10269": "|**2024-06-11**|**Markov Constraint as Large Language Model Surrogate**|Alexandre Bonlarron et.al.|[2406.10269v1](http://arxiv.org/abs/2406.10269v1)|null|\n", "2402.10738": "|**2024-06-16**|**Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning**|Yinpeng Liu et.al.|[2402.10738v2](http://arxiv.org/abs/2402.10738v2)|**[link](https://github.com/61peng/curri_learning)**|\n", "2406.12018": "|**2024-06-17**|**CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling**|Yu Bai et.al.|[2406.12018v1](http://arxiv.org/abs/2406.12018v1)|null|\n", "2406.16450": "|**2024-06-24**|**Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers**|Xiuying Wei et.al.|[2406.16450v1](http://arxiv.org/abs/2406.16450v1)|**[link](https://github.com/claire-labo/structuredffn)**|\n", "2406.15524": "|**2024-06-21**|**Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization**|Sungbin Shin et.al.|[2406.15524v1](http://arxiv.org/abs/2406.15524v1)|null|\n", "2406.15473": "|**2024-06-15**|**Intertwining CP and NLP: The Generation of Unreasonably Constrained Sentences**|Alexandre Bonlarron et.al.|[2406.15473v1](http://arxiv.org/abs/2406.15473v1)|null|\n", "2406.17542": "|**2024-06-26**|**CDQuant: Accurate Post-training Weight Quantization of Large Pre-trained Models using Greedy Coordinate Descent**|Pranav Ajit Nair et.al.|[2406.17542v2](http://arxiv.org/abs/2406.17542v2)|null|\n", "2406.17296": "|**2024-06-25**|**BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks**|Amrutha Varshini Ramesh et.al.|[2406.17296v1](http://arxiv.org/abs/2406.17296v1)|null|\n", "2406.17253": "|**2024-06-25**|**How Well Can Knowledge Edit Methods Edit Perplexing Knowledge?**|Huaizhi Ge et.al.|[2406.17253v1](http://arxiv.org/abs/2406.17253v1)|null|\n", "2406.18382": "|**2024-07-02**|**Adversarial Search Engine Optimization for Large Language Models**|Fredrik Nestaas et.al.|[2406.18382v2](http://arxiv.org/abs/2406.18382v2)|null|\n", "2406.17808": "|**2024-06-24**|**Training-Free Exponential Extension of Sliding Window Context with Cascading KV Cache**|Jeffrey Willette et.al.|[2406.17808v1](http://arxiv.org/abs/2406.17808v1)|null|\n", "2406.19234": "|**2024-06-27**|**Seeing Is Believing: Black-Box Membership Inference Attacks Against Retrieval Augmented Generation**|Yuying Li et.al.|[2406.19234v1](http://arxiv.org/abs/2406.19234v1)|null|\n", "2407.02891": "|**2024-07-03**|**GPTQT: Quantize Large Language Models Twice to Push the Efficiency**|Yipin Guo et.al.|[2407.02891v1](http://arxiv.org/abs/2407.02891v1)|null|\n", "2407.02659": "|**2024-07-02**|**Ensuring Responsible Sourcing of Large Language Model Training Data Through Knowledge Graph Comparison**|Devam Mondal et.al.|[2407.02659v1](http://arxiv.org/abs/2407.02659v1)|null|\n", "2407.00102": "|**2024-06-27**|**Curriculum Learning with Quality-Driven Data Selection**|Biao Wu et.al.|[2407.00102v1](http://arxiv.org/abs/2407.00102v1)|null|\n", "2311.09325": "|**2024-07-03**|**Temperature-scaling surprisal estimates improve fit to human reading times -- but does it do so for the \"right reasons\"?**|Tong Liu et.al.|[2311.09325v2](http://arxiv.org/abs/2311.09325v2)|**[link](https://github.com/TongLiu-github/TemperatureSaling4RTs)**|\n", "2407.07093": "|**2024-07-09**|**FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation**|Liqun Ma et.al.|[2407.07093v1](http://arxiv.org/abs/2407.07093v1)|**[link](https://github.com/liqunma/fbi-llm)**|\n", "2407.06917": "|**2024-07-09**|**Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models**|Zara Siddique et.al.|[2407.06917v1](http://arxiv.org/abs/2407.06917v1)|**[link](https://github.com/groovychoons/GlobalBias)**|\n", "2407.06654": "|**2024-07-09**|**SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training**|Nan He et.al.|[2407.06654v1](http://arxiv.org/abs/2407.06654v1)|null|\n", "2407.06411": "|**2024-07-08**|**If You Don't Understand It, Don't Use It: Eliminating Trojans with Filters Between Layers**|Adriano Hernandez et.al.|[2407.06411v1](http://arxiv.org/abs/2407.06411v1)|null|\n", "2407.05734": "|**2024-07-08**|**Empirical Study of Symmetrical Reasoning in Conversational Chatbots**|Daniela N. Rim et.al.|[2407.05734v1](http://arxiv.org/abs/2407.05734v1)|null|\n", "2407.05483": "|**2024-07-07**|**Just read twice: closing the recall gap for recurrent language models**|Simran Arora et.al.|[2407.05483v1](http://arxiv.org/abs/2407.05483v1)|**[link](https://github.com/HazyResearch/prefix-linear-attention)**|\n", "2407.04965": "|**2024-07-10**|**Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression**|Zhichao Xu et.al.|[2407.04965v2](http://arxiv.org/abs/2407.04965v2)|**[link](https://github.com/zhichaoxu-shufe/beyond-perplexity-compression-safety-eval)**|\n", "2407.04752": "|**2024-07-05**|**SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking**|Xingrun Xing et.al.|[2407.04752v1](http://arxiv.org/abs/2407.04752v1)|null|\n", "2407.08152": "|**2024-07-11**|**Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language Models**|Aydin Abadi et.al.|[2407.08152v1](http://arxiv.org/abs/2407.08152v1)|**[link](https://github.com/vdasu/deduplication)**|\n", "2407.09447": "|**2024-07-12**|**ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts**|Amelia F. Hardy et.al.|[2407.09447v1](http://arxiv.org/abs/2407.09447v1)|**[link](https://github.com/sisl/astprompter)**|\n", "2407.09722": "|**2024-07-12**|**Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference**|Zongyue Qin et.al.|[2407.09722v1](http://arxiv.org/abs/2407.09722v1)|null|\n", "2407.12824": "|**2024-07-02**|**Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models**|Xavier Suau et.al.|[2407.12824v1](http://arxiv.org/abs/2407.12824v1)|null|\n", "2407.16166": "|**2024-07-23**|**Robust Privacy Amidst Innovation with Large Language Models Through a Critical Assessment of the Risks**|Yao-Shun Chuang et.al.|[2407.16166v1](http://arxiv.org/abs/2407.16166v1)|null|\n", "2407.15857": "|**2024-07-08**|**BoRA: Bayesian Hierarchical Low-Rank Adaption for Multi-task Large Language Models**|Simen Eide et.al.|[2407.15857v1](http://arxiv.org/abs/2407.15857v1)|null|\n"}}