[Advances in Logic and Artificial Intelligence] 18th September, 2025:
Towards Logical and Causal Reasoning of Large Language Models
Speaker: Haoxuan Li (Peking University)
Time: 16:00-17:30, 18 September 2025
Abstract:
Large language models (LLMs) have achieved remarkable successes in various natural language tasks, but still have significant limitations to their logical and causal reasoning abilities. In this talk, we first comprehensively introduce the most cutting-edge LLM logical reasoning approaches with a proposed new taxonomy. Specifically, to accurately answer complex logic questions, previous methods can be categorized based on reliance on external solvers, prompts, and fine-tuning. To avoid logical contradictions, we discuss concepts and solutions of various logical consistencies, including implication, negation, transitivity, factuality consistencies, and their composites. Secondly, we discuss the benefits of introducing causality into LLM reasoning, in which the key insight is that correlation does not necessarily imply causation. For example, there is high ice cream sales and crime rates in summer, but this does not indicate that ice cream sales have a causal influence on crime rates. We conclude that logical rules can be regarded as the causal invariance of LLM reasoning based on natural language examples.
=====
Speaker Bio: Haoxuan Li is an assistant researcher at Peking University, also as research fellow at Tsinghua-UvA Joint Research Center for Logic and the University of Oxford. He graduated from the experimental class for gifted children in Beijing No.8 Middle School, which enables him to pursue his PhD at the age of 19. His research interests include causal inference and logical reasoning of large language models, and has more than 50 publications as the first author or the corresponding author appeared in top-tier CCF-A conferences, reported by MIT Technology Review and CAAI. Moreover, he is supported by the Young Scientists Fund of the National Natural Science Foundation of China (¥300,000) and Young Elite Scientists Sponsorship Program by CAST – Doctoral Student Special Plan (via CCF). He has been selected as the 2024 Peking University Person of the Year and representative of National Scholarship reported by People’s Daily.
[TALK] 18th May, 2025:
Developing And Assessing Language Models For Logical Reasoning Over Natural Language
Speaker: Qiming Bao (University of Auckland)
Time: 10:00 AM, 18 May 2025
Abstract: Recent advancements in AI have highlighted the importance of integrating deep learning with symbolic logic reasoning. Language models such as RoBERTa, DeBERTa, LLaMA, Alpaca, Vicuna, GPT-3.5, and GPT-4 have advanced the performance of AI systems in various natural language processing tasks to human-like levels. However, the generalization of language models in logical reasoning remains underexplored. One of the main reasons is the limitation posed by the lack of extensive, balanced, and real-world datasets for logical reasoning. This presentation has three research objectives, addressing the main research gap/limitation:
To improve the models’ out-of-distribution performance on multi-step logical reasoning tasks through logic-driven data augmentation.
To enhance the models’ performance on real-world logical reasoning datasets by constructing an Abstract Meaning Representation based logic-driven data augmentation method.
Although large language models demonstrate impressive performance on current logical reasoning leaderboards, it remains underexplored whether they truly possess strong capabilities in logical reasoning.
The first part of the presentation focuses on improving language models’ ability in multi-step logical reasoning, particularly when faced with unbalanced reasoning steps. Inspired by DeepLogic, we present IMA-GloVe-GA, an RNN-based model with a gate attention mechanism, developed to accommodate varying reasoning depths. This is facilitated by our PARARULE-Plus dataset, created for deeper reasoning tasks. Our results show notable enhancements in model performance under both standard and out-of-distribution conditions.
The second part of the presentation focuses on generating diverse training data to address the scarcity of real-world logical reasoning datasets and enhance large language models (LLMs) for logical reasoning tasks. We introduce AMR-LDA, a data augmentation method that converts text into Abstract Meaning Representation (AMR) graphs, improving reasoning datasets. This approach benefits various models, including GPT-3.5 and GPT-4, and improves performance, notably achieving the top rank on the ReClor leaderboard.
The third part of the presentation examines how Large Language Models (LLMs) like GPT-3.5 and GPT-4 respond to trivial changes in logical reasoning datasets. We created ReClor-plus, LogiQA-plus, and LogiQAv2-plus, which include shuffled options and modified correct choices to test LLMs’ logical reasoning. Although LLMs excel on standard datasets, they exhibit degraded performance with these modified versions. Our findings reveal that incorporating task variations, perturbations in training sets, and logic-driven data augmentation significantly enhances LLMs’ generalisation and robustness in logical reasoning scenarios.
This presentation explores several different approaches to demonstrate a more robust QA system that aids computers in thinking and reasoning over natural language texts through logical reasoning. Our methods have been evaluated and now lead the public logical reasoning leaderboard, ReClor. We are the first group in the world to have scored above 90% on the ReClor hidden test set.
About the speaker: Qiming Bao is a Ph.D. graduated from the Strong AI Lab, NAOInstitute, University of Auckland, New Zealand, supervised by Professor Michael Witbrock and Associate Professor Jiamou Liu. His research interests include natural language processing and reasoning. He has over five years of research and development experience, and has published several papers in top conferences in the fields of AI/NLP/Reasoning, including ACL, AAAI, IJCAI, ICLR, EACL, LLM@IJCAI, AGI@ICLR and IJCLR-NeSy. His method named AMR-LDA (GPT-4 + AMR-LDA Prompt Augmentation) has achieved the #1 ranking on a one of the most challenged logical reasoning reading comprehension leaderboards (ReClor) and we are the first group scored above 90% on the hidden test set around the world. Two of his logical reasoning datasets called PARARULE-Plus and AbductionRules have been collected by LogiTorch, ReasoningNLP, Prompt4ReasoningPapers, OpenAI/Evals, A Survey on Evaluation of Large Language Models and Reasoning Language Models: A Blueprint. Qiming has given public guest talks and academic visit at Microsoft Research Asia, Samsung AI Center Cambridge UK, IEEE Vehicular Technology Society, ZJU-NLP Group, Zhejiang University, The University of Melbourne, Institute of Automation, Chinese Academy of Sciences, Shenzhen MSU-BIT University, University of Massachusetts – Amherst and Penn State University on his main research topic, “Natural Language Processing and Reasoning”.