Tsinghua University – University of Amsterdam Joint Research Centre for Logic
Tsinghua University – University of Amsterdam Joint Research Centre for Logic

Events

[Advances in Logic and Artificial Intelligence] 26th March, 2026:

Resolution Chain-of-Thought for LLM Symbolic Reasoning

Speaker: Yixiang Chen (East China Normal University)

Time: 16:00-17:30, 26 March 2026

Abstract:

Large language models still struggle with complex logical reasoning. Numerous studies have explored ways to strengthen their inference skills, broadly grouped into solver-based, prompt-based, and fine-tuning approaches. Among these, prompting techniques improve LLMs by explicitly modeling reasoning chains like Chain-of-Thought (CoT), Tree-of-Thought (ToT), by acquiring symbolic expressions such as SymbCoT and by adaptive selection of Symbolic Languages (SL).
Building on this line of work, we introduce bidirectional reasoning into the improved method and implement an automated reasoning process based on the generation of large language models through the design of prompt words. Technically, Bi-Resolution first converts the natural language problem into first-order logic formulation, and selects the corresponding version of resolution algorithm. During resolution, bidirectional reasoning guides constraint instantiation to prune redundant clauses and reduce complexity. Bi-Resolution enables the model to judge statements that are “neither fully true nor fully false” more accurately. Experimental results show that our method successfully improves the logical inference accuracy of large language models.

=====

Speaker Bio: Professor Yixiang Chen, a professor at the School of Software Engineering, East China Normal University, currently serves as the first Chair of the Artificial Intelligence Logic Committee of the Chinese Association for Artificial Intelligence, the first Chair of the Trusted Intelligent Systems Committee of the Shanghai Association for Artificial Intelligence. He is engaged in foundational and engineering research on the trustworthiness of artificial intelligence. He has established the spatio-temporally consistent intelligent system specification language STeC and its hybrid clock logic system, designed technical methods for the optimized hardware and software design of intelligent systems, and developed multidimensional attribute-based software trustworthiness measurement, evaluation methods, and enhancement specifications.

[Advances in Logic and Artificial Intelligence, lectures] 26th February, 6th March, 13th March, 2026:

Probabilistic Causal Models

Speaker: Hanti Lin

Time: 9:50-12:15, 27 February 2026

Algorithms for Causal Learning

Speaker: Hanti Lin

Time: 9:50-12:15, 6 March 2026

Probabilistic Causal Models

Speaker: Hanti Lin

Time: 9:50-12:15, 13 March 2026

=====

Speaker Bio: Hanti Lin is a philosopher of science and formal epistemologist, with papers published in philosophy as well as theoretical computer science. Before he joined UC Davis, he was a postdoc at the Australian National University.

[Advances in Logic and Artificial Intelligence] 18th September, 2025:

Towards Logical and Causal Reasoning of Large Language Models

Speaker: Haoxuan Li (Peking University)

Time: 16:00-17:30, 18 September 2025

Abstract:

Large language models (LLMs) have achieved remarkable successes in various natural language tasks, but still have significant limitations to their logical and causal reasoning abilities. In this talk, we first comprehensively introduce the most cutting-edge LLM logical reasoning approaches with a proposed new taxonomy. Specifically, to accurately answer complex logic questions, previous methods can be categorized based on reliance on external solvers, prompts, and fine-tuning. To avoid logical contradictions, we discuss concepts and solutions of various logical consistencies, including implication, negation, transitivity, factuality consistencies, and their composites. Secondly, we discuss the benefits of introducing causality into LLM reasoning, in which the key insight is that correlation does not necessarily imply causation. For example, there is high ice cream sales and crime rates in summer, but this does not indicate that ice cream sales have a causal influence on crime rates. We conclude that logical rules can be regarded as the causal invariance of LLM reasoning based on natural language examples.

=====

Speaker Bio: Haoxuan Li is an assistant researcher at Peking University, also as research fellow at Tsinghua-UvA Joint Research Center for Logic and the University of Oxford. He graduated from the experimental class for gifted children in Beijing No.8 Middle School, which enables him to pursue his PhD at the age of 19. His research interests include causal inference and logical reasoning of large language models, and has more than 50 publications as the first author or the corresponding author appeared in top-tier CCF-A conferences, reported by MIT Technology Review and CAAI. Moreover, he is supported by the Young Scientists Fund of the National Natural Science Foundation of China (¥300,000) and Young Elite Scientists Sponsorship Program by CAST – Doctoral Student Special Plan (via CCF). He has been selected as the 2024 Peking University Person of the Year and representative of National Scholarship reported by People’s Daily.

[TALK] 18th May, 2025:

Developing And Assessing Language Models For Logical Reasoning Over Natural Language

Speaker: Qiming Bao (University of Auckland)

Time: 10:00 AM, 18 May 2025

Abstract: Recent advancements in AI have highlighted the importance of integrating deep learning with symbolic logic reasoning. Language models such as RoBERTa, DeBERTa, LLaMA, Alpaca, Vicuna, GPT-3.5, and GPT-4 have advanced the performance of AI systems in various natural language processing tasks to human-like levels. However, the generalization of language models in logical reasoning remains underexplored. One of the main reasons is the limitation posed by the lack of extensive, balanced, and real-world datasets for logical reasoning. This presentation has three research objectives, addressing the main research gap/limitation:
To improve the models’ out-of-distribution performance on multi-step logical reasoning tasks through logic-driven data augmentation.
To enhance the models’ performance on real-world logical reasoning datasets by constructing an Abstract Meaning Representation based logic-driven data augmentation method.
Although large language models demonstrate impressive performance on current logical reasoning leaderboards, it remains underexplored whether they truly possess strong capabilities in logical reasoning.
The first part of the presentation focuses on improving language models’ ability in multi-step logical reasoning, particularly when faced with unbalanced reasoning steps. Inspired by DeepLogic, we present IMA-GloVe-GA, an RNN-based model with a gate attention mechanism, developed to accommodate varying reasoning depths. This is facilitated by our PARARULE-Plus dataset, created for deeper reasoning tasks. Our results show notable enhancements in model performance under both standard and out-of-distribution conditions.
The second part of the presentation focuses on generating diverse training data to address the scarcity of real-world logical reasoning datasets and enhance large language models (LLMs) for logical reasoning tasks. We introduce AMR-LDA, a data augmentation method that converts text into Abstract Meaning Representation (AMR) graphs, improving reasoning datasets. This approach benefits various models, including GPT-3.5 and GPT-4, and improves performance, notably achieving the top rank on the ReClor leaderboard.
The third part of the presentation examines how Large Language Models (LLMs) like GPT-3.5 and GPT-4 respond to trivial changes in logical reasoning datasets. We created ReClor-plus, LogiQA-plus, and LogiQAv2-plus, which include shuffled options and modified correct choices to test LLMs’ logical reasoning. Although LLMs excel on standard datasets, they exhibit degraded performance with these modified versions. Our findings reveal that incorporating task variations, perturbations in training sets, and logic-driven data augmentation significantly enhances LLMs’ generalisation and robustness in logical reasoning scenarios.
This presentation explores several different approaches to demonstrate a more robust QA system that aids computers in thinking and reasoning over natural language texts through logical reasoning. Our methods have been evaluated and now lead the public logical reasoning leaderboard, ReClor. We are the first group in the world to have scored above 90% on the ReClor hidden test set.

About the speaker: Qiming Bao is a Ph.D. graduated from the Strong AI Lab, NAOInstitute, University of Auckland, New Zealand, supervised by Professor Michael Witbrock and Associate Professor Jiamou Liu. His research interests include natural language processing and reasoning. He has over five years of research and development experience, and has published several papers in top conferences in the fields of AI/NLP/Reasoning, including ACL, AAAI, IJCAI, ICLR, EACL, LLM@IJCAI, AGI@ICLR and IJCLR-NeSy. His method named AMR-LDA (GPT-4 + AMR-LDA Prompt Augmentation) has achieved the #1 ranking on a one of the most challenged logical reasoning reading comprehension leaderboards (ReClor) and we are the first group scored above 90% on the hidden test set around the world. Two of his logical reasoning datasets called PARARULE-Plus and AbductionRules have been collected by LogiTorch, ReasoningNLP, Prompt4ReasoningPapers, OpenAI/Evals, A Survey on Evaluation of Large Language Models and Reasoning Language Models: A Blueprint. Qiming has given public guest talks and academic visit at Microsoft Research Asia, Samsung AI Center Cambridge UK, IEEE Vehicular Technology Society, ZJU-NLP Group, Zhejiang University, The University of Melbourne, Institute of Automation, Chinese Academy of Sciences, Shenzhen MSU-BIT University, University of Massachusetts – Amherst and Penn State University on his main research topic, “Natural Language Processing and Reasoning”.