Lecturer
Phokion G.Kolaitis （University of California Santa Cruz & IBM Research）
Time
Beijing Time: 9:50 am ~12:15 am
Venue
on Tsinghua campus: to be announced
Zoom: to be announced
Course Description:
During the past fifty years, first-order logic and its variants have been successfully used as a database query language. In traditional applications, the data at hand are assumed to be unambiguous and consistent, which implies that queries (specified in some logical formalism) are posed on a well-defined complete and consistent database. In more recent applications, however, the data may be incomplete, inconsistent, or uncertain. This state of affairs necessitates the introduction and study of the certain answers of database queries, which is an alternative semantics that takes the incompleteness, inconsistency, or uncertainty of the data into account. Intuitively, the certain answers to a query are obtained by first evaluating the query on each “completion” of the given incomplete database and then aggregating the answers obtained over all these “completions”. The aim of this course is to examine the semantics and the algorithmic aspects of the certain answers to queries in four different such contexts, namely, in the contexts of data exchange and integration, inconsistent databases, probabilistic databases, and voting with partial preferences.
Course Outline
Lecture 1: Overview of Database Query Languages and Computational Complexity
The relational data model; first-order logic as a database query language; conjunctive queries and unions of conjunctive queries; the query evaluation problem; data complexity and combined complexity; algorithmic aspects of conjunctive query evaluation.
Lecture 2: Schema Mappings, Data Exchange and Integration, Certain Answers
Database dependencies and schema mappings; solutions and universal solutions in data exchange and data integration; certain answers and possible answers of queries in data exchange and data integration; algorithmic aspects of the certain answers of conjunctive queries in data exchange and data integration.
Lecture 3: Inconsistent Databases and Consistent Answers
Managing inconsistency in databases using logic; integrity constraints and database repairs; consistent answers of queries; trichotomy theorems for the computational complexity of the consistent answers of conjunctive queries.
Lecture 4: Probabilistic Databases and Query Answering
Modeling uncertainty in data using probabilistic databases; the tuple-independence model for probabilistic databases; semantics of queries on probabilistic databases; dichotomy theorems for the computational complexity of conjunctive-query evaluation on probabilistic databases.
Lecture 5: Voting with Partial Preferences, Necessary and Possible Winners
Social choice and voting rules; modeling incomplete information about voting preferences using partial orders; necessary winners and possible winners; enriching voting with relational database context; necessary answers and possible answers to queries on election databases, and their computational complexity.
Background Knowledge
The material covered in this course is in the interface between mathematical logic, algorithms, computational complexity, and relational databases. The goal is to make this material accessible to a broad audience, but some familiarity with at least some of the following topics is desirable:
Mathematical Logic: Familiarity with the syntax and the semantics of first-order logic. For example, Chapter 2 of the textbook A Mathematical Introduction to Logic by H.B. Enderton is a very good source.
Algorithms and Computational Complexity: Familiarity with the notions of the running time of algorithms, worst-case complexity, and a basic knowledge of NP-completeness. For example, Chapters 1 and 2 of the monograph Computers and Intractability: A Guide to the Theory of NP-Completeness by M.R. Carey and D.S. Johnson is a very good source.
Relational Databases: Familiarity with the notion of the relational data model, database queries, and database query languages. For example, Chapter 3 of the textbook Database Systems Concepts by A. Silberschatz, H.F. Korth, and S. Sudarshan is a very good source (but so is any other standard textbook on relational databases).
Slides
Home Assignments
(Submit to haoxuan_yin@foxmail.com)
- The first assignment (due: before the class on Wednesday, June 29th) solution
- The second assignment (due: before the class on Thursday, June 30th)
- The third assignment (due: before the class on Friday, July 1st)
- Take-home exam (due: before 12:00 pm of Sunday, July 3rd)
General References
- Foundations of Databases by S. Abiteboul, R. Hull, V. Vianu, Addison-Wesley
The standard reference for database theory. Free copy available at http://webdam.inria.fr/Alice/
- Computers and Intractability: A Guide to the Theory of NP-Completeness by M.R. Garey and D.S Johnson, W. H. Freeman 1979.
The by-now classical introduction to NP-completeness.
- Computational Complexity: A Modern Approach by S. Arora and B. Barak, Cambridge University Press
A comprehensive textbook on modern computational complexity. Free copy of an almost final draft available at https://theory.cs.princeton.edu/complexity/book.pdf
- Foundations of Data Exchange by M. Arenas, P. Barceló, L. Libkin, F. Murlak, Cambridge University Press
A comprehensive overview of data exchange and query answering in the context of data exchange.
- Data Exchange, Integration, and Streams, edited by Ph.G. Kolaitis, M. Lenzerini, and N.Schweikardt, Dagstuhl Follow-Ups 5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2013.
An open-access collection of chapters with topics on data exchange, data integration, and data streams.
- Database Repairing and Consistent Query Answering by L. Bertossi, Synthesis Lectures on Data Management, Morgan & Claypool Publishers
A comprehensive survey of query answering over inconsistent databases.
- Probabilistic Databases, by D. Suciu, D. Olteanu, C. Ré, C. Koch, Synthesis Lectures on Data Management, Morgan & Claypool Publishers
A comprehensive survey of query answering over probabilistic databases.
- Handbook of Computational Social Choice by F. Brandt, V. Conitzer, U. Endriss, J. Lang, A.D. Procaccia, Cambridge University Press.
The standard reference for comprehensive surveys of topics in computational social choice; Chapter 1 and Chapter 10 are the most relevant for this course.