Prof. Dan Suciu

University of Washington

"Querying Probabilistic Data"

(Vortrag im Rahmen der "Distinguished Lecture Series" des "Max Planck Instituts für Software-Systeme")


A major challenge in data management is how to manage uncertain data. Many reasons for the uncertainty exists: the data may be extracted automatically from text, it may be derived from the physical world such as RFID data, it may be integrated using fuzzy matches, or may be the result of complex stochastic models. Whatever the reason for the uncertainty, a data management system needs to offer predictable performance to queries over such data.

In this talk I will address a fundamental problem in probabilistic databases: given a query, what is the computational complexity of evaluating it over probabilistic databases? Probabilistic inference is known to be hard in general, but once we fix a query, it becomes a specialized problem. I will show that Unions of Conjunctive Queries (also known as non-recursive datalog rules) admit a dichotomy: every query is either provably #P hard, or can be evaluated in PTIME. For practicaly purposes, the most interesting part of this dichotomy is the PTIME algorithm. It uses in a fundamental way the Mobius' inversion formula on finite lattices (which is the inclusion-exclusion formula plus term cancellation), and, because of that, it can perform probabilistic inference in PTIME on classes of Boolean expressions where other established methods fail, including OBDDs, FBDDs, inference based on bounded tree widths, or d-DNNF's.

Bio: Dan Suciu is a Professor in Computer Science at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, then was a principal member of the technical staff at AT&T Labs until he joined the University of Washington in 2000. Suciu is conducting research in data management, with an emphasis on topics that arise from sharing data on the Internet, such as management of semistructured and heterogeneous data, data security, and managing data with uncertainties. He is a co-author of the book Data on the Web: from Relations to Semistructured Data and XML, holds twelve US patents, received the 2000 ACM SIGMOD Best Paper Award, the 2010 PODS Ten Year Test of Time Award, is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship.



Zeit: Dienstag, 23. März 2011, 11:00 Uhr
Ort: Saarbrücken, Gebäude MPI-SWS Wartburg 5. Stock
Hinweis: Der Vortrag wird live an die TU Kaiserslautern MPI-SWS Gebäude, Raum 206 übertragen.