A little while ago, I delivered my master thesis and finished my Master of Technology in Computer Science degree at the Norwegian University of Science and Technology.
The thesis has the title iAD: Query Optimization in MARS
, and is about building a query optimizer for Fast
's (now a subsidiary of Microsoft
, they bought them) new Enterprise Search solution.
The report itself can be found here: MasterReport.pdf
, and the abstract is included below:
This document is the report for the authors' joint effort in researching and
designing a query optimizer for Fast's next-generation search
platform, known as MARS. The work was done during our
master thesis at the Department of Computer and Information Science at the
Norwegian University of Science and Technology, spring 2009.
MARS does not
currently employ any form of query optimizer, but does have a parser and
a runtime system. The report therefore focuses on the core query optimizing
aspects, like plan generation and optimizer design. First, we give an introduction to
query optimizers and selected problems. Then, we describe previous and ongoing efforts
regarding query optimizers, before shifting focus to our own design and results.
MARS supports DAG-structured query plans for more efficient execution, which means that the
optimizer must do so too. This turned out to be a greater task than what it
might seem like -- since we must use algorithms that greatly differ
from the optimizers we were familiar with. The optimizer also needed to be extensible, including the
ability to deal with future query operators, as well as supporting arbitrary cost models.
During the course of the master's thesis, we have laid out the design of an
optimizer that satisfies these goals. The optimizer is able to recognize common subexpressions and construct
DAGs from non-DAG inputs. Extensibility is solved by loose coupling between optimizer components. Rules are used to model operators,
and the cost model is a separate, customizable component. We have also implemented a prototype that demonstrates that the design actually works.
The optimizer itself is designed as a separate component, not tied up to MARS. We have been able tin inject it into the MARS
query pipeline and run queries end-to-end with optimization enabled, improving query evaluation time. For now, the project depends on MARS assemblies, but reusing it for another engine and algebra is entirely feasible.