Hans Olav's Repeatable Read - Friday, 30 January 2009

Friday, 30 January 2009

Master of Technology in Computer Science Project

I'm in my last semester here at the Norwegian University of Science and Technology, which means that I'm starting on my master thesis these days. Actually, I'm not starting now, I'm continuing on the "pre-project" I worked on together with my friend, Alex Brasetvik from August to December last year.

Our master thesis is about building a query optimizer for Fast's (now a subsidiary of Microsoft, they bought them) new Enterprise Search solution.

The report itself can be found here: MasterProjectReport.pdf, and the abstract is included below:

This document is the report for the authors’ joint eﬀort in researching and designing a query optimizer for fast’s next-generation search platform, known as MARS. This work was done during the pre-project to the master thesis at the Department of Computer and Information Science at the Norwegian University of Science and Technology, autumn 2008.

MARS does not currently employ any form of query optimizer, but does have a parser and a runtime system. The report therefore focuses on the core query optimizing aspects, like plan generation and optimizer design. First, we give an introduction to query optimizers and selected problems. Then, we describe previous and ongoing eﬀorts regarding query optimizers, before shifting focus to our own design and results.

MARS supports DAG-structured query plans, which means that the optimizer must do so too. This turned out to be a greater task than what it might seem like. The optimizer also needed to be extensible, including the ability to deal with query operators it does not know, as well as supporting arbitrary cost models.

During the course of the project, we have laid out the design of an optimizer we believe satisfies these goals. DAGs are currently not fully supported, but the design can be extended to do so. Extensibility is solved by loose coupling between optimizer components. Rules are used to model operators, and the cost model is a separate, customizable component. We have also implemented a prototype that demonstrates that the design actually works.

Coding | Computer Science | Microsoft | Studying

posted on Friday, 30 January 2009 03:08:08 (W. Europe Standard Time, UTC+01:00)

Comments [1]

Monday, 01 September 2008

Slides and Code From Thursday's NNUG Session

Here are the slides and T-SQL code I used during Thursday's Norwegian .NET User Group presentation.
I presented basic transaction processing with emphasis on concurrency and isolation. I hope everyone had a good time. I definitely had a good time presenting.

I also got some really good questions, one of them being what would happen if SQL Server were to lose the log file for one of its databases during operation. Since I didn't give the full explanation at the presentation, I've written a blog post about it. It can be found here.

Also, I mentioned that SQL Server 2008 RTM'ed (it's done!) on or sometime before 6th of August with build number 10.0.1600.22. I didn't blog about it here since I was OOF at the time :-)

Transaksjoner, isolasjonsnivåer og låsing i SQL Server.pptx (151.12 KB)
NNUGDemos-2008-08-28-HON.zip (2.92 KB)

Presentations | SQL Server

posted on Monday, 01 September 2008 21:32:15 (W. Europe Standard Time, UTC+01:00)

Comments [3]

What Happens If You Lose the Data File (MDF), Log File (LDF) or Both?

I got a question at a .NET Community Event a few days ago about what would happen if SQL Server were to lose the log (LDF) or data (MDF/NDFs) file for a database while in operation (e.g. the disk with the data or log file on crashes). If I've got my SQL Server disaster recovery right, this should be what would happen:

First, if both data and log are lost, it's simple - SQL Server will stop servicing requests for that DB and we'll need to restore everything from our last backup (possibly some minutes/hours/days old, depending on your backup scheme).

Second, if the data file is lost, while the log is good, SQL Server will probably stop servicing requests pretty quickly here too, but we shouldn't lose any data (assuming we're running under the full recovery model and have taken at least one full backup and have the log chain intact - that is, we haven't truncated the transaction log and we've got all log backups since the last full or differential backup ready for restore). We can just restore the last full backup, then the last differential one and then all log backups consecutively, up to and including the tail of the log that is still good.

Third, if the log file is lost, while the data file is good, we may have bigger problems. SQL Server will at least stop servicing any requests involving writing to the database, and we now have the potential to lose data.
But wait - we have the complete data file - why would we lose data? The reason for this is the way SQL Server handles buffering and recovery, using the ARIES algorithm. ARIES uses a so-called STEAL/NO-FORCE approach to optimize performance for the buffer pool (SQL Server's in-memory data cache), which basically means that data from uncommitted transactions can be written to the MDF/NDFs on disk and that data from committed transactions can still only reside in memory.

This means that if there are open transactions or any transactions have been writing data to the database since the last checkpoint at the time of the crash (and possibly more scenarios), the data file is potentially in an inconsistent state. Losing the log file in such a situation can cause database corruption, broken constraints, half-finished transactions, lost data and all sorts of crap, since SQL Server will not be able to roll back uncommitted transactions or roll forward committed ones.

If the log is lost, it can be rebuilt using Emergency Mode Repair, but as Paul S. Randall (former SQL Server employee) describes here, this is something that shouldn't be done unless you're out of other options.

So, the only way to ensure you don't lose data is, once again, a plan for backup and disaster recovery. Murphy states that if you don't, you WILL find yourself in deep shit at some time in the future.

And when we're on the topic of losing the log - I've seen some pretty ridiculous ways of reducing the size of your log file around different forums. I've seen posts advising people to just delete or rebuild the log file whenever it gets too big. That is a pretty bad piece of advise (unless you know what you're doing and are checkpointing or detaching the database first). Rebuilding the log is, due to the reasons above, a pretty quick and handy way of inducing corruption into your database. To reduce the size of your transaction log, back it up using the BACKUP LOG statement, optionally shrinking the log files afterward.

So, do you agree with me? Feel free to post comments if I've got something wrong.

SQL Server

posted on Monday, 01 September 2008 18:37:09 (W. Europe Standard Time, UTC+01:00)

Comments [1]

Saturday, 07 June 2008

SQL Server 2008 RC0

A few months ago I wrote that a SQL Server 2008 RC (Release Candidate) was scheduled for Q2 this year.

Looks like Microsoft is staying on their schedule - RC0 was just released to MSDN and TechNet subscribers!

EDIT: RC0 is now available here for the public as well.

SQL Server

posted on Saturday, 07 June 2008 04:58:05 (W. Europe Standard Time, UTC+01:00)

Comments [1]

Wednesday, 04 June 2008

New SQL Server Logo

Microsoft has published SQL Server's new logo:

I think it looks good :-)

Courtesy of Wesley from http://blogs.msdn.com/wesleyb/archive/2008/06/03/sql-server-logo.aspx

Microsoft | SQL Server

posted on Wednesday, 04 June 2008 02:58:52 (W. Europe Standard Time, UTC+01:00)

Comments [1]

Friday, 09 May 2008

Flash Memory and the DBMS World

I recently wrote a paper at school about how flash memory impacts the database world. Those who are interested can read it here: How Flash Memory Changes the DBMS World - An Introduction

Computer Science | Studying

posted on Friday, 09 May 2008 15:30:30 (W. Europe Standard Time, UTC+01:00)

Comments [4]

Thursday, 06 March 2008

Resources from SQL Server 2008 Presentation