I have to confess that there’s a little trick I didn’t mention in my first post about summarizing time range data using SQL. Specifically, my solution relies on the data set being sorted in a certain way, in this case by server_id and DateAndTime, and then inserting a sequential key on the table using an identity column. So my solution isn’t very general.
This came back to bite me in the butt when I wanted to take the next step and summarize the output from that script further by eliminating the TestID and summarizing by server_id and time range. I couldn’t get my original algorithm to work. At all. Frustrated, I did more research and found that this problem, which is formally known as coalescing temporal data, is really hard—so hard that there are people like database guru Rick Snodgrass who devote their whole careers to figuring out how best to summarize temporal data using SQL.
The difficulty is that SQL is a set based language, but to properly summarize temporal data, SQL needs to understand time spans bounded by a start and end date and be able to compare them to see if one partially or wholly contains another. Fortunately for me, Snodgrass wrote an article a few years back in Database Programming and Design, called “Temporal Coalescing,” that lays out several options for solving this problem, including a mostly procedural option, a cursor-based option, an option all in one SQL query that’s even hairier than the one I proposed, and an option that uses a view and a HAVING COUNT statement, which is what I’m using now. It’s not fast, but it is correct. Here’s Snodgrass’s sample code, translated into the terms of my original solution:
CREATE VIEW V1 (server_id, DateStart1, DateEnd2)
AS SELECT F.server_id, F.DateStart1, L.DateEnd2
FROM ServerHistory AS F, ServerHistory AS L, ServerHistory AS E
WHERE F.DateEnd2 <= L.DateEnd2
AND F.server_id = L.server_id AND F.server_id = E.server_id
GROUP BY F.server_id, F.DateStart1, L.DateEnd2
HAVING COUNT(CASE
WHEN (E.DateStart1 < F.DateStart1
AND F.DateStart1 <= E.DateEnd2)
OR (E.DateStart1 <= L.DateEnd2
AND L.DateEnd2 < E.DateEnd2)
THEN 1 END) = 0
CREATE TABLE Temp(server_id int,
DateStart1 DATETIME, DateEnd2 DATETIME)
INSERT INTO Temp
SELECT server_id, DateStart1, MIN(DateEnd2)
FROM V1
GROUP BY server_id, DateStart1
Mandatory disclaimer: This posting is provided “AS IS” with no warranties, and confers no rights.