Using an index view or materialized view (Oracle) allows for the denormalization of data, by storing joins, aggregates, and other runtime queries in permanent storage. The benefit of this is that rather then performing these operations on demand, they’re already stored in a single flat data structure for retrieval. The result is much faster return times. There are plenty of articles out there for example http://www.microsoft.com/technet/prodtechnol/sql/2005/impprfiv.mspx on the benefits of indexed views. This blog will talk to determining if this approach is right for you by explaining how to perform quantitative and qualitative analysis of the costs.
First off there’s no free lunch, it takes work to build this flat structure, as well as work to maintain it. Every time a DML statement is run against an underlying table, data is pushed to the indexed view. This process not only involves the IO of the write but CPU costs of any joins or computations. There’s no better way to examine these costs then using the “showplan_all” command. The following example uses the Northwind database, consisting of a two table join with an aggregation. The purpose is to display the last three orders and the associated costs totals. Lets start with the original query:
select top 3
o.orderDate, sum(od.UnitPrice * od.Quantity) as OrderCost
from Orders o inner Join [Order Details] OD on
o.OrderID = Od.OrderID
where customerid = 'vinet'
group by orderdate
order by orderdate desc
Now lets get the cost of this by running a showplan_all.
set showplan_all on
go
select top 3
o.orderDate, sum(od.UnitPrice * od.Quantity) as OrderCost
from Orders o
inner Join
[Order Details] OD on
o.OrderID = Od.OrderID
where customerid = 'vinet'
group by orderdate
order by orderdate desc
Explaining this command (showplan_all) is out of scope, so I’ll stay focused on the primary result field we’re concerned with “TotalSubtreeCost”. Every statement is made up of different operators to perform it’s task. The total subtreeCost is the estimation SQL Server makes on how expensive the operator will be. This number is derived as a time value in seconds each operation is estimated to take. An important note here is that it doesn’t relate to the time on your machine, but time run on standard equipment at Microsoft. It’s simply an unchanging baseline to work against. Scroll through the result sets to find this field. In the following output the sum of all the subtree’s is a cost of 0.29155515 seconds.
TotalSubtreeCost
0.04338021
0.04338021
0.04337971
0.04336942
0.04336812
0.02707165
0.01569226
0.0032875
0.01238386
0.01624221
0.29155515 - Total
With a normalized baseline in place the following statement applies an index view. Notice no predicate is given, since this will normally be applied as a parameter.
Create view order_summary_view
with schemabinding
as
select
count_big(*) as Cnt, o.customerid, o.orderDate, sum(od.UnitPrice * od.Quantity) as OrderCost
from dbo.Orders o
inner Join
dbo.[Order Details] OD on
o.OrderID = Od.OrderID
group by orderdate, customerid
go
create unique clustered index order_summary_view_index
on order_summary_view(customerid asc, orderDate desc)
Now execute a query to return the same results as the previous select statement, using the showplan_all.
select top 3 OrderDate, OrderCost from order_summary_view with (noexpand)
where customerid = 'vinet'
You’ll notice only a few operations are used with a total on the “TotalSubtreeCost” results, of 0.009864. Compared to the normalized result it is only 3% of the cost. It’s easy to see the advantages of an indexed view for retrieving data. However the point of this blog is to determine the overall cost, not just on retrieving.
We need to determine the costs of data modifications so let’s first use the index view by examining the cost of an insert into an underlying table.
set showplan_all on
go
insert into [order details]
(orderid, productid, unitprice, quantity, discount)
values(10248, 17, 19.24, 10, 0)
The plan looks much more inline with the first select statement, and so do the total cost results coming in at 0.746337 which is over double the cost of the original select statement. Now I’ll drop the index view and rerun the above query.
set showplan_all off
go
Drop view order_summary_view
go
set showplan_all on
go
insert into [order details]
(orderid, productid, unitprice, quantity, discount)
values(10248, 18, 19.24, 10, 0)
select top 1 * from [order details]
The cost of inserting without an indexed view is 0.339465 more then 50% less then an insert with a clustered index. With these numbers in hand we can now make a quantitative analysis of the cost of having an indexed view. The way to do this is to take a sampling of your current activity and gather how many calls are made to retrieve the data and how many calls are made for data modification. Once armed with this information use the following formula on each set, then compare the results:
(Select_Calls * Select_totalcost) + (DML_Calls * DML_totalcost) = TrueCost
In our case lets assume in a hour period we see 1000 select calls and 100 DML calls the comparison formula should look like so
No Indexed View
(1000 * .29155515) + (100 * 0.339465) = 325.50165
Indexed View
(1000 * 0.009864) + (100 * 0.746337) = 84.4977
In this situation the quantitative analysis shows better overall performance by using an indexed view. This exercise was fairly vanilla, one insert statement and one select statement. Unfortunately the world doesn’t work this way, often you have several dml statements that will affect the underlining tables and several queries that will benefit from an indexed view. In this case the formula would just incorporate all the calls, by adding on. For example:
(Select_Calls1 * Select_totalcost) + (Select_Calls’n’ * Select_totalcost) + (DML1_Calls * DML_totalcost) + (DML1_Calls’n’ * DML_totalcost) = TrueCost
Before moving on to the qualitative analysis I must call out few items to be fair. The above is for demonstration only. For example the first query without an indexed view would benefit significantly from a covered index, and a 10 to 1 ratio might not be realistic. This last point is an excellent segway into qualitative analysis.
Quantifying costs is a great measure but should not be your final determinant on which direction to go. In the end run it’s all about user experience. If you can slash 100 ms on a select statement with an index view, but means timeouts when a user inserts data, it provides a poor user experience. Along those lines submitting data to a web interface is expected to take longer, users have just been trained that way. So penalizing your writes for read speed might be acceptable.
When deciding on which direction to go, you need to run in a stress environment, and analyze the numbers for a balance. In addition the previous example only analyzes a single query and not the entire environment. You need to look at what the user experience is with 100 clients hitting the server. With an indexed view your writing to two objects, and doubling your locking on both objects (Writes are in a transaction). So you need to answer, how this effects the other 99 users?