June 2007 - Posts
I have the best job here at SQLServerCentral.com. I work from home, I get to be creative, writing about what I want to write about and work on my own schedule. I've gotten free admissions to the PASS Summit, TechEd, and the MS BI Conference and been offered others.
However it can be aq pain at times. Like today where I've got people banging on me for running an article in today's newsletter. I put the title as "
Writing Faster T-SQL" and the description as follows:
All of us would like to code faster and with fewer mistakes. While a
third party tool usualy won't help with the latter, there are a few to
help you increase the speed at which you build readable code. New
author Tom Fisher brings us a short tutorial on using a couple of Red
Gate's tools.
Now I thought it made sense that this was a vendor piece, admittedly, but my vendor of employment. Instead in the discussion I had some complaints that this is becoming a "Red Gate" site and we are running too many vendor things.
I can't quite understand that. We have run 3 reviews of Red Gate products, including today's, this year, but we've run 7 or 8 other vendor's products. We rarely run reviews, and so I can't understand what the extrapolation here is for other users. Are these the "squeeky wheels" or are they representative of 1000 other users.
It's a fine line to walk with Red Gate. They paid a good price for this brand/site and they want to be able to promote their products. To be fair, other than asking me to remove advertising, re-do the newsletter look, and to display their logo, they haven't pushed any content on me. I tend to just do the same things I've always done and this site reflects my feelings, interests, and ideas, not really anyone else's. I guess Brian and Andy's from over the years, but for the most part I decide how the site works and what content appears. I didn't think anything of running the article as it was a nice "how to" tutorial showing off their products. I've actually asked other vendors to do this, but haven't gotten any responses.
So I'm not sure what to do here, but I'd love feedback (sjones at sqlservercentral.com) on what you think.
I wanted to get working on Katmai a little, but not being brave and seeing the release notes that it can't run side by side with SQL Server 2000 (who decided that?), I wanted to put it in a VM.
So I fired up Virtual PC, created a new VM, and...
Nothing. It started the boot and froze. I'd selected Windows 2003 as the base image and for some weird reason thought I'd get a W2K3 install. It was then that I realized I don't have a blank Windows 2003 image. So I'm installing W2K3 now on a VM and when I'm done, I'll shut it down and copy the file off to another place to hold as a template.
I haven't done much VM work lately, but I used to keep spare images around as templates, which allowed me to quickly and easily test installs.
So a tip: Keep at least a Win XP and a Win 2K3 image handy. Or Vista if your hardware supports it.
My job here at SQLServerCentral.com is mainly as editor. This means that for the most part I edit articles, schedule them for release, work with authors, build questions of the day, correct mistakes in questions of the day, and of course write editorials.
Last week I was on vacation, so no work got done. As a matter of fact, I was completely unwired, and so I didn't really do anything. I did make some notes on vacation and start to write an editorial on Friday (as I was ramping up out of vacation mode), so I was really behind when the week started. Only about 600 emails to go through from the week, of which 400 or so were nonsense, vendors, newsletters, SPAM, etc. that I keep an eye on.
So today was going to be a good day to edit some stuff and do some writing. I was behind getting to the Database Weekly links this week, and they usually provide me with some good things to write about. And in fact I jotted down 5 or 6 things I need to editorialize on or write articles on at some point. I keep a living document of stuff going and today was going to be a productive day.
But my daughter claimed illness this morning. With my wife suffering
Montezuma's Revenge for the last 2 days, I couldn't take a chance, and let her stay home from summer day camp. But after a couple hours, she was "miraculously cured" and wants to play, talk, ask for things. I finally snapped when she walked in the office for the 10th time in an hour with some request. Not making for a good work day. And I was ready to work.
The database manages itself. Not totally, but close to it. It was an
analogy made by Dan Jones with cars. Auto shop isn't really offered in
high school anymore, people don't "tinker" with their cars much anymore.
If this is true, we have more time for data design, data architecture.
I'm not sure I agree and as much as I hear, the more I think a DBA
needs to be around because when something goes south, it goes way down.
Some trends affecting managability.
Trend 1 - New Features in the product
A huge chart of new features that were in SQL Server 2005. Many more
features were added in SQL Server 2005 than will be added in 2008.
However the surface area is still growing in many ways.
One of the manageability things is that no one can understand so completely the product anymore because it's too large.
Trend 2 - Data center and server consolidation
Companies are trying to have fewer centers and less servers. And the
amount of staff will shrink, but the work grows. In fact it's expected
that more databases per DBA is the trend.
Not too many customers are virtualizing SQL Server. Bigger servers and
multiple instances, but not really virtual servers in production.
Trend 3 - Remote and embedded DBs
More and more of these are remote, kiosk, small form factor databases
will grow. The data will upload from these remote and embedded systems,
but the DBAs aren't on-site. One retail chain, close to 1000 locations,
has 3 DBAs managing 5,000 databases. Most are small, but they still
need to be managed.
Laptops and mobile workers will often have databases on their laptops that need to be managed.
Katmai Manageability
To combat these trends, Katmai is trying to scale the DBA with better managability. Here are some features:
- resource governmer - I've been asking for this for years. The goal is
to differentiate workloads, monitor resources per workload group, and
you can limit resources. The goal here is to prevent runaway queries.
You can set applications up into groups and set min/max for memory and
CPU usage. You can set priorities between groups as well, so power
users, sales, etc. can be placed above regular report runners. You can
also put applications into groups as well.
Programmed via T-SQL, SSMS, or SMO
All Actions Audited
A frameowrk to audit the data environment. A new object (audit
configuration) will be included to specify a location to store audit
information. Can audit to the Windows logs or a file. This object will
have a specification that defines which actions to audit. Can define
auditing activity on DML, events, on particular objects. You can lump
multiple specications together. It's a securable just like any other
object. It will have a T-SQL programmability API, SSMS, or SMO access.
Performance Studio
Data collectors will be introduced to collect data in memory. They can
be configured via T-sQL or Win32. They run as configured to collect
information and post it to a warehouse somewhere. you can configure
what and how often they collect data as well as how often they post to
a warehouse.
This allows you to analyze performance. Some canned reports will be
provided or SSRS can write custom ones. It can be moved to SSAS as
well. After Ss2K8, they expect to provide a real-time dashboard
as well as a wizard to watch trends and offer suggestions.
There will be the ability to apply policy, like system must run within
parametesr. So like if it goes above 80%CPU for more than 5 minutes,
send an alert.
Data Compression
Not backup compression, but compression on tables and indexes.
works on partitioned tables (or non partitioned). You can compress one partition and not another.
It has page or row compression, works on clustered or non-clustered index.
T-SQL extenions available for table and index DDL. You'll have an SP to estimate the savings. Also available via SSMS and SMO
There is a CPU hit. Not every data type is compressable.
I'm not sure if this is a great idea for many people, but for less used
data, this can be a big deal. Don't forget that a savings in the
database can mean a savings in backup size and time, a tape savings. It
can really add up. When I bought litespeed years ago for JD Edwards,
when it was $400 a server, we had about a 1 year payback based on tape
cost.
Database Mirroring Page Repair
Pages can become corrupt due to physcial data problems, i.e. disk going down.
If you failed to a mirror with a bad page, you could be down. This
system watches both the principal and the mirror for bad pages. If
found, it can call it from the other side to restore that page. New DMV
shows the last 100 page repairs, new trace event, data collector can be
run on this, etc.
Powershell Integration - not committed.
Intellisense - WILL BE IN 2008!!!! About time that this will appear.
Declarative Management
also called policy based management. The focus for SS2K8 is on the db
engine to allow you to set policies for server conifiguration. This can
work across multiple servers, like the master/target by pushing
policies to particular groups of servers.
The goal is to lower the TCO and reduce the cost of managing. Should
eliminate some scripts to tweak servers. Policy should handle this. You
can enforce policy or just monitor.
This will replace the surface area configuration configuration tool.
Why these changes?
All of these changes are designed to combat the trends above.
To handle the additional features, they want to give you less knobs to
manage. There aren't really less knobs, but with grouping and policies,
you can physically have to work with less knobs as you deploy these to
other servers. The first server takes time, but changes or alterations
are more easily pushed to other boxes.
These same features help with consolidations because by grouping things
together, one DBA can manage more servers. It makes the DBA more
scalable
.
Once again with policies, this allows you to manage all databases that
are on cell phones, or on laptops, or in any other remote situations.
You can control and monitor those devices in an easier fashion.
The conceptual model for management
Policy - The desired state of facets, when to check the policy (on
change, on a schedule, etc), actions on check (log, rollback, etc),
what to check (name, setting, etc.), and the categories. Group the
policies together, like all security policies.
Facet - projection on top of a target type. A groupding of properties,
like the security properties of a target type. Facets can have logic,
so it can have new properties.
Target Type - object, table, view, sproc, physical objects with their properties.
Kind of confusing, but each of the items above was a box in a stack
with the target type at the bottom. I have the feeling we'll see these
in BOL, so start to get used to them.
Back up MSDB
I don't know anyone that isn't doing this, but it was mentioned to be
sure you back up msdb. As with everything else internal it seems, the
policies will be stored in msdb.
They'll also have an internal SQL Agent job to handle policies. This
will also use SQLCLR, evne if you have it turned off. Very interesting
that they have a back door, but more importantly, it's good to know
that it's a potential problem. Not likely, but you never know. I'm
assuming they backdoor SQLAgent as well, probably using the subssytem
even if the service is off.
To prevent changes, they use the same DDL events that we use.
Demo
policy to prevent creation of tables in the dbo schema. Fails with a level 16 error message.
There's a new Policy Management folder under Management in SSMS. There are policies, conditions, and facets.
If there's something out of compliance, the icon on a database changes and you can find the issues.
Showed how to create policies, categories instances, and set groups up.
Once again with policies, this allows you to manage all databases that
are on cell phones, or on laptops, or in any other remote situations.
You can control and monitor those devices in an easier fashion.
It's a similar theme, off by default if it makes sense, minimize surface area.
Recommendations: for new installs, leave stuff off unless you need it.
Minimize connectivity. for upgrades, turn things off if you do not
truly use them.
SQLSAC - Surface Area tool, launch from command line or setup. There's
a SAC command line utility to script out the configuration that you can
apply to other servers.
Service Accounts
BOL says don't use Network Service? Why not? Has more rights than
generally needed. Also a shared account. If penetrated somehow from
another place, control of SQL Server is gained.
Local or domain account is recommended. You can change the password without a shutdown.
Least desirable: Local system
OK: Network Service or Local Service.
Changing the account: Previous advice was go to SQL Service Manager or
Enterprise Manager to change the account. Reason is because large
number of files and folders are ACL'd. Same advice for SQL Server 2005.
Use the tools, not Services applet.
Authentication Mode
Windows is recommended and is the default.
Exceptions are because an application requires its own login. Also DBAs
do not want to have Windows administrators having control over who can
access SQL Server. This is why there are additional features in SQL
Server 2005 for SQL logins. One thing is that the channel from client
to server needs to be encrypted to cover passwords. SQL Server 2005
uses a self-signed certificate to encrypt the password packet to keep
it safe.
Network Connectivity
It's recommended to enable the fewest protocols you need, probably only TCP/IP.
Also default ports should be blocked and changed (1433, 1434)
Endpoints are new entries into SQL Server, but you can secure these to
limited users/roles. You can limit the types of access as well (no ad
hoc, etc.).
DO NOT EXPOSE YOUR SERVER to the Internet. This should be common
knowledge for anyone and it's something you want to do if at all
possible.
System Procedures:
REVOKE execute from permission from XPs and system stored procedures.
Even though public had rights to run a system proc, the procedures had
a security check. In 2005 all were re-examined to be sure rights were
needed, for example sp_help had a security check added. Many XPs
(XP_dirtree, xp_regread, etc) have security checks added.
They are more secure, but removing XPs results in an unsupported configuration.
Password Policy
In previous versions, no validation of SQL auth passwords.
New features added based on OS capabilities. Recommend that
CHECK_POLICY is left on, CHECK_EXPIRATION is left on and MUST_CHANGE
for new logins. Note, policy is only checked when passwords are
changed, not during an upgrade.
If you set MUST_CHANGE, this only works for 2005 clients (.NET 2.0,
Native Client). The reason is they can send a two password packet that
old clients cannot when a change password is required.
Admin privileges
Use when needed. Don't use SA because it's convenient. Also minimize
the number of administrators and provision them explicitly. Don't use
the Windows Administrators group.
Also be sure each administrator has their own account to be sure you can audit and determine who change what.
VistaUser Account Control issues.
Protected Admin - Even if you are an admin, your token does not have
the admin token. A check is made to be sure that you have rights and
want to perform the action.
DB Ownership and Trust
SQL2K SP3 turned off database cross chaining. This ensures that a cross
database query has a security check. If you don't need it, turn it off.
If you do need this, you can trust the dbo of the other database or
sign a stored procedure with a certificate and have the dbo trust the
certificate.
Recommendations are to have distinct owners for each database, not "sa", and only enable trust as needed.
Schemas
It's a namespeace in the container hierarchy
server -> database -> schema -> object
Can be owned by anyone and can grant permissions at this level. This
was done to isolate applications or groups of objects, separate
administrative grouping from application usage. You can easily change
the user as well.
OK, I'm done. It's too basic a session. I guess if you haven't really
worked with SQL Server, this is a good list of stuff, but I think too
much of this is common knowledge for DBAs. Giving some of the reasons
for changes is good, but for a many of these explanations it sounds
like an excuse or a "ok, we give in". Having a somewhat boring speaker
doesn't help.
Last night I went out with Tony Davis of Simple Talk and Alan White, a
longtime author for both here and Simple Talk. After satisfying their
need for microbrewery beer, we ended up at a Southern BBQ. They enjoyed
it, despite having to join me in choking down American beers.
We got to talk about a bunch of stuff, mostly non-technical, but it was
interesting to hear their perspectives on the conferences. They
definitely are enjoying the conference and getting some good
development knowledge here at TechEd. There's definitely some, but I
think this is more an infrastructure conference overall. Still it was
interesting to hear them. They still like the PASS conference for DBAs
with lots of networking opportunities and real world presentations.
They did think that Powershell is going to be here a long time and take
off. I think it's pretty cool, but I've seen MS push lots of
technologies, so I'm kind of adopting a wait-and-see attitude about
Powershell. If you get the chance, check it out and see what you think
There were definitely lots of Powershell sessions here covering all
sorts of admin, Windows, SQL , Exchange, etc.
CHECKDB
- needs consistent view of the db. Needs to do this without locking. In
SQL Server 2000 log analysis is used. Almost everything is logged and
once the database is read, the log is read and inside checkdb, recovery
is performed to reconcile the differences. If there was a lot of
activity, this could take a long time. While the data portion is
multi-threaded, the t-log analysis is single threaded. There are
problems because not everything is logged. So it guesses in some cases.
Could cause false positives.
In SQL Server 2005, this was changed and they use COWs (copy on write).
An internal snapshot is generated and used for a consistent view. Still
with a high load, there are problems (just like a snapshot). Tempdb is
not used (nor can you create a snapshot on tempdb).
- no online checks on tempdb. recovery cannot be run and a snapshot cannot be created.
- only works if you have databases on NTFS.
- internal snapshot is an alternate file stream on each db file, but it will not show up in explorer.
- any space used in the alternate stream is returned to the drive after checkdb.
- If you run checkdb on a snapshot (if you're low on space), no alternate stream/snapshot is created.
- What does it do?
- primitive checks on critical tables (5 tables)
- works by getting to the leaf level of each of
these tables. Read every page, no cycles in the linkages, page audit of
every page.
- if there's a problem it's done.
- allocation checks (= checkalloc) - checks consistency of
allocation maps. Ensures that the maps are accurately stored. IAM, GAM,
etc. ensure no two tables have the same map set, ensure that the GAM
and the IAM match up correctly. Check PFS pages - is page allocated, is
allocation page, etc.
- so far this is fast
- If running repair, it's run on allocation problems.
- logical system table chceks and repairs
- eq of checktable on these. Works in parallel
- check indexes are correct, b-trees are correct, page audit, etc.
- If page checksums are turned on, this is checked.
- logical checks of user tables (and repairs)
- this runs in parallel as well
- service broker checks
- metadata checks (check catalog, wasn't included in ss2k)
- indexed views and XML index checks (and repairs). Determines
if the content of the indexed view matches the definition of the view.
Can determine if there are extra views or missing rows. Makes new index
to check. Similar thing with XML.
You can check percent complete column to figure out which stage and progress in the DMV (dm_exec_requests).
Physical_only?
- only first two checks
- then reads every page to do checksum checks and page audits.
Leaf Index Checks
- Page audit (buffer pool does torn page or checksum checks). Can eat a checksum failure and continue on. Reports the issues.
- look at page header, check that data is what is expected for that page.
- Record audit - gross audit of structure of all records. Are offsets correctly set.
- B-tree check
- for index and leaf pages.
- check linkages
- are key ranges correct?
- partitioning - is the record correct for the partitioning function.
- key ordering - records on the page are not stored in order.
They are stored where they fit, so there's a key ordering store for the
records on a page.
- per record checks
- complex columns - checks the pointer to the values. ensures that what's stored in the row is what's stored elsewhere.
- computed columns - make sure persisted values are correct according to the formula.
- Data purity - not all types have range checking. Datetime has
minutes after midnight. Must be < 1440, but the stored value could
be higher in theory. So checks are performed to be sure the stored
value is within the range.
- 1 to 1 mapping - ensures that a nonclustered index maps to one base table row.
- Page counts
- counts in header are correct.
No blocking of the table on SS2k and beyond.
How often do you run this? It depends.
checks could approach n<sup>2</sup> complexity. Fastest way
is N * log(n). Rather than following links, as pages are read, various
facts are generated about each page.
- A (actual) - we read A
- A (sinbling) b - who does A point to
- P (parent) A - when A is read.
Where it's an end page, nulls are generated, For parent pages, a
psuedo-parent is generated. A, B, O, are really the page IDs, index
IDs, etc. these are sorted and then checked. Every node must have 3
facts (can be null) or there's a missing link. If there are too many
items, like 2 parents, then there's a linkage issue. This is done so
pages can be read in allocation order, which is the fastest way to read
them. Other checks use a similar method with different facts.
Index matching. It's getting complex, but the idea is that a
bitmap is generated. As the table (heap or clustered index), is read, a
hash is done and a bit is flipped. As the index is read, it's hashsed
and it should map up to the same value and the bit is flipped back. So
as long as a bit is flipped twice, it's good. so it digs through and
can reuser the bitmap for multiple tables. As long as everything flips
twice, it's ok. If at the end they do find a set bit a "deep
drive" is performed, which is very time consuming to find the
corruption. This first check is 30% of the resources of the deep dive
and is fast. So if checkdb starts to take a long time, it's bad.
Before SP2, you couldn't tell a deep-dive was being run. Post SP2, an error message is now logged.
Repairs
- purpose is to make the db consistent. NOT protect data. Data is not
necessarily deleted, but the repairs are efficient, fast repairs and
data can be lost.
- mor eintrusive repairs ar edone first. Like allocation map for an
index repaired before a missing row. The more drastic ones often fix
less drastic ones.
- Not everything is repaired, or known. You have to check.
- Not feasible to make it online. Too complex.
CHECKDB doesn't run faster after index rebuilds.
If you run CHECKDB and find errors, does it run again with the repair
option? Yes, it runs twice. Because something could have changed.
Space required ro run checkdb.
- facts require more space, so tempdb space is needed to store
facts. Run with_estimate_only, to get an idea of the space for facts.
can internally do single index row inserts and deletes.
If a value in a row is corrupt, the row can be deleted.
If a pointer is corrupt, the entire page can be deleted.
to be safe, you can start a transaction, run repair, and if data is deleted, you can rollback if needed.
checkdb does not respect key relationships (PK, FK). It is
structurally complete, but you could have orphaned rows. Need to make
checks.
Some things are unrepairable
- system table clustered index problems
- PFS pages
- Data purity errors.
Beware of third party kernel filters not working with CHECKDB that doesn't handle alternate streams.
Not recommended to have anti-virus on SQL Server because it can interfere with IOs
I missed this yesterday, but I saw it on the schedule again, and I knew
I had to make this one. It's titled "Beyond Relational", which I expect
to mean it's focused on enhancements not necessarily related to rows
and columns. But I didn't see the abstract, so I'm not 100% sure what
it will be. I'm hoping it's not too much time on GIS stuff.
Jnue CTP is out and is the first public release. They are definitely
looking for feedback on the features that are already baked in.
4 pillars of SQL Server
- enterprise platform - all the "-abilities"
- Beyond relational
- Dynamic development - moving towards model based development.
- Pervuasive Insight - BI everywhere, all data can be used in BI.
Beyond Relational - The explosion of data which means more and more types of data will need to be hosted.
Survey of the audience: how many have more than 1TB at home? 75-80% did.
Applications work with different types of data to be rich.
- relational (rows/colummns), structured data
- XML
- documents and multimedia data.
- spatial data
An example is an insurance claims application. In the past the
application was forms mapped to tables in rows. Today we have pictures
and videos, relating to other documents like witness reports, moving
data in XML, and even mapping those claims to geographical locations.
Idea on new types of data is to ensure that the functionality available
for traditional types of data is also available for new data types.
Goals
- Reduce cost of managing all types of data
- simply development of apps using both relational and non-relational data
- Extend services for relational data to non-relational data.
Additions in Katmai
- XML upgrades
- Remote BLOB Store API, FILESTREAM, Integrated FTS
- support geometry and geography types and functions
- Large UDTs, flexible columns, wide tables, filtered indexes, hierarchyID
WinFS? - most of team was folded into SQL Server team. A lot of
the work went into SQL Server. The FILESTREAM type and large UDTs came
out of this project.
XML Upgrades
- Full support for storing and validating Office 12 document formats.
- Support for lax validation
- Full xs:dateTime support
- support for no timezone values and timezone preservation
- support for lists and union types
- There is XQUERY support for the let-clause.
- Added support for insert sql:variable("@xml") into /a/b
Doucments and Multimedia
Why store in the db?
- integrated management and data level consistency
- problems with poor data streaming, size limitations, and high cost/GB
- right now you can use VARBINARY(MAX)
Alternatives
- File server storage or dedicated BLOB store server (RMC Centera, Fujitsu Nearline).
None of these is the right way. Each works in different situations.
SS2K8 should deliver a few different platforms.
- FILESTREAM feature. Addresses limitations of SQL BLOBs ni making them work better.
- REMOTE BLOB storage - allows blobs to live in file server or dedicated store with reference in db.
- SQL BLOBs - more support for internal storage.
FILESTREAMS
storage attribute on VARVINARY(MAX)
- unstrcutured data is stored in the file systems (NTFS)
- dual programming model.
- TSQL (same as blob)
- WIN32 streaming APIs with T-SQL transactional semantics.
- Gives data consistency and integrated manageability (backup/restore, administraton)
- Size limit is the file system volume size
- uses SQL Server security stack
- demo - Kevin Farlee
- create db with filegroups, one of which has an attribute as
containing filestreams. the location is a directory that will be
created. Different tables can have filestreams in different locations.
Just like tables.
- table is a "varbinary(max) filestream" type.
- insert a row and we see
- to work with the data, need to get a handle and logical path to get to the data in the filesystem
"select get_filestream_transaction_context(), field.pathname from MyTable"
- Use a Win32 call to read and write this data from the file system (fascreat??)
- Showed transaction rollback as well as trigger fire from update to the file system data.
- backup, drop db, restore db, and data is there, including filestream data.
- First release in 2008 should have these limitations
- remote storage of filestream not supported
- DB snapshot and mirroring are not supported (log shipping will work)
- encryption and table value parameters are not supported.
My question is what does this do to the log?
REMOTE STORE BLOB API
- applications can link via a provider. So each application provider needs to deliver their own provider for their own system.
- simple api for fetch, create, enumerate, Garbage collection, delete
- returns reference that is stored in the db. By going through the db,
there is some maintenance of the references. This should ensure some
link consistency and allows the application to be loosely linked to the
BLOB store.
Full-Text Indexing
Challenges
- indexes outside SS have manageability problems
- mixed query performance suffers from pull over complete full-text set
- scaling issues on big boxes
SS2K8
- FTI fully integrated into the engine.
- Mixed queries should scale better. (query on "contains() and xx = i)
Spatial Data
- New types (geometry, geography)
- New methods (intersects, buffer, more)
- new indexes for similar performance to other types.
- Full set of OGC/SQL MM, ISO 19125 components
- Integration with Virtual Earth
- Demo - michael rys
- intersection of map with zip code query.
- added spatial index, performance is improved.
- showed integration with Virtual Earth as well
Relational Data
- HierarchyID - store arbitrary hierarchies of data and efficiently query them. New UDT to implement hierarchies.
- Large UDTs - no more 8k limit
- Sparse columns - optimized storage for sparsely populated columns.
- Wide tables - support for hundreds of thousands of sparse columns.
- Filtered indexes - define indexes over subsets of data. Index on rows in a table with a particular value.
Andrew Kelley from Solid Quality Learning did this one, which was good.
I was worried it would be about maintenance plans, but it wasn't.
Instead it was on good real life maintenance for your server and how to
handle it. He did start out talking at the Maintenance Plans abstract
you away from what's happening and can be limiting, so be sure you do
understand what's going on, don't just blindly assume the plan wizard
knows best.
Hardware
Since the maintenance can stress SQL, it's a good idea to look at
hardware. First, memory. It's better to have too much than too little.
Most SQL Servers are supposed to be memory bound, and memory is an easy
and cheap way to boost performance.
For CPUs, more is better and multi-cores are definitely recommended. As
your data size grows and load grows, even in maintenance, having more
CPUs or cores helps. One caveat, it's recommended to turn off
hyperthreading unless you really test. Since it's one core/CPU, you
could end up with threads performing different activities, and slow
things down. I still haven't see great metrics on this, but the
consensus seems to be leave this off. So if you have an older CPU, be
sure that's the case.
MAXDOP - You can set this at the server or statement level. Which can
be handy if you are working on a system that is being used. You can
reduce the use of CPUs by your maintenance statement by setting this to
something other than 0 (use app procs).
Disks
Know your config. RAID 5 has been the standard, but it's gone out of
favor for databases. It's write intensive and people seem to be more
worried about failures. Andrew recommended R0+1 for most things, data,
logs, tempdb. R1 was for the OS, logs, and tempdb. Not sure why not for
data, perhaps because large files. R0 is not the ideal.
Beware of SAN or other storage people. Be sure your separate drives are separate physically.
DBCC CheckDB
Checks all rows, text offsets, etc. to be sure that they are correctly linked. 2005 added many more options and checks.
- data purity - values within range for data type.
- Row overflow pointers
- Service Broker checks
- XML and indexed view checks
- torn page and checksum - set this for each database for the page level.
Be sure that when you upgrade, you run checks. It is possible for data to exceed type limits and be invalid.
In 2000, you could run CHECKDB and got CHECKALLOC and CHECKTABLE. 2005
adds CHECKCATALOG. SS2K also used a table level schema lock and could
block log truncations. In 2005, it uses an internal snapshot to get a
transactionally consistent view and doesn't block the log.
Run this on master, model, and msdb. They can get corrupt as well. Not as likely, but it can happen. Run on ALL databases.
When to run? Every night it best, but you have to weigh the risks v the
time and resources you have. Run as often as you can.
How can you speed it up?
- Run physical_only to skip some checks. Does a good job of
checking integrity, so use this if you cannot run the entire check.
- Run it with TABLOCK. It runs faster, but it blocks other connections.
- Run it against a recently restored backup. This allows you to run this on another backup.
- Use no_infomsgs
File-level Fragmentation
- Normal fragmentation you might see on your workstation.
- Create your files as large as needed for needs at once. This prevents growth, which often fragments files.
- Don't let auto-grow kick in. Proactively add space when needed.
Auto growths can be expensive and pause your server for the connections
doing work. Instant file initialization helps, but requires Enterprise
Edition, permissions for the SQL Server service account, etc. NOT TRUE
FOR LOG FILES
- Dedicate the drives to SQL Server, not other apps.
- Use an OS level defragmenting tool.
Shrinking a Database
- This should almost never happen. This removes free space, which fragments files, causes a need for growth, etc.
- costly operation and FULLY LOGGED
- Keep the free space you need for maintenance activities.
- If you need to do this, use SHRINKFILE, not SHRINKDB. More
control and lets you stop after each file. It is online and it's
stoppable.
Optimizations
Why? to deal with logical fragmentation and page fill.
Logical Fragmentation - The physical order does not match the
logical order. So index on names, alphabetically, might be stored as
Allen, George, Henry, Rhonda, Ken, Tom, Billy. So the server jumps
around through the pages back and forth to follow the index. Page
splits cause this.
Page fullness - Also known as fillfactor. How much free space on
a page. Caused by data changes. Reindexing regularly it to get this to
your ideal value, not deal with corruption.
In 2000 we used
- DBCC REINDEX
- DBCC INDEXDEFRAG
- DBCC SHOWCONTIG
Now SS2K5 uses
- ALTER INDEX REBUILD
- ALTER INDEX REORGANIZE
- sys.dm_db_index_physical_Stats
The DBCCs are being deprecated and will not be in Katmai, so change your code now!
INDEX REBUILD
- does all or one index and deals with existing RI and can work online
- Adheres to fill factors and padding, updates stats, and can uses parallelism.
- Best chance of getting contiguous index.
However, there are cons.
- Locks the entire table for the entire operation if it's offline.
- One large transaction, so log space can be an issue
- Can take awhile and uses lots of resouces. Requires 1.2x size of the index.
REORGANIZE
- minimal locks, fills pages up to the fill factor
- can be stopped and restarted
- log can be backed up while this occurs
- doesn't require extra free space.
HOWEVER, the cons
- Can log several times the size of the index over time because pages can move more than once.
- Doesn't update statistics and is single threaded. Rebuilds the leaf level only and works only on one file at a time.
Which is better? It depends. Use the one that works best in each
situation. You may have servers that require REORGANIZE, but others
that can use REBUILD. Need to consider the size of tables, maintenance
window, size of tables, hardware, online requirements, etc. You don't
necessarily need to do every index on every table every night. Might
have different schedules for different databases. Check under
SHOWCONTIG (SS2K) or under the DMV (SS2K5) is an example for reindexing
based on thresholds.
Statistics
- Do not need to do this if doing a REBUILD. You don't necessarily need
to do this unless your data changes dramatically. If you have
auto-update turned on, that probably works.
Backups
What type of backups should you use? It depends.
What are your requirements? The recovery time is a big factor here. How
fast you need to recover helps determine what types of backups to take.
Full backup every night is a standard practice. Data loss is the other
factor. Less tolerance for data loss means more frequent backups. And
of course, the database size is a factor (data size, not allocated
size). The larger it is, the less choice you may have.
Do we need to verify backups? Verify does not detect corruption (unless
you use the checksum option), but verify makes sure that you can read
back what you wrote.
Need to test your backups! Ensure that you can restore them or they could be useless.
Disk or Tape? Disk first is preferred, not the same disk as the data
resides on, but it's quicker, cleaner, and available if you need a
quick restore. Move to tape for long term storage. If you use remote
drives, use UNC pathing, not mapped drives. Third party tools are a
good idea to compress the backups and save space and time.
Don't use INIT or the same file every day. It loses flexibilty and
increases the chances of problems. The maintenance plan does this for
you, but lots of code samples here at SQLServerCentral.com or elsewhere
on the web.
Watch our for backup history. Every backup gets recorded in msdb. The
maintenance plans can clean up the history, but if you don't use them,
be sure that you clean up with sp_deletebackuphistory periodically.
Scheduling - Know when things are happening. I agree with this and in
most of my environments my servers have been small enough I could
arrange things well. But I do see where something like sqlSentry is
needed if you have lots of jobs. Since maintenance is resource
intensive, be sure that you don't schedule things a the same time, like
everything at midnight. You'll make everything take longer.
Things not built in, but need to be done
- check for free disk space
- check for free space in files
- scripting jobs
- scripting dbs
- delete old backup files (or log files
- chcek for log errors
- check for job failures.
Everyone wants 5 9's of uptime, 24x7 support, and instant fixes for
issues with their database server. Never mind that I'd argue that most
DBAs don't really need the "
Always On" technology, but it's something that sells. So I thought Kim Tripps "Demo Fest" would be a cool seminar to see for a DBA.
Kim is a great speaker, entertaining and very comfortable up there, so
if you get the chance to see her, I think you'll enjoy it. This was a
demo session, really a couple slides and then demos. She was looking to
cover:
- Mirroring
- Snapshot no mirror
- Online Indexing
- Peer to peer replication
Using VPC images, we got to see the various demo items. One of
the intersting things is she showed some tips on how to setup your lab
or test environment. Kim has 5 instances on her image, but she also has
them set to not start by default. Instead there are batch files to
start and stop services for specific labs.
Mirroring was the first step, which is one of the hot new
features of SS2K5. An interesting fact that I knew, but you might not,
is that the witness server, the third (optional) server in mirroring,
can run on SQL Server Express. So it's a very lightweight requirement
for this server instance. Plus the cool thing about a witness is it
doesn't cost anything with Express.
I don't think mirroring is that hard to understand as it's simlar
to what I've been doing with custom log shipping scripts, but it's
something many new people to SQL Server have to dig into a little to
understand. A great, simple explanation was given on what mirroring is
and the limitations and restrictions. Like driving home the point the
primary can only be in full recovery mode, no simple or bulk-logged,
and that the mirror cannot have its recovery model changed. So
reporting cannot be directly done off the mirror.
Scripting mirroring isn't something that's easy to do. No wizard
buttons for it and while Profiler can be used, it's a hassle. Kim Trip
has a script that should go up on
SQLSkills.com, using SQLCMD, after TechEd.
If you haven't looked at SQLCMD, you should. As Kim mentioned,
it's got a lot of advantages in its scripts: variables, error handling,
loops, labels, etc. She also gives good reasons to be sure that your
directory structures are the same on the principal and mirror. Some
good tips and tricks to be sure that you don't get yourself into
trouble with this technology. Another big one is to be sure that you
don't get into trouble with the setup because of the timing of the
restore to allow your mirror to "catch up."
There's also a DMV for mirroring, sys.database_mirroring for those of you that don't want to work with the Mirroring Monitor.
In the Object Explorer, your database will show as eitherthe principal
or mirror and whether it's synchronized. The mirror also shows it's in
the "Restoring" state.
For monitoring, one of the cool things was seeing a small applet
that watches a table on both the principal and mirror and counts rows.
It actually connects to the principal, but the attempt to connect to
the secondary, it gets redirected to the mirror. Kind of an interesting
way to show the implicit redirection. If you put the mirror in your
string, then you'll move directly to the principal. If you want to
connect to the principal with a failover, then you need both servers in
your connection string. One issue with the connecting to the mirror
instance, is that it must be available. If it's down, you'll get an
error.
Failover can be via SSMS or T-SQl, as well as a real failure of your
principal of course :). The connections that exist, when something
fails, connections break and transactions rollover. So there isn't a
transaction movement from one instance to the other.
Be aware that if your mirror is down for a long time, you might have a
"Synchronizing" state for some time to get back in synch. That means
that you aren't really protected while things are catching up. Which
means you need to get your mirror back up as quickly as possible.
The witness checks both the principal and mirror for "aliveness",
but it's forming a quorum with one of the other two servers. This means
that it can die and it doesn't affect the log on the principal, or even
the transactions moving to the mirror. There is a problem if the
witness goes down in that if the principal loses connectivity to the
mirror, it thinks it might be down, and the mirror and witness can see
each other. So it closes the database down, which means losing the
witness and the mirror will bring down the principal. Not a great
situation.
A Reporting Mirror You can't use the mirror to report, but
you can create a snapshot on the mirror database. I had someone ask the
question at the Colorado Code Camp that we can create a snapshot on the
mirror, we can read the snapshot, which goes back to the mirror, why
can't we read the mirror? I asked and was told it's coming. Not sure
about Katmai, but should be out by at least the release after that.
Same for snapshot backups.
My answer at the Code Camp, when no one else had one, was "it's a feature" :)
Some interesting questions. What if you have a clsutered principal that
has a mirrored database. If you have a failover on the cluster, you
might have a mirror failover as well because of the time to restart the
cluster. You can change the mirror timeout to fix this.
Someone also asked if the principal/mirror transfer is a two-phase
commit. It's not two-phase commit, but a new algorithm that writes the
transaction to the Principal's log, then writes it to the Mirror log,
then commits on the Mirror, then commits it on the Principal. If the
Mirror is down, it still commits on the principal's log. If the witness
is down.
It's not complicated, but there are some moving parts and it can get confusing in some scenarios.
No license required for mirror being used a mirror only. If you report off it, you got to get a new license for that server.
Snapshot files look big, but check the space on disk. They're "sparse files" and are really small.
Online indexing, some blocking that could occur at the beginning and end of the process, but not that much.
A quick look at Peer to Peer
Because of time, this was just a 5 minute session. Different than
mirroring. Mirroring seems simpler, and more protective in that it
happens quickly.
Repl is more scalable because you can do partial replication, only move
certain data, move to multiple places. This means that you can have
multiple peers and have data move among them all with bi-directional
transactional replication between them all. Requires some app work, but
it's pretty cool.
A good talk on the tools from the guy that's responsible for SSMS,
Configuration Manager, Maintenance plans, Surface Area Configuration
Wizard, and a few more.
Initially I thought this might be a dud, with him showing some of the
basic features of Configuration Manager and Surface Area tools,but then
he showed some interesting tricks. Like how to connect to different
version or filter the types of connections in the registered servers
pane (hint, check the icon buttons). He also looked at filtering and
some of the scripting options available. Changing defaults is nice
because I've been frustrated by them in the past, with my scripts
requiring lots of search and replace.
He covered maintenance plans as well, showing some of the new options
in SP2. Sub plans, target servers, etc. No mention of the bugs that SP2
introduced, but it wasn't mentioned at all. Another great place for an
apology for the hassles, but it wasn't shown.
The summary page (RTM, SP1), became the Object Explorer Details. Paul
detailed a few other changes, which I'm not sure why they were made as
they confuse people, but it was good to see them listed.
One of the things I've worried about is configuration changes over
time. Since this has required lots of work to track, it was great to
see the standard report of "Configuration Changes History" report. Now
it's coming from the default trace that is very lightweight, but it
only keeps 5 files, each 20MB, so you could lose a bunch of information.
These reports show up in Object Explorer with SP2, Before that, you'd
have to dig through the tool. You can make your own reports in BIDS
with a Report Project. Create a query and then format the report.
Ignore the report server and deployment folder. Load these into SSMS
and you have lots of data available. There are tons of reports, so
search around for the dashboard performance reports.
Lots of work went into the Database Turning Advisor, making it its own product.
Profiler allows SSAS traces in SS2K5. One of the cool things is integrating the System Monitor with a trace to correlate events.
The Best Practices Analyzer, gives a good check on your instances. It's
come out about 18 months after RTM, which is better than SS2K, when it
took 4 years before it was released. Look for this to be updated with
the Katmai RTM. Results are linked to BOL, so this is a nice
integration of information. They also include some rational for the
recommendations.
We're not partners anymore in SQLServerCentral.com, but Andy, Brian,
and myself got together last night for dinner. We haven't seen each
other since the PASS Summit and this was a good time to get together.
We had a nice dinner, reminiscing about the site a bit, them giving me
some feedback on how they think I'm doing and a little strategy talk
about our training business at End to End Training. Our first mentoring
class is this week, which should be interesting.
I ended up driving around Orlando this afternoon to get books. We'd
printed the next version of our Two Minute SQL Server Stumpers and I'd
had it shipped down to a friend's house. He didn't make it down to
TechEd, so I drove up and picked up the books. So I spent the afternoon
running around Orlando and with some dinner plans. I cut out early from
the conference.
I did spend some time talking with some Microsoft developers talking
about Katmai and their plans and lined up a few more interviews, so
look for those soon. I met up with Tony Davis of Simple Talk and we
strategized about some new features that we'd like to bring you
soon.
Tomorrow will be a long day.
I dropped into Brian Knight's "Clustering in 60 Minutes" to say hi. If
you've never seen this, it's pretty cool and worth the watch. Brian
makes clustering look pretty easy, and he really sets up a cluster in
the session from scratch, so check it out. It's updated for 2005, but
I've seen it, so heading to the SQL IO Demystified session.
And I can't report on it. There are at least 15 people standing outside
trying to listen in to this "informal talk" with the room packed and
people sitting all over the floors. Can't really hear, so I talk to the
people from PASS and say to to a few Microsoft developers I've met over
the years.
These will get pretty long as I'm kind of wordy, but these are impressions as things are occurring.
Registration is a little crazy. I'm not sure who desgined this
conference, but it's split among the two sides of the conference center
in Orlando. I've attended 3 or 4 Tech Eds, a couple PDCs, and half a
dozen other conferences. In all of the cases, the entire conference is
in one building. A big one, but one building. Here the keynote is in
the west center and the show/expo, and the backpack you get loaded with
marketing material and swag, is in the South center. If you've never
seen the Orlando convention center, that's a big deal. There's a 1/2
mile walk outside in the Florida heat between them. Not a big deal to
me, since I like the heat, but I sweat, so it's not so nice for my
neighbors.
Keynote
I arrived late, and it was packed. If you've attended others,
you know that it's usually the case you have 4 or so large screens. It
seems most conferences have moved towards dual screens, showing the
speaker on one and a computer on the other. The recent BI conference
had 2 pairs of screens. TechEd has 6 pairs. It's a big show.
Talking about models for optimization of your IT processes. It's
a good idea and I think we need more reference models and also most
explanations on how to map our internal IT processes to the model.
Energizer - The first video was from Envergizer, which was also
a featured company with the BI conference. They must be a heavy
Microsoft user. Their initial plan was to be 2 years behind any release
of software. Sharepoint is a featured, which shows just how much focus
is being put on thie particular piece of software.
The focus on IT this year seems to be working on an agile,
rapidly changeably infrastructure, being able to move in real-time.
Gartner
- Connections more pervasive
- reponse time expectations are shrinking.
- technology can really enable differentiation.
- relationships are online and short-lived (is this true? Or has
business pushed us this way? In some ways it has, in others it hasn't.)
- Agility is hard. It seems that the gap between agile and
not-so-agile companies is growing. Agility is sensing a change and
efficiently and rapidly responding.
- 70% of IT budgets maintaining. But successful companies have moved that to 50-50.
- Focus on costs, cost based on usage, not more efficiency. A utiltiy model
- Quality of service with management based on
policies. Matching up what IT can do with the business needs. Not
everything needs 5 9s or subsecond response.
- be agile. infrastructure is more dynamic.
- Need to build other metrics besides cost to allow you to measure this and improve it.
- Ask business what they need and translate those needs into measures and begin to work with those measures.
- Gartnew has an IT maturity model
- Focus on process, technology, and culture. All three are important.
- Don't use long-termprojects. They usually
fail. Work on smaller pieces instead with ROI measured in each stage.
Not just cost, but also quality, and agility. Usually it is cost
recovery in early stages with later stages needing investment to gain
on quality and agility.
XEN open source software is including Microsoft interoperability
in their distrobutions. Licensing agreements are occuring, so it seems
thatt Microsoft is trying to work with open source software.
There was a good docus on the interoperability and the efforts
their making to work with other software and embrace stadnards. I'm not
sure how much I believe it, but they do seem to be working a bit better
with ther systems. Is it Microsoft or the toher companies?
The Back to the Future car was on stage and Christopher Lloyd
showed up on stage a few times. Microsoft does go all out at
TechEd and they usually have some godo guests.
Windows Server 2008 Demo
- Server Core - Minimal installation option for
specialized installation. This means that a file server canjust have
the file server services. Good for virtualization. IIS, file server, AD
server can be roles. Only a command line interface for server admin.
- Virtual Server lets you use multicore cpus.
- System Center has a virtual server manager. VMWare
machines can be converted to MS virtual machines. Built on powershell.
- Physical servers can be converted to virtual machines.
- move VMs from one host to another. Quick Migration is the option that you can do today with Virtual Server.
- Demo was a bit fast, but well delivered.
- New model driven management tools
- System Center, Operations Manager, modeling
applications for a company. Operations Manager can see errors, like web
erors. There is a "Problem Path" section that can highlight all the
components showing errors. A lot of design and documentation work to
get things set, but it certainly could assist your production team in
managing servers. Not sure about security, rights, etc., but it's a
good idea in general.
- Katmai - Policies in Management Studio, and
we can set policies for servers. This allows us to set settings for
standards and I can now see if a database is not set according to
policy. Things like db options, etc. These can rollup to System Center.
SML is a theme. This seems like the next wave of XML use in Microsoft.
We had the XML for visual displays in Vista, and now we're seeing SML
for system modeling.
Services are important. One of the four pillars that Microsoft is
pushing is the need for software services. SOA is definitely something
to watch and learn a bit more about. BizTalk has been working with
"services in the cloud" where disparate companies can easily set up
services that they can use to talk to each other. Not sure how well
this will take off, but it's interesting.
Kind of a cool demo showing the Dundas technology Microsoft has
licensed for Reporting Services 2008. An easy to build report with a
map control in it. At the BI conference, I also saw the Excel and Word
integration technologies for building reports. The rest of the demo was
a bit drab. Not a great presented, and it seemed that they were trying
to push Biztalk, but I didn't really see it as a great technology to
use. Maybe I don't get it, but I think if I don't, then most people
won't.
It's nice to see that Microsoft hasn't forgotten the user experience.
The fourth pillar was absed on the user interface, and featured
:Expression, silverlight, and Office with Visual studio. VS2008 was
used with the office tools included. An extension to Outlook was built.
I can see people getting a little crazy with this, but it's a good
idea. Getting more, rich Outlook messages is a good step to ensuring
people stick with the platform. And it makes for a better experience as
well. But it does tie you tighter to Microsoft. I'm ok with it, but
some people may be concerned. I think this is mostly for internal mail,
so I'm not sure what would happen when it transitions outside your
domain.
Losing my battery, more later.