Thursday, May 9, 2013

Dumping IO Stats to Powershell... Get em Girl!

Hello All,

A 2 part post this month, i guess i finally removed head from arse and got around to posting updates. First a super useful query for IO stats of log and data files. Please note these are at file level. This may exist already and im sure there are better ways to do this, but since i have been going through a ton of logs and data files at work recently, i figured it was worth a post.

[code]

SELECT DB_NAME(tblFileStat.database_id) AS databaseName ,tblFileStat.file_id ,tblMasterFile.physical_name ,tblIOPending.io_pending ,tblFileStat.io_stall_read_ms ,tblFileStat.io_stall_write_ms ,tblFileStat.file_handle ,tblIOPending.NumberofIOPendingRowsFROM sys.dm_io_virtual_file_stats(NULL, NULL) tblFileStatINNER JOIN ( SELECT io_handle , MAX(io_pending) AS io_pending ,COUNT(*) AS NumberofIOPendingRows FROM sys.dm_io_pending_io_requests GROUP BY io_handle) tblIOPendingON tblFileStat.file_handle = tblIOPending.io_handleINNER JOIN sys.master_files tblMasterFileON tblFileStat.database_id = tblMasterFile.database_idAND tblFileStat.file_id = tblMasterFile.file_idORDER BY tblFileStat.io_stall_read_ms DESC

[/code]

Second for a little bit of comic relief a list of DBA jargon i am entitling WTF did he just say. Note I have never in 15 years heard any of these phrases used but I still think they are pretty funny. http://www.brentozar.com/archive/2012/10/learn-speak-dba-slang-terms/

What the hell was that constraint called?

Over the years I have been guilty so many times of writing a rollback script that rolls column additions to a table. Which is essentially a script that will drop the added columns. When i create a new column with a default value i ALWAYS use a constraint name which i created. In the event you inherit someone else database who does not (not mine of course) :) forgetting to drop the default value constraint (which is something only an inexperienced DBA would do NOT ME of course) :)

As luck would have it if you forget to add your own name SQL Server will add one for you. with a very easy to remember name. If DF__Beta_foo__bar__5629CD9C is easy to remember that is. So all of that being said, here is a quick script to get a list of all the constraints and the tables / columns they are on so you can easily add them to your rollback script.

 SELECT  
   b.name AS TABLE_NAME, d.name AS COLUMN_NAME, a.name AS CONSTRAINT_NAME,  
   c.text AS DEFAULT_VALUE  
 FROM  
   sys.sysobjects a  
 INNER JOIN (  
        SELECT name, id FROM sys.sysobjects WHERE xtype = 'U'  
       ) b  
 ON ( a.parent_obj = b.id )  
 INNER JOIN sys.syscomments c  
 ON ( a.id = c.id )  
 INNER JOIN sys.syscolumns d  
 ON ( d.cdefault = a.id )  
 WHERE  
   a.xtype = 'D'  
 ORDER BY  
   b.name, a.name  

Toyota and Database nothing in common, until NOW!

Turn the key, hear the engine catch go to put it in drive and... shit Darn it golly gee a stall. This is NOT just for cars anymore. One of the biggest performance issue indicators I have ever used in practice to check for DB issues has been an IO Stall test metric. In days of old, you know when 16 bit gaming consoles made you the man if you had one, and Mortal Kombat was THE GAME to play.. (still is now by the way if anyone is on PSN and wants to take a beating ProphetDE come find me) I had no monitoring software. So I used the script below and dumped the results to a text file. Now that tools have come so far (As have video game consoles) I am using Red Gate's SQL Monitor. The new version gives you the ability to add custom metrics. Below I will explain IO Stalls, what they mean, and how to track them.

What causes an IO stall?
In our case there are 3 main causes of IO stalls.

1. A poorly performing disk - Now a days this is almost ALWAYS caused by a misconfiguration in the SAN. I now understand why SAN Admins get paid as much as they do.

2. Poorly defined / written queries. I don't just mean the obvious Oh Johnny used a cursor to update 22.7 million rows one at a time because he thinks set based operations are the antichrist. I mean all issues some silly some not so much.

3. Overloaded disks. You may or may not hear the term "data bursts" often. I will not go into the explanation. What I will say is there is a wonderful all inclusive article on disks, how they work, how they don't work and data bursts, which can be found here: http://www.symantec.com/connect/articles/getting-hang-iops

Needless to say just like you do not want your Toyota to stall, you do not want your database to stall either. Troubleshooting database stalls, and disk contention issues can be for lack of a better term a real pain in the proverbial arse. The script will essentially check the average time of reads and writes in MS / number of total reads and writes. Ideally you never want to exceed 100MS. This is just a general number with larger systems you may and that could be fine. The 100MS baseline is simply the guideline i use.

[code]
SELECT CAST(SUM(io_stall_read_ms + io_stall_write_ms) / SUM(1.0 + num_of_reads + num_of_writes) AS NUMERIC(10,1)) AS [avg_stall]
FROM sys.dm_io_virtual_file_stats(DB_ID(), NULL)
WHERE FILE_ID = 2;
[/code]

Now the all important question: "John that metric just returned 12,948,102MS what the hell do I do"
Well grasshopper its not that simple... Let me rephrase it COULD BE that simple, but may not be. If you are using a virtual environment, the first thing I would recommend trying would be to allocate a few more processors and memory to the cause. A good start would be to double what you currently have. This may also be practical for your physical environment and if it is bravo!!

Secondly (or if adding hardware is a no go) I would recommend getting very familiar with SQL profiler. Go through and run those overly complex stored procedures you inherited from Bob who is now working at Kinko's because he decided it would be a good idea to write them. "What do you mean writing dynamic SQL using a cursor to update 22.7M rows one at a time for only one field is a bad idea?!" Try taking just one of those and re-writing er... re-factoring it. Then run again, you will be surprised how often I hear "Wow all i did was re-write one sproc and I am getting a 22% performance boost.

Finally I would recommend a SAN Admin. If you do NOT have one, and your company does not want to staff a proper one fine. I would then say get one to re-configure the SAN and document the new subsystem for you. If your infrastructure is to large and contracting someone, or hiring someone perm is a no go, I would make a strong case to management why they need to fork over the 4 - 6K and get someone trained in the area.

Trust me I didn't think it was a necessary evil... it IS. Use that script and keep the above in mind, and I can assure you your Toyota will be purring like the day you got her off the show room floor and took her for that first test drive.

Django? ill Pass.

Disclaimer - I use Microsoft Technologies. A few jobs and hacks aside I always have. I am the first person to admit that I know NOTHING about Python, or how the Django framework operates aside from my limited experiences with it. That said...