SQL Server – Breakup A Monolithic Data File (Database) Into Multiple Small Files And Breakaway From Shrinking!

If the SQL Server databases being manged are a few GB in size each, then having a single .mdf file for data is fine. However, if the size is anywhere in the hundreds of GB or more, then a single .mdf file is certainly not the way to go.

Related to this, please checkout my post on how to relocate datafiles to another drive/folder if that is your need.

If you do no already know, it is possible to have multiple files called secondary files for storing data besides the .mdf file. These files take a .ndf extension (recommended) and can help breakup a monolithic .mdf file into more manageable smaller files possibly across different drives. A good read on the subject is here.

A database can also have multiple filegroups (besides PRIMARY). In fact, my recommendation is to have separate filegroups for each logical business unit by which the data can also be physically separated.

CAUTION: This post only describes how to distribute/breakup within the same filegroup. It would be ideal to distribute to multiple filegroups which involves more work than what is outlined here.

Please read on…

Files and filegroups:

Do not confuse secondary files with filegroups. Filegroup is a logical structure. Every database as the PRIMARY filegroup by default. The data is stored in the .mdf file if you did not choose to create secondary files. Additional filegroups can be created. Each filegroup can have one or more physical datafiles. So, I could create something like this

  • MyDatabase
    • PRIMARY filegroup (logical structure)
      • D:\Data\Datafile_01.mdf
      • D:\Data\Datafile_02.ndf (note the .ndf extension)
      • D:\Data\Datafile_03.ndf
        • Table1 in PRIMARY filegroup
        • Table2 in PRIMARY filegroup
        • Table3 in PRIMARY filegroup
        • more tables and indexes in PRIMARY filegroup
    • FINANCE filegroup (logical structure)
      • D:\Data\Finance_Data01.ndf
      • D:\Data\Finance_Data02.ndf
      • D:\Data\Finance_Data02.ndf
        • TableSubLedger (dont use the “table” prefix/suffix. For illustration only)
        • TableAccounting
        • TableSalary
    • HR filegroup

You get the idea from looking at the hierarchy above. The data of tables in a filegroup may get distributed within one or more files in the filegroup. It does not HAVE to be distributed (just MAY).

Below is an image I made a long time ago to illustrate the hierarchy. Note that  both physical and logical structures are mixed, in the image.

SQLServerBasicStructure

Advantages of secondary data files

Ask the DBA’s who manage huge single-MDF databases and they will tell you all the reasons why that is bad

  1. Better I/O layout and performance – I/O load on the host gets distributed (depending on how the volumes/drives are setup) resulting in better performance and parallelism
  2. BackupsFilegroups can be backed up selectively instead of the whole database
  3. Relocate files easilyRelocating files to another drive is easier with smaller files than a multi-TB file!
  4. DBCC selective filegroups – DBCC can ignore read-only filegroups if you have set-up separate filegroups for archived read-only data vs active read/write data
  5. Copy files fast – It is easier to copy multiple files in parallel for any reason (attach/detach)
  6. Control over partition and filegroup alignment – This takes discipline and work at the beginning but rewards are enormous in terms of performance. This is an old whitepaper by Kimberly Tripp but still the concepts still apply.
  7. Corruption: Any physical corruption would possibly be isolated to a single file and if the filegroup to file ratio is good and if you are lucky, only a portion of the database will be affected.
  8. …plus a lot more that I cannot think of at the moment but will add as I remember. You are welcome to add comments too!

How to breakup a monolithic database or datafile?

You are probably reading this post because you have experienced the pain yourself and I dont want to waste anymore of your time and get right to the steps involved in breaking up a huge database or a datafile

  1. Check the space situation on your host
  2. Get the space usage by files for the big database/datafile in question
  3. Decide on number of files to add/location
  4. Add multiple secondary datafiles
  5. Distribute data from big datafile into the new datafiles using EMPTYFILE option
  6. Shrink the big datafile and set a maximum size
  7. Change the default filegroup

1. Check the space situation on your host

I use the Get-DBADiskSpace PowerShell function that is part of DBATools. It gives me a nice output of all the drives/volumes with free-space information (something like below)

PS C:\Windows\system32> Get-DbaDiskSpace -ComputerName MYSQLHOST | ft -AutoSize

Server      Name         Label      SizeInGB FreeInGB PercentFree BlockSize
------      ----         -----      -------- -------- ----------- ---------
MYSQLHOST D:\          APP             500   160.79       32.16      4096
MYSQLHOST B:\          DATA            500   181.8        36.36      4096
MYSQLHOST E:\          DATA             50    49.89       99.79      4096
MYSQLHOST L:\          LOG             100    99.88       99.89      4096
MYSQLHOST T:\          TEMP             10     9.94       99.44      4096

2. Get the space usage by files for the big database/datafile in question

Before you could create secondary datafiles, you need to understand the current picture of your database when it comes to files, their sizes, the free-space within them etc. If your database is 500 GB but only 5 GB of it is used, you still have a 500 GB file but almost all of it is empty. Your strategy is suddenly not to breakup a 500 GB file!

I use this query that I found somewhere on StackOverflow (I think)

SELECT fg.data_space_id AS FGID,
   (f.file_id) AS File_Id,
   -- As provided by OP, size on disk in bytes.
   CAST(f.size AS FLOAT) * 8.00 * 1024 AS Size_On_Disk_Bytes,
   ROUND((CAST(f.size AS FLOAT) * 8.00/1024)/1024,3) AS Actual_File_Size,
   ROUND(CAST((f.size) AS FLOAT)/128,2) AS Reserved_MB,
   ROUND(CAST((FILEPROPERTY(f.name,'SpaceUsed')) AS FLOAT)/128,2) AS Used_MB,
   ROUND((CAST((f.size) AS FLOAT)/128)-(CAST((FILEPROPERTY(f.name,'SpaceUsed'))AS FLOAT)/128),2) AS Free_MB,
   f.name,
   f.physical_name
FROM sys.database_files f
    LEFT JOIN sys.filegroups fg
    ON f.data_space_id = fg.data_space_id

This gives you something like this (numbers are made-up):

FGID        File_Id     Size_On_Disk_Bytes  Actual_File_Size  Reserved_MB    Used_MB     Free_MB     name            physical_name
----------- ----------- ------------------- ----------------- -------------- ----------- ----------- --------------- -----------------------------------------------------
1           1           1094479652864       529.979           530699         521639       9060       ABC_DWH         E:\data02\MSSQL12.ABC02\MSSQL\DATA\ABC_DWH.mdf
NULL        2           6945505280          6.469             6623.75        1849.51   4774.24       ABC_DWH_log     L:\log02\MSSQL12.ABC02\MSSQL\DATA\ABC_DWH_log.ldf

3. Decide on number of files to add/location

Using the information from above, I know exactly where the space is and how to distribute my files. As in my case, you may have standards on which drive can hold what type of file (data/log/temp/backups etc). Plan how may files you need to add. In my case above, I have a 500 GB+ database. I may want to have 17 files each with 30 GB maximum size. If I had to go with the made-up numbers of Step #1 above I don’t have adequate space in any single drive for all the files. I have to distribute and/or add more space to certain drives but you get the point of doing these steps.

4. Add multiple secondary datafiles

Below I am adding secondary datafiles to the PRIMARY filegroup each with 5GB initial size, 5GB autogrow increment with a 30GB maximum size, all on E:\Data folder.

Adding files with TSQL:

USE master
GO

ALTER DATABASE [ABC_DWH]
ADD FILE ( NAME = N'ABC_DWH_01',
FILENAME = N'E:\Data\ABC_DWH_01.ndf' ,
SIZE = 5GB, MAXSIZE = 30GB, FILEGROWTH = 5GB )
TO FILEGROUP [PRIMARY]
GO
ALTER DATABASE [ABC_DWH]
ADD FILE ( NAME = N'ABC_DWH_02',
FILENAME = N'E:\Data\ABC_DWH_02.ndf' ,
SIZE = 5GB, MAXSIZE = 30GB, FILEGROWTH = 5GB)
TO FILEGROUP [PRIMARY]
GO
ALTER DATABASE [ABC_DWH]
ADD FILE ( NAME = N'ABC_DWH_03',
FILENAME = N'E:\Data\ABC_DWH_03.ndf' ,
SIZE = 5GB, MAXSIZE = 30GB, FILEGROWTH = 5GB)
TO FILEGROUP [PRIMARY]
GO

…and more files

Adding files via the interface:

Adding with TSQL makes it really easy since you just have to keep incrementing the “_01” in the first file and copy/paste but if you like doing it from the UI, it is painful but it works just as fine. Just make sure that you choose non-default values that are appropriate for each of the columns including the file location, initial size, growth and maximum size (use the scroll bar to scroll right to see/set all the columns).

DB_Properties_Files

5. Distribute data from big datafile into the new datafiles using the EMPTYFILE option

At this point we want to move the data over from the big file into the newly created secondary files and empty the big file so that it can later be shrunk to free-up physical space.

Distributing data and emptying files with TSQL:

Using the “EMPTYFILE” option, distribute the database (first example) or a existing large datafile (second example) into the newly created empty datafiles.

--Distribute the primary database .mdf file and empty it
USE [ABC_DWH]
GO
DBCC SHRINKFILE (N'ABC_DWH' , EMPTYFILE)
GO

--Distribute an existing large secondary .ndf file and empty it
USE [ABC_DWH]
GO
DBCC SHRINKFILE (N'ABC_DWH_04' , EMPTYFILE)
GO

Distributing data and emptying files with UI:

The option to do the same in the UI is shown here..

RightClickShrinkFile
Right-click on the database and..
ShrinkFile
Empty file option of shrink

Watch progress!

This operation can take quite a while depending on the size of the file you are trying to distribute. No so surprisingly, you can watch the progress of the move using the query from step #2 (Get the space usage by files for the big database/datafile in question). Watch the Used_MB and Free_MB columns of the large file chosen above for data redistribution.

6. Shrink the big datafile and set a maximum size

The data from the big file has been and the contents emptied out but still the physical file size would be the original size when we started at step 1. We need to manually shrink the file and reset some of its properties to make sure that it does not grow back to be the monster it was.

USE [ABC_DWH]
GO
DBCC SHRINKFILE (N'ABC_DWH', 5000)
GO

At this point, we have physically reclaimed the space with DBCC SHRINKFILE! The size of the big file is now 5 GB.

Set limits on the big file!

You should use the opportunity to set bounds on the file now. You would use the same dialog as the one used in Step 4 to add a datafile but only this time, we are modifying the existing file settings. Here I am setting the maximum size for ABC_DWH_04 file to be 25 GB. You can set your own size based on your need.

USE [master]
GO
--File should grow to maximum of 25 GB only
ALTER DATABASE [ABC_DWH] MODIFY FILE ( NAME = N'ABC_DWH_04', MAXSIZE = 25GB )
GO

If you get the below error, the file has not been shrunk down yet.

Msg 5040, Level 16, State 1, Line 13
MODIFY FILE failed. Size is greater than MAXSIZE.

Verify and make sure that the file has really shrunk as there are circumstances where it may say that it finished successfully but the file might not have shrunk. You just have to play with the options for file shrinking until the choice you pick works.

Note: If it is too stubborn and it does not shrink, try increasing the initial file size and then shrinking.  There seems to be a bug with shrinking completely empty files!

7. Change the default filegroup

If you did not many any effort, all tables are created in a database will be in the PRIMARY filegroup by default. This can and should be changed if you want to actively manage where tables should go. You can read more about this by Buck Woody here.

ALTER DATABASE ABC_DWH
  MODIFY FILEGROUP ABC_DWH_Default_FG
  DEFAULT;
GO

Now, you have some safeguards to prevent the PRIMARY filegroup from being the dumping ground for all new tables. I have heard from DBA’s (I still do that for some Oracle databases) who have set the PRIMARY filegroup so small and already full that no new tables/indexes will fit there forcing the developers to place them in the filegroup that is appropriate.

Conclusion

Hopefully, this post is useful enough for you to reorganize your database (like spring cleaning your home). We still did some shrinking above but without the negative aspects of shrinking. Once you are used to reasonably sized files, you will never go back to monolithic files! Please keep an eye out for my future post on how to breakup monolithic databases into separate filegroups too instead of just files.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s