Saturday, May 16, 2015

Data File and Transaction Log File Architecture

Data File and Transaction Log File Architecture


1) Explain the SQL Server Database Data File structure?


SQL server divides the data file into 8 KB pages and page is the smallest unit of any IO operation. SQL server refer the first page in the data file as page number 0.In all the data file first 9  pages (till page number 8) are in same order as shown below. In the primary data file, the 10th page (Page number 9) will be the boot page which stores the metadata about the database.

2) What is a Data Page?
Data rows with all data, except text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data.

3) What is an Index Page?
Contains information related to Indexes. This holds index records in the upper levels of a clustered index and all levels of non-clustered indexes.

4) What is a Page Header?
Page number 0- the file header (page type 15). It holds the file header information. There will be only one header page for each data file and that reside in the 0th position. The header page store the file level information like file size, minimum size, max size and file growth.

5) What is PFS?
PFS page is the second page (Page Number 1) in the data file followed by file header (Page Number 0). PFS pages are used to track page level allocation. PFS page repeat after every 8088 pages

6) What is a GAM Page?
GAM pages records what extents have been allocated for any use. GAM has bit for every extent. If the bit is 1, the corresponding extent is free, if the bit is 0, the corresponding extent is in use as uniform or mixed extent.

7) How many GAM pages will be there in a 7GB data file?
A GAM page can hold information of around 64000 extents. That is, a GAM page can hold information of (64000X8X8)/1024 = 4000 MB approximately. In short, a data file of size 7 GB will have two GAM pages.

8) What is SGAM page?
SGAM (Shared Global Allocation Map) page (Page type 8). SGAM pages record what extents are currently being used as mixed extent and also have at least one unused page. SGAM has bit for every extent. If the bit is 1, the corresponding extent is used as a mixed extent and has at least one page free to allocate. If the bit is 0, the extent is either not used as a mixed extent or it is mixed extent and with all its pages being used.

9) How many SGAM pages will be there in a 7GB data file?
A SGAM page can hold information of 64000 extents. That is, a SGAM page can hold information of (64000X8X8)/1024 = 4000 MB. In short, a data file of size 7 GB will have two SGAM page.

10) What is a DCM and its significance?
SQL Server uses DCM pages to track extent modified after the last full backup operation. DCM pages track the extents modified after the full backup.DCM has a bit for every extent it tracks. If the bit is 1, the corresponding extent is modified after the last full backup, if the bit is 0, there is no change in the corresponding extent after the last full backup

11) How SQL Server engine will decide which extents need to be added into a differential backup?
A differential backup process scans through DCP page to identify the extents which are modified after the last full backup and add those extents in the Differential backup.

12) What is a BCM and its significance?
This tracks the extents that have been modified by bulk logged operations since the last BACKUP LOG statement. If the bit for an extent is 1, the extent has been modified by a bulk logged operation after the last BACKUP LOG statement. If the bit is 0, the extent has not been modified by bulk logged operations.
Although BCM pages appear in all databases, they are only relevant when the database is using the bulk-logged recovery model. In this recovery model, when a BACKUP LOG is performed, the backup process scans the BCMs for extents that have been modified. It then includes those extents in the log backup.

13) What is Boot Page?
There is one special data page that exists only once per database. It is the database boot page. The database boot page is always stored in page 9 of file 1, the first file in the primary file group.
The database boot page contains a broad variety of data about the database itself. Here you can find the internal database version, the compatibility level, the database name and the date of the last log backup.

14) How many types of Allocation units are there?
There are three types of allocation units.
IN_ROW_DATA (default): If a table is relatively simple in desin (meaning record size is less than 8000 and no LOB data types are present), all records are stored in a pages refered as IN_ROW_DATA pages.
ROW_OVERFLOW_DATA: assume that a table is created with record size 12000 bytes having 4 varchar data types of size 4000 bytes. Whenever user inserts a record with size greater than 8000 (page size is 8K), then the excess data is moved to ROW_OVERFLOW_DATA pages. In simple terms, ROW_OVERFLOW_DATA pages will come in  to picture only when the row size exceed page maximum limit.
LOB_DATA: LOB data like text not stored along with data pages. LOB data is stored in pecial pages called LOB_DATA pages. 16 byte pointer in data page will be used to refer LOB_DATA page.

15) What is an IAM page?
IAM stands for Index Allocation Map: To catalog all the pages that belong to a single allocation unit, SQL Server utilizes a special type of pages, the Index Allocation Map or IAM pages. Each allocation unit has at least one IAM Page; depending on the size of the table there might be many. The page type of an IAM page is 10.

16) What is a ROW OFFSET and its importance?


Page header stored system information about the page. After the page header data row is stored serially. Row offset information is stored at the end of the page, after the data rows. Every data row has a row offset. Row offset stores information about how far the row is from the start of the page.

17) What are LOB pages?
SQL Server stores all data in 8192-byte sized blocks called pages. Several types of pages are in use within a typical database. One particularly interesting group is formed by the type-3 pages or Large Object Pages.

LOB_DATA Allocation Units
Most data types in SQL Server take up no more than 8000 bytes of storage. However, there are a few data types, which allow for larger pieces of information to be stored. Examples include the VARCHAR(MAX), VARBINARY(MAX) or XML data types.
Normal data pages that belong to a table are grouped in IN_ROW_DATA allocation units. However, if a value that is larger than 8000 bytes needs to be stored, SQL Server does not attempt to store it in those data pages anymore. It does not even store those values in the same allocation unit. Instead, Large Object data or LOB data is stored in special LOB_DATA allocation units.

18) What is a PAGE SPLIT?
Page splits are performed when there is no space in a data page to accommodate more inserts or updates. In a page spilt, SQL server removes some data from the current data page and moves it to another data page. This is done in order to make space for the new rows.

19) How Page Splits impacts the database performance?
Too many page splits can decrease the performance of the SQL server because of the large number of I/O operations.

Following remedies can be taken to avoid too many page splits:-
  • Rebuild indexes frequently to empty the fill factor space for more data.
  • Increased the Fill factor value after observing the page split behavior
20) How can we monitor or detect Page Splits?
Number of page splits can be observed by using the Performance Monitor and watch the SQLServer:Access Methods:Page Splits/sec counter

21) What is a Fill Factor?
The fill-factor option is provided for fine-tuning index data storage and performance. When an index is created or rebuilt, the fill-factor value determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth. For example, specifying a fill-factor value of 80 means that 20 percent of each leaf-level page will be left empty, providing space for index expansion as data is added to the underlying table. The empty space is reserved between the index rows rather than at the end of the index.

22) What is a Logical Read?
Logical Reads:
Logical read indicates total number of data pages that are found in the data cache and read from Data cache without reading anything from Disk.

23) What is a Physical read?
Physical Reads
Physical read indicates total number of data pages that are read from disk. In case no data in data cache, the physical read will be equal to number of logical read. And usually it happens for first query request. And for subsequent same query request the number will be substantially decreased because the data pages have been in data cache.

24) How can we view the content of a page?
To see the row data stored in that page , we have to use the DBCC PAGE command. The syntax of DBCC PAGE
dbcc page ( {'dbname' | dbid}, filenum, pagenum [, printopt={0|1|2|3} ]);
Print opt:
0 – print just the page header
1 – page header plus per-row hex dumps and a dump of the page slot array
2 – page header plus whole page hex dump
3 – page header plus detailed per-row interpretation
DBCC TRACEON(3604)
 GO
 DBCC page('test',1,114,3)

25) What is a Transaction log file?
A transaction log is a physical file in which SQL server stores the details of all transactions and data modifications performed on the database.  In the event of disaster, that causes SQL server to shutdown unexpectedly (Power failure/hardware failure), the transaction log is used to bring the database in a consistent state while restarting the server. On restarting the server, database goes through the recovery process.
During this recovery process, the transaction log is used to make sure that all committed transactions are written to respective data file (rolled forward) and rollback the uncommitted transaction.

Logically transaction log is a set of log records. Each record is identified by a log sequence number (LSN). The new log records are always written at the logical end of log file with a LSN which is greater than the previous one


26) What are Virtual Log files? How these are created?
SQL Server Database Engine divides each physical log file internally into a number of virtual log files. Virtual log files have no fixed size, and there is no fixed number of virtual log files for a physical log file. The Database Engine chooses the size of the virtual log files dynamically while it is creating or extending log files. The Database Engine tries to maintain a small number of virtual files. The size of the virtual files after a log file has been extended is the sum of the size of the existing log and the size of the new file increment. The size or number of virtual log files cannot be configured or set by administrators.

27) How VLFs are created SQL Server in a Transaction log file?
SQL server decides the size and number of VLF in a log file based on the size of the log file growth as given below.
Growth upto 64 MB        = 4 VLF
From 64 MB to 1 GB       = 8 VLF
Larger than 1 GB             = 16 VLF

28) What is the way to check Virtual Log file details?
DBCC loginfo(‘mydb’)

29) What is Log Sequence No?
Every record in the SQL Server transaction log is uniquely identified by a log sequence number (LSN). LSNs are ordered such that if LSN2 is greater than LSN1, the change described by the log record referred to by LSN2 occurred after the change described by the log record LSN.

30) What is a Log Truncation?
Log truncation is essential to keep the log from filling. Log truncation deletes inactive virtual log files from the logical transaction log of a SQL Server database, freeing space in the logical log for reuse by the physical transaction log. If a transaction log were never truncated, it would eventually fill all the disk space that is allocated to its physical log files. However, before the log can be truncated, a checkpoint operation must occur. A checkpoint writes the current in-memory modified pages (known as dirty pages) and transaction log information from memory to disk. When the checkpoint is performed, the inactive portion of the transaction log is marked as reusable. Thereafter, the inactive portion can be freed by log truncation.

31) How Does Log Truncation occurs?
Log truncation occurs automatically after the following events, except when delayed for some reason:
  • Under the simple recovery model, after a checkpoint.
  • Under the full recovery model or bulk-logged recovery model, after a log backup, if a checkpoint has occurred since the previous backup.



No comments:

Post a Comment