Home

Optimizing data model: Configuring databases

 

The best way to solve performance problems - just do not let them ...
Optimizing SQL Server database begins with selecting the correct configuration database and data model. You can improve performance, expanding the database indexes of various types and more powerful hardware, but completely eliminate the disadvantages of the data model is still not succeed. The consequence of unsuccessful configuration database or data model may be too great response time, locked or hung transaction, incorrect or inaccurate results in the preparation of business reports, data sync, data inconsistencies and the inability to make query to retrieve the desired data. But the failure data model - not the only cause of such problems. For example, the slow response of the system can be the result of overloading the server. Unsuccessful combination transaction updates from the conflicting application may hang or lock. You should always carefully examine the causes of problems. If you can not find the overloaded processor or a conflict between two transactions that trying to acquire a monopoly of information resource, you must carefully consider the configuration database and data model, they can cause of trouble.

To build a database, whose behavior is completely consistent with expectations, is necessary to first examine its foundations. Start at the with the optimization of the operating system. Then you must configure the database to support the correct data model and preserve their integrity. Advance in considering the configuration of basic elements, you can prevent many problems of productivity and create conditions for improving the overall performance database.

Preparing the medium

Optimizing SQL Server starts with the operating system Windows. SQL Server can operate only in an environment Microsoft, so only important condition for success - understanding the system administrator Windows. Two important parameters of the database server - a file system and the paging file. For SQL Server use the file system NTFS - it is more stable and more secure than FAT, although it is believed that write operations are performed in a little faster than FAT. When you configure the paging file rule of thumb for virtual memory - set a static size of 1.5 times the size of physical memory. In addition, if any server component, such as a network card or hard drive, pass standby mode after a period of inactivity, it should be banned from crossing into standby (or instruct the system administrator to do so). Better not to risk the need to "cold-boot to activate the computer components. In the multiprotocol environment, make sure that TCP / IP - The first in a set of protocols. If the network connections have low bandwidth, you need to reference value of time-out during registration exceeded the time of registration applications using the database.

In addition to the optimization of the operating system to work with SQL Server should increase fault tolerance environment. For reliability and performance, I recommend used for SQL Server array RAID. RAID solution can be costly, but the funds allocated for its purchase, will be well spent if deliberately choose the best type of RAID for a given environment, identifying parts of the database that you want to protect in the first place. Explanations different types of RAID systems and their benefits, see the sidebar «SQL Server on the array RAID».

Optimizing SQL Server performance is impossible without the proper database configuration. To improve the speed of the database should include main types of data to be stored in a database, for selected groups of files. It is necessary to separate the system tables from user tables, data the indices, tabulated data from the images, text, and n-text (ntext used for storing strings of characters Unicode). Applying this scheme to separate data on several groups of files, you can build a highly scalable database. Recall that scalability - it is possible to increase number of processed transactions without compromising performance. In small systems, you can collect all the file group on a single disc (with the exception of the journal transaction, which should always be stored on another disk, separate from other data). With the expansion of the system, increasing the number of users and volume data, you can move the different groups of files on separate disks, thereby distributing the workload between multiple drives. By separating database into several groups: the control file backup simpler. You can use a group of files to backup very large databases (very large database, VLDB) in a time window reserved for copying a database. More information on the use of groups of files backup, see the article Kimberly Tripp "Not A Sound of Thunder" on the site www.windowsitpro.ru at http://www.osp.ru/win2000/sql/200309sq477.htm. Groups of files can be used for horizontal separation, which is described in the article "Return to Life" at http://www.osp.ru/win2000/sql/admsecrets/401_1.htm (www.windowsitpro.ru). When designing high-performance database file group - a useful tool that will help avoid problems before starting work.

The source code in Listing 1 shows my usual method of constructing the database. Each group of files has three names: the name of a group of files, the logical file name and physical file name. These names can be seen by opening the Properties window, any database, and then selecting the tab Data Files. On this tab, the elements column correspond Filegroup PRIMARY and names of groups of files that are in the first part of Listing 1. Elements in the column Location - is the names physical files, which include the full path to the storage of the physical file on the hard disk. SQL Server system tables should be placed in group PRIMARY, and user tables and indexes - to the appropriate group of files, separate from the system tables, SQL Server. Images and text data stored in its own group of files, as shown in the snippet of source code with the label A (Listing 2).

Configuration parameters that follow the command CREATE DATABASE, established in accordance with standard ANSI SQL-92. Perhaps one or more parameters need to be changed in accordance with the requirements of a particular company. Make sure that the specified configuration parameters are compatible to the environment.

Integrity - a prerequisite

The next step in optimizing database performance - configure SQL Server to enforce referential integrity. For example, between the tables Store and Sale in Listing 2, adapted from the database of pubs, there is a relation of dependence. Sale (Sale) can not be exercised without regard to the warehouse (Store). Referential integrity means that the business relationship is realized by one of two ways. You can assign this task application outside SQL Server or SQL Server to provide an opportunity to establish this rule. In my opinion, the maintenance of static rules, such as referential integrity between the Store and Sale, to provide a better database. Source text for the rule is compiled once (fragment labeled B in Figure 2). Then, SQL Server uses rule for all users of the database. If we introduce a rule with the application, it may disappear from future versions of this program or would not recognized by other applications that access the same data. This can lead to violations of referential integrity, and possibly distort the data.

In addition to ensuring the integrity of links within SQL Server, I recommend to use the DRI (declarative referential integrity - declarative referential integrity), instead of triggers or stored procedures. DRI - the restriction and limitation shall be executed more efficiently than the triggers or stored procedure, especially if the data base used in the massive operation to load the data.

Listing 2 shows how to organize DRI between tables Store and Sale. As noted above, these two tables are adapted from the database pubs. I changed columns and put in the parameters needed in a well-organized work environment and also added some columns to illustrate influence the data model for performance. CREATE TABLE statement for table dbo.Store adds a new column, StorePhoto, which contains the image. It should be noted that in the last line of the CREATE TABLE statement data are sent to the storage group files MyDatabase_data, but related images will remain in the group of files MyDatabase_image.

Foreign keys. I always ask for restrictions on primary and foreign keys separately from the statement CREATE TABLE, to be able to manage the names restrictions. You can assign a column as primary key or foreign key when creating a table, but I do not do that. I prefer separate ALTER TABLE statements for the constraints of the primary and foreign keys. When designing the data model has to constantly make changes. Therefore necessary be able to quickly identify various constraints.

Obviously, the standard names restrictions appointed SQL Server, are significant. For example, FK__sales__stor_id0AD2A005, the name of the original constraints foreign key between Stores and Sales in my copy database pubs, clearly refers to the foreign key in the table Sales, which uses a column stor_id. However, the constraint name can not easily understand how a table is referenced. My naming convention, FK_Sale2Store, shorter, and, although my name does not contain information about the column, it says that depends on Sale Store. Besides, I do not need to query to get full name restrictions. Suffice it to follow the standard set forth for the names of foreign key constraints. The name of any foreign key constraints will begin with FK_, followed by the name of the dependent table, Number 2, and finally, the name of an independent table.

Indexing. SQL Server does not automatically builds an index for the foreign key columns (in contrast to the primary key in table). In a production environment can use columns in the foreign and primary keys for mergers. Therefore, when creating a new database should build an index for each column foreign key in each dependent table. Typically, these indexes are not clustered. In a future article I will talk about clustering but for now suffice it to mention that the clustering is performed on a column or columns to be the most heavily used for data extraction. You can even wait until the database will approach or reach the production stage, before making a decision about clustering.

Illustration of this approach to clustering - a clustered index in the fragment C Listing 2, which was built for the Sale of the table using StoreCode and SaleID. Sales reports will be issued daily or hourly and sorted first by shopping (submitted StoreCode), then by SaleID. Values SaleID increase during the day (SaleID - ID), so they actually represent the serial numbers of records; both they and the values SaleID increase with each new sale. Most reports are summary reports with summary data for individual shopping. StoreCode with SaleID - an excellent candidate for clustering. With a clustered index faster retrieval of data, because data already ordered - in this case, first on the Store Code, and then, within each value of the Store Code, on SaleID. Actually, StoreCode links table table Sale Store. The presence of two indices - one cluster, one non-clustered, each of which begins with StoreCode, possibly redundant, but the operation of the database you will see how SQL Server uses (or not use) these indexes. If SQL Server does not trigger nonclustered index, then it can be safely dropped. However, at the design stage is recommended to build a separate index for each column foreign key, even if initially it will lead to the redundancy indices.

If possible, you must define default values and control limits (check constraint specifies the range of values of the elements of columns) in database data, rather than at the application level for the same reason, on which is preferable to entrust the management of database referential integrity. Constraints -- database objects, so they are faster and more efficiently than the code in an external application. If the rules, which can be expressed by default, and the control constraints are static - do not change every week or every month, then these rules can be defined as restrictions on columns or tables. In the statement CREATE TABLE dbo.Sale Listing 2 column SaleDate has a default value CURRENT_TIMESTAMP. This rule is unlikely to change all the operation of the database, so it is convenient to define a restriction column. The next rule, SalePayTerms, expressed as a control limit. SalePayTerms can not be NULL, and initially SalePayTerms defined values Net 30, Net 60, and On Invoice. The application can not change This set of values. If you want to add value in a set, you should change this restriction column. If it is assumed that a set of values will change frequently, more convenient way to introduce restriction - look-up table to draw up terms of payment for purchases, assign to each row of the table unique value as an identifier and to compare the look-up table with dbo.Sale in the ratio of one-to-many (1: M). Attitude between look-up table and dbo.Sale be dependent relationship, in the same way as the ratio between dbo.Store and dbo.Sale.

Optimizing database - a huge and important topic that requires understanding of the database environment and a great experience. Begin optimization is convenient to study the data model and business requirements. Only then can configure the data model for high performance.

Rambler's Top100