Friday, 1 August 2014

Database View's


A view is a virtual table which can be created over a single or multiple tables.
Eg: CREATE VIEW view_name1 AS (SELECT col1, col2 FROM table_name WHERE col2>25)
OR
CREATE VIEW view_name2 AS (SELECT t1.col1, t1.col3, t2.col1, t2.col4 FROM table_name1 LEFT OUTER JOIN table_name2 ON t1.col2=t2.col2 )

Internals:

  1. There will not be any physical existence for a view.
  2. Actually only the select query which is used in creating a view will be saved under the view name (“view_name” in the above example).
  3. View is a separate database object like other's (table, procedure, function, login, trigger, cursor, sequence and etc.,).
  4. So, a view can be queried like table.
    Eg: SELECT colx, coly FROM view_name1
  5. The above query will be thought of the below:
    Eg: SELECT colx, coly FROM (SELECT col1, col2 FROM table_name WHERE col2>25)-> “This query is taken from the creation statement of view 'view_name1' “.

Are the Views Up-datable?

  1. As we already know that there is no physical existence of view and we have already seen that, data is retrieved from original underlying table when the view is queried.
  2. So, if we update a view then the underlying table should be updated. And the update operation should get executed without any problems.
  3. Hold on..!
  4. Yes, view are up-datable. But only when they are created on a single table as in 'view_name1'.
  5. Hmm. Why so ???
  6. See the creation statement of 'view_name2'. This view is created on joining two tables. In this example in particular, all the column values of the table 'table_name2' will be displayed as NULL's in the result set except the values which satisfy the join condition (ON t1.col2=t2.col2). But the original table 'table_name2' will not be having NULL values in those positions of the result set.
  7. For this reason views which are created from multiple tables cannot be updated.

Usage:


  1. Views are created to grant limited access to the users on the database tables. A security measure.
  2. It will be very efficient to retrieve the actual desired data rather than the entire table. This can be done by creating a view on the desired data and then retrieving that data from the view. A performance measure.

Sunday, 6 July 2014

Rules of GroupBy Clause


Group functions are the built-in SQL functions that act on groups of rows and return one value for the entire group.

Rules of GroupBy clause

The name may be a Column reference. Such a Column is called a grouping Column: its values will be grouped for the final result.

* The group by clause should contain all the columns in the select list expect those used along with the group functions.

Example:
Let us take the below query,
SELECT location, dept, SUM (salary) 
FROM employee 
GROUP BY location, dept;

In the above query, if the columns 'location' and 'dept' are not specified next to GroupBy clause, then it results an error.


* It is upto us to include aggregated columns next to GroupBy. Meaning, in the above mentioned example it is upto us to include column 'salary' next to the GroupBy clause.

* We cannot include a column next to GroupBy, if that particular column is not selected.

Monday, 19 May 2014

Handling the completely filled Transaction Log problem


Is your Transaction Log full ? And are you not able to execute transactions because of error saying "Transaction Log is full" ?

Problem Explanation:

In SQL Server, we have three recovery models.
  1. Simple Recovery Model
  2. Full Recovery Model
  3. Bulk-Logged Recovery Model.
An SQL Server database has two files by default namely,
  • Data file “.mdf”
  • Log file “.ldf”
This database should be managed under one of the earlier mentioned recovery models.

NOTE: Taking Transaction Log Backup is the only way to truncate(empty) the log file.
In Simple Recovery model,
SQL Server automatically truncates the log file in regular intervals.
That is the reason why we are not allowed to take a transaction log backup in this mode of recovery (And there is no need to take transaction log backup also in this mode).
So, coming to the main point, log file will never get fulled in this mode of recovery.
In Full Recovery model,
Burden of maintaining and controlling the transaction log is completely left on the DBA.
So, If we dont take regular log backups of the database, then log file will get filled up until the threshold (Refer the NOTE mentioned earlier).

Listed below are some other regular scenarios when transaction log file of SQL Server database grows:
  • If there are no scheduled log backups or log backups are not working.
  • If there are active transactions in the log file
  • If there are long running queries
  • In database mirroring if mirror server fails
  • In replication if distributor fails
  • If there is no disk space to grow the Transaction Log file
  • If bulk data was imported without dividing into multiple batches or transactions.

Problem Solution:

When the transaction log becomes full, SQL Server Database Engine issues a 9002 error.

Follow the below mentioned steps to take a transaction log backup.
  1. After connecting to the appropriate instance of the SQL Server Database Engine, in Object Explorer, click the server name to expand the server tree.
  1. Expand Databases, and select a desired database.
  1. Right-click the database, point to Tasks, and then click Back Up. The Back Up Database dialog box appears.
  1. In the Database list box, verify the database name. You can optionally select a different database from the list.
  1. Verify that the recovery model is either FULL or BULK_LOGGED.
  1. In the Backup type list box, select Transaction Log.
  1. Optionally, you can select Copy Only Backup to create a copy-only backup. Leave this here.
  1. Either accept the default backup set name suggested in the Name text box, or enter a different name for the backup set.
  1. Optionally, in the Description text box, enter a description of the backup set.
  1. Choose the type of backup destination by clicking Disk or Tape. To select the paths of up to 64 disk or tape drives containing a single media set, click Add. The selected paths are displayed in the Backup to list box.
  1. To remove a backup destination, select it and click Remove. To view the contents of a backup destination, select it and click Contents.
  1. To view or select the advanced options, click Options in the Select a page pane.
  1. Select an Overwrite Media option, by clicking one of the following:
  • Back up to the existing media set
    • For this option, click either Append to the existing backup set or Overwrite all existing backup sets.
    • Optionally, select Check media set name and backup set expiration to cause the backup operation to verify the date and time at which the media set and backup set expire.
    • Optionally, enter a name in the Media set name text box. If no name is specified, a media set with a blank name is created. If you specify a media set name, the media (tape or disk) is checked to see whether the actual name matches the name you enter here.
    • If you leave the media name blank and check the box to check it against the media, success will equal the media name on the media also being blank.
  • Back up to a new media set, and erase all existing backup sets
    • For this option, enter a name in the New media set name text box, and, optionally, describe the media set in the New media set description text box.
  1. In the Reliability section, optionally, check:
  • Verify backup when finished.
  • Perform checksum before writing to media, and, optionally, Continue on checksum error.
  1. In the Transaction log section:
  • For routine log backups, keep the default selection, Truncate the transaction log by removing inactive entries.
  • To back up the tail of the log (that is, the active log), check Back up the tail of the log, and leave database in the restoring state.
  1. If you are backing up to a tape drive (as specified in the Destination section of the General page), the Unload the tape after backup option is active. Clicking this option activates the Rewind the tape before unloading option.
  1. SQL Server 2008 Enterprise and later supports backup compression. By default, whether a backup is compressed depends on the value of the backup-compression default server configuration option. However, regardless of the current server-level default, you can compress a backup by checking Compress backup, and you can prevent compression by checking Do not compress backup.

After taking transaction log backups, in-active log will be truncated and empty space is created for the upcoming transactions. We can come out of the problem.

In order to prevent the occurrence of this problem in future, log backups should be scheduled at regular intervals according to the server traffic when we have opted for the Full Recovery mode.

Note: The process detailed above will only empty the log file. It will not reduce the log file size.The side reduction of database files is achieved by shrinking them. Shrinking is not a suggested operation. It is always good to maintain adequate log size by taking regular backups. 

Friday, 4 April 2014

Database Snapshots


A Snapshot is a read only copy of the source database as it existed when we created snapshot. 

The following list are some of the properties of database snapshots:
  • Snapshot database is completely dependent of the source database.
  • If a source database corrupts, the snapshot will also be corrupted.
  • We can even restore the entire source database from the snapshot to the time of snapshot creation with below syntax
RESTORE DATABASE db_name
FROM DATABASE_SNAPSHOT = 'snapshotdb_name'

We can run the below query to list all the database snapshots created in a particular database instance.
SELECT * FROM sys.databases 

  •  In the result of the above query, the value of 'source_db_id' column will be NULL for all the databases. Because these are original databases.
  • So, if the column 'source_db_id' has some value other than null, it indicates that that is a database snapshot. 
We can directly list the database snapshots with the below query.
SELECT * FROM sys.databases 
WHERE 
source_database_id IS NOT NULL 

 Concept of Database Snapshots

  • When a database snapshot is created, an empty 'sparse file' will be created with exactly the number of     pages that the source database consists.
  • So, essentially a snapshot starts up empty.
  • The database snapshot preserves each page of the source database when the time it (snapshot) was created.
  • If few pages are modified in the source database, then those pages ( just before writing the modifications to the disk ) will be copied to the snapshot ( into empty space of those particular pages in the previously created sparse file ).
You can visualize the before mentioned concepts in the below figure:


Not able to understand? Try seeing the below figure:


Got it..!
You can also visualize the above concept in the below animation.

  • Snapshot size on disk increases with the updates or modifications made on the source database.
  • Source database and the snapshot database will be accessing the same data files.
  • That is the reason why, we can run the snapshot on a different host ( source database should be present in this host up and running or available always ) so that the work load can be shared between the source and snapshot database as long as the read-only operations as concern.
  • Database snapshots not only give the point in time view of the source database, it also gives point in time transactionally consistent view of the source database.
i.e, if there are any active transactions at the time of snapshot creation, then they must be rolled back.
If a page is copied into the snapshot because of a transaction, then if this transaction rolls back, then the page that is added into the snapshot will not be removed ( One-way operation ). 
This page copying is done synchronously which affects the performance.  

  •  Source database and the snapshot database should be located in the same instance. 
Now when a query is submitted on the snapshot database and if the required pages are not present in it, then those pages are brought from source database to serve the query.
No locks taken on reads from the source database when copying the page to snapshots.

Monday, 24 February 2014

"SELECT" Statement


I will take you through two steps in which an SELECT statement will get processed...

Optimizing a Select


1. A SELECT statement is non-procedural.
2. A database server needs to optimize a SELECT statement because it is non-procedural (we cannot know   the exact number of steps that the server would undergo to execute the query).
3. Query optimizer does this optimizing query.
4. Tables, views definitions, indexes and statistics are input to the query optimizer and execution plan is its      output.

A query execution plan is a definition of following:

1. Sequence in which the source tables are accessed.
  • Usually accessed in any way (top to bottom or bottom to top) and it does not matter.
2. The methods used to extract data from each table.
  • Table scan
  • Index scan
  • Index seek

SQL Server Query optimizer:

  1. SQL server query optimizer is cost-based.
  2. It selects a plan which has reasonable cost and returns results faster.
  3. Optimizer relies on statistics which helps in estimating the costs of the resources required to process a query.

Processing a SELECT statement


1. Parser scans the SELECT statement and breaks it into logical units such as keywords, expressions,    operators and identifiers.
2. A query tree or a sequence tree is built describing the logical steps needed to transform the source data  into result set format.
3. The optimizer analyzes different ways the source tables can be accessed.
4. Then the query tree is updated with the exact series of the steps which gives fast results while consuming  less resources.
5. The final optimized version of query tree is the execution plan.
6. The relational engine starts executing the execution plan upon getting the data from the storage engine.
7. The relational engine itself processes the data returned from the storage engine into the result set format and  sends to client.

Thanks for reading...!
Comments and issues are accepted with great respect.

Friday, 21 February 2014

Stored Procedure Vs User Defined Function


I think many of us are not aware of the exact differences between a user defined function and a procedure. So, i would like to share some of the basic differences between a function and a stored procedure from my understanding.

Stored Procedures:
1. Procedures can return more than one value.
2. Can have both input and output parameters.
3. Allow all types of sql statements irrespective of reading(SELECT statement) or writing( DML statements)     statements.
4. A procedure can call a function and a procedure can call another procedure.
5. Exception handling can be done in procedures.
6. Transaction management is allowed.
7. Procedures cannot be used in SELECT or HAVING or WHERE clauses.
8. Used to execute Business Logic.
9. Has a pre-compiled execution plan.
10. Can be written to do one or more tasks.
11. A procedure can return maximum of 1024 values.

User-defined Functions:
1. Functions can return one value which is mandatory.
2. Can only have input parameters.
3. Allow only SELECT  queries or reading statements to be written in them.
4. A function cannot call a procedure.
5. Exception handling cannot be done in functions.
6. Transaction management is not allowed.
7. Functions can be used in SELECT or HAVING or WHERE clauses.
8. Functions that return tables can be treated as another row-set. This can be used in join with other tables.
9. Inline functions can be thought of as views that take parameters and can be used in joining it with another row-set.
10. Used for Computations.
11. Not compiled before hand (pre-compiled).

Thursday, 13 February 2014

SQL Server Statistics

The people who are reading this blog should have a basic knowledge of 
1. Indexes in sql server
2. Index internals like representation, contents, storage and usage.
3. Clustered, Non-clustered, and covering indexes.
4. SQL Server Architecture (Query processing part)

Lets begin to observe the importance of statistics..

Indexes are not enough to gain the best performance from the queries submitted to the database engine.
In addition to indexes, additional, updated and accurate statistics should also be used to gain best performance.

The purpose of optimizer is not only finding a best plan, its purpose is to find a best plan fast. The optimizer uses the statistics and finds the best execution plan suited for the query as fast as possible.

Selectivity - Is the measure of the number of rows which satisfies the condition in the where clause.

In a where clause if the
1- condition is selective enough (i.e, very less rows are present which satisfies that condition), the index is      effective and the process chosen is Index Scan.
2- condition is not selective enough (more number of rows are present which satisfies that condition ), the index is not fully used and the process chosen will an Index Scan or a Table Scan.
3- If we cannot tell the number of rows that satisfies the condition, then the statistics come to play.

My last blog post was on index scan , index seek and table scan.

And in that blog, i explained about a simple case where they can take place. But in many cases, we cannot determine the selectivity that easily.

There we took a query as example,
SELECT emp_name FROM EMP
WHERE emp_id=2210

The selectivity(the number of rows that satisfies the condition in a particular query) can be easily determined from the above query.

Think of a query like below,
SELECT emp_name FROM EMP
WHERE emp_id like "%21%"

In this case, we cannot determine the selectivity that easily. Statistics are used here. And this is the situation for most of the real time queries.

- How can we know the number of rows without performing a table scan??
Even this issue will be taken care by the statistics.

Statistics
1. Statistics are the information kept by the sql server. 
2. They are accessed through the sys.indexes and sys.stats catalog views.
3. Aids the optimizer in estimating how many rows a query will return.
4. Automatically created for indexes to help and assess the usefulness of an index.
5. Can even be created on an un-indexed columns.
6. Can also be created on columns that are not the high-order elements of an index.

SQL Server Version         Number of indexes per table            Number of stats per table
         2005                                         249                                                  2000
         2008                                         999                                                 10000
         2008 R2                                    999                                                 30000


   

Note on Index scan, Index seek, Table scan

Data required for executing a query can be accessed in three different ways.
1. Index Scan
2. Index Seek
3. Table Scan

Let us assume a query like below,
SELECT emp_name FROM EMP 
WHERE emp_id = 2210 ;

And also assume that, all the explaination which i am giving below is valid for a query of above kind only.

If there is no index defined on any columns of the table which are involved in the where clause, then the method chosen will be a Table Scan.

If the column "emp_id" has an index on it, then the data required for the above query is accessed by the method of Index Scan. 

One should aware of the fact that, if there is an index defined on a column "x", then all the values of that column are stored in that index.

So, in the case of above query, index was defined on "emp_id". But we want to fetch the values of column "emp_name". So with the help of index on "emp_id", the storage engine can identify the address of a particular set of or single page in which the record related to condition "emp_id=2210" satisfies. 
But in-order to fetch the values of the column emp_name, it needs to do the bookmark look-up (i.e, going to that particular page of the original table).

Then what's and index seek?

If suppose the query is in the below way,
SELECT emp_id FROM EMP 
WHERE emp_id=2210;

In this case the method chosen is Index Seek.
Meaning, if the column on which the index is defined is selected (needs to be fetched), then there is no need to perform any bookmark look-up because if there is an index on a column, then all the values of that column will be stored in that index.

Thank you for reading!

Any issues, queries and comments are encouraged with respect.

Tuesday, 4 February 2014

Configuring Log Shipping in SQL Server 2008

What is Log Shipping?
Log Shipping is a process that provides an alternative to replication. It starts with a source database (primary database), from which a full backup is taken. This backup is then restored as a new database, which is used as the copy database (secondary database). The source database then provides regular transaction log backups that contain all the changes made to it. These backups are used to restore the changes to the copy database. The process of taking transaction log backups, copying the backup files and restoring the backups is automated via SQL Server Agent. Multiple copy databases can be set up if required.

Log Shipping Features
■ Simple to set up and maintain
■ Reliable and robust
■ Source and copy databases can be completely remote from one another
■ Unlimited number of copy databases can be set up
■ Does not affect the performance of the source database
■ Available to SQL Server 2000*, 2005 and 2008.

Log Shipping Limitations
■ The copy databases can only be full copies of the whole source database; therefore, Log Shipping cannot be used to merge multiple sources into a single central copy database.
■ All copy databases have Read-Only attributes.
■ Automation requires SQL Server Agent; so realistically, Log Shipping is not available with SQL Server Express editions.
■ The backup strategy applied to the source database needs to take Log Shipping into consideration. (*SQL Server 2000 may be problematic in this area.)

Log Shipping Process
This process can be described by the following simple steps:
1. SQL Server Agent job fires on the source server and takes a transaction log backup of the source database, placing the backup file in a shared directory.
2. SQL Server Agent job fires on the copy server and copies the backup file from the shared directory to a
location on its own hard drive.
3. Another SQL Server Agent job fires on the copy server and uses the copied backup file to restore the transactions from the source database to the copy database.



How to do?

Generally log shipping is performed with two different server instances.
But here i am creating two instances in a single server and configuring the log shipping for "credit" database.

You can observe the two instances namely SQLSERVER2008 and INSTANCE1.



I am assuming SQLSERVER2008 as my primary server. So expand it and then right click on the credit database and take a full database backup.





In order to configure a database for log shipping, first we need to enable log shipping for that database. So, First right click on credit database -> select properties -> Transaction log shipping in left menu.




Now enable it to see the below screen.


Click on backup settings button, to see the below screen.



We need to give a network path for a backup share folder even if the backup folder is present in the local machine. Use backup compression option according to your knowledge towards that. And click on "ok" finally when you are done.


Upon clicking "ok",we get the below screen. Click on "Add.." button to add a secondary server.


Then click on "connect" button in the below screen and select the secondary instance in my case ( select any secondary server if  your secondary instance is not on your primary). In order to add more secondary servers, we can add any number of secondary server's with the help of before screen.

And then select the secondary database in the very next drop down list.


In this step you can specify how to create the data on the secondary server. You have three options: create a backup and restore it, use an existing backup and restore or do nothing because you have manually restored the database and have put it into the correct state to receive additional backups.

Since we already taken a full backup of the database in the beginning, we go for the second option. And we need to click on the "Restore Options.." button. Then appear's a new screen in which we need to give the path for the ".mdf" and ".ldf" files of the database in the primary server's instance.



Upon clicking "ok" button, add the path for backup file in the primary server as below.


Mention the folder path in the "copy files" tab for storing the copy of full backup ( which is a result of copy job ) in the secondary server.

In "Restore Transaction Log" tab, select the option the you desired as per your requirement. I am selecting the "standby" option. Click on "ok" upon that.


Now it is upto you to have a monitor server or not. We can perform log shipping without monitor server.
If desired, select the check box in the below screen.


You get the below screen upon clicking that check box followed by "Settings.." button.

Click on "Connect" button to add a monitor server. I am adding it as my secondary server.



Now click on "ok" in the above screen and "ok" in the following screen. You can the see the progress of log shipping configuration.


Click on "Close" upon completion.


Now go back to the secondary server instance and expand it. And then expand the databases folder. You can see the "credit" database in a standby mode (Result of our log shipping configuration).


Now to get the standby database into online, execute the below command in the management studio.


You get an error if you are using the same database which you are trying to get online for executing the above command as below.


Change the using database to master as below.



And the click on "execute" button. You will get the below screen.


Now you can refresh the databases folder for seeing our "credit" database online.


This in my first blog. So, please post some comments upon reading this so that i will improve and correct myself if i need to in my next blog posts.

Thank you everyone for reading!

GROUPBY VS PARTITIONBY