Monday, 24 February 2014

"SELECT" Statement


I will take you through two steps in which an SELECT statement will get processed...

Optimizing a Select


1. A SELECT statement is non-procedural.
2. A database server needs to optimize a SELECT statement because it is non-procedural (we cannot know   the exact number of steps that the server would undergo to execute the query).
3. Query optimizer does this optimizing query.
4. Tables, views definitions, indexes and statistics are input to the query optimizer and execution plan is its      output.

A query execution plan is a definition of following:

1. Sequence in which the source tables are accessed.
  • Usually accessed in any way (top to bottom or bottom to top) and it does not matter.
2. The methods used to extract data from each table.
  • Table scan
  • Index scan
  • Index seek

SQL Server Query optimizer:

  1. SQL server query optimizer is cost-based.
  2. It selects a plan which has reasonable cost and returns results faster.
  3. Optimizer relies on statistics which helps in estimating the costs of the resources required to process a query.

Processing a SELECT statement


1. Parser scans the SELECT statement and breaks it into logical units such as keywords, expressions,    operators and identifiers.
2. A query tree or a sequence tree is built describing the logical steps needed to transform the source data  into result set format.
3. The optimizer analyzes different ways the source tables can be accessed.
4. Then the query tree is updated with the exact series of the steps which gives fast results while consuming  less resources.
5. The final optimized version of query tree is the execution plan.
6. The relational engine starts executing the execution plan upon getting the data from the storage engine.
7. The relational engine itself processes the data returned from the storage engine into the result set format and  sends to client.

Thanks for reading...!
Comments and issues are accepted with great respect.

Friday, 21 February 2014

Stored Procedure Vs User Defined Function


I think many of us are not aware of the exact differences between a user defined function and a procedure. So, i would like to share some of the basic differences between a function and a stored procedure from my understanding.

Stored Procedures:
1. Procedures can return more than one value.
2. Can have both input and output parameters.
3. Allow all types of sql statements irrespective of reading(SELECT statement) or writing( DML statements)     statements.
4. A procedure can call a function and a procedure can call another procedure.
5. Exception handling can be done in procedures.
6. Transaction management is allowed.
7. Procedures cannot be used in SELECT or HAVING or WHERE clauses.
8. Used to execute Business Logic.
9. Has a pre-compiled execution plan.
10. Can be written to do one or more tasks.
11. A procedure can return maximum of 1024 values.

User-defined Functions:
1. Functions can return one value which is mandatory.
2. Can only have input parameters.
3. Allow only SELECT  queries or reading statements to be written in them.
4. A function cannot call a procedure.
5. Exception handling cannot be done in functions.
6. Transaction management is not allowed.
7. Functions can be used in SELECT or HAVING or WHERE clauses.
8. Functions that return tables can be treated as another row-set. This can be used in join with other tables.
9. Inline functions can be thought of as views that take parameters and can be used in joining it with another row-set.
10. Used for Computations.
11. Not compiled before hand (pre-compiled).

Thursday, 13 February 2014

SQL Server Statistics

The people who are reading this blog should have a basic knowledge of 
1. Indexes in sql server
2. Index internals like representation, contents, storage and usage.
3. Clustered, Non-clustered, and covering indexes.
4. SQL Server Architecture (Query processing part)

Lets begin to observe the importance of statistics..

Indexes are not enough to gain the best performance from the queries submitted to the database engine.
In addition to indexes, additional, updated and accurate statistics should also be used to gain best performance.

The purpose of optimizer is not only finding a best plan, its purpose is to find a best plan fast. The optimizer uses the statistics and finds the best execution plan suited for the query as fast as possible.

Selectivity - Is the measure of the number of rows which satisfies the condition in the where clause.

In a where clause if the
1- condition is selective enough (i.e, very less rows are present which satisfies that condition), the index is      effective and the process chosen is Index Scan.
2- condition is not selective enough (more number of rows are present which satisfies that condition ), the index is not fully used and the process chosen will an Index Scan or a Table Scan.
3- If we cannot tell the number of rows that satisfies the condition, then the statistics come to play.

My last blog post was on index scan , index seek and table scan.

And in that blog, i explained about a simple case where they can take place. But in many cases, we cannot determine the selectivity that easily.

There we took a query as example,
SELECT emp_name FROM EMP
WHERE emp_id=2210

The selectivity(the number of rows that satisfies the condition in a particular query) can be easily determined from the above query.

Think of a query like below,
SELECT emp_name FROM EMP
WHERE emp_id like "%21%"

In this case, we cannot determine the selectivity that easily. Statistics are used here. And this is the situation for most of the real time queries.

- How can we know the number of rows without performing a table scan??
Even this issue will be taken care by the statistics.

Statistics
1. Statistics are the information kept by the sql server. 
2. They are accessed through the sys.indexes and sys.stats catalog views.
3. Aids the optimizer in estimating how many rows a query will return.
4. Automatically created for indexes to help and assess the usefulness of an index.
5. Can even be created on an un-indexed columns.
6. Can also be created on columns that are not the high-order elements of an index.

SQL Server Version         Number of indexes per table            Number of stats per table
         2005                                         249                                                  2000
         2008                                         999                                                 10000
         2008 R2                                    999                                                 30000


   

Note on Index scan, Index seek, Table scan

Data required for executing a query can be accessed in three different ways.
1. Index Scan
2. Index Seek
3. Table Scan

Let us assume a query like below,
SELECT emp_name FROM EMP 
WHERE emp_id = 2210 ;

And also assume that, all the explaination which i am giving below is valid for a query of above kind only.

If there is no index defined on any columns of the table which are involved in the where clause, then the method chosen will be a Table Scan.

If the column "emp_id" has an index on it, then the data required for the above query is accessed by the method of Index Scan. 

One should aware of the fact that, if there is an index defined on a column "x", then all the values of that column are stored in that index.

So, in the case of above query, index was defined on "emp_id". But we want to fetch the values of column "emp_name". So with the help of index on "emp_id", the storage engine can identify the address of a particular set of or single page in which the record related to condition "emp_id=2210" satisfies. 
But in-order to fetch the values of the column emp_name, it needs to do the bookmark look-up (i.e, going to that particular page of the original table).

Then what's and index seek?

If suppose the query is in the below way,
SELECT emp_id FROM EMP 
WHERE emp_id=2210;

In this case the method chosen is Index Seek.
Meaning, if the column on which the index is defined is selected (needs to be fetched), then there is no need to perform any bookmark look-up because if there is an index on a column, then all the values of that column will be stored in that index.

Thank you for reading!

Any issues, queries and comments are encouraged with respect.

Tuesday, 4 February 2014

Configuring Log Shipping in SQL Server 2008

What is Log Shipping?
Log Shipping is a process that provides an alternative to replication. It starts with a source database (primary database), from which a full backup is taken. This backup is then restored as a new database, which is used as the copy database (secondary database). The source database then provides regular transaction log backups that contain all the changes made to it. These backups are used to restore the changes to the copy database. The process of taking transaction log backups, copying the backup files and restoring the backups is automated via SQL Server Agent. Multiple copy databases can be set up if required.

Log Shipping Features
■ Simple to set up and maintain
■ Reliable and robust
■ Source and copy databases can be completely remote from one another
■ Unlimited number of copy databases can be set up
■ Does not affect the performance of the source database
■ Available to SQL Server 2000*, 2005 and 2008.

Log Shipping Limitations
■ The copy databases can only be full copies of the whole source database; therefore, Log Shipping cannot be used to merge multiple sources into a single central copy database.
■ All copy databases have Read-Only attributes.
■ Automation requires SQL Server Agent; so realistically, Log Shipping is not available with SQL Server Express editions.
■ The backup strategy applied to the source database needs to take Log Shipping into consideration. (*SQL Server 2000 may be problematic in this area.)

Log Shipping Process
This process can be described by the following simple steps:
1. SQL Server Agent job fires on the source server and takes a transaction log backup of the source database, placing the backup file in a shared directory.
2. SQL Server Agent job fires on the copy server and copies the backup file from the shared directory to a
location on its own hard drive.
3. Another SQL Server Agent job fires on the copy server and uses the copied backup file to restore the transactions from the source database to the copy database.



How to do?

Generally log shipping is performed with two different server instances.
But here i am creating two instances in a single server and configuring the log shipping for "credit" database.

You can observe the two instances namely SQLSERVER2008 and INSTANCE1.



I am assuming SQLSERVER2008 as my primary server. So expand it and then right click on the credit database and take a full database backup.





In order to configure a database for log shipping, first we need to enable log shipping for that database. So, First right click on credit database -> select properties -> Transaction log shipping in left menu.




Now enable it to see the below screen.


Click on backup settings button, to see the below screen.



We need to give a network path for a backup share folder even if the backup folder is present in the local machine. Use backup compression option according to your knowledge towards that. And click on "ok" finally when you are done.


Upon clicking "ok",we get the below screen. Click on "Add.." button to add a secondary server.


Then click on "connect" button in the below screen and select the secondary instance in my case ( select any secondary server if  your secondary instance is not on your primary). In order to add more secondary servers, we can add any number of secondary server's with the help of before screen.

And then select the secondary database in the very next drop down list.


In this step you can specify how to create the data on the secondary server. You have three options: create a backup and restore it, use an existing backup and restore or do nothing because you have manually restored the database and have put it into the correct state to receive additional backups.

Since we already taken a full backup of the database in the beginning, we go for the second option. And we need to click on the "Restore Options.." button. Then appear's a new screen in which we need to give the path for the ".mdf" and ".ldf" files of the database in the primary server's instance.



Upon clicking "ok" button, add the path for backup file in the primary server as below.


Mention the folder path in the "copy files" tab for storing the copy of full backup ( which is a result of copy job ) in the secondary server.

In "Restore Transaction Log" tab, select the option the you desired as per your requirement. I am selecting the "standby" option. Click on "ok" upon that.


Now it is upto you to have a monitor server or not. We can perform log shipping without monitor server.
If desired, select the check box in the below screen.


You get the below screen upon clicking that check box followed by "Settings.." button.

Click on "Connect" button to add a monitor server. I am adding it as my secondary server.



Now click on "ok" in the above screen and "ok" in the following screen. You can the see the progress of log shipping configuration.


Click on "Close" upon completion.


Now go back to the secondary server instance and expand it. And then expand the databases folder. You can see the "credit" database in a standby mode (Result of our log shipping configuration).


Now to get the standby database into online, execute the below command in the management studio.


You get an error if you are using the same database which you are trying to get online for executing the above command as below.


Change the using database to master as below.



And the click on "execute" button. You will get the below screen.


Now you can refresh the databases folder for seeing our "credit" database online.


This in my first blog. So, please post some comments upon reading this so that i will improve and correct myself if i need to in my next blog posts.

Thank you everyone for reading!

GROUPBY VS PARTITIONBY