Friday, 17 November 2017

GROUPBY VS PARTITIONBY

They're used in different places. PARTITION BY is analytic, while GROUP BY is aggregate. In order to use PARTITION BY, you have to contain it with an OVER clause.

GROUP BY

1. It normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row.
2. It actually groups the result set returning one row per group, which results therefore in SQL Server only allowing in the SELECT list aggregate functions or columns that are part of the GROUP BY clause (in which case SQL Server can guarantee that there are unique results for each group).
GROUP BY modifies the entire query, like:
select customerId, count(*) as orderCount
from Orders
group by customerId

PARTITION BY

1. It divides the result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.
2. It does not affect the number of rows returned, but it changes how a window function's result is calculated.
3. It doesn't actually roll up the data. It allows you to reset something on a per group basis.
Partition by just works on a window function, like row_number:
select row_number() over (partition by customerId order by orderId)
    as OrderNumberForThisCustomer
from Orders

Thursday, 7 September 2017

Troubleshooting SQL Server Connectivity Issues

When you failed to connect to your SQL Server, the issue could be:
  •  Network issue
  • SQL Server configuration issue
  • Firewall issue
  • Client driver issue
  • Application configuration issue
  • Authentication and logon issue

Please go throught the below link on how to resolve these issues,

Best Practices

Server Side:
- Check that TCP/IP protocol is enabled in SQL Server Network Configuration section in SQL Server Configuration manager
- Check that SQL Server Service is Running.
- SQL Browser service status is “Running”. (if you have got an instance that is not by default with port 1433)
- Check that Firewall is configured to allow acces to SQL Server by opening the desired port.
- Verify whether Remote Connections are accepted by the server or not.
Client Side:
Verify that the Connection string is correct and pointing to the right server. Be sure on the instance name.
- Check whether the Server is reachable from client machine or not.(use Ping)
- Also try connecting to Server machine out of your application.(use SQLCMD)

Page Life Expectancy


Page Life Expectancy is the number of seconds the average page of data has been in the buffer pool. Keeping the data in memory gives SQL Server quicker access to it instead of making it long by accessing the same from disk.

You can find the value of the PLE by running the following query.

SELECT [object_name],
[counter_name],
[cntr_value] FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE '%Manager%'
AND [counter_name] = 'Page life expectancy'


The recommended value of the PLE counter is (updated: minimum of) 300 seconds. I have seen on busy system this value to be as low as even 45 seconds and on unused system as high as 1250 seconds. Page Life Expectancy is the number of seconds a page will stay in the buffer pool without references. In simple words, if your page stays longer in the buffer pool (area of the memory cache) your PLE is higher, leading to higher performance as every time request comes there are chances it may find its data in the cache itself instead of going to the hard drive to read the data.

Also Refer

Friday, 1 September 2017

SQL Server Resource Governor

Difference between Inner Join and Left Semi Join

Semi joins will not have any syntax in SQL.
The following are the logical join operators that SQL Server supports:
  • Inner join
  • Outer join
  • Cross join
  • Cross apply
  • Semi-join
  • Anti-semi-join

An INNER JOIN returns the columns from both tables. A LEFT SEMI JOIN only returns the records from the left-hand table. It's equivalent to (in standard SQL):

SELECT name 
FROM table_1 a 
WHERE EXISTS(
SELECT * FROM table_2 b WHERE (a.name=b.name))

If there are multiple matching rows in the column of right-side table, an INNER JOIN will return one row for each matching column, while a LEFT SEMI JOIN only returns the matching rows from the left table. That's why you're seeing a different number of rows in your result.

Example:

Suppose there are 2 tables TableA and TableB with only 2 columns (Id, Data) and following data:

TableA:
Id Data
1 DataA11
1 DataA12
1 DataA13
2 DataA21
3 DataA31

TableB:
Id Data
1 DataB11
2 DataB21
2 DataB22
2 DataB23
4 DataB41

Inner Join on column Id will return columns from both the tables and only the matching records:
Id Data Id Data
1 DataA11  1 DataB11
1 DataA12  1 DataB11
1 DataA13  1 DataB11
2 DataA21  2 DataB21
2 DataA21  2 DataB22
2 DataA21  2 DataB23

Left Join (or Left outer join) on column Id will return columns from both the tables and matching records with records from left table (Null values from right table):
Id Data Id Data
1 DataA11  1 DataB11
1 DataA12  1 DataB11
1 DataA13  1 DataB11
2 DataA21  2 DataB21
2 DataA21  2 DataB22
2 DataA21  2 DataB23
3 DataA31

Right Join (or Right outer join) on column Id will return columns from both the tables and matching records with records from right table (Null values from left table):
Id Data Id Data
1 DataA11 1 DataB11
1 DataA12 1 DataB11
1 DataA13 1 DataB11
2 DataA21 2 DataB21
2 DataA21 2 DataB22
2 DataA21 2 DataB23
4 DataB41
    
Outer Join (or Full outer join) on column Id will return columns from both the tables and matching records with records from left table (Null values from right table) and records from right table (Null values from left table):
Id Data Id Data
1 DataA11 1 DataB11
1 DataA12 1 DataB11
1 DataA13 1 DataB11
2 DataA21 2 DataB21
2 DataA21 2 DataB22
2 DataA21 2 DataB23
3 DataA31
4 DataB41
    
Left Semi Join on column Id will return columns only from left table and matching records only from left table:
Id Data
1 DataA11
1 DataA12
1 DataA13
2 DataA21

Please refer the below link for other types of semi joins.
https://blog.jooq.org/2015/10/13/semi-join-and-anti-join-should-have-its-own-syntax-in-sql/
https://blogs.msdn.microsoft.com/craigfr/2006/07/19/introduction-to-joins/
https://technet.microsoft.com/en-us/library/ms191255(v=sql.105).aspx

Thursday, 31 August 2017

Difference between Logical and Physical database joins

Most of us know about JOINS in SQL Server and their types. But do we really know how they are interpreted in SQL Server internally. Today I found lot of informative, interesting and important sources regarding Logical and Physical joins in SQL Server (links below).

JOINS are mainly classified into 2 types:

1. Logical Joins: These joins are simple joins that we apply in our SQL queries, like INNER JOIN, RIGHT/LEFT OUTER JOIN, CROSS JOIN, OUTER APPLY, etc.

2. Physical Joins: These are the joins that users don’t use/write in their SQL queries. Instead these are implemented inside SQL Server engine as operators or algorithms to implement the Logical Joins. Their types are Nested Loop, Merge and Hash.
For a particular SQL query when you try to view an Estimated Execution Plan or execute a query by selecting Actual Execution Plan, you can clearly see these Physical Joins under the Execution Plan tab in SSMS.


References for Physical Joins:
Nested Loop Joins: https://blogs.msdn.microsoft.com/craigfr/2006/07/26/nested-loops-join
Merge Joins: https://blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join/
Hash Joins: https://blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join/

Tuesday, 29 August 2017

To Start SQL Server In Single User Mode

There are certain situations where user wants to start SQL Server Engine in “single user” mode from the start up.

Starting SQL Server in single user mode is very simple process shown below:

Go to SQL Server Configuration Manager and click on SQL Server 2012(Any version that you use) Services. Click on desired SQL Server instance and right click go to properties. On the Advance table enter parameter ‘-m;‘ before existing params in Startup Parameters box.



Make sure that you entered semi-comma after -m. Once that is completed, restart SQL Server service. Once this is done, now you will be only able to connect SQL Server using sqlcmd.


Maximum number of rows that can be stored in Temp table or a Table variable.

Temp Table:

No, there is no records limit for temporary table (the limit is the disk space). But be careful, because temporary tables are physically created in tempdb database, and this database must be placed on the disk with appropriate size.
The differences between tempdb and any other database are minimal, especially when it comes to limits. If you can store it in a user table, you can store it in a temporary table as well. It does not have to fit into RAM as the tempdb is stored on disk just like any other database (only with more aggressive caching and less logging).

Table Variable:

As such the official MSDN site where the Maximum Capacity Specifications for SQL Server there is no such upper limit defined for table variables because it depends on the database size and the free memory available for the storage. You can also refer the MSDN forum discussing the same; Maximum Capicity of Table Variable

Temp table VS Table variable in SQL Server



There are a few differences between Temporary Tables (#tmp) and Table Variables (@tmp) listed below:

  1. As a rule of thumb, for small to medium volumes of data and simple usage scenarios you should use table variables.
  2. Temporary Tables are real tables so you can do things like CREATE INDEXes, etc. If you have large amounts of data for which accessing by index will be faster then temporary tables are a good option.
  3. Table variables can have indexes by using PRIMARY KEY or UNIQUE constraints. 
  4. Table variables don't participate in transactions and SELECTs are implicitly with NOLOCK. The transaction behaviour can be very helpful, for instance if you want to ROLLBACK midway through a procedure then table variables populated during that transaction will still be populated!
  5. When using a #temp table within a user transaction locks can be held longer than for table variables (potentially until the end of transaction vs end of statement dependent on the type of lock and isolation level) and also it can prevent truncation of the tempdb transaction log until the user transaction ends. So this might favour the use of table variables.
  6. If you will be repeatedly adding and deleting large numbers of rows from the table then use a #temporary table. That supports TRUNCATE (which is more efficient than DELETE for large tables) and additionally subsequent inserts following a TRUNCATE can have better performance than those following a DELETE.
  7. If the optimal plan using the table will vary dependent on data then use a #temporary table. That supports creation of statistics which allows the plan to be dynamically recompiled according to the data (though for cached temporary tables in stored procedures the recompilation behaviour needs to be understood separately).
  8. If the optimal plan for the query using the table is unlikely to ever change then you may consider a table variable to skip the overhead of statistics creation and recompiles (would possibly require hints to fix the plan you want).
  9. If the source for the data inserted to the table is from a potentially expensive SELECT statement then consider that using a table variable will block the possibility of this using a parallel plan.
  10. If you need the data in the table to survive a rollback of an outer user transaction then use a table variable. A possible use case for this might be logging the progress of different steps in a long SQL batch.
  11. Temp tables might result in stored procedures being recompiled, perhaps often. Table variables will not.
  12. Within stored routines, both table variables and temporary tables can be cached. The metadata maintenance for cached table variables is less than that for #temporary tables. This can cause additional contention on system tables under conditions of high concurrency. Additionally, when dealing with small quantities of data this can make a measurable difference to performance.
  13. You can create a temp table using SELECT INTO, which can be quicker to write (good for ad-hoc querying) and may allow you to deal with changing datatypes over time, since you don't need to define your temp table structure upfront.
  14. You can pass table variables back from functions, enabling you to encapsulate and reuse logic much easier (eg make a function to split a string into a table of values on some arbitrary delimiter).
  15.  If you're writing a function you should use table variables over temp tables unless there's a compelling need otherwise.
  16. Both table variables and temp tables are stored in tempdb. But table variables (since 2005) default to the collation of the current database versus temp tables which take the default collation of tempdb (ref). This means you should be aware of collation issues if using temp tables and your db collation is different to tempdb's, causing problems if you want to compare data in the temp table with data in your database.
  17. Global Temp Tables (##tmp) are another type of temp table available to all sessions and users.

GROUPBY VS PARTITIONBY