Friday 17 November 2017

GROUPBY VS PARTITIONBY

They're used in different places. PARTITION BY is analytic, while GROUP BY is aggregate. In order to use PARTITION BY, you have to contain it with an OVER clause.

GROUP BY

1. It normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row.
2. It actually groups the result set returning one row per group, which results therefore in SQL Server only allowing in the SELECT list aggregate functions or columns that are part of the GROUP BY clause (in which case SQL Server can guarantee that there are unique results for each group).
GROUP BY modifies the entire query, like:
select customerId, count(*) as orderCount
from Orders
group by customerId

PARTITION BY

1. It divides the result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.
2. It does not affect the number of rows returned, but it changes how a window function's result is calculated.
3. It doesn't actually roll up the data. It allows you to reset something on a per group basis.
Partition by just works on a window function, like row_number:
select row_number() over (partition by customerId order by orderId)
    as OrderNumberForThisCustomer
from Orders

GROUPBY VS PARTITIONBY