Skip to main content

SQL Window functions (OVER, PARTITION_BY, ...)

Introduction


When you run an SQL Query you select rows, but what if you want to have a summary per multiple rows, for example you want to get the top basketball for each country, in this case we don't only group by country, but we want also to get the top player for each of the country.  This means we want to group by country and then select the first player.  In standard SQL we do this with joining with same table, but we could also use partition by and windowing functions.
For each row the window function is computed across the rows that fall into the same partition as the current row. 
Window functions are permitted only in the SELECT list and the ORDER BY clause of the query
They are forbidden elsewhere, such as in GROUP BYHAVING and WHERE clauses. This is because they logically execute after the processing of those clauses

Over, Partition By

So in order to do a window we need this input:

- How do we want to group the data which windows do we want to have?

so 

def createWindowOnRows(ByWhat): DataWithWindows


In order to specify the ByWhat part you use the OVER and also the PARTITION BY by simply specif

OVER (PARTITION BY depname)
Now that we know that we can partition the rows that we get back by department name, all we need to do is to combine it with a simple standard SQL query. 

The SQL query that we are going to combine that windowing function is a standard SQL query that contains aggregation functions but do not contain in our simple case a group by because we already do the group by with the partition by. 

So, our basic structure of simple query that would use over and partition by in order to window the data to get aggregated data as output is: 
SELECT x,y,count(z), OVER (PARTITION BY x) FROM MyTable
Note the following:

  1. We have a standard SQL SELECT statement
  2. One of the fields in our case in SELECT was in window we said OVER (PARTITION BY x)
  3. One of the fields in our case had an aggregation we had count(z)
  4. One of the fields (we had a total of 3) in select did not have any aggregation nor any window therefore we get an actual row for each such value in our case that was employee number, so we did get a row for each such employee number it was not aggregated but for each such employee number we did get the average salary for his department.
So what we see is that the window function 

performed a calculation across a set of rows that are somehow related to the current row.  The rows retain their separate identity.

It's as if the current row has looked around into multiple rows in the window and added response from the window into the current row.

You can treat the OVER as a function just as any other function like COUNT only in this case the function is a window function where in the input of this function you give it the way you want to partition your data your windowing.

Rank()


The window function will always contain an OVER clause, this is the main function, the main that we call. The partition by is how we ask SQL to divide the rows to window them.  

With rank you control the order of the rows that fall into the window add a new column with name rank according to the order by
SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary;
Note that it would produce a new value, the new value in the select is a new column named rank,  this new column would have the order that we specify in the ORDER BY salary DESC that resides inside the OVER function.

You can have the same values in the rank column you would have actually the same values across departments because the rank is computed per each window.

Comments

Popular posts from this blog

Dev OnCall Patterns

Introduction Being On-Call is not easy. So does writing software. Being On-Call is not just a magic solution, anyone who has been On-Call can tell you that, it's a stressful, you could be woken up at the middle of the night, and be undress stress, there are way's to mitigate that. White having software developers as On-Calls has its benefits, in order to preserve the benefits you should take special measurements in order to mitigate the stress and lack of sleep missing work-life balance that comes along with it. Many software developers can tell you that even if they were not being contacted the thought of being available 24/7 had its toll on them. But on the contrary a software developer who is an On-Call's gains many insights into troubleshooting, responsibility and deeper understanding of the code that he and his peers wrote. Being an On-Call all has become a natural part of software development. Please note I do not call software development software engineering b

Recursion Trees Primer

Recursion trees. Controlling the fundamentals stands at the cornerstone of controlling a topic.  In our case in order to be a good developer its not enough or even not at all important to control the latest Java/JavaScript/big data technology but what's really important is the basics.  And the basics in computer science are maths, stats, algorithms and computer structure. Steve wosniak the co-founder of apple said the same, what gave him his relative advantage was his deep understanding of programming and computer structure, this is what gave him the ability to create computer's which are less costly than the competitors (not that there were many) and by the way there were 3 founders to apple company one responsible for the technical side, one for the product and sales (Steve Jobs) and the third responsible for the company structure and growth, each of the three extremely important, it was not only the two Steve's but that's a topic for another episode. And with t

Building Secure and Reliable Systems

A recent book was published this year by Google about site reliability and security engineering, I would like to provide you a brief overview of it and incorporate my own analysis and thoughts about this subject while saving you some time from reading, at least part of it. Take a few of your customers and ask them, what are the top 5 features on my product that you like.  The answer that you are likely to get is, I really like how polished the UI is, or the daily report I get by mail is just fantastic, or since I started using your product I was able to save one hour a day my productivity got up and the share /chat button on document that you added recently is doing a great job. Your customers are very unlikely to answer the question of what top 5 features of my product do you like with I really like its security or I really like that we lost no chat messages since I started using it.  No real customer will even think of it, moreover, assuming you did a very good job, they won&#