Thursday, March 8, 2012

Aggregate multiple columns with different SELECT criteria

Let me start with saying thanks to all of you who have helped me (I'm a SQL newbee after doing OO for the past 12+ years)

I need to do several aggregates on multiple columns, with each column having different SELECT Criteria.

Sample Data:

Dept Project Cost CostFlag Schedule ScheduleFlag
D1 D1P1 495 1 135 3
D1 D1P2 960 2 70 2
D1 D1P3 1375 3 105 2
D1 D1P4 1050 2 160 3
D1 D1P5 1890 3 40 1

D2 D2P1 650 1 155 3
D2 D2P2 890 2 125 2
D2 D2P3 1235 3 85 1
D2 D2P4 430 1 140 3

D3 D3P1 1960 3 45 1
D3 D3P2 1490 3 85 1
D3 D3P3 1025 2 135 3
D3 D3P4 615 1 100 2
D3 D3P5 270 1 70 1
D3 D3P6 815 2 155 3

I need to calculate MEAN (average), Standard Deviation, Variance, Range, Span & Median for each data column (Cost, Schedule in the test data), where each data column has different selection criteria. I have the calculations working for each column individually (e.g. funcCalcCost, funcCalcSchedule), but I need to return the calculated values as a single data set:

SELECT Dept, Project, AVG(Cost) as Cost_Mean, MAX(Cost) - MIN(Cost) as Cost_Range, .......

WHERE CostFlag = @.InputParameter

GROUP BY Dept, Project

The code above works great - but only for a single column. I need to return a dataset like this:
Dept Project Cost_Mean Cost_Range
D1 D1P1 495 135
D1 D1P2 960 70
D1 D1P3 1375 105

I need to return a dataset like this:

Dept Project Cost_Mean Cost_Range Schedule_Mean Schedule_Range
D1 D1P1 495 135 100 28
D1 D1P2 960 70 42 12
D1 D1P3 1375 105 91 38

I also have working code calculate the MEDIAN (what a pain that was, thank god I found a code example to get me going on the MEDIAN)

Thanks!

Did you try

SELECT Dept, Project, AVG(Cost) as Cost_Mean, MAX(Cost) - MIN(Cost) as Cost_Range,AVG(Schedule) as Schedule_Mean,MAX(Schedule) - MIN(Schedule) as Schedule_Range
WHERE CostFlag = @.InputParameter

GROUP BY Dept, Project

?

|||The query above will not produce correct results.

Each data column being aggregating has its own unique "Flag" column (Cost - CostFlag, Schedule - ScheduleFlag)- the value of the "Flag" column determines if that record should be included in the dataset to be used during the aggregation.

I do have seperate individual queries that produce the correct aggregated values, but only for 1 specific data column.

There is a different SELECT condition for each column of data I am trying to process:

Below are 2 queries and their resulting datasets:

Aggregate Cost:

SELECT Dept, Project, AVG(Cost) as Cost_Mean, MAX(Cost) - MIN(Cost) as Cost_Range, .......

WHERE CostFlag = 2

GROUP BY Dept, Project


Dataset returned:
Dept Project Cost_Mean Cost_Range
D1 D1P1 495 135
D1 D1P2 960 70
D1 D1P3 1375 105

Aggregate Schedule:

SELECT Dept, Project, AVG(Schedule) as Schedule_Mean, MAX(Schedule) - MIN(Schedule) as Schedule_Range, .......

WHERE ScheduleFlag = 3

GROUP BY Dept, Project


Dataset returned:
Dept Project Schedule_Mean Schedule_Range
D1 D1P1 100 28
D1 D1P2 42 12
D1 D1P3 91 38

I need to return a single dataset with all of the calculated columns - like this:

Dept Project Cost_Mean Cost_Range Schedule_Mean Schedule_Range
D1 D1P1 495 135 100 28
D1 D1P2 960 70 42 12
D1 D1P3 1375 105 91 38

Could I somehow use a JOIN to combine the 2 datasets produced by the 2 different queries?

Thanks everyone for your help

|||

Now, if you explained more clearly, the things are very easy to do :

1.build with your last 2 selects 2 views :

viewCost and viewSchedule

2. you can use the following query :

select a.Dept,a.Project,Cost_Mean,Cost_Range,Schedule_Mean, Schedule_Range

from viewCost a innner join viewSchedule b on a.Dept=b.Dept and a.Project=b.Project

|||I'm lost on using 2 views & the query (I am a total T-SQL greanbean, learning as I go on this project).

I have no idea where to code the WHERE conditions to select based upon each of the "Flag" values?

This is how I am thinking about this (and I could be way way way off base here):

I need to have 2 different datasets - one for Cost and one for Schedule. The Where clause in the Query for the 2 datasets is different: using different columns in the condition.

for example:
for the "cost" query : WHERE CostFlag = 2 for the "schedule" query: WHERE ScheduleFlag = 3
|||

You can just derive them.

e.g.

Code Snippet

select *

from(

SELECT Dept, Project, AVG(Cost) as Cost_Mean, MAX(Cost) - MIN(Cost) as Cost_Range, ...


WHERE CostFlag = 2
GROUP BY Dept, Project

) tb1

join (
SELECT Dept, Project, AVG(Schedule) as Schedule_Mean, MAX(Schedule) - MIN(Schedule) as Schedule_Range, .......


WHERE ScheduleFlag = 3
GROUP BY Dept, Project
) tb2

on tb1.Dept=tb2.Dept and tb1.Project=tb2.Project

|||Thanks

Using derived tables did the trick!

No comments:

Post a Comment