SQL tool for MSDE: Aggregation Optimization Issue

Okay, so I want to set particular levels to be aggregated and not others because I think I'm smarter than the 'random optimizer' and have a good idea of what dims are used more than others. And I want to deploy good aggs before the customer uses it and we can't know exactly what they are going to do, so usage based aggs are out for a month or two - until the customer gets onto the system.

So I can see there is the AllMemberAggregationUsage, which I want used for all dims, then there is the option AggregationUsage for each level. I am choosing various levels and setting it to Full, and others to None.

It is my understanding that the number of aggregations possible is equal to (Num of levels + All level) for each dim times eachother. So lets say you have 3 dims of 1 level, 2 levels, and 2 levels. That would be (1+1) * (2+1) * (2+1). So, if I put one level in each of the 2-level dims to None rather than Full then this should be 2*2*2 = 8, and if I put them all to Full this should be 2*3*3 = 18.

In my actual cube I have 10 or so dims with varying levels and that calculation comes out to 1,500. In MSAS 2000 I did this all the time and the number of aggs would roughly equal (although never exctly, and never quite knew why not) this calc I have above. But when I do this in SSAS 2005 it comes out to '11 aggregations'. What is up with that?

Maybe its just a bug on the display of the total agg number, because when running the aggs on a small cube (fact table limited to 10 mil rows, full dimensions there, largest 4 million) it has been running for several hours which would make sense for the number of aggs I gave it. Also note that during the window where it 'prepares' aggs (with the graph on the right) it would go from 1 to 5 to 10 to 15 back to 7 up to 10 down to 8 and then end at 11. Maybe it just isn't updating that number right and I should assume that the number of aggs is in fact the number from the calc above? Not sure...

Setting the AttributeUsage only tells the aggregation designer that it should or should not consider including that particular level in aggregations. That doesn't guarantee that the wizard will actually create an aggregation with that column included. For good reason sometimes.

Take your 4 million member dimension; if for the sake of the argument the other 9 dimensions have only 10 members then an aggregation of all those columns would be 4E6*10^9 = 4E15 possible combinations or aggregationrows. But the maximum number of combinations that actually exist in your dataset is the number of factrows it has. Thus such an aggregation is completely useless - you could just as well read the base table and this aggregation will never be designed.

Of course the actual number of groupings is dependent on the datadistribution. If some columns are very dependent on each other there will be far fewer actual combinations then the theoretical number. However in our example above and many others that won't really matter - it will still be bigger then the number of factrows and thus useless waste of space.

To account for the more closerun cases one can assume that data is uniformly spread over the possibilities and use simple probability to estimate how big an aggregation will become. In addition to this AS cuts the allowed size of the aggregation a fair bit below the number of factrows and no doubt takes other performance issues as well as your AttributeUsage settings and size limits into account.

But the bottomline is that most of your 1500 possible aggregations above would be the same size as the facttable and thus will not be designed. Eleven was obviously the number of aggs that actually would be useful within the limits you put on it.

Monday, March 19, 2012

Aggregation Optimization Issue

No comments:

Post a Comment

SQL tool for MSDE

Blog Archive

About Me