Monday, March 19, 2012

Aggregation Question

Let's say we have 2 attributes: a and b. Each consists of the defaulthierarchy of an all-level and a leaflevel consisting of the leaves 1 and 2. If I add an aggregation on (a,b) then questions such as

SELECT [SomeMeasure] ON 0

FROM [Cube]

WHERE (b.b.2)

should yield an answer that is preaggregated, right? I.e. the coordinates (a.all,b.1),(a.all,b.2),(a.all,b.all),(a.1,b.1),(a.1,b.2),(a.1,b.all),(a.2,b.1),(a.2,b.2),(a.2,b.all) should contain preaggregated values for all measures. Except my cube has 5 times worse queryperformance with aggregates enabled...

So have I misunderstood something here? Do the levels not mix here so only (a.all,b.all),(a.1,b.1),(a.1,b.2),(a.2,b.1),(a.2,b.2) becomes preaggregated?

Looks like the query you are trying to optimize is hitting some other attributes.

Try running SQL Profiler against Analysis Server.

QuerySubcubeVerbose event should show you the set of attributes that were needed for you query.

Edward.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

|||

Thanks for the help (yet again ) - didn't know about those profiler options - also found the "Get Data From Aggregation" event to see when and which aggregation is used. The problem still remains though as it seems it picks the correct attributes and the best aggregation corresponding to it just fine. However using a single aggregation of all attributes does significantly worse then using none at all.

Using smaller partial aggregations such as those storage designer chooses is of course even better then no aggregations and perfectly fine for the queries where they're applicable but the performance for the queries they don't cover isn't good enough.

I can imagine a large enough aggregation would take some time to navigate (as it seems to spend almost all it's effort reading from disk) but it seems very strange that it wouldn't take even more time to read all the underlying data. Furthermore this aggregation is only a few hundred mb and there's no others - not exactly huge in my book.

Anyway thanks - a lot easier to test further with those profileroptions. Any further thoughts are much appreciated though.

|||

To make your aggregations more efficient you need to try and define relationships between attributes.

Take a look for instance at what you get after you migrate FoodMart 2000 sample cube. You will see that almost all attributes in all dimensions are in hierarchies and there are relationships defined between them. Having right set of relationships will help single aggregation to answer more queries.

Edward.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

|||

All the attributes in the model need to be fully filterable on (simultaneously) and thus I've made a separate dimension for each attribute with just that attribute in it. As each attribute has the default all-member it (if I understand it correctly) has a default attribute relationship from the leaf-level of an attribute to the all-level of the same attribute but I haven't defined any in addition to that. Is there any other way I could structure this and still keep the independence between the attributes that allows full filtering?

I want to add hierarchies later to allow faster multiselection but right now I have these problems even for very simple queries of the type:

SELECT ([attr1]) ON 0,

[Some Measure] ON 1

FROM [cube]

WHERE ([att2].[some leaf member],[att3].[all], [att4].[all], etc...)

Having experimented a little further with the profiler (with a cube with 4x dimensions with 16384 members, one with 11 and one with 5) I noticed that almost the entire performancedrop comes when adding 2 of the big-member attributes to the same aggregation. Any additional attributes after that does degrade performance as well but only slightly. I'm assuming this has something to do with much more empty cells showing up in the aggregation in the latter cases as I (for testingpurposes) use only 1 million factrows. Luckily the wizard also ignores any aggregations that large but as I need filtering on all attributes that doesn't help me.

As I've changed the XMLA for the aggregations directly and recreated the cube there's no change in the relationsships here so it seems that it's the pure size of the aggregation that makes the difference when it comes to queryperformance. 16384x11x5 is 4-5 times faster then unaggregated but 16384x16384x11x5 takes 5-10x longer then the unaggregated one.

|||

Besides trying to desing "good" set of aggregations for your cube you should look into partitioning it.

If you create say monthly parittions and your query is only asking for the last month of data, you will benefit greatly. Analysis Server is only going to scan last partition of data to retreive answer for you query.

Edward.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

No comments:

Post a Comment