Showing posts with label ddl. Show all posts
Showing posts with label ddl. Show all posts

Thursday, March 8, 2012

aggregate query reporting different results

forgive the absense of DDL and data to reproduce this problem, but it
is 8 tables and hundreds of thousands of rows of data. instead, let me
just post the query and the symptom and we'll see how far that can get
me.
select r.description, sum(t.amount), count(*)
from salesregion as r
join metroarea as w on r.salesregionid = w.salesregionid
join saleitems as a on a.website = w.id
join transactions as t on t.adid = a.adid
join orders as o on t.orderid = o.orderid
join products as p on t.productid = p.productid
where o.orderpaid = 1
and datepart(mm, o.datecreated) = 2
and datepart(year, o.datecreated) = 2006
and p.type in (1, 3)
group by r.description
order by r.description asc
when this query is run in our production support environment, which has
almost no load, it produces completely predictable results every time.
however, when this same query is run in our production environment,
which has identical data, but a much higher load, the aggregate
functions fluctuate very slightly (one to three rows).
i have read on this issue and found two possible remedies:
1) "option (maxdop 1)" (a response i've seen in many threads on this
ng) - i applied this, it had no effect at all on the stability of the
results
2) apply sql service pack 4 (known parallelism bug described at
http://support.microsoft.com/kb/814509) - my question here is: does
load affect parallelism? because production and production support
environments both have service pack 3 applied, but it's happening in
one and not the other. i will apply the service pack if necessary, but
i want to make sure that the parallelism bug is known to occur only
under high load.
thanks in advance for any help,
jasonoh, an additional note:
when i further narrow the search results to just calculate the
aggregates for a single salesregion, like so:
select r.description, sum(t.amount), count(*)
from salesregion as r
join metroarea as w on r.salesregionid = w.salesregionid
join saleitems as a on a.website = w.id
join transactions as t on t.adid = a.adid
join orders as o on t.orderid = o.orderid
join products as p on t.productid = p.productid
where o.orderpaid = 1
and datepart(mm, o.datecreated) = 2
and datepart(year, o.datecreated) = 2006
and p.type in (1, 3)
AND r.description = 'Arizona' -- NEW LINE
group by r.description
order by r.description asc
then it becomes deterministic again!

Thursday, February 9, 2012

Advice Requested on Primary Key: Is char(20) better than binary(20)

I am not posting the DDL because it is not relevant to my question.
So far I have been unable to find a decent "natural key" for a table I am
designing. The true "natural key" is varbinary(MAX), which is unusable, and
so I have to consider a surrogate key, which is using SHA1 agorithm to
generate a 20 byte value of the natural key.
My Question:
Which would be the best choice for storing
20 bytes from SHA1 algorithm?
char(20)
binary(20)
I would like to choose the best datatype for Joins/Performance, etc. I am
using SQL Server 2005.
Thanks
Russell Mangel
Las Vegas, NVBINARY. The HashBytes function returns a VARBINARY value.
"Russell Mangel" <russell@.tymer.net> wrote in message
news:eUnwdJAkGHA.2004@.TK2MSFTNGP04.phx.gbl...
>I am not posting the DDL because it is not relevant to my question.
> So far I have been unable to find a decent "natural key" for a table I am
> designing. The true "natural key" is varbinary(MAX), which is unusable,
> and so I have to consider a surrogate key, which is using SHA1 agorithm to
> generate a 20 byte value of the natural key.
> My Question:
> Which would be the best choice for storing
> 20 bytes from SHA1 algorithm?
> char(20)
> binary(20)
> I would like to choose the best datatype for Joins/Performance, etc. I am
> using SQL Server 2005.
> Thanks
> Russell Mangel
> Las Vegas, NV
>
>|||Russell Mangel wrote:
> I am not posting the DDL because it is not relevant to my question.
> So far I have been unable to find a decent "natural key" for a table I am
> designing. The true "natural key" is varbinary(MAX), which is unusable, an
d
> so I have to consider a surrogate key, which is using SHA1 agorithm to
> generate a 20 byte value of the natural key.
> My Question:
> Which would be the best choice for storing
> 20 bytes from SHA1 algorithm?
> char(20)
> binary(20)
> I would like to choose the best datatype for Joins/Performance, etc. I am
> using SQL Server 2005.
> Thanks
> Russell Mangel
> Las Vegas, NV
Your hash wouldn't usually be called a surrogate key because it's
derived from real, meaningful data. I'd prefer to call it a logical key
or business key but I think it's fair enough to call it a natural key
if you like.
Even without the hash, your problem wasn't that you couldn't find the
key but that you had no easy way to enforce it. It's an annoyance that
SQL Server won't permit unique constraints on "large" keys. I suspect
the reason is due to the fact that only B-tree indexes are supported
and that makes large keys impractical in Microsoft's way of thinking.

> Which would be the best choice for storing
> 20 bytes from SHA1 algorithm?
Use BINARY to avoid the overhead and complications of collations.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||>> My Question: Which would be the best choice for storing 20 bytes from SHA
1 algorithm?
char(20)
binary(20) <<
If you use CHAR(20), then you can display it easier, but that is the
only advantage I can think of. It would be nice to have a Teradata
style hashing routine that would give you a true surrogate key, hidden
from the users and with hash clash managed by the system. Sigh!|||"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1150329186.339767.74520@.g10g2000cwb.googlegroups.com...
> char(20)
> binary(20) <<
> If you use CHAR(20), then you can display it easier, but that is the
> only advantage I can think of. It would be nice to have a Teradata
> style hashing routine that would give you a true surrogate key, hidden
> from the users and with hash clash managed by the system. Sigh!
Unless you convert the SHA1 hash to some character format, like Base64,
you'll run into problems displaying some hash codes that contain a 0x00 and
other special control characters in them. Base64, ASCII hexadecimal
representation, etc., of a 20-byte hash will require more than a CHAR(20) to
store.
-- Demonstration of CHAR(10) with an embedded 0x00 char
DECLARE @.x CHAR(10)
SELECT @.x = 'ABC' + CHAR(0) + 'DEF'
SELECT @.x
PRINT @.x