Thursday, February 9, 2012
Advice Requested on Primary Key: Is char(20) better than binary(20)
So far I have been unable to find a decent "natural key" for a table I am
designing. The true "natural key" is varbinary(MAX), which is unusable, and
so I have to consider a surrogate key, which is using SHA1 agorithm to
generate a 20 byte value of the natural key.
My Question:
Which would be the best choice for storing
20 bytes from SHA1 algorithm?
char(20)
binary(20)
I would like to choose the best datatype for Joins/Performance, etc. I am
using SQL Server 2005.
Thanks
Russell Mangel
Las Vegas, NVBINARY. The HashBytes function returns a VARBINARY value.
"Russell Mangel" <russell@.tymer.net> wrote in message
news:eUnwdJAkGHA.2004@.TK2MSFTNGP04.phx.gbl...
>I am not posting the DDL because it is not relevant to my question.
> So far I have been unable to find a decent "natural key" for a table I am
> designing. The true "natural key" is varbinary(MAX), which is unusable,
> and so I have to consider a surrogate key, which is using SHA1 agorithm to
> generate a 20 byte value of the natural key.
> My Question:
> Which would be the best choice for storing
> 20 bytes from SHA1 algorithm?
> char(20)
> binary(20)
> I would like to choose the best datatype for Joins/Performance, etc. I am
> using SQL Server 2005.
> Thanks
> Russell Mangel
> Las Vegas, NV
>
>|||Russell Mangel wrote:
> I am not posting the DDL because it is not relevant to my question.
> So far I have been unable to find a decent "natural key" for a table I am
> designing. The true "natural key" is varbinary(MAX), which is unusable, an
d
> so I have to consider a surrogate key, which is using SHA1 agorithm to
> generate a 20 byte value of the natural key.
> My Question:
> Which would be the best choice for storing
> 20 bytes from SHA1 algorithm?
> char(20)
> binary(20)
> I would like to choose the best datatype for Joins/Performance, etc. I am
> using SQL Server 2005.
> Thanks
> Russell Mangel
> Las Vegas, NV
Your hash wouldn't usually be called a surrogate key because it's
derived from real, meaningful data. I'd prefer to call it a logical key
or business key but I think it's fair enough to call it a natural key
if you like.
Even without the hash, your problem wasn't that you couldn't find the
key but that you had no easy way to enforce it. It's an annoyance that
SQL Server won't permit unique constraints on "large" keys. I suspect
the reason is due to the fact that only B-tree indexes are supported
and that makes large keys impractical in Microsoft's way of thinking.
> Which would be the best choice for storing
> 20 bytes from SHA1 algorithm?
Use BINARY to avoid the overhead and complications of collations.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||>> My Question: Which would be the best choice for storing 20 bytes from SHA
1 algorithm?
char(20)
binary(20) <<
If you use CHAR(20), then you can display it easier, but that is the
only advantage I can think of. It would be nice to have a Teradata
style hashing routine that would give you a true surrogate key, hidden
from the users and with hash clash managed by the system. Sigh!|||"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1150329186.339767.74520@.g10g2000cwb.googlegroups.com...
> char(20)
> binary(20) <<
> If you use CHAR(20), then you can display it easier, but that is the
> only advantage I can think of. It would be nice to have a Teradata
> style hashing routine that would give you a true surrogate key, hidden
> from the users and with hash clash managed by the system. Sigh!
Unless you convert the SHA1 hash to some character format, like Base64,
you'll run into problems displaying some hash codes that contain a 0x00 and
other special control characters in them. Base64, ASCII hexadecimal
representation, etc., of a 20-byte hash will require more than a CHAR(20) to
store.
-- Demonstration of CHAR(10) with an embedded 0x00 char
DECLARE @.x CHAR(10)
SELECT @.x = 'ABC' + CHAR(0) + 'DEF'
SELECT @.x
PRINT @.x
Advice Requested : Trying to write portable SQL
I have a table called Companies that has a column (Active), which would be
either 0 or 1 for either "Active or Not-Active".
I realize that there is a BIT column type in SQL 2005, but I don't want to
use it. Instead I have chosen to use an 'int' column data type, and then I
applied a check constraint to ensure that only 0 or 1 is inserted.
I have included some DML, please let me know if there is a better way to
accomplish my goal.
Thanks
Russell Mangel
Las Vegas, NV
CREATE TABLE [dbo].[Companies]
(
[CompanyCode] [varchar](5) NOT NULL,
[CompanyName] [varchar](35) NOT NULL,
[Active] [int] NOT NULL CONSTRAINT [DF_Companies_Active] DEFAULT ((1)),
CONSTRAINT [PK_Companies] PRIMARY KEY CLUSTERED
(
[CompanyCode] ASC
) WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[Companies] WITH CHECK ADD CONSTRAINT [CK_Companies]
CHECK (([Active]=(0) OR [Active]=(1)))
GO
ALTER TABLE [dbo].[Companies] CHECK CONSTRAINT [CK_Companies]Russell Mangel wrote:
> I am running on MS SQL2005.
> I have a table called Companies that has a column (Active), which would be
> either 0 or 1 for either "Active or Not-Active".
> I realize that there is a BIT column type in SQL 2005, but I don't want to
> use it. Instead I have chosen to use an 'int' column data type, and then I
> applied a check constraint to ensure that only 0 or 1 is inserted.
> I have included some DML, please let me know if there is a better way to
> accomplish my goal.
> Thanks
> Russell Mangel
> Las Vegas, NV
> CREATE TABLE [dbo].[Companies]
> (
> [CompanyCode] [varchar](5) NOT NULL,
> [CompanyName] [varchar](35) NOT NULL,
> [Active] [int] NOT NULL CONSTRAINT [DF_Companies_Active] DEFAULT ((1)),
> CONSTRAINT [PK_Companies] PRIMARY KEY CLUSTERED
> (
> [CompanyCode] ASC
> ) WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
> ) ON [PRIMARY]
> ALTER TABLE [dbo].[Companies] WITH CHECK ADD CONSTRAINT [CK_Companies]
> CHECK (([Active]=(0) OR [Active]=(1)))
> GO
> ALTER TABLE [dbo].[Companies] CHECK CONSTRAINT [CK_Companies]
It's very hard to write totally portable DDL because virtually all SQL
DBMSs (SQL Server, Sybase, Oracle, MySQL for example) have extended the
SQL DDL to define the physical implementation of tables as well as the
logical model. Crazy but true!
In your case, I'd use CHAR(1) and use a more friendly, readable code
instead of a potenitally mysterious 0/1.
You can use the Mimer SQL Validator to check your code against the
ANSI/ISO standard. Standard not necessarily = Portable of course. :-(
http://developer.mimer.com/validator/
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||David Portas (REMOVE_BEFORE_REPLYING_dportas@.acm.org) writes:
> In your case, I'd use CHAR(1) and use a more friendly, readable code
> instead of a potenitally mysterious 0/1.
My bets are on 0 and 1. That's clearcut. char(1) comes with localisation
issues.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Erland Sommarskog wrote:
> David Portas (REMOVE_BEFORE_REPLYING_dportas@.acm.org) writes:
> My bets are on 0 and 1. That's clearcut. char(1) comes with localisation
> issues.
>
Perfectly true that it will be localised, but developing a database
schema without localisation sounds like a pretty tough call to me. What
do you do for table and column names? If you need to support developers
and DBAs with multiple languages then you can put translations in a
data dictionary. On the other hand if you rely on a good data
dictionary then I suppose that negates my argument about the potential
ambiguity of 1/0 anyway...
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||"David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
news:1149628866.388604.195270@.y43g2000cwc.googlegroups.com...
> Russell Mangel wrote:
>
> It's very hard to write totally portable DDL because virtually all SQL
> DBMSs (SQL Server, Sybase, Oracle, MySQL for example) have extended the
> SQL DDL to define the physical implementation of tables as well as the
> logical model. Crazy but true!
>
I think the fault lies in the design of SQL for not cleanly accomodating the
physical design of databases. It's one thing to assume that the DBMS will
provide a suitable runtime implementation of a SELECT, INSERT, UPDATE or
DELETE. Quite another to assume that the an implementation won't need
additional details for CREATE TABLE, etc.
In fact, CREATE TABLE is just broken, as it badly conflates the type, the
instance and the implementation. A TABLE ought to be decalred as an
instance of a relation "type", with a prescribed mechanism for specifying
the physical implementation. SQL shouldn't try model the physical design,
but at least should provide a standard way for it to be specified.
David
David|||David Portas (REMOVE_BEFORE_REPLYING_dportas@.acm.org) writes:
> Erland Sommarskog wrote:
> Perfectly true that it will be localised, but developing a database
> schema without localisation sounds like a pretty tough call to me. What
> do you do for table and column names? If you need to support developers
> and DBAs with multiple languages then you can put translations in a
> data dictionary. On the other hand if you rely on a good data
> dictionary then I suppose that negates my argument about the potential
> ambiguity of 1/0 anyway...
The problem with char(1) is that there are no obvious values for the
constant. If a programmer writes
WHERE binarycol = 'J'
when the the possible values are Y and N have introduced a bug. That, there
are no universal values for a char(1) column. There is for bit on the other
hand.
And if you think
WHERE bitcol = 1
is cumbersome for some reason, you can in SQL 2005 write:
WHERE bitcol = 'true'
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||David Browne wrote:
> I think the fault lies in the design of SQL for not cleanly accomodating t
he
> physical design of databases. It's one thing to assume that the DBMS will
> provide a suitable runtime implementation of a SELECT, INSERT, UPDATE or
> DELETE. Quite another to assume that the an implementation won't need
> additional details for CREATE TABLE, etc.
> In fact, CREATE TABLE is just broken, as it badly conflates the type, the
> instance and the implementation. A TABLE ought to be decalred as an
> instance of a relation "type", with a prescribed mechanism for specifying
> the physical implementation. SQL shouldn't try model the physical design,
> but at least should provide a standard way for it to be specified.
>
Agree absolutely. A Data Implementation sub-language ought to be
included as a hook for translating types and views into their
product-specific structures. It's remarkable to think that if the SQL
standard had specified even a rudimentary "DIL" and if vendors had used
it we could have been spared truckloads or denormalization agony and
the worst of the Kimball dimensional muddle. That simple omission has
perhaps cost $billions for database customers.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||Thanks for the reply...
Did I do the check contstraint right? I can't think of anything else to do
here. Besides choosing an int or char(1).
Russell Mangel
Las Vegas, NV
Advice requested : Choosing the best Key for a table
Messages. The table will hold archived messages from Microsoft Outlook. The
SQL 2005 schema that I have posted, is simply what the data looks like in
it's *raw* form. I intend to modify the schema in the most efficient way
possible. This table was created using MS SQL server 2005 and uses two if
the new (MAX) data types.
You can help me the most by focusing on the best Key for the Messages table.
The "natural key" for the table (in raw form), would be MessageID, so lets
discuss it's problems. The first problem is that varbinary(MAX) (new in
SQL2005), does not allow Primary Key constraints, and so it should be!
Obviously we will need to use a smaller DataType than varbinary(MAX), once
you pick something reasonable in size you are allowed to create PrimaryKey
constraint. MessageID's do not have a documented or published limit, they
can be *any* length. From my observations MessageID will typically be 70-340
bytes. So I suppose we could set varbinary(500). This would allow us to
handle most situations. But this is still way too long for a Primary Key in
my opinion. The other columns that are set as xxx(MAX) have similar
problems, ignore these columns for now until we find a decent Key for the
table.
In my opinion this Table does not have a "Natural Key" we can use it's
simply to long, and this table will have "hundreds of millions of rows", so
I am worried about JOIN performance. Once a message is archived (then
deleted), from it's original location and placed in SQL server, the
MessageID has no value and is never displayed to the user. There are more
complications because this table is replicated using (Merge replication).
The database is a distributed database, so using auto identity columns
complicate replication issues, not too mention irritating --Celko--. So I
really want to find a suitable surrogate key.
So what to do?
1. Use RSA to reduce MessageID to 20 bytes, and deal with the collisions?
2. Use a rowGuid (Outch!)
At this point I like option #1 the best because it works nicely with
"replication" as the original MessageID is a Globally Unique Identifier, so
no two MessageID's (PR_ENTRY_ID long term) will ever exist. But I think
there is are issues with generating these shorter keys using RSA, collisions
will happen. Additionally it is quite simple to encode the MessageID using
RSA. The MessageID would now be 20 bytes or [varbinary](20).
Does anyone have any good suggestions?
Thanks for your time.
Russell Mangel
Las Vegas, NV
-- Note: I commented out the Constraint, as varbinary(MAX) is not allowed.
-- This schema is just what the data looks like in the real world.
CREATE TABLE
[dbo].[Messages]
(
[MessageID] [varbinary](MAX) NOT NULL,
[ParentID] [varbinary](MAX) NOT NULL, -- FK for Folders Table
[SenderID] [int] NOT NULL, -- FK for Sender of message
[MessageType] [int] NOT NULL, -- FK MessageType
[Length] [bigint] NOT NULL,
[To] [nvarchar](MAX) NOT NULL,
[CC] [nvarchar](MAX) NOT NULL,
[Subject] [nvarchar](MAX) NOT NULL,
[Body] [ntext] NOT NULL,
[HasAttachment] [bit] NOT NULL,
-- CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED
-- (
-- [MessageID]
-- )
)ON [PRIMARY]
-- End SchemaHi, Russel
Well, why do you want the MessageId column to be varbinary datatype? Any
Reason?
Sure, I don't know your business requirements so my guess is (if you do not
want to use an IDENTITYT)
BTW an IDENTITY in most cases is a good candidate for surrogate keys
CREATE TABLE Messages
(
MessageId INT NOT NULL PRIMARY KEY
)
Then you will need to generate a next MessageId by your self like
DECLARE @.MaxId INT
BEGIN TRAN
SELECT @.MaxId =COALESCE(MAX(MessageId),0) FROM Messages WITH
(UPDLOCK,HOLDLOCK)
INSERT INTO Messages SELECT @.MaxId ,....... FROM Table
COMMIT TRAN
"Russell Mangel" <russell@.tymer.net> wrote in message
news:%23JD8adcaGHA.3304@.TK2MSFTNGP04.phx.gbl...
>I would appreciate good advice about the following Schema for a Table
>called Messages. The table will hold archived messages from Microsoft
>Outlook. The SQL 2005 schema that I have posted, is simply what the data
>looks like in it's *raw* form. I intend to modify the schema in the most
>efficient way possible. This table was created using MS SQL server 2005 and
>uses two if the new (MAX) data types.
> You can help me the most by focusing on the best Key for the Messages
> table.
> The "natural key" for the table (in raw form), would be MessageID, so lets
> discuss it's problems. The first problem is that varbinary(MAX) (new in
> SQL2005), does not allow Primary Key constraints, and so it should be!
> Obviously we will need to use a smaller DataType than varbinary(MAX), once
> you pick something reasonable in size you are allowed to create PrimaryKey
> constraint. MessageID's do not have a documented or published limit, they
> can be *any* length. From my observations MessageID will typically be
> 70-340 bytes. So I suppose we could set varbinary(500). This would allow
> us to handle most situations. But this is still way too long for a Primary
> Key in my opinion. The other columns that are set as xxx(MAX) have similar
> problems, ignore these columns for now until we find a decent Key for the
> table.
> In my opinion this Table does not have a "Natural Key" we can use it's
> simply to long, and this table will have "hundreds of millions of rows",
> so I am worried about JOIN performance. Once a message is archived (then
> deleted), from it's original location and placed in SQL server, the
> MessageID has no value and is never displayed to the user. There are more
> complications because this table is replicated using (Merge replication).
> The database is a distributed database, so using auto identity columns
> complicate replication issues, not too mention irritating --Celko--. So I
> really want to find a suitable surrogate key.
> So what to do?
> 1. Use RSA to reduce MessageID to 20 bytes, and deal with the collisions?
> 2. Use a rowGuid (Outch!)
> At this point I like option #1 the best because it works nicely with
> "replication" as the original MessageID is a Globally Unique Identifier,
> so no two MessageID's (PR_ENTRY_ID long term) will ever exist. But I think
> there is are issues with generating these shorter keys using RSA,
> collisions will happen. Additionally it is quite simple to encode the
> MessageID using RSA. The MessageID would now be 20 bytes or
> [varbinary](20).
> Does anyone have any good suggestions?
> Thanks for your time.
> Russell Mangel
> Las Vegas, NV
>
> -- Note: I commented out the Constraint, as varbinary(MAX) is not
> allowed.
> -- This schema is just what the data looks like in the real world.
> CREATE TABLE
> [dbo].[Messages]
> (
> [MessageID] [varbinary](MAX) NOT NULL,
> [ParentID] [varbinary](MAX) NOT NULL, -- FK for Folders Table
> [SenderID] [int] NOT NULL, -- FK for Sender of message
> [MessageType] [int] NOT NULL, -- FK MessageType
> [Length] [bigint] NOT NULL,
> [To] [nvarchar](MAX) NOT NULL,
> [CC] [nvarchar](MAX) NOT NULL,
> [Subject] [nvarchar](MAX) NOT NULL,
> [Body] [ntext] NOT NULL,
> [HasAttachment] [bit] NOT NULL,
> -- CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED
> -- (
> -- [MessageID]
> -- )
> )ON [PRIMARY]
> -- End Schema
>|||Can MessageID ever change for a given message? If not, then use it as the
natural key (as silly as this name sounds in this case). Put a unique
constraint on it, but create a surrogate key for joins.
Although everybody else will say that the GUID is a poor choice for a
primary key, I'd say use it but NEVER as a clustered key. Perhaps you'd have
more use for a clustered index on a datetime column (although I can't see on
e
in your proposed schema). Maybe you could also use an IDENTITY column - it
makes a great
candidate for a clustered index.
Of course a better option still would be to do more research and find a
better natural key (maybe even composite) candidate, then use a single colum
n
surrogate key for joins and references.
ML
http://milambda.blogspot.com/|||There is 1 other Column in the Messages table "Created" it's a DateTime,
that might help us find a better key.
Not sure what to call this Key as it has two columns, must be
Composite/Natural?
Is it possible for a person (SenderID), to send a message at the exact same
time?
Hmmmmmmm. Got to think about this for a bit, and maybe I can find another
column that I can use for Composite key.
-- Here is the new Schema,
CREATE TABLE
[dbo].[Messages]
(
[MessageID] [varbinary](max) NOT NULL,
[ParentID] [varbinary](max) NOT NULL,
[SenderID] [int] NOT NULL,
[MessageType] [int] NOT NULL,
[Length] [bigint] NOT NULL,
[To] [nvarchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[CC] [nvarchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Subject] [nvarchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Body] [ntext] COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[HasAttachment] [bit] NOT NULL,
[Created] [datetime] NOT NULL, -- New Column
CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED
(
[SenderID] ASC,
[Created] ASC
)
)ON [PRIMARY]|||Russell Mangel wrote:
> I would appreciate good advice about the following Schema for a Table call
ed
> Messages. The table will hold archived messages from Microsoft Outlook. Th
e
> SQL 2005 schema that I have posted, is simply what the data looks like in
> it's *raw* form. I intend to modify the schema in the most efficient way
> possible. This table was created using MS SQL server 2005 and uses two if
> the new (MAX) data types.
> You can help me the most by focusing on the best Key for the Messages tabl
e.
> The "natural key" for the table (in raw form), would be MessageID, so lets
> discuss it's problems. The first problem is that varbinary(MAX) (new in
> SQL2005), does not allow Primary Key constraints, and so it should be!
> Obviously we will need to use a smaller DataType than varbinary(MAX), once
> you pick something reasonable in size you are allowed to create PrimaryKey
> constraint. MessageID's do not have a documented or published limit, they
> can be *any* length. From my observations MessageID will typically be 70-3
40
> bytes. So I suppose we could set varbinary(500). This would allow us to
> handle most situations. But this is still way too long for a Primary Key i
n
> my opinion. The other columns that are set as xxx(MAX) have similar
> problems, ignore these columns for now until we find a decent Key for the
> table.
> In my opinion this Table does not have a "Natural Key" we can use it's
> simply to long, and this table will have "hundreds of millions of rows", s
o
> I am worried about JOIN performance. Once a message is archived (then
> deleted), from it's original location and placed in SQL server, the
> MessageID has no value and is never displayed to the user. There are more
> complications because this table is replicated using (Merge replication).
> The database is a distributed database, so using auto identity columns
> complicate replication issues, not too mention irritating --Celko--. So I
> really want to find a suitable surrogate key.
> So what to do?
> 1. Use RSA to reduce MessageID to 20 bytes, and deal with the collisions?
> 2. Use a rowGuid (Outch!)
> At this point I like option #1 the best because it works nicely with
> "replication" as the original MessageID is a Globally Unique Identifier, s
o
> no two MessageID's (PR_ENTRY_ID long term) will ever exist. But I think
> there is are issues with generating these shorter keys using RSA, collisio
ns
> will happen. Additionally it is quite simple to encode the MessageID using
> RSA. The MessageID would now be 20 bytes or [varbinary](20).
> Does anyone have any good suggestions?
> Thanks for your time.
> Russell Mangel
> Las Vegas, NV
>
> -- Note: I commented out the Constraint, as varbinary(MAX) is not allowed
.
> -- This schema is just what the data looks like in the real world.
> CREATE TABLE
> [dbo].[Messages]
> (
> [MessageID] [varbinary](MAX) NOT NULL,
> [ParentID] [varbinary](MAX) NOT NULL, -- FK for Folders Table
> [SenderID] [int] NOT NULL, -- FK for Sender of message
> [MessageType] [int] NOT NULL, -- FK MessageType
> [Length] [bigint] NOT NULL,
> [To] [nvarchar](MAX) NOT NULL,
> [CC] [nvarchar](MAX) NOT NULL,
> [Subject] [nvarchar](MAX) NOT NULL,
> [Body] [ntext] NOT NULL,
> [HasAttachment] [bit] NOT NULL,
> -- CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED
> -- (
> -- [MessageID]
> -- )
> )ON [PRIMARY]
> -- End Schema
You can add a constraint on MessageID like this:
CREATE FUNCTION dbo.ufn_sha_hash
(@.msg VARBINARY(8000))
RETURNS VARBINARY(20)
AS
BEGIN
RETURN
ISNULL(HASHBYTES('SHA1',@.msg),0x7C7CDB42
7449446E8EA812C5981C5D6A) ;
END
GO
CREATE TABLE dbo.Messages
(
MessageID varbinary(8000) NOT NULL,
HashID varbinary(20) NOT NULL PRIMARY KEY,
CHECK (HashID = dbo.ufn_sha_hash(MessageID)),
);
GO
Whether you should also use a surrogate key is a different question. It
would probably make sense to do so. Hope this helps.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||> CREATE FUNCTION dbo.ufn_sha_hash
> (@.msg VARBINARY(8000))
> RETURNS VARBINARY(20)
> AS
> BEGIN
> RETURN
> ISNULL(HASHBYTES('SHA1',@.msg),0x7C7CDB42
7449446E8EA812C5981C5D6A) ;
> END
> GO
> CREATE TABLE dbo.Messages
> (
> MessageID varbinary(8000) NOT NULL,
> HashID varbinary(20) NOT NULL PRIMARY KEY,
> CHECK (HashID = dbo.ufn_sha_hash(MessageID)),
> );
> GO
> Whether you should also use a surrogate key is a different question. It
> would probably make sense to do so. Hope this helps.
> --
> David Portas, SQL Server MVP
I didn't know that SQL could do SHA1 hashes, thanks. This table is a
replicated table so once you add surrogate keys, you have to deal with that.
Personally I would rather have a binary(20) column which replaces MessageID
column completely than deal with the surrogate key problems with Replicated
(Merge) databases.
The Messages table also has a FK (FolderID) for Folders table, so this
column could also be removed and replaced by a binary(20) column. Folders
Table PK (FolderID)would be replaced with SHA1 binary(20) as well.
Which brings a couple questions:
1. If we are going to store an SHA1 hash, wouldn't it be best to use:
binary(20) instead of varbinary(20)?
2. Will I have any collisions (PK violations) with SHA1 hash (more on this
in next sentence)?
The MessageID is a variable length Binary byte array, and is a Globally
UniqueIdentifier (it's just an older form of GUID before GUID was born) this
MessageID is from MAPI (Exchange Server). So if you had 50 million
varbinary(MAX) MessageIDs, and you ran SHA1 hash against all of them would
you find duplicated SHA1 hash values?
Thanks
Russell Mangel|||Russell Mangel wrote:
> I didn't know that SQL could do SHA1 hashes, thanks. This table is a
> replicated table so once you add surrogate keys, you have to deal with tha
t.
> Personally I would rather have a binary(20) column which replaces Message
ID
> column completely than deal with the surrogate key problems with Replicate
d
> (Merge) databases.
> The Messages table also has a FK (FolderID) for Folders table, so this
> column could also be removed and replaced by a binary(20) column. Folders
> Table PK (FolderID)would be replaced with SHA1 binary(20) as well.
> Which brings a couple questions:
> 1. If we are going to store an SHA1 hash, wouldn't it be best to use:
> binary(20) instead of varbinary(20)?
> 2. Will I have any collisions (PK violations) with SHA1 hash (more on this
> in next sentence)?
> The MessageID is a variable length Binary byte array, and is a Globally
> UniqueIdentifier (it's just an older form of GUID before GUID was born) th
is
> MessageID is from MAPI (Exchange Server). So if you had 50 million
> varbinary(MAX) MessageIDs, and you ran SHA1 hash against all of them would
> you find duplicated SHA1 hash values?
> Thanks
> Russell Mangel
1. True. Although VARBINARY is functionaly equivalent because the
return value is always 20 bytes.
2. Not unless the MessageID exceeds 8000 bytes, which is the maximum
supported length for HashBytes. Although there is a theoretical risk of
a hash collision, you need 2^80 hashed messages on average before you
hit a duplicate. You are hashing GUIDs though. I don't know the GUIDs
used by Exchange but if they have similar qualities to the ones
generated by Windows then theoretically you are much, much more likely
to hit a duplicate GUID than a duplicate hash.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Advice requested - How do I convert a Base36 number to SQL int
Does anyone know if it is possible to create a Stored Procedure in
SQL7.0/2000 to do the following:
I need to convert an SQL int value to a Base36 value, and the other
direction Base36 to SQL int.
Example:
Take the (Base10) number 22, and convert to "M".
Take the (Base36) number "M" and convert to 22.
The Base36 digits are:
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
Thanks
Russell Mangel
Las Vegas, NVHi Russel
This is a solution for Hex which could be extended
http://www.umachandar.com/technical...ipts/Main89.htm
http://www.umachandar.com/technical...ipts/Main29.htm
For hex there is also an undocumented procedure xp_varbintohexstr
http://www.umachandar.com/technical...ipts/Main11.htm
John
"Russell Mangel" <russell@.tymer.net> wrote in message
news:ezmZ0qSEFHA.1396@.tk2msftngp13.phx.gbl...
> Hi,
> Does anyone know if it is possible to create a Stored Procedure in
> SQL7.0/2000 to do the following:
> I need to convert an SQL int value to a Base36 value, and the other
> direction Base36 to SQL int.
> Example:
> Take the (Base10) number 22, and convert to "M".
> Take the (Base36) number "M" and convert to 22.
> The Base36 digits are:
> "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> Thanks
> Russell Mangel
> Las Vegas, NV
>
Advice on table design which will allow me to enforce integrity
I have two tables Table A and B, below with some dummy data...
Table A (contains specific unique settings that can be requested)
Id, SettingName
1, weight
2, length
Table B (contains the setting values, here 3 values relate to weight
and 1 to length)
Id, Brand, SettingValue
1, A, 100
1, B, 200
1, null, 300
2, null, 5.3
(There is also a list of Brands available in another table). No primary
keys / referential integrity has been setup yet.
Basically depending upon the Brand requested a different setting value
will be present. If a particular brand is not present (signified by a
null in the Brand column in table B), then a default value will be
used.
Therefore if I request the weight and pass through a Brand of A, I will
get 100
If I request the weight but do not pass through a brand (i.e. null) I
will get 300.
My question is, what kind of integrity can I apply to avoid the user
specifying duplicate Ids and Brands in table B. I cannot apply a
composite key on these two fields as a null is present. Table B will
probably contain about 50 rows and probably 10 of them will be brand
specific. The reason its done like this is in the calling client code I
want to call some function e.g.
getsetting(weight) ... result = 300
Or if it is brand specific
getsetting(weight,A) .... result = 100
Any advice on integrity or table restructuring would be greatly
appreciated. Its sql 2000 sp3.
Thanks
bradAs you have already realized, TableB isn't even in First Normal Form.
Attribute Value models like this are seldom viable and you haven't
stated any excuse not to use the standard, normalized approach:
CREATE TABLE TableB (id INTEGER NOT NULL, brand CHAR(1) DEFAULT 'Z' NOT
NULL REFERENCES Brands (brand), weight INTEGER NOT NULL, length INTEGER
NOT NULL, PRIMARY KEY (id,brand))
Either add the default ("Z" here) brand to the brands table, or if you
prefer not to do that, put the brand-specific info into a separate
table:
CREATE TABLE TableB (id INTEGER NOT NULL, default_weight INTEGER NOT
NULL, default_length INTEGER NOT NULL, PRIMARY KEY (id))
CREATE TABLE TableBBrands (id INTEGER NOT NULL REFERENCES TableB (id),
brand CHAR(1) NOT NULL REFERENCES Brands (brand), weight INTEGER NOT
NULL, length INTEGER NOT NULL, PRIMARY KEY (id,brand))
Personally, I would lean toward the former design. This meets all the
requirements that you have specified as far as I can see.
--
David Portas
SQL Server MVP
--|||i would go with two tables B1(id ,brand , value) with PK (id,brand) and
B2(id,value) PK (id)
but i think that more appropriate solution is to have separate table
for each "setting type" e.g.
WEIGHTS, LENGTHS, VOLUMES where you can have appropriate data type for
each setting
HTH, Strider|||> i think that more appropriate solution is to have separate table
> for each "setting type" e.g.
> WEIGHTS, LENGTHS, VOLUMES
Do you mean separate COLUMNS? Why would you create separate tables for
each attribute?
--
David Portas
SQL Server MVP
--|||I should have also said that Table A will grow in size over time (i.e.
60 different settings), so you can have many settings, i.e. a user
could add additional settings in here, therefore making these as
columns would not be ideal. If possible I would also prefer to not have
to have a dummy brand.
David Portas wrote:
> > i think that more appropriate solution is to have separate table
> > for each "setting type" e.g.
> > WEIGHTS, LENGTHS, VOLUMES
> Do you mean separate COLUMNS? Why would you create separate tables
for
> each attribute?
> --
> David Portas
> SQL Server MVP
> --|||>Why would you create separate tables for
>each attribute?
i don't know, perhaps transient brain disorder ;)|||Why can't you implement the necessary change control procedures to add
columns as the users require them? Allowing users the unfettered
ability to add new "attributes" to a system is a recipe for creating an
unusable mass of redundant and inconsistent data. Do you really expect
users to go through the process of identifying and eliminating
functional dependencies before they add new attributes? On the whole,
users aren't good database architects. At least if they were they would
work with proper tables and columns, constraints and keys - things
which aren't possible under your model.
Your question was how to enforce integrity but I don't see any
integrity in your design at all.
--
David Portas
SQL Server MVP
--|||Yup, I agree all valid points there David. Thanks for your input on
this one.
Brad
David Portas wrote:
> Why can't you implement the necessary change control procedures to
add
> columns as the users require them? Allowing users the unfettered
> ability to add new "attributes" to a system is a recipe for creating
an
> unusable mass of redundant and inconsistent data. Do you really
expect
> users to go through the process of identifying and eliminating
> functional dependencies before they add new attributes? On the whole,
> users aren't good database architects. At least if they were they
would
> work with proper tables and columns, constraints and keys - things
> which aren't possible under your model.
> Your question was how to enforce integrity but I don't see any
> integrity in your design at all.
> --
> David Portas
> SQL Server MVP
> --|||1) Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are. Sample data is also a good idea, along with clear
specifications.
2) Google EAV and attribute splitting" design flaws before you destroy
your database. It is virtually impossible to have any data integrity
with this "design" becuase of the need to write a complete RDBMS engine
in SQL constraints.
3) The thing falls apart in about one year of operation. That is based
on fixing these things as a consultant.
4) You need help from someone who knows even a little bit about data
modeling. You have not even gottent o the basic normal forms yet.|||[posted and mailed, please reply in news]
(obhayes@.hotmail.com) writes:
> My question is, what kind of integrity can I apply to avoid the user
> specifying duplicate Ids and Brands in table B. I cannot apply a
> composite key on these two fields as a null is present. Table B will
> probably contain about 50 rows and probably 10 of them will be brand
> specific. The reason its done like this is in the calling client code I
> want to call some function e.g.
> getsetting(weight) ... result = 300
> Or if it is brand specific
> getsetting(weight,A) .... result = 100
I would add a dummy primary key to TableB, possibly an IDENTITY column.
When I would add a UNIQUE constraint on (Id, Brand). A UNIQUE constraint
does permit for NULL values, but only one NULL.
(It's a bit amazing that a thread with so many replies did not include
this simple and straigtforward suggestion.)
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp