Browsing articles tagged with " Identity"

Clustered Indexes and UniqueIdentifiers

Jul 11, 2012   //   by Karen Lopez   //   Blog, Database, Database Design, Fun, Snark  //  No Comments

Tom LaRock (@sqlrockstar) posted today about clustered indexes on uniqueidentifiers.  He has a great image there, which got me thinking about making my own….

 

image

New in SQL Server 2012 SEQUENCEs: Why They Aren’t Just for Surrogate Keys

Feb 16, 2012   //   by Karen Lopez   //   Blog, Data, Data Modeling, Database, DLBlog, SQL Server  //  5 Comments

imageStarting with SQL Server 2012, database designers will have the ability to use SEQUENCEs to generate numbers, such as for populating surrogate keys (primary Keys with no meaning). Sounds like the IDENTITY property, right?  It’s not quite the same.  In SQL Server (and other DBMSs), the IDENTITY property is a characteristic of a column in a table.  A database designer can set this property up to automatically generate a value in that column and specify certain features such as what number to start with and what number to increment by.  These properties are set for a specific column and only one column in a table can have this identity property.  That’s why IDENTITY is almost always used as a surrogate primary key on a table.

To create a column with the IDENTITY property in SQL Server 2012:

CREATE TABLE 
    [ database_name . [ schema_name ] . | schema_name . ] table_name 
    [ AS FileTable ]
    ( { <column_definition> | <computed_column_definition> 
        | <column_set_definition> | [ <table_constraint> ] [ ,...n ] } )
    [ ON { partition_scheme_name ( partition_column_name ) | filegroup 
        | "default" } ] 
    [ { TEXTIMAGE_ON { filegroup | "default" } ] 
    [ FILESTREAM_ON { partition_scheme_name | filegroup 
        | "default" } ]
    [ WITH ( <table_option> [ ,...n ] ) ]
[ ; ]
 
<column_definition> ::=
column_name <data_type>
    [ FILESTREAM ]
    [ COLLATE collation_name ] 
    [ NULL | NOT NULL ]
    [ 
        [ CONSTRAINT constraint_name ] DEFAULT constant_expression ] 
      | [ IDENTITY [ ( seed ,increment ) ] [ NOT FOR REPLICATION ] 
    ]
    [ ROWGUIDCOL ] [ <column_constraint> [ ...n ] ] 
    [ SPARSE ] 

http://msdn.microsoft.com/en-us/library/ms174979(v=sql.110).aspx

I’ve highlighted the syntax that applies the IDENTITY to a column.  SQL Server will know which column has the identity property (and you can only have one of these columns in the same table).  Notice that there are only three options for an IDENTITY.

To create a SQUENCE in SQL Server 2012:

CREATE SEQUENCE [schema_name . ] sequence_name
    [ AS [ built_in_integer_type | user-defined_integer_type ] ]
    [ START WITH <constant> ]
    [ INCREMENT BY <constant> ]
    [ { MINVALUE [ <constant> ] } | { NO MINVALUE } ]
    [ { MAXVALUE [ <constant> ] } | { NO MAXVALUE } ]
    [ CYCLE | { NO CYCLE } ]
    [ { CACHE [ <constant> ] } | { NO CACHE } ]
    [ ; ]

http://msdn.microsoft.com/en-us/library/ff878091(v=sql.110).aspx

Notice that there is no TABLE or COLUMN referenced in that TSQL.  You are just creating a SEQUENCE.  SQL Server will have no idea what you are going to use it for or where it will be used.  In fact, you could create a SEQUENCE and never use it. Notice there are several more options for a SEQUENCE.

SEQUENCEs are separate objects in a database and can be used to populate more than one column even across several tables.  Because they aren’t bound to a table in any way, they can be used in ways that IDENTITY columns can’t be used.  For  instance:

  • One SEQUENCE can be used across more than one table. For instance, a Control Number or Document Number on several tables.  I’ve seen this used in other DBMSs to generate document numbers across subtypes, say Purchase Orders, Shipping Notices, Return to Vendor Numbers, etc.  There’s a requirement that they be unique across all types of documents.  This could not be done with IDENTITY.
  • Using a SEQUENCE allows tables to have more than one auto-generated number in multiple columns. Think of Order Number and Cancellation Number, both on the Order table. You can’t do that with IDENTITY.
  • Since sequences are generated completely outside a table, you can chose to store the resulting value in a different format, say VARCHAR or CHAR. Or, heaven forbid, you might want to append data to it or do other kinds of evil to it before storing it.
  • Since sequences are generated completely outside a table, you can store the results in a NULLable column.

I think the confusion sets in because we tend to think of auto-generated number as useful only as surrogate keys. But we have real data needs for generating numbers other than just row identifiers, even if that’s the primary use for this. In fact, one could have a table with a PK that uses IDENTITY and an unlimited number of columns that use a SEQUENCE to set their values.  Of course, SEQUENCEs can be used for surrogate primary keys, too.  The existence of SEQUENCEs gives us another option over IDENTITY.

So SEQUENCE is a feature that just generates numbers based on some characteristics. A designer has many ways to use the number once it is generated and tables are not dependent on that method to get that number, nor do they even “know” about how that value was provided.

This separation of the COLUMN from how the data is populated is a feature of using SEQUENCEs. How one uses this separation depends on the technical and business requirements for the data.

Identity Column Issues and Trade-offs

Dec 1, 2010   //   by Karen Lopez   //   Blog, Data, Data Modeling, Database, DLBlog  //  4 Comments

This post is based on a post I previously blogged on 30 May 2008…but like most design approaches, it still holds true.

If you’ve heard me present, then you know that one of my primary design mantras is:

Everything is a trade off.  You need to be able to explain, in both business and technical terms, the cost, benefits, and risks associated with every design decision.

If you can’t do that, then you aren’t doing design: you are copying design patterns from other designs and hoping for the best outcome, even if your current design has different requirements and risk tolerances.

I am often told that a database design must have surrogate keys on every table and that “right” way to implement surrogate keys is to use the Identity property in SQL Server (or the RowID in Oracle, or the Identity property in DB2).  These sorts of features of modern DBMSs do vary from vendor to vendor, so the cost, benefit, and risk associated with each varies by version and by vendor.  Most IT pros I speak with don’t realize that.

Many people I speak with assume, incorrectly, that Identity Property = Surrogate Key = Primary Key= Unique Index but this is not true. It is important for database designers to understand the differences between these concepts.

Nigel Rivett has written a great article over at Simple-talk.com about SQL Server’s identity property and some of the interesting “features” of this incrementing function:

Note: An identity column is not guaranteed to be unique nor consecutive. You should always place a unique index on an identity column if your system requires uniqueness.

Note: The next value is the step added to the current seed; not one more than the max value in the table, or even the step from the last or maximum value.

I’m guessing that 6+ out of 10 of experienced SQL Server designers and developers are not clear on these two gotchas.  Other DBMSs have technical issues as well.  When I’m questioned on why I don’t just slap on an identity property, call it a PK an move on to the next change request, I usually start asking questions:

  • Will the application need to assume that the values are always sequential?  What if the sequence is missing a few steps?  Will that break the code?
  • Will the Identity column value be displayed anywhere outside the database, such as on a report or on a screen?
  • Will the users be confused if a sequence is missing or if they “restart” in the middle of an order?
  • Will the Identity column value be used outside this database? How?  Where?
  • What are our plans for dealing with rows that exceed the maximum number of identity values (in some versions of DB2, identity values maxed out at 32k or so)?

Identity properties can be useful, but as in every design decision, there are uses that are appropriate and uses that are not — it all comes down to cost, benefit, and risk.

Subscribe via E-mail

Use the link below to receive posts via e-mail. Unsubscribe at any time. Subscribe to www.datamodel.com by Email


Recent Comments

Categories

Archive

UA-52726617-1