Browsing articles in "Data"

You’re Doing it Wrong: Generalizations about Generalizations

Nov 17, 2010   //   by Karen Lopez   //   Blog, Data, Data Modeling  //  3 Comments

I have a couple of presentations where I describe how generalized data modeling can offer both benefits and unacceptable costs.  In my Data Modeling Contentious Issues presentation, the one where we vote via sticky notes, we debate the trade-offs of generalization in a data model and database design.  In 5 Classic Data Modeling Mistakes, I talk about over-generalization.

Over the last 20 some years (and there’s more “some” there than ever before), I’ve noticed a trend towards more generalizalized data models.  The means that instead of having a box for almost every noun in our business, we have concepts that have categories.  Drawing examples from the ARTS Data Model, instead of having entities for:

  • Purchase Order
  • Shipping Notice
  • Receipt
  • Invoice
  • etc

…we have one entity for InventoryControlDocument that has a DocumentType instance of Purchase order, Shipping Notice, Receipt, Invoice, etc.

See what we did there?  We took metadata that was on the diagram as separate boxes and turned them into rows in a table in the database.  This is brilliant, in some form, because it means when the business comes up with a new type of document we don’t have to create a new entity and a new table to represent that new concept.  We just add a row to the DocumentType table and we’re done.  Well, not exactly…we probably still have to update code to process that new type…and maybe add a new user interface for that…and determine what attributes of InventoryControlDocument apply to that document type so that the code can enforce the business rules.

Ah! See what we did there this time?  We moved responsibility for managing data integrity from the data architect to the coders.  Sometimes that’s great and sometimes, well, it just doesn’t happen.

So my primary reason to raise generalization as an issue is that sometimes data architects apply these patterns but don’t bother to apply the governance of those rules to the resulting systems.  Just because you engineered a requirement from a table to a row does not mean it is no longer your responsibility.  I’ve even seen architects become so enamoured with moving the work from their plate to another’s that they have generalized the heck out of everything while leaving the data quality responsibility up to someone else.  That someone else typically is not measured or compensated for data integrity, either.

Alec Sharp has written a few blog posts on Generalizations. These posts have some great examples of his 5 Ways to Go Wrong with Generalisation.   I especially like his use of the term literalism since I never seem to get the word specificity out when I’m speaking. I recommend you check out his 5 reasons, since I agree with all of them.

1 – Failure to generalize, a.k.a. literalism

2 – Generalizing too much

3 – Generalizing too soon

4 – Confusing subtypes with roles, states, or other multi-valued characteristics

5 – Applying subtyping to the wrong entity.

By the way, Len Silverston and Paul Agnew talk about levels of generalization in their The Data Model Resource Book, Vol 3: Universal Patterns for Data Modeling book.  Generalization isn’t just a yes/no question.  Every data model structure you architect has a level of generalization.

I’m wondering how many of you who have used a higher level of generalization and what you’ve done to ensure that the metadata you transformed into data still has integrity?  Leave your recommendations in the comments.

Creating an ERwin Data Modeler Template

Nov 16, 2010   //   by admin   //   Blog, Data, Data Modeling  //  3 Comments

You probably have some default settings or shop standards (Yes, I have standards!) for your ERwin data models.  A time saving method for ensuring that you don’t have to set those defaults for every model file is to create a data model template, then use that when you create a new model.

The steps for creating and using a template:

  1. Open ERwin Data Modeler with a blank model (choose the type: Logical, Physical, or Logical/Physical)
  2. Set your defaults, create standard objects like domains or UDTs or UDPs – anything you’d like to appear or be enforced in future data models that make use of this template.
  3. Save the model as a template:

    Save As TemplateNotice that I’ve chosen to use underscores in the template file name.  ERwin seems to force the name to all upper case, so spaces or underscores will be needed to make the file name readable when you go to open again.  The file will be saved with a special extension, .erwin_tmpl .  This tells ERwin it is a template file.

    Once you have saved this, it will be available on the FILE, NEW menu option for use when you create new models:image 

  4. Once you have this open, you’ll need to remember to do an immediate FILE, SAVE AS… to save it as a regular .erwin file.

At this point you have created an ERwin template file and used it to create a model that will have the reusable objects you want to share as well as enforce some standards.

Setting the Default Datatype in ERwin Data Modeler

Nov 15, 2010   //   by Karen Lopez   //   Blog, Data, Data Modeling  //  1 Comment

Out of the box, CA ERwin Data Modeler has a default datatype of char(18).  This default is used when you create an attribute or column in a data model and don’t specifically assign a datatype to that attribute.

I don’t know this for a fact, but I’m guessing the developers of ERwin chose this odd length so it would stand out as being a bit odd, reminding modelers to choose a “real” datatype and length at some point.  You can see in the figure below that I’ve added an attribute named test to the MOVIE entity, but did not assign a datatype, so ERwin gave it the default datatype of char(18).

MOVIE Entity

However, one the problems with this is that it is still not odd enough.  One of the tips I picked up while speaking at the BeNeLux Modeling User Group was that you can set the default datatype in ERwin to something especially funky, char(-1).  That datatype is certain to:

  • Stand out in both the data model diagram and any DDL
  • Not generate valid DDL
  • Fail any DDL testing
  • Make a DBA really mad
  • Motivate database designers to set real datatypes

Sounds to me like the perfect datatype for being a default.

Defaults in ERwin are model-specific, meaning that you set them on a .erwin file basis.  To do this in ERwin 7.3, you must first open your model, then go to MODEL, MODEL PROPERTIES:

 

MODEL, MODEL Properties

 

This will bring up the properties dialog.  Choose the DEFAULTS tab.

Defaults Tab

In the Default datatype section above, you can set the default datatype to char(-1) or whatever makes sense for you project.  I’ve seen some people make a good guess at the most generic datatype, something like varchar(50), but I prefer to assign an oddball datatype so that I can find these “unassigned” attributes and columns easily.  Note that you can also set separate defaults for the logical and physical models.  Since I’m choosing to set the default to an invalid default, I set them the same.

Having said this, I rarely create attributes manually. I usually create attributes via the domain approach, where domains have a “best guess” standard datatype that makes sense for the type of data contained in them.  For instance, a domain for email address would normally be varchar(254) in a model, since that is the Internet standard for email addresses.  However, attributes do get created without a domain all the time, so it makes sense to have a default datatype that meets your needs.

Having set my default datatype in my model to char(-1), when I create a new attribute and don’t assign a datatype it will be set to the default as shown in the MOVIE entity below:

Movie Entity The key to using an invalid datatype is to ensure that you’ve circled back and set them all to a real datatypes and lengths before you derive a new model from them, generated DDL, or compared to a database.

You can use a report or the ERwin Query Tool to find all the attributes that have this datatype:

ERwin Query Tool

More on the ERwin Query Tool in another blog post soon.

Getting SQL Server Help from the Best, Simply the Best

Nov 15, 2010   //   by Karen Lopez   //   Blog, Database, Social Networking  //  2 Comments

"Tina Turner" at SQLPASS Do you have a question about SQL Server database design or how things work?  Are you experienced in another RDMBS and are trying to apply a DB2/Oracle/Whatever design principle to SQL Server?

If you have a Twitter account you can post your question (in 140 characters or less) with the #SQLHelp hashtag and most likely someone will jump in to answer the question.

Thousands of SQL Server experts (many who do not dress as Tina Turner) want you to be a SQL Server Success, so they monitor this hashtag to answer your question or give you tips on how to address an issue.  Of course the format of Twitter means that you probably won’t want to ask about the meaning of life (A:42) or big complex issues, but even then there’s probably a blog post that someone can point you to.

Some recent examples of questions and answers to this tag:

image

I’ve made use of this “service” on Twitter a few times when I needed help with a physical design consideration.  It’s great that the SQL Community is willing to help like this.  Perhaps we should be posting questions under hashtags like #ERwinHelp and #ERSStudioHelp, too.

I’m SQLPASSed out – My Trip Report from Seattle

Nov 15, 2010   //   by Karen Lopez   //   Blog, Database, Professional Development, Social Networking, Speaking  //  2 Comments

Whew! 

IMG_0982[1]I spent last week in Seattle attending the Professional Association for SQL Server (PASS) Summit (as if you could not tell from my tweet stream). Like most successful conferences, there was more to it than a bazillion sessions.  There were so many events: vendor events, unofficial events, pre-cons, post-cons, impromptu events and more.  I feel as if I couldn’t stretch myself far enough to take advantage of all them.  Did I mention there were sessions, too?

We arrived in Seattle on Friday so that we could do our obligatory cross-border shopping at our favourite not-available-in-Canada retailers: Fry’s Electronics, Kohl’s, Target and Trader Joes.  It is my goal to shop at every Fry’s in the US; so far I have tackled more than half of them.  We didn’t buy much there, but its like visiting a gadget museum.  Geekiness for the win.  We spent time on Saturday shopping as well.  We were doing our best to help stimulate the US economy.

We also spent time visiting with friends, including Yanni Robel, John Robel, Corey Smith and Eva Smith.  We know Yanni and John from the SQLCruise.  We know Corey and Eva from numerous DAMA events.  But we built these friendships via the conversations we have via Twitter and Facebook. 

On Sunday I participated in Freecon, a special event for bloggers and social media celebrities from the SQL Server world.  This event, hosted by Brent Ozar, covered topics such as developing content, dealing with plagiarism, monetizing writing, and consulting skills.  I’d love to see this event develop into a writers’ guild for those of us putting content out there. 

IMG_1043[1] Monday was the opening session for the PASS Summit.  There was a Quiz Bowl “panel”, where SQL “elites” played a Jeopardy-like trivia game.  I’d love to see Enterprise Data World (EDW) offer a similar opening session event.

I’m not going to list all the sessions I attended, but I wanted to point out that there were several sessions that focused on data architecture topics.  My own session on “Starting with More than a Blank Page” focused on how to best adopt data model patterns/industry standard data models.  A session by Louis Davidson covered Intro to Database Design.  Normalization is a tough topic to cover in just over an hour and I thought he did a great job helping people visualize why the normal forms exist. 

It took me a while, but I managed to meet up with Neil Buchwalter, Product Manager of CA ERwin Data Modeler.  CA is a founding organization of PASS and Neil has a spot on the PASS Board.

IMG_1036[1]Wednesday was SQLKilt day.  What a hoot. There were many attendees dressed in kilts,  mostly just because they could, but also in honour of the Women in IT.  In fact, Jen McCown of MidnightDBA fame produced t-shirts to link the kilts to WIT.  That’s Sean McCown there, sporting a kilt and the t-shirt that asks “What are you doing” for women in IT.

Did I mention that several people found new jobs/gigs right at the conference?  There was a job board in the exhibits area, but most of these “connections” happened ad hoc during the lunches, coffee breaks and informal get-togethers that surrounded the event.  As I tweeted, #Network to #Getwork.

Network to Getwork

As I have blogged before, one of the best things about attending SQLPASS was that Rob and I knew hundreds of the attendees, even though it was my first time attending this event.  The connections we made on Twitter, Facebook and LinkedIn prior to the conference paid off in spades for us during the event.  Sure, I was a first timer, but it wasn’t a bunch of first conversations. For me, this is the real power of tying together virtual and in-person events.

I’m bringing home a DVD of the SQL Server new version codenamed “Denali”.  I’m looking forward to getting it installed, then moving on to all the evaluation licenses I received for third party tools.

I will definitely be back for next year’s PASS Summit, again in Seattle in October 2011 and SQLRally 2011, located in Orlando in May.  I want to continue learning about SQL Server and making connections with industry leaders in this space. 

New Blog Location

Nov 3, 2010   //   by Karen Lopez   //   Blog, Data  //  No Comments

LoveYourDataChickNoAvatarMed While we’ve been using DotNetNuke as both our blog and website content management system for a long time, we are taking the plunge and moving our blog content off DNN and on to a self hosted WordPress platform.

This is going to allow us to take advantage of better posting and reading features, as well as better support multiple bloggers (like letting Rob Drysdale (@projmgr) post under his real name instead of mine.

You’ll also find it easier to comment on and participate in discussions about our blog posts. Remember, responding to and giving feedback to a blogger is one of the greatest gifts you can give to support more blogging.

If you subscribe to our blog via Feedburner, you won’t have to change anything: we’ve updated the feed link for you.  If you manually surf to our blog, the new address is:

www.datamodel.com

Overall this shouldn’t be a huge change, other than making it easier for us to post more relevant content.

If you have any questions or have any issues with this location change, please let us know.

Subscribe via E-mail

Use the link below to receive posts via e-mail. Unsubscribe at any time. Subscribe to www.datamodel.com by Email


Categories

Archive

UA-52726617-1