I have a couple of presentations where I describe how generalized data modeling can offer both benefits and unacceptable costs. In my Data Modeling Contentious Issues presentation, the one where we vote via sticky notes, we debate the trade-offs of generalization in a data model and database design. In 5 Classic Data Modeling Mistakes, I talk about over-generalization.
Over the last 20 some years (and there’s more “some” there than ever before), I’ve noticed a trend towards more generalized data models. The means that instead of having a box for almost every noun in our business, we have concepts that have categories. Drawing examples from the ARTS Data Model, instead of having entities for:
- Purchase Order
- Shipping Notice
…we have one entity for InventoryControlDocument that has a DocumentType instance of Purchase order, Shipping Notice, Receipt, Invoice, etc.
See what we did there? We took metadata that was on the diagram as separate boxes and turned them into rows in a table in the database. This is brilliant, in some form, because it means when the business comes up with a new type of document we don’t have to create a new entity and a new table to represent that new concept. We just add a row to the DocumentType table and we’re done. Well, not exactly…we probably still have to update code to process that new type…and maybe add a new user interface for that…and determine what attributes of InventoryControlDocument apply to that document type so that the code can enforce the business rules.
Ah! See what we did there this time? We moved responsibility for managing data integrity from the data architect to the coders. Sometimes that’s great and sometimes, well, it just doesn’t happen.
So my primary reason to raise generalization as an issue is that sometimes data architects apply these patterns but don’t bother to apply the governance of those rules to the resulting systems. Just because you engineered a requirement from a table to a row does not mean it is no longer your responsibility. I’ve even seen architects become so enamoured with moving the work from their plate to another’s that they have generalized the heck out of everything while leaving the data quality responsibility up to someone else. That someone else typically is not measured or compensated for data integrity, either.
Sometimes data architects apply these patterns but don’t bother to apply the governance of those rules to the resulting systems
Alec Sharp has written a few blog posts on Generalizations. These posts have some great examples of his 5 Ways to Go Wrong with Generalisation. I especially like his use of the term literalism since I never seem to get the word specificity out when I’m speaking. I recommend you check out his 5 reasons, since I agree with all of them.
1 – Failure to generalize, a.k.a. literalism
2 – Generalizing too much
3 – Generalizing too soon
4 – Confusing subtypes with roles, states, or other multi-valued characteristics
5 – Applying subtyping to the wrong entity.
By the way, Len Silverston and Paul Agnew talk about levels of generalization in their The Data Model Resource Book, Vol 3: Universal Patterns for Data Modeling book (affiliate link). Generalization isn’t just a yes/no position. Every data model structure you architect has a level of generalization.
Every data model structure you architect has a level of generalization.
I’m wondering how many of you who have used a higher level of generalization and what you’ve done to ensure that the metadata you transformed into data still has integrity?
Leave your recommendations in the comments.
Update: I updated the link to Alec’s blog post. Do head over there to read his points on generalization.
This news arrived today:
Jefferies is also leading a US$425m covenant-lite credit to back Idera’s acquisition of Embarcadero Technologies. Idera is backed by TA Associates. The deal, which launches on Thursday, includes a US$25m revolving credit, a US$300m first-lien term loan and a US$100m second-lien term loan.
So last year we had Embarcadero attempting to purchase ERwin from CA, now today we have Idera, makers of SQL Server focused database-related solutions, moving towards buying Embarcadero.
The Embarcadero-buying-ERwin deal fell through, in part, due to regulatory concerns over market consolidation of the database/data modeling tool business. I’m wondering how regulators will feel about this consolidation of tools.
I’ve worked with both vendors in the past. Both are based in Austin, Tx. Standing by to see what happens next.
UPDATE: I’m now seeing official communications about the sale, with a very aggressive closing date. This is in contrast to the prolonged, ultimately failed acquisition attempt by Embarcadero for CA’s ERwin Data Modeler product.
Thoma Bravo, a leading private equity investment firm, today announces the sale of Embarcadero Technologies, a leading provider of software solutions for application and database development, to Idera, Inc., an application and server management software provider. The exit represents the culmination of a long and successful partnership between Thoma Bravo and Embarcadero Technologies. The deal is expected to close in mid-October, subject to normal closing conditions and approvals.
Are you an educator teaching at an accredited university or college? Do you want Azure cloud subscription grants for you and your students? All you need to do is register with some basic information at:
The current grant amounts (subject to change and regional differences, of course) for reference:
If you teach a technology-related course, you may be eligible for these grants. Remember, Azure includes more than Microsoft technologies, so these grants can cover a wide variety of course subjects.
…looks like to me.
Sure, you’ve got your own home-grown database security system all designed and working in development. And then you ask me to confirm that it’s “safe”. I’ll tell ya “it’s safe as long as you don’t actually put any data in it”.
I get asked to help teams increase the performance of their database (hint: indexes, query tuning and correct datatypes, in that order) or to help the scale it out for increasing workloads. But when I open it up to take a look, I see something that looks more like this meme.
All those cheats, workarounds and tricks they’ve used are going to make the engine optimizers work harder, make the tuning of queries that much harder and in the end it’s going to cost so much more to make it “go faster” or “go web scale”.
Where are the nail clippers in your data models and databases?
I’ve been posting about this for the last few years and I’ve finally carved out some time during my staycation to decommission our discussion group servers. This is long overdue, I know. But like any caring ListMistress, it’s been hard to say “it’s time”. It’s about 5 years past time, actually.
I know that some of you have expressed an interested in having me just continue to host these with no updates. But the technology the WebBoard software runs on is too old and out of support to do so. While there have been several physical servers over the years (starting with just mailing list software running on a PC in my basement), the vendor of the most recent software has been out of business for more than seven years. The software was running last on Windows 2003 and SQL Server 2000. And while I could likely install the software on an updated server, the installation process for this application requires a call home to a mothership that has long left the universe. So that’s not an option. There are also other considerations in that the original vendor took no steps to make the software very security-mindful and that has always bothered me.
The server (and database) I’m decommissioning today was put into production in 1998. Clinton was having a bad year, the International Space Station was just being built and InfoAdvisors had been incorporated for about a year.
I wish I had time to sort through some of our posts to see what the most fun, debatable or encouraging ones were. But what I do remember is that we built a community of data professionals who worked to make sure that everyone else working in the data field had the right resources to be successful. I will keep my database backup around and might spend some time rooting through it to find some gems. If you have some memories you’d like to share, please do so in the comments below.
we built a community of data professionals who worked to make sure that everyone else working in the data field had the right resources to be successful.
Many of you were just lurkers, reading the content, occasionally asking for help (printing with ERwin, getting macros to work in ER/Studio, figuring out what the heck a conceptual model is, etc.) But some of you did wonderful things by answering so many posts and providing user-to-user support to help others get stuff done (image shows some of our most frequent posters). And some of you came for the debate. You know who you are.
I thought I’d share some stats with you about our community. Not all of these were data boards, but since our non-data ones were trivial, I’m not going to bother filtering out. While we’ve archived a great deal of content over the decades. And, again, this is data active on the server right now, not over the entire life of our communities.
Registered Users: 10,175
These also were adjusted over the years, but we hosted communities for:
Casewise Corporate Modeler
Data Modeling (various boards)
Other Data Modeling Tools (various boards)
Rational/InfoSphere Data Architect
Unlike almost all other online communities, we actively moderated every post to our boards. That means that a human being read every post to ensure it was on topic and not spam. We could have not have done that without the help of our volunteer moderators:
Many of us are still active on other boards and social media. You should reach out to them and say thanks. They made this all happen.
I looked at web-based discussion software for my blog. I may still install some, but they all miss the feature that I really want – Email and web-based discussions, all integrated. The other issue is that there are now so many places on the web with data-focused discussions that I’m not sure standing up another one will add much value.
Here are some of the places you can go to get some data modeling community vibes:
There are also the usual internet locations of LinkedIn, Facebook and Twitter. But most of these are, let’s be candid, full of spam. I can’t really recommend any single source there.
I’m also still the moderator of dm-discuss on Yahoo Groups. I suggest you join that group if you are looking for vendor-independent discussions about data management and modeling.
I ran the infrastructure for these online communities, but you, readers and sometimes posters, delivered the content, which was the most important part. I’m hanging up my ListMistress tiara and using my Twitter to influence IT professionals to love their data now. I encourage you to find some non-data oriented communities and start influencing them to think about data, too. Then join some of the data ones and start helping each other, too.
I’m still here, still loving data. It’s just the server that is moving to a farm where it can play with other servers. I hope to see you in one these other communities.
So Long, and Thanks for All the Data
My Dataversity Heart of Data Modeling webinar this month was titled The Best Data Modeler is a Lazy Data Modeler.
In this presentation I discuss tips for automating more of the mindless tasks in data modeling (printing, publishing, complex by rote naming of objects and more). My rules for when to automate a task:
- Don’t spend time doing things that a computer is faster and better at
- Automation is your friend
- Don’t try to automate everything at once
- Don’t try to rebuild an entire data modeling tool in a script
- Focus modeling time on mindful things, not mindless ones
- If you’ve automated it, you must ask vendors to make it a feature in their tool
Check out the recording when it goes live this week. And if you have examples of automation that we didn’t cover, let me know. I’d love to talk about them (and use them in my own data modeling activities).
Subscribe via E-mail
- September 2016
- August 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- September 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- September 2010
- August 2010
- July 2010
- February 2009