One of the most clichéd blogging tricks is to declare something popular as dead. These click bait, desperate posts are popular among click-focused bloggers, but not for me. Yet here I am, writing an “is dead” post. Today, this is about sharing my responses on-going social media posts. They go something like this:
OP: No one loves my data models any more.
Responses: Data modeling is dead. Or…data models aren’t agile. Or…data models died with the waterfalls. Or…only I know how to do data models and all of you are doing it wrong, which is why they just look dead.
I bet I’ve read that sort of conversation at least a hundred times, first on mailing lists, then on forums, now on social media. It has been an ongoing battle for modelers since data models and dirt were discovered…invented…developed.
I think our issues around the love for data modeling, and logical data models specifically, is that we try to make these different types of models be different tasks. They aren’t. In fact, there are many types, many goals, and many points of view about data modeling. So as good modelers, we should first seek to understand what everyone in the discussion means by that term. And what do you know, even this fact is contentious. More on that in another post.
I do logical data modeling when I’m physical modeling. I don’t draw a whole lot of attention to it – it’s just how modeling is done on my projects.
Data Modeling is Dead Discussion
One current example of this discussion is taking place right now over on LinkedIn. Abhilash Gandhi posted:
During one of my project, when I raised some red flags for not having Logical Data Model, I was bombarded with comments – “Why do we need LDM”? “Are you kidding”? “What a waste of time!". The project was Data Warehouse with number of subject areas; possibility of number of data marts.
I have put myself into trouble by trying to enforce best practices for Data Modeling, Data Definitions, Naming Standards, etc. My question, am I asking or trying to do what may be obsolete or not necessary? Appreciate your comments.
There are responses that primarily back up the original poster’s feelings of being unneeded on modern development projects. Then I added another view point:
I’ll play Devil’s advocate here and say that we Data Architects have also lost touch with the primary way the products of our data modeling efforts will be used. There are indeed all kinds of uses, but producing physical models is the next step in most. And we have lost the physical skills to work on the physical side. Because we let this happen, we also have failed to make physical models useful for teams who need them.
We just keep telling the builders how much they should love our logical models, but have failed to make the results of logical modeling useful to them.
I’ve talked about this in many of my presentations, webinars (sorry about the autoplay, it’s a sin, I know) and data modeling blog posts. It’s difficult to keep up with what’s happening in the modern data platform world. So most of us just haven’t. It’s not that we need to be DBAs or developers. We should, though, have a literacy level of the features and approaches to implementing our data models for production use. Why? I addressed that as well. Below is an edited version of my response:
We Don’t All Have to Love Logical Data Modeling
First of all, the majority of IT professionals do not need to love an LDM. They don’t even need to need them. The focus of the LDM is the business steward/owner (and if i had my way, the customer, too). But we’ve screwed up how we think of data models as artefacts that are "something done on an IT project". Sure, that’s how almost all funding gets done for modeling, and it’s broken. But it’s also the fact of life for the relatively immature world of data modeling.
We literally beat developers and project managers with our logical data modeling, then ask them “why don’t you want us to produce data models?” We use extortion to get our beautiful logical data models done, then sit back an wonder why everyone sits at another lunch table.
I don’t waste time or resources trying to get devs, DBAs or network admins to love the LDMs. When was the last time you loved the enterprise-wide AD architecture? The network topology? The data centre blueprints and HVAC diagrams?
Data Models form the infrastructure of the data architecture, as do conceptual models and all the models made that would fill the upper rows of the Zachman Framework. We don’t force the HVAC guys to wait to plan out their systems until a single IT application project comes along to fund that work. We do it when we need a full plan for a data centre. Or a network. Or a security framework.
But here we are, trying to whip together an application with no models. So we tell everyone to stop everything while we build an LDM. That’s what’s killing us. Yes, we need to do it. But we don’t have to do it in a complete waterfall method. I tell people I’m doing a data model. then I work on both an LDM and the PDM at the same time. The LDM I use to drive data requirements from business owners, the PDM to start to make it actually work in the target infrastructure. Yes, I LDM more at first, but I’m still doing both at the same time. Yes, the PDM looks an awful lot like the LDM at first.
Stop Yelling at the Clouds
The real risks we take is sounding like old men yelling at the clouds when we insist on working and talking like it is 1980 all over again. I do iterative data modeling. I’m agile. I know it’s more work for me. I’d love to have the luxury of spending six months embedded with the end users coming up with a perfect and lovely logical data model. But that’s not the project I’ve been assigned to. It’s not the team I’m on. To work against the team is a demand that no data modeling be done and that database and data integration be done by non-data professionals. You can stand on your side of the cubicle wall, screaming about how LDMs are more important, or you can work with the data-driving modeling skills you have to make it work.
When I’m modeling, I’m working with the business team drawing out more clarity of their business rules and requirements. I am on #TeamData and #TeamBusiness. When the business sees you representing their interests, often to a hostile third party implementer, they will move mountains for you. This is the secret to getting CDMs, LDMs, and PDMs done on modern development projects. Just do them as part of your toolkit. I would prefer to data model completely separately from everyone else. I don’t see that happening on most projects.
The #TeamData Sweet Spot
My sweet spot is to get to the point where the DBAs, Devs, QA analysts and Project Managers are saying "hey, do you have those database printouts ready to go with DDL we just delivered? And do you have the user ones, as well?" I don’t care what they call them. I just want them to call them. At that point, I know I’m also on #TeamIT.
The key to getting people to at least appreciate logical data models is to just do them as part of whatever modeling effort you are working on. Don’t say “stop”. Just model on. Demonstrate, don’t tell your teams where the business requirements are written down, where they live. Then demonstrate how that leads to beautiful physical models as well.
Logical Data Modeling isn’t dead. But we modelers need to stop treating it like it’s a weapon. Long Live Logical!
I have a couple of presentations where I describe how generalized data modeling can offer both benefits and unacceptable costs. In my Data Modeling Contentious Issues presentation, the one where we vote via sticky notes, we debate the trade-offs of generalization in a data model and database design. In 5 Classic Data Modeling Mistakes, I talk about over-generalization.
Over the last 20 some years (and there’s more “some” there than ever before), I’ve noticed a trend towards more generalized data models. The means that instead of having a box for almost every noun in our business, we have concepts that have categories. Drawing examples from the ARTS Data Model, instead of having entities for:
- Purchase Order
- Shipping Notice
…we have one entity for InventoryControlDocument that has a DocumentType instance of Purchase order, Shipping Notice, Receipt, Invoice, etc.
See what we did there? We took metadata that was on the diagram as separate boxes and turned them into rows in a table in the database. This is brilliant, in some form, because it means when the business comes up with a new type of document we don’t have to create a new entity and a new table to represent that new concept. We just add a row to the DocumentType table and we’re done. Well, not exactly…we probably still have to update code to process that new type…and maybe add a new user interface for that…and determine what attributes of InventoryControlDocument apply to that document type so that the code can enforce the business rules.
Ah! See what we did there this time? We moved responsibility for managing data integrity from the data architect to the coders. Sometimes that’s great and sometimes, well, it just doesn’t happen.
So my primary reason to raise generalization as an issue is that sometimes data architects apply these patterns but don’t bother to apply the governance of those rules to the resulting systems. Just because you engineered a requirement from a table to a row does not mean it is no longer your responsibility. I’ve even seen architects become so enamoured with moving the work from their plate to another’s that they have generalized the heck out of everything while leaving the data quality responsibility up to someone else. That someone else typically is not measured or compensated for data integrity, either.
Sometimes data architects apply these patterns but don’t bother to apply the governance of those rules to the resulting systems
Alec Sharp has written a few blog posts on Generalizations. These posts have some great examples of his 5 Ways to Go Wrong with Generalisation. I especially like his use of the term literalism since I never seem to get the word specificity out when I’m speaking. I recommend you check out his 5 reasons, since I agree with all of them.
1 – Failure to generalize, a.k.a. literalism
2 – Generalizing too much
3 – Generalizing too soon
4 – Confusing subtypes with roles, states, or other multi-valued characteristics
5 – Applying subtyping to the wrong entity.
By the way, Len Silverston and Paul Agnew talk about levels of generalization in their The Data Model Resource Book, Vol 3: Universal Patterns for Data Modeling book (affiliate link). Generalization isn’t just a yes/no position. Every data model structure you architect has a level of generalization.
Every data model structure you architect has a level of generalization.
I’m wondering how many of you who have used a higher level of generalization and what you’ve done to ensure that the metadata you transformed into data still has integrity?
Leave your recommendations in the comments.
Update: I updated the link to Alec’s blog post. Do head over there to read his points on generalization.
I’ve been attending Enterprise Data World for more than 15 years. This event, focused on data architectures, data management, data modeling data governance and other great enterprise-class methods is part technical training and part revival for data professionals. It’s just that good.
This year the big bash is being held in Austin, TX, a thriving tech-oriented community, 27-April to 1 May. And this year’s theme is “The Transformation to Data-Driven Business Starts Here.”
And right now there’s a $200 Early Bird Discount going…plus if you use coupon code “DATACHICK” you can save $200 more on a multi-day registration or fifty bucks on a one day pass. There. I just saved you $400. And no, I get no kickbacks with this discount code. I don’t need them. I need you to be at this event, sharing your knowledge and meeting other data professionals. I need you to be part of the community of data professionals.
Top 10 Reasons You Need to Go to EDW 2014
- Data is HOT HOT HOT. I deemed 2013 The Year of Data and I see no signs that organizations are going to back to software-is-everything thinking. 2014 is still going to be a year full of data. There’s even an executive, invitation-only CDOvision even co-located.
- Not Just Bullet Points. There are over 20 hours of scheduled networking events for you to chat with other data-curious people. Chatting with other data professionals is my favourite part of this event. Bring your business cards…er… .vcs contact file.
- Lots of Expertise. Not just data celebrities, but also other data professionals with thousands of hours of hands-on experiences, sharing their use cases around data. And not just data modeling. Big Data. Analytics. Methods. Tools. Open Data. Governance. NoSQL. SQL. RDBMS. Fun.
- Certifications. You can take advantage of the Pay-Only-If-You-Pass option for the CDMP on-site certification testing.
- Workshops. I’m doing a half day tutorial on Driving Development Projects with Enterprise Data Models. I’ll be talking about how data models fit within real-life, practical, get-stuff-done development projects. No ivory towers here.
- SIGs. There are special interest groups on data modeling products, industries and methods. You can meet people just like you an share your tips and tricks for data lovin. I will be leading the ER/Studio SIG.
- Ice Cream. This conference has a tradition of the ice cream break on the exhibit floor. Nice ice cream, even.
- Austin. Austin is one of the more vibrant cities in Texas. So cool, it even has a Stevie Ray Vaughan statue. Museums, Theatres, indoor golf, clubs. There’s a reason why SxSW is held here.
- Vendors. Yes, we love them, too. Meet the product teams of the makers of the tools you use every day. Or meet new teams and ask for a demo. They are good people.
- Love Your Data. There’s no better way to show your love than to network with other data professionals and learn from industry leaders.
Come learn how to help your organization love data better. You might even see me in a lightning talk holding a martini. Or taking impromptu pics of @data_model and other data professionals. Or debating data management strategy with people from around the globe. In other words, talking data. With people who love their data. Join us.
If you’ve been to one of my “Stuff Your Database Says” sessions, you know I collect photos of how my data is messed up by information systems.
Many of my frequent flyer friends can confirm that integration between airlines, even alliance partners, is plagued with problems.
Here’s today’s boarding pass. I call this my “secret international alias”.
Join me and an excellent panel of industry experts in discussing the state of Data Modeling Governance. No, not Data Governance, but Data Modeling Governance. This free webinar is Thursday, 23 May at 2 PM EDT. While the formal part starts at 2 PM, you are free to join us 15 minutes before as we prepare for the event. You can even post your Data Modeling Governance questions then so that we can answer them
We data architects spend a great deal of time advocating for organizations to treat their data as an asset. We champion the efforts to set up stewardship programs and data governance councils. We insist that data conform to enterprise naming and modeling standards. We enforce data policies, measure data quality, report deficiencies and track anomalies. But do we follow our own advice when it comes to managing “our” data – metadata and data models?
In this webinar, we’ll be tackling the questions of:
- Do you have budget (money and time) to govern the data modeling environment?
- How can we get the resources we need to properly govern our data models?
- Who sets permissions and manages them?
- When does data modeling by email work?
- Are data modeling artefacts part of the production systems operations?
- Are there multiple data modeling tools in your environment? For the right reasons?
- Are we loving our data models as much as our data?
- …plus more.
My panellists this week have a great deal of experience in working on a variety of enterprise environments:
- Anne Marie Smith, Ph.D. is an Information Management professional and consultant with broad experience across industries. She has exceptional, demonstrated skills in business requirements gathering and analysis, data governance and stewardship, data architecture, data and process modeling, strategic data management, meta data management, data quality management, master data management, data warehouse planning and design, project management, and information systems methodology development.
- David Loshin is the President of Knowledge Integrity, Inc, (www.knowledge-integrity.com), a consulting company focusing on customized information management solutions including information quality consulting and training, business intelligence, metadata, and data standards management. David is among Knowledge Integrity’s recognized experts in information management.
- Pete Stiglich is a Senior Healthcare Data Architect at Perficient. He has over 25 years IT experience having worked in Enterprise Data Architecture, Data Management, Data Modeling (Conceptual, Logical, Physical, Dimensional, Data Vault, EDW), Data Quality, DW/BI, MDM, Metadata Management, Data Quality, and Database Administration (DBA)
- You! I always consider the audience as the first panellist in a webinar. We have a open, engaging webinar configuration where attendees can chat with each other and ask questions to the panel.
You’ll need to pre-register, but it’s fast and free. Bring your questions, comments, snark and observations. See you Thursday.
Oh, and if you are late reading this and the webinar has already happened: no worries. We record every event and post it to the Dataversity website. You’ll miss all the great chatter in the chat room, though.
Some people believe that in an age of Facebook, Foursquare and Twitter, we should give up all our expectations of privacy. While I agree that I’ve been shocked by the amount of personal information that people share (sometimes even how much I share), I still believe that organizations need to have the right technologies, policies and training in place to protect abuse of personal and sensitive data.
In a wilful privacy breach in 2011, a clerk at British Columbia’s insurance bureau (ICBC) accessed customer data in order to intimidate employees of another organization. One of the victims has launched legal proceedings against ICBC for failing to have suitable data protections in place. ICBC is a sort of universal automobile insurance organization in BC – everyone who wants a driver’s license there must get their insurance via this organization, so their data collection covers most adult BC residents.
Annette Oliver isn’t just worried about sensitive information being made public, but about how that data was used to terrorize her family and co-workers.
Annette Oliver alleges in her lawsuit that her husband’s van was torched on April 17, 2011, at about 2 a.m., which police believe was an arson.
Then on June 1, 2011, Oliver claims, she was at home when she heard three loud bangs at about 5 a.m. and discovered three bullet holes in the front of her house.
Oliver says her husband and two daughters were home at the time.
This wasn’t an isolated case: others had their cars burned and homes shot.
Three months later, on Dec. 14, 2011, the RCMP revealed the investigation had found a link to an ICBC employee, who allegedly accessed personal information of 65 people, including 13 identified as victims who were targeted.
ICBC said at the time the employee under investigation was a woman who had been at ICBC for 15 years before she was fired in August 2011
It appears from the lawsuit that ICBC did not use monitoring technologies to monitor access. Or that they weren’t using them correctly. I’m always surprised by organizations that steward customer data and don’t do much to properly care for that data. We’ll see in the end whether or not ICBC had suitable protections.
Myths about Data Protection
- Data privacy breaches don’t really hurt people. This one makes me mad. Even something less physically harmful like having their identities stolen can cause years of trouble for your customers, not to mention great financial harm. But data breaches can and do physically harm people.
- Data privacy is about secrecy. No, data privacy protection is about controlling the usage of data for only the reasons for which it was collected. Among other things.
- If the data is available elsewhere, it doesn’t need to be protect in our database. No, IT professionals still have a duty to protect personal and sensitive data in their care.
- Data wants to be free, so we shouldn’t control how it’s used within the organizations. Yeah? My cats want to be free, too. And we still don’t let them outside.
- Data protection is just a technology issue. Data protection is just a training issue. Data protection requires technological, process and people-based solutions.
- Encryption is all we need to do. No, because if people can read the data or download it, it’s not encrypted any more. Encryption helps when people walk away with the data. But people who use the data don’t see encrypted data.
- Data privacy requirements can be applied after the system goes into production. This one drives me crazy. Data protection requires effort at all phases of a project. There architectural, design, development, deployment and maintenance components to be addressed. There are policy and procedures to be developed. There is monitoring and alerting to be practiced.
You know my mantra. Love your data because it’s not really yours. You have a professional duty to ensure it’s safe.
Read the full story at Metronews
I think we need to have an industry acronym now that this seems to happen every week. My proposals:
- Yet Another USB Breach (YAUB)
- Blame A Thumbdrive (BLAT)
- Yet Another Flashdrive Fail (YAFF)
I like the YAFF one best, so I’m going with that, even though the #FAIL really isn’t in the hardware, but in the abuse of policy and hardware to cause a data breach.
This week’s YAFF announcement comes again from Utah, where a contractor with access to sensitive health data lost a USB flash drive somewhere between Salt Lake City, Denver, and Washington, DC.
What’s different about this news story is that we get more insight as to why that data was on a portable device. And it’s just as I prognosticated in a previous post: the contractor was frustrated with an infrastructure issues.
The contractor, Goold Health Systems, handles Medicaid pharmacy transactions for the Health Department.Department spokesman Tom Hudachko said the GHS employee, identified only as a woman from Denver, was having trouble with an Internet connection Thursday while trying to upload the data to a server. The employee saved the personal information to an unencrypted USB memory stick and left the Health Department with the device. The employee lost the stick sometime in the following days while traveling between Salt Lake City, Denver and Washington, D.C.
The contractor lost her job over this.
People Forget Policy When They Are Frustrated or Stressed
I once found a QA contractor cursing at his computer because he was having trouble sending a large file via his Hotmail account. I offered to help. When he showed me what he was doing I just about had a heart attack. He had been trying to send our offshore contractor a copy of a production database backup. This backup contained names, addresses, phone numbers, credit card information (no, the legacy system shouldn’t have been storing this information, but it did), SSNs, Driver’s license numbers and other forms of ID. It was an identity theft treasure chest of awesome.
When I asked him why he was trying email this information to our offshore contractor he said he was frustrated that corporate email system would not let him email such a large file.
He told me the only reason he did this was that he had to get the bug logged and fixed before the weekend because he had plans to be away. He also forgot that production data was never supposed to leave the building. I’m not sure he ever really felt that what he was doing was wrong, or had any idea why emailing sensitive data was wrong.
The other shock I got was that it was a production DBA who had given him the backup. When I asked the DBA why he did this without even asking what it was for, he said "I was really busy and didn’t have time."
I wonder just how many times this scenario plays out every day in offices around the world.
Love your data, even when you are stressed. Especially when you are stressed.
Subscribe via E-mail
- September 2016
- August 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- September 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- September 2010
- August 2010
- July 2010
- February 2009