Browsing articles in "Data Governance"

10 Ways I Can Steal Your Data: eBook

I wrote an eBook sponsored by SolarWinds. I share real life stories of non-traditional, non-hacker ways I can steal your data.  You can download the PDF for free (registration required).

clip_image001

I’ve also been contributing a blog series over on THWACK, 5 MORE Ways I can Steal Your Data, 5 More Ways I Can Steal Your Data: Work for you and Stop Working for You, 5 More Ways I Can Steal Your Data: Accessing Unmonitored Servers and Services, 5 More Ways I Can Steal Your Data: Ask the Security Guard to Help Me Carry it Out.  There’s one more post coming up soon, too.

Data protection from a data architect’s point of view is going to be a big focus of mine over the next year or so.  I’m hoping it will be yours, too.

You’re Doing it Wrong: Generalizations about Generalizations

What Could Go Wrong Complicated Data Model Thumbnail Plus Darth Vader

I have a couple of presentations where I describe how generalized data modeling can offer both benefits and unacceptable costs.  In my Data Modeling Contentious Issues presentation, the one where we vote via sticky notes, we debate the trade-offs of generalization in a data model and database design.  In 5 Classic Data Modeling Mistakes, I talk about over-generalization.

Over the last 20 some years (and there’s more “some” there than ever before), I’ve noticed a trend towards more generalized data models.  The means that instead of having a box for almost every noun in our business, we have concepts that have categories.  Drawing examples from the ARTS Data Model, instead of having entities for:

  • Purchase Order
  • Shipping Notice
  • Receipt
  • Invoice
  • etc

…we have one entity for InventoryControlDocument that has a DocumentType instance of Purchase order, Shipping Notice, Receipt, Invoice, etc.

See what we did there?  We took metadata that was on the diagram as separate boxes and turned them into rows in a table in the database.  This is brilliant, in some form, because it means when the business comes up with a new type of document we don’t have to create a new entity and a new table to represent that new concept.  We just add a row to the DocumentType table and we’re done.  Well, not exactly…we probably still have to update code to process that new type…and maybe add a new user interface for that…and determine what attributes of InventoryControlDocument apply to that document type so that the code can enforce the business rules.

Ah! See what we did there this time?  We moved responsibility for managing data integrity from the data architect to the coders.  Sometimes that’s great and sometimes, well, it just doesn’t happen.

So my primary reason to raise generalization as an issue is that sometimes data architects apply these patterns but don’t bother to apply the governance of those rules to the resulting systems.  Just because you engineered a requirement from a table to a row does not mean it is no longer your responsibility.  I’ve even seen architects become so enamoured with moving the work from their plate to another’s that they have generalized the heck out of everything while leaving the data quality responsibility up to someone else.  That someone else typically is not measured or compensated for data integrity, either.

Sometimes data architects apply these patterns but don’t bother to apply the governance of those rules to the resulting systems

Alec Sharp has written a few blog posts on Generalizations. These posts have some great examples of his 5 Ways to Go Wrong with Generalisation.   I especially like his use of the term literalism since I never seem to get the word specificity out when I’m speaking. I recommend you check out his 5 reasons, since I agree with all of them.

1 – Failure to generalize, a.k.a. literalism

2 – Generalizing too much

3 – Generalizing too soon

4 – Confusing subtypes with roles, states, or other multi-valued characteristics

5 – Applying subtyping to the wrong entity.

By the way, Len Silverston and Paul Agnew talk about levels of generalization in their The Data Model Resource Book, Vol 3: Universal Patterns for Data Modeling book (affiliate link).  Generalization isn’t just a yes/no position.  Every data model structure you architect has a level of generalization.

Every data model structure you architect has a level of generalization.

I’m wondering how many of you who have used a higher level of generalization and what you’ve done to ensure that the metadata you transformed into data still has integrity?

Leave your recommendations in the comments.

Update: I updated the link to Alec’s blog post.  Do head over there to read his points on generalization.

Your #1 Job….

Jan 6, 2015   //   by Karen Lopez   //   Blog, Data Governance, Data Modeling, Database Design, DLBlog  //  3 Comments

Tim Berners-Le Quote on CEO connect data

I hear frequently, especially from the DBA groups, that our number one job as a data professional is performance.  That typically includes making sure database queries run fast, that systems have expected uptimes, and that developers/DBAs can do their jobs as fast as possible without slowing down to consider whether or not they are doing the right thing for the data. In fact, I’ve been told many times that data quality is Job NULL, meaning that we shouldn’t care as much about data quality as we do about performance.  The crazy things I’ve read: query running slow? Delete some rows and see if anyone notices.  Assign numeric datatypes to number-like columns so they will be smaller (and missing leading zeros).  Make columns small, even if it means losing data. Shove data in a column with comma delimiters so that you don’t have to change the database.  Re-use a column for something it was never intended for.

Developers and DBAs start thinking this way, for the most part, because they are measured and rewarded based on all kinds of factors other than data quality.  And yet management expects systems to support exactly what Tim Berners-Lee says in this quote.  Sure, making systems purr is one part of allowing data to be connected across sources.  But misleading data, mis-understoood data and plain old bad data means that CEOs can’t run a company effectively. 

Any enterprise CEO really ought to be able to ask a question that involves connecting data across the organization, be able to run a company effectively, and especially to be able to respond to unexpected events.

Most organizations are missing this ability to connect all the data together.

There are all kinds of presentations and blog posts about how to make systems run fast.  There are so few about how to love your data so that the CEO can rely on it. The first person that needs to fix this mismatch of incentives and actions is the CEO.  She needs to ensure that IT professionals are properly evaluated and motivated to produce both fast data and correct data.  And to stop providing incentives for IT professionals to work against data quality.

DBAs and Developers want to do the right thing. It’s just that we are paying them to do the wrong things over the right things.  

I’m all ‘bout the data…

Dec 5, 2014   //   by Karen Lopez   //   Blog, Data, Data Governance, Fun, Parody, Snark  //  No Comments

AllAboutTheData

Big Challenges in Data Modeling: Ethics & Data Modeling–24 April

Modeling with Graeme

I have a great topic and panel for this month’s Big Challenges in Data Modeling webinar on Thursday, 24 April 2014, 2:00 PM EDT. It’s free, but you have to register to get the log in information.

Ethical Issues in Data Modeling

We’ll be talking about the nature of ethics, data and data modeling.  I bet all of you have been placed in a tough situation before, either by other IT professionals or by business users who ask you to do something that you aren’t sure is ethical.  Maybe it’s legal, maybe it isn’t.  Maybe it’s about protecting data or data quality.

Some of the topics I hope we can discuss:

  • What is the nature of ethics?
  • How do ethics differ from morality? Legality?
  • Can ethics be taught?
  • Where does ego come into play here?
  • What about Codes of Ethics and Codes of Conduct?
  • Is there one right answer? Is there an always wrong answer?
  • What’s the difference between a whistleblower and a tattletale?
  • What tools do we have in making ethical decisions?
  • How should we deal with unethical co-workers? Management? Customers?
  • What does it all mean, anyway?

Ethical Situations in Data and Data Modeling

  • If the answer is always “it depends”, what does it depend on?
  • What if faster data means lesser data quality?
  • Have you ever been asked to falsify a status report?
  • Have you had to deal with someone else who provided incorrect information to a business user or management?
  • Have you ever been asked to look the other way when security policies are being broken?
  • Have you raised an issue of data protection that was ignored? Or minimalized?
  • What about using production data for testing and development?
  • What if the data is right, but the transformations or reporting is wrong?
  • What if it’s intentionally wrong or misleading?
  • Have you ever had to deal with someone else’s ego?
  • Have you escalated an ethical issue? What about a legal one? A moral one?
  • Do data modelers have distinct areas that we need to watch out for when it comes to ethics?
  • Have you ever left a job or project due to ethical reasons?

 

Panelists

Len Silverston (http://www.univdata.com/ | @lensilverston ), author of Universal Data Models I, II, III, speaker, coach, consultant, trainer.

 

 

Denny Cherry, (http://dcac.co/ | @mrdenny ) author of Basics of Digital Privacy, Securing SQL Server and other books, speaker, consultant and trainer.

 

 

Tamera M. ClarkTamera Clark (http://clarkcreations.net/blog/ | @tameraclark ) speaker, volunteer, Business Intelligence expert

 

Kerry Tyler, (http://www.airbornegeek.com/ | @airbornegeek ) speaker, volunteer, Business Intelligence Developer.

 

 

image

YOU! Our webinars consider attendees as panelists. You’ll have the opportunity to ask questions, chat with other attendees and tell your own stories. You can even arrive early and stay late for our pre-show and after-show discussions. 

 

 

Register now and bring your ethical questions and comments.

Data Modeling is Iterative. It’s not Waterfall

Mar 7, 2014   //   by Karen Lopez   //   Blog, Data, Data Governance, Data Modeling, Database Design, DLBlog  //  7 Comments

Sure, data modeling is taught in many training classes as a linear process for building software.  It usually goes something like this:

  1. Build a Conceptual Data Model.
  2. Review that with users
  3. Build a Logical Data Model
  4. Review that with users
  5. Build a Physical Data Model
  6. Give it to the DBA
  7. GOTO step one on another project.

And most team members think it looks like this:

image

Training classes work this way because it’s a good way to learn notations, tools and methods.  But that’s not how data modeling works when the professionals do it on a real project.

Data modeling is an iterative effort. Those integrations can be sprints (typical for my projects) or have longer intervals. Sometimes the iterations exist just between efforts to complete the data models, prior to generating a database.  But it’s highly iterative, just like the software development part of the project. 

In reality, data modeling looks more like this:

Data Model Driven Development - Karen Lopez

This is Data Model-Driven Development.  The high-level steps work like:

  1. Discuss requirements.
  2. Develop data models (all of them, some of them, one of them).
  3. Generate Databases, XML schemas, file structures, whatever you might want to physically build. Or nothing physical, if that’s not what the team is ready for. 
  4. Refine
  5. Repeat.

These, again, are small intervals, not the waterfall steps of an entire project.  In fact, I might do this several times even in the same sprint. Not all modeling efforts lead to databases or physical implementations.  That’s okay.  We still follow an iterative approach.  And while the steps here look like the same waterfall list, they aren’t the same.

  • There isn’t really a first step.  For instance, I could start with an in-production database and move around the circle from there.
  • We could start with existing data models. In fact, that’s the ideal starting point in a well-managed data model-driven development shop.
  • The data models add value because they are kept in sync with what’s happening elsewhere – as a natural part of the process, not as a separate deliverable.
  • The modeling doesn’t stop.  We don’t do a logical model, then derive a physical model, throwing away the logical model.
  • Data  modelers are involved in the the project throughout its lifecycle, not just some arbitrary phase. 
  • Modeling responsibilities may be shared among more roles.  In a strong data model-driven process, it is easier for DBAs and BAs to be hands-on with the data models.  Sometimes even users.  Really.

By the way, this iterative modeling approach isn’t unique to data models.  All the models we might work on for a project should follow this project.  Class diagrams, sequence diagrams, use cases, flow charts, etc. should all follow this process to deliver the value that has been invested in them.  That’s what Agile means in “the right amount of [modeling] documentation”. Data model driven development means that models are “alive”.  

If you are a modeler and re-enforcing the wrong perceptions of needing a waterfall-like approach to data modeling, you are doing it wrong.  You might be causing more pain for yourself than anyone else on your project.

Data Models aren’t just documentation checklist items.  They model the reality of the living, breathing systems at all points in its life.  They deliver value because they are accurate, not because they are “done”.

Rob Ford’s Achilles Data Management

Feb 26, 2014   //   by Karen Lopez   //   Blog, Data, Data Governance, DLBlog, Snark  //  No Comments

Mayor Ford CC 2.0 http://commons.wikimedia.org/wiki/User:MTLskyline

I don’t usually blog about politics here, but when bad data management and bad people mix, it’s time for a post…

Toronto Star reporter Robyn Doolittle has reported that my world famous (infamous?) mayor Rob Ford, may have lost all the data from his previous election campaign.

Councillor Doug Ford claims the mayor’s former campaign manager, Nick Kouvalis, is refusing to turn over valuable 2010 voter database information.

Kouvalis, who also served for a time as Ford’s chief of staff, is now working for the John Tory campaign. The man who actually ran the database aspect of Rob Ford’s first mayoral campaign says the Fords were given everything right after the election.

“I made two DVDs with all of the data from the campaign — entire voters’ list with contact info, supporters, non-supporters, signs, volunteers, all voter contact records, etc. — and gave them both to Doug Ford,” said Conservative data expert Mitch Wexler.

And,

If it is in fact gone, it would be a serious blow to the mayor’s re-election hopes. Numerous political strategists involved in the 2010 race say what helped set Ford apart was that voter intelligence, much of it collected by Ford himself over his 10 years as a councillor in Etobicoke.

I’ll try to not to comment on the use of the term “voter intelligence.”  Just in case you’ve been hiding under a rock (not a crack rock, I presume) our mayor has been in a heap of trouble (NSFW) since he was elected.  Actually, even before he was elected.  This isn’t a partisan thing when I say I’m not a fan of my mayor.  This is all about not respecting his behaviour.  But back to the data thing….

Where Rob Ford’s Data Management Went Wrong

Well, pretty much every single thing he has done has been wrong.  At least it feels that way. And sounds and views that way. But if we focus on today’s issue of his reported data loss, I’m thinking he messed up by:

  • Giving source data to an external party without a backup.  When Ford handed over those record boxes full of 10 years of handwritten notes, he lost his source data.  All data deserves protection, even handwritten notes.  We in IT sometimes ignore paper data, but we shouldn’t.  It’s still data.
  • Storing personally identifiable and sensitive data insecurely.  I’m betting those file boxes where sitting next to his desk.  Sure, his desk is in city hall and I’m betting they have decent physical security.  But file boxes aren’t exactly locked cabinets. They also have a way of getting disposed of incorrectly. 
  • Outsourcing data and database management without getting copies of data on a regular basis.   It’s sort of crazy to hand over critical data to a third party for management and not insisting that you get copies of it on a regular basis.  Even if your relationship is strong, people leave companies or they stop working for you (as we see in Rob Ford’s case).  Have you been getting data, models, code, documents from your vendors on a regular basis? You should.
  • Using data collected for a specific reason for another reason.  Allegedly this data was collected by Ford in fulfilling his duties as city councillor.  I’m not sure whether that means it can be used for fundraising and vote elicitation.  Sounds off to me. I wonder if all those people who called Ford asking for help with their trash collection and dead raccoon needs knew they were being added to a campaign database.
  • Waiting until he needed the data to ask for it.  It appears that the Ford brothers waited until it was time to campaign to play “who has the data”. It would be entirely possible (maybe even legally or ethically required) for the outsourcer to destroy all copies of the data when their work ended and the data was given back to Ford.
  • Getting copies of data and losing them.  It’s reported that the data was provided to Rob Ford’s brother, Councillor Doug Ford.  But it appears he lost the data.  That’s not good. Where are those DVDs now?  Again, this indicates that private and sensitive data probably wasn’t treated with the respect it deserves.

As data professionals, I believe it’s our job to ensure that all data is properly managed and protected.  That means monitoring paper and digital data, ensuring that good data management practices are followed, and ensuring that these practices are followed even when we outsource these activities. Please, go find out if anyone in your organization is doing a better job than Rob Ford is.  You might be shocked at what you find.

Pages:123»

Subscribe via E-mail

Use the link below to receive posts via e-mail. Unsubscribe at any time. Subscribe to www.datamodel.com by Email


Categories

Archive

UA-52726617-1