Browsing articles from "January, 2012"

Another Zombie Job Posting…Data Architect Designer Implementer Operational Support

Jan 30, 2012   //   by Karen Lopez   //   Blog, Data, Data Modeling, Database, Professional Development, Snark  //  1 Comment

I blogged over on Dataversity about Hiring Data Professionals: Mason Dixon Lines and Zombies in Your Job Postings .  In that rant, I talk about organizations that want to hire people who can do everything in the data column of the Zachman Framework.


I call these people "wonder candidates" and write about how they don’t exist in sufficient numbers to supply all the organizations in the world:

It would seem to make sense that if you were hiring a data professional you’d design a position that fills in the Data column, right?  No?  It turns out, though, that most people don’t think and work along a column.  In my experience, people aren’t passionate about tasks that span columns from top to bottom.  They normally aren’t skilled along the whole column, either.  Referring to the Zachman Framework, what sorts of skills and passions would this candidate need: planning, architecting, designing, building systems, building parts, keeping the systems up and running.

I thought about my rant in this area while reading a job posting on Dataversity for a Data Architect.  I’m sure the people at Miami Children’s Hospital do amazing things, probably with very limited budgets.  That’s why these hiring organizations tell me they have to fill their positions with Architect Designer Developer Implementer Operational Support Wonder Candidates.  I’m going to pick on this posting, so apologies to the hospital for using them as an anti-pattern for finding good data architects.  I’m sure they are nice people there and really want to get to successful database and data warehouse solutions.  You might even want to apply for that job.

“Designs and constructs very large relational databases for data warehousing. Develops data models and is responsible for data acquisition, access analysis and design, and archive/recovery/load design and implementation. Integrates new data with existing data data warehouses in design and planning.”

Right there we have the keywords design, constructs, develops, implementation.  These activities are done in different rows in the data column of the Zachman Framework.  There’s also this:

performance tuning, data retention policies, data classification, data security, and data acquisition….Data modeling experience. Database and application object management, including DDL, table constraints and triggers, clusters, object storage allocation and tuning, indexing options and tradeoffs, partitioning, etc., experience.”

Those activities are clearly down in the lower half of the Framework.  Yet data modeling, which exists along the entire data column, is not typically a strong skill set for people who work so far down in the Framework.   So my guess is that professional data architects and modelers will not be qualified to do the clustering/partitioning/indexing/performance tuning part of the job and that implementers who can won’t be qualified to prepare and maintain the data models they also want out of this role.

If I were interviewing for this type of position, I’d focus on why this organization wants data models but doesn’t seem to want to fund a data architect.  It’s sounds crazy, but I recommend that organizations not incur the costs of preparing and maintaining data models when they don’t want to work with professional data modelers.  They won’t see many of the benefits of having an active data model but will incur all the costs and the risks associated with preparing incorrect ones.

I realize that there are many successful IT professionals who can work along many rows and columns.  I’ve worked with these amazing people.  But staffing a team of these amazing people is costly: they are difficult to find, expensive to hire, and tough to keep around because:

There may be people who can do a lot of those things, but in my experience they aren’t passionate about all of them. New hires won’t be happy and the organization will not realize the economies that they think they will. 

I recommend that if organizations want to combine responsibilities that they do so across the columns in the same range of rows.  Combining positions where thought processes are similar (business and data analysts, DBAs and developers, etc.).  Analysts in general make for good analysts in other columns.  Operational people tend to think operationally, builders tend to think mostly of building, not planning well.  Let’s not drag people up or down the rows.

Go now and check your job postings.  Do they reflect the true nature of the job?  Or are they actually full of zombies ready to drag someone to an assignment that they don’t really want?

Do you work with any of these Zombies?  People who have been hired to fill several jobs, but don’t have the passion or skills to do all of them?  How is that working?

Monte Carlo-ing Your Eventual Consistency Bets

Jan 10, 2012   //   by Karen Lopez   //   Blog, Database  //  No Comments

One of the features of not-only-SQL (NoSQL) data storage systems is the concept of eventual consistency (via Wikipedia):

Eventual Consistency… means that given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent.

For those of us coming from a transactional system point of view, eventual consistency can be mind-boggling at first. Thinking about data being presented in an inconsistent manner is usually seen as a data quality failure — something to be avoided. But in non-transactional systems it’s worth the trade-off for speed and scalability. Think about your Facebook page for a minute: how bad would it be if one of your friend’s updates was not visible to you at the same time it was visible to someone else, but eventually you’d be able to see that update?

Paul Cannon has a great write up on using tools to estimate your eventual consistency with Cassandra:

"The best part is that they also provided the world with an interactive demo, which lets you fiddle with N, R, and W, as well as parameters defining your system’s read and write latency distributions, and gives you a nice graph showing what you can expect in terms of consistent reads after a given time.

See the interactive demo here.

This terrific tool actually runs thousands of Monte Carlo simulations per data point (turns out the math to create a full, precise formulaic solution was too hairy) to give a very reliable approximation of consistency for a range of times after a write."

Being able to plan your architecture to best fit the business need is what is important, not necessarily data purity at the cost of speed or reliability.  Again, that sounds weird to a profession that has focused on fighting to keep data integrity on the radar of management, but the best design decisions are made balancing cost, benefit and risk.  Those of us in the data world to understand that eventually consistent is often the best solution.  Even if it feels weird.

Having tools that help us understand how to best architect the trade-offs is the first step in delivering the right data consistency for what the business needs.

7 Mistakes You *CAN* Afford to Make

Jan 2, 2012   //   by Karen Lopez   //   Blog, Data, Data Modeling, Fun, Snark  //  No Comments

I just saw that my article on Enterprise Data Modeling: 7 Mistakes You Can’t Afford to Make at Information Management magazine was on their list of the Top 10 Stories of 2011.    In fact, it made it to the top five stories. My friend Corey Smith ( blog | @heycorey ) then took it upon himself to ask

I’d like to know the 7 mistakes I *can* afford to make, so I can stop worrying about them?

Tweet by Corey

He wasn’t specific about whether he meant in enterprise data management or not, so I’m responding in general. These are all mistakes, but one can afford to make them, in moderation.

  1. Peanut butter: I’m not a fan of it, especially when it’s hidden in other stuff like pretzels and vodka
  2. Rounding Up:  Mathy people say that you should round up sometimes and round down other times.  And sometimes you don’t.  I say, make everyone happy and round up, right?
  3. Non-Alcoholic Beer:  I have to say that this stuff is usually quite bad, which is why it is a mistake.  But sometimes you need to have a beer and this is all they will let you drink. Like at lunch in the US or between real beers in the rest of the world.
  4. Twitter Grammar: 140 chars is some*s 2 short 2 say something 1derful.
  5. Running: What can I say?  I do it, and I want to do it more. It always feels like a mistake when I start doing it.  But I can afford to do more.
  6. Bacon: I prefer the veggie kind.  Others don’t.  We both feel as if the other is making a mistake.  That’s okay.
  7. February: It’s post holiday season, it’s not really spring and hardly anyone can pronounce it correctly.  But everyone has to do it.

So there you go, Corey,  your list.

Everyone else, go check out my article as well as all the other Top Stories in from Information Management.

Subscribe via E-mail

Use the link below to receive posts via e-mail. Unsubscribe at any time. Subscribe to by Email