Browsing articles in "Database Design"

Figuring out Consistency Levels in Azure Cosmos DB

Apr 18, 2018   //   by Karen Lopez   //   Azure, Blog, Cosmos DB, Data, Database, Database Design  //  No Comments

Azure Cosmos DB five levels of consistency Stront, Bounded Staleness, Session, Consistent Prefix and Eventual

I’ll have to admit: the first time I heard the term and explanation behind “Eventual Consistency”, I laughed.  This is due to the fact that I’ve spent my whole life fighting the good fight to ensure data is consistent.  That’s what transactions are for.  Now fast forward several years and we data professionals understand that some data stories don’t require strict consistency for ever reader of the data.

The key to that statement is reader. For the most part, we still don’t want inconsistent writes.

Consistency in a real world is a continuum from strictly consistent to eventually consistent.  Notice that consistency is still a goal.  But because it’s a continuum, there are many consistency schemes along the way.  I’ve always  struggled a bit with understanding and explaining these levels. 

We need these consistency levels due to the CAP Theorem, which says we can pick two of Consistency, Availability or Partition Tolerance when using distributed systems.  This is mostly due to physics: if I have distributed the same data over multiple locations, I need to give up one of the CAP items to make the system work. 

Let’s take a look at what the Cosmos DB documentation says about consistency levels (feel free to just scan this):

Consistency levels

You can configure a default consistency level on your database account that applies to all collections (and databases) under your Cosmos DB account. By default, all reads and queries issued against the user-defined resources use the default consistency level specified on the database account. You can relax the consistency level of a specific read/query request using in each of the supported APIs. There are five types of consistency levels supported by the Azure Cosmos DB replication protocol that provide a clear trade-off between specific consistency guarantees and performance, as described in this section.

Strong:

  • Strong consistency offers a linearizability guarantee with the reads guaranteed to return the most recent version of an item.
  • Strong consistency guarantees that a write is only visible after it is committed durably by the majority quorum of replicas. A write is either synchronously committed durably by both the primary and the quorum of secondaries, or it is aborted. A read is always acknowledged by the majority read quorum, a client can never see an uncommitted or partial write and is always guaranteed to read the latest acknowledged write.
  • Azure Cosmos DB accounts that are configured to use strong consistency cannot associate more than one Azure region with their Azure Cosmos DB account.
  • The cost of a read operation (in terms of request units consumed) with strong consistency is higher than session and eventual, but the same as bounded staleness.

Bounded staleness:

  • Bounded staleness consistency guarantees that the reads may lag behind writes by at most K versions or prefixes of an item or t time-interval.
  • Therefore, when choosing bounded staleness, the "staleness" can be configured in two ways: number of versions K of the item by which the reads lag behind the writes, and the time interval t
  • Bounded staleness offers total global order except within the "staleness window." The monotonic read guarantees exist within a region both inside and outside the "staleness window."
  • Bounded staleness provides a stronger consistency guarantee than session, consistent-prefix, or eventual consistency. For globally distributed applications, we recommend you use bounded staleness for scenarios where you would like to have strong consistency but also want 99.99% availability and low latency.
  • Azure Cosmos DB accounts that are configured with bounded staleness consistency can associate any number of Azure regions with their Azure Cosmos DB account.
  • The cost of a read operation (in terms of RUs consumed) with bounded staleness is higher than session and eventual consistency, but the same as strong consistency.

Session:

  • Unlike the global consistency models offered by strong and bounded staleness consistency levels, session consistency is scoped to a client session.
  • Session consistency is ideal for all scenarios where a device or user session is involved since it guarantees monotonic reads, monotonic writes, and read your own writes (RYW) guarantees.
  • Session consistency provides predictable consistency for a session, and maximum read throughput while offering the lowest latency writes and reads.
  • Azure Cosmos DB accounts that are configured with session consistency can associate any number of Azure regions with their Azure Cosmos DB account.
  • The cost of a read operation (in terms of RUs consumed) with session consistency level is less than strong and bounded staleness, but more than eventual consistency.

Consistent Prefix:

  • Consistent prefix guarantees that in absence of any further writes, the replicas within the group eventually converge.
  • Consistent prefix guarantees that reads never see out of order writes. If writes were performed in the order A, B, C, then a client sees either A, A,B, or A,B,C, but never out of order like A,C or B,A,C.
  • Azure Cosmos DB accounts that are configured with consistent prefix consistency can associate any number of Azure regions with their Azure Cosmos DB account.

Eventual:

  • Eventual consistency guarantees that in absence of any further writes, the replicas within the group eventually converge.
  • Eventual consistency is the weakest form of consistency where a client may get the values that are older than the ones it had seen before.
  • Eventual consistency provides the weakest read consistency but offers the lowest latency for both reads and writes.
  • Azure Cosmos DB accounts that are configured with eventual consistency can associate any number of Azure regions with their Azure Cosmos DB account.
  • The cost of a read operation (in terms of RUs consumed) with the eventual consistency level is the lowest of all the Azure Cosmos DB consistency levels.

    https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels

It’s clear, isn’t it? No?  I’ll agree that reading text about consistency levels can be difficult to really understand.  in searching for more examples, I found a wonderful write-up that uses animations plus a baseball analogy. In that post,  Michael Whittaker  references the 2013 CACM article Replicated Data Consistency Explained Through Baseball (ACM Subscription required) by Doug Terry, of Microsoft Research.  If you don’t have access to the ACM library (you definitely should, by the way), you can find videos of talks he has given on this topic on the web.

Michael also has a more complex post on Visualizing Linearizability.  This is a topic I want to know more about, but first I have to tackle my challenge of saying Linearizability without stumbling.

It’s Always a Data Modeling Question…

Apr 9, 2018   //   by Karen Lopez   //   Blog, Data Modeling, Database Design  //  No Comments

WhatDoYouMeanByDataModel Question on a Beer Menu

When you have been a data modeler for [redacted] decades, you learn to see the world through data modeler eyes.  Everything seems to be a data modeling question.

I was with a client for lunch one day and we asked the server “What do you have on tap?”  She was gone quite a long time, but came back and said “Beer.”  It turned out she was right.  But her answer was not that helpful.

Why is an Expert Asking Us What a Data Model Is?

One of the odd parts of every new project I have to deal with is getting everyone to understand that the question “Can you help us with a data model” results in me asking. “What do you mean by data model?” That’s right, I have to ask team members what a data model is. You’d think experts would know better.

I have to do this because it seems like everyone has a different definition.  For most DBAs, they want a reverse-engineered image of a production database.  For a business user, data modeling that results in documentation about all the questions, answers and decisions were made. For a developer a specification of something they can build upon. An executive wants a high-level view of the data concepts a specific project will be addressing so she can approve scope and budgets.  A data scientist wants a consolidated view of both the physical data objects available to him and a logical definition of what they are. Finally, a data modeler wants a list of previously modeled entities so that she doesn’t have start from scratch on every project.

It’s likely that every role in the organization wants a different data model with a different set of metadata. It’s also why we need to have a discussion about conceptual, logical and physical data models.   Even that set of terms has differing definitions. That’s nearly unforgivable given that we data modelers preach that we should use consistent definitions. (Note from author: this one bit led to one of the longer threads on LinkedIn that has ever been discussed about one of my posts. Some of the comments are not fit for work.) Then to make this even more complex, we need to discuss the primitives in the Zachman Framework as well.

Simple Tools Don’t Work Well For Complex Data Model Questions

This is why using native database tools aren’t good enough to solve all those needs.  This is why a simple drawing tool isn’t enough.  What an enterprise needs are tools that can author, design, and present all those types of data models without creating duplicate copies of those data concepts. It’s also why a data modeler needs to ask the question: What do *you* mean by data model?  You need to ask your team members what they are expecting before you start working.  You may need to negotiate priorities or formats.  You may need to create separate views of your models. That’s wonderful, though, to deliver what they need from your data models. It’s all good.

It’s not that we modelers don’t know the 100+ possible answers to that question.  It’s that we know there are 100+ answers.

It’s not that we modelers don’t know the 100+ possible answers to that question.  It’s that we know there are 100+ answers. That’s what data modeling is after all: getting to the right answer for this requirement.

Note: This post is an updated version of one posted to community.embarcadero.com in 2015

10 Ways I Can Steal Your Data: eBook

I wrote an eBook sponsored by SolarWinds. I share real life stories of non-traditional, non-hacker ways I can steal your data.  You can download the PDF for free (registration required).

clip_image001

I’ve also been contributing a blog series over on THWACK, 5 MORE Ways I can Steal Your Data, 5 More Ways I Can Steal Your Data: Work for you and Stop Working for You, 5 More Ways I Can Steal Your Data: Accessing Unmonitored Servers and Services, 5 More Ways I Can Steal Your Data: Ask the Security Guard to Help Me Carry it Out.  There’s one more post coming up soon, too.

Data protection from a data architect’s point of view is going to be a big focus of mine over the next year or so.  I’m hoping it will be yours, too.

How Deep is My Non-Love? Nested Dependencies and Overly Complex Design

Dec 4, 2017   //   by Karen Lopez   //   Blog, Data Modeling, Database, Database Design, SQL Server, WTF  //  No Comments

Relational databases have this nifty concept of objects (just things, not code objects) being dependent upon other things.  Sometimes those dependencies exist due to foreign key constraints, others via references to other things.  One example of the latter can be found in VIEWs.  A database VIEW is an object that references TABLEs or other VIEWS.  Of course, if that VIEW references other VIEWs, then that view must reference TABLEs or another VIEW.  And it’s that or another VIEW that can get modelers into trouble.

I reviewed a database design that had massively dependent VIEWs.  How did I know that? I used a proper data modeling tool to look at all the dependencies for one central VIEW.  And this is what my data modeling tool showed me:

Data Model with hundreds of dependencies (lines) between a handful of objects (squares)

That diagram shows how ONE VIEW is related to a whole bunch of other VIEWs and TABLEs in that design.  In reviewing the model, I saw that many of the VIEWs appeared to be duplicates or had very high overlap of content with other VIEWs. 

How do VIEWs Like This Happen?

There are many reasons one would created a nested VIEW.  Like anything in a hierarchy, you could have objects that could be used independently and as part of a group on a regular basis.  But that only explains one level of a VIEW hierarchy (nest).   What about VIEWs that are nested dozens are levels deep?  And why would a database have such a complex design around one VIEW?  These are the most common reasons I run into bad practices with VIEWs:

  • Designers who don’t understand the massive performance loss for massively nested VIEWS
  • Designers who design for theory, not for real world data stories
  • Designers who have no idea they are referencing another VIEW when they design their VIEW
  • Designers who are following a worst practice of creating a VIEW for every report and every window in an application
  • Designers who don’t collaborate with other designers and create their own set of VIEWs and dependencies
  • Designers who are compensated for doing work fast and not well
  • Designers who use DDL to do design, therefore never seeing the complexity of their designs
  • Data Governance policies that let anyone create objects in a database
  • A team environment were “everyone is a generalist”.

I could go on.  While I can’t go into details here, in my review I recommended complete refactoring of this overly complex design.  It is my guess this complexity was contributing to performance problems experienced in this application.  I also recommended that professional designer was used to refactor other issues with the database design.  I have no idea if this happened.  But I doubted that this application was going to meet its large scale web application goals.

Why Am I Sharing This?

Because so many design issues I find in reviews have the same causes for performance and data quality issues I’ve listed above.  I find that not using a real data modeling or design tool is the main contributing factor.  There’s a reason why physical world architects and engineers use drawings and architectural diagrams. Models are also how they make modifications successful to the items they build.

Yes, physical objects are different than software/application/database objects. My position is that these latter objects need models at least as much as buildings and devices do.  We need tools to reverse engineer objects, to view the dependencies, to search, and to assess.  In other words, to model.  Engineering data solutions requires engineering tools like data modeling tools.  And, yes, data engineers to understand how to use those tools and how to model out the unnecessary complexity.

The Key to Keys at the North Texas SQL Server User Group – 17 March

Mar 15, 2016   //   by Karen Lopez   //   Blog, Data Modeling, Database, Database Design, DLBlog, Speaking, SQL Server  //  No Comments

I’m visiting Dallas this week to speak at the North Texas SQL Server User Group this Thursday.  I’ll be speaking about keys: primary keys, surrogate keys, clustered keys, GUIDs, SEQUENCEs, alternate keys…well, there’s a lot to cover about such a simple topic.  The reason I put this presentation together is I see a lot of confusion about these topics. Some of it’s about terminology (“I can’t find anything about alternate keys in SQL Server…what the heck is that, anyway”), some of it is misunderstandings (“what do you mean IDENTITIES aren’t unique! of course they are…they are primary keys!”), some of it is just new (“Why the heck would anyone want to use a SEQUENCE?”).

We’ll be chatting about all these questions and more on Thursday, 17 March at the Microsoft venue in Irving, Texas starting at 6PM.

Attendance is free, but you need to register at http://northtexas.sqlpass.org/ to help organizers plan for the event.

Don’t worry if you don’t know about SQL Server or don’t use it: this presentation will focus on some SQL Server specific features, but the discussion is completely portable to other DBMSs.

So many of us have learned database design approaches from working with one database or data technology. We may have used only one data modeling or development tool. That means our vocabularies around identifiers and keys tend to be product specific. Do you know the difference between a unique index and a unique key? What about the difference between RI, FK and AK? These concepts span data activities and it’s important that your team understand each other and where they, their tools and approaches need to support these features. We’ll look at the generic and proprietary terms for these concepts, as well as where they fit in the database design process. We’ll also look at implementation options in SQL Server and other DBMSs.

Hope to see you there!

Is Logical Data Modeling Dead?

Feb 16, 2016   //   by Karen Lopez   //   Blog, Data Modeling, Data Stewardship, Database Design  //  7 Comments

KeepCalmAndModelOnOne of the most clichéd blogging tricks is to declare something popular as dead.  These click bait, desperate posts are popular among click-focused bloggers, but not for me. Yet here I am, writing an “is dead” post.  Today, this is about sharing my responses on-going social media posts. They go something like this:

OP: No one loves my data models any more.

Responses: Data modeling is dead.  Or…data models aren’t agile.  Or…data models died with the waterfalls. Or…only I know how to do data models and all of you are doing it wrong, which is why they just look dead.

I bet I’ve read that sort of conversation at least a hundred times, first on mailing lists, then on forums, now on social media.  It has been an ongoing battle for modelers since data models and dirt were discovered…invented…developed.

I think our issues around the love for data modeling, and logical data models specifically, is that we try to make these different types of models be different tasks.  They aren’t.  In fact, there are many types, many goals, and many points of view about data modeling.  So as good modelers, we should first seek to understand what everyone in the discussion means by that term.  And what do you know, even this fact is contentious.  More on that in another post.

I do logical data modeling when I’m physical modeling.  I don’t draw a whole lot of attention to it – it’s just how modeling is done on my projects.

Data Modeling is Dead Discussion

One current example of this discussion is taking place right now over on LinkedIn. Abhilash Gandhi posted:

During one of my project, when I raised some red flags for not having Logical Data Model, I was bombarded with comments – “Why do we need LDM”? “Are you kidding”? “What a waste of time!". The project was Data Warehouse with number of subject areas; possibility of number of data marts.

and

I have put myself into trouble by trying to enforce best practices for Data Modeling, Data Definitions, Naming Standards, etc. My question, am I asking or trying to do what may be obsolete or not necessary? Appreciate your comments.

There are responses that primarily back up the original poster’s feelings of being unneeded on modern development projects.  Then I added another view point:

I’ll play Devil’s advocate here and say that we Data Architects have also lost touch with the primary way the products of our data modeling efforts will be used. There are indeed all kinds of uses, but producing physical models is the next step in most. And we have lost the physical skills to work on the physical side. Because we let this happen, we also have failed to make physical models useful for teams who need them.

We just keep telling the builders how much they should love our logical models, but have failed to make the results of logical modeling useful to them.

I’ve talked about this in many of my presentations, webinars (sorry about the autoplay, it’s a sin, I know)  and data modeling blog posts. It’s difficult to keep up with what’s happening in the modern data platform world.  So most of us just haven’t.  It’s not that we need to be DBAs or developers.  We should, though, have a literacy level of the features and approaches to implementing our data models for production use.  Why? I addressed that as well.  Below is an edited version of my response:

We Don’t All Have to Love Logical Data Modeling

First of all, the majority of IT professionals do not need to love an LDM. They don’t even need to need them. The focus of the LDM is the business steward/owner (and if i had my way, the customer, too). But we’ve screwed up how we think of data models as artefacts that are "something done on an IT project".  Sure, that’s how almost all funding gets done for modeling, and it’s broken. But it’s also the fact of life for the relatively immature world of data modeling.

We literally beat developers and project managers with our logical data modeling, then ask them “why don’t you want us to produce data models?” We use extortion to get our beautiful logical data models done, then sit back an wonder why everyone sits at another lunch table. 

I don’t waste time or resources trying to get devs, DBAs or network admins to love the LDMs. When was the last time you loved the enterprise-wide AD architecture? The network topology? The data centre blueprints and HVAC diagrams?

Data Models form the infrastructure of the data architecture, as do conceptual models and all the models made that would fill the upper rows of the Zachman Framework. We don’t force the HVAC guys to wait to plan out their systems until a single IT application project comes along to fund that work. We do it when we need a full plan for a data centre. Or a network. Or a security framework.

But here we are, trying to whip together an application with no models. So we tell everyone to stop everything while we build an LDM. That’s what’s killing us.  Yes, we need to do it. But we don’t have to do it in a complete waterfall method.  I tell people I’m doing a data model. then I work on both an LDM and the PDM at the same time. The LDM I use to drive data requirements from business owners, the PDM to start to make it actually work in the target infrastructure. Yes, I LDM more at first, but I’m still doing both at the same time. Yes, the PDM looks an awful lot like the LDM at first.

Stop Yelling at the Clouds

The real risks we take is sounding like old men yelling at the clouds when we insist on working and talking like it is 1980 all over again.  I do iterative data modeling. I’m agile. I know it’s more work for me. I’d love to have the luxury of spending six months embedded with the end users coming up with a perfect and lovely logical data model. But that’s not the project I’ve been assigned to. It’s not the team I’m on. To work against the team is a demand that no data modeling be done and that database and data integration be done by non-data professionals. You can stand on your side of the cubicle wall, screaming about how LDMs are more important, or you can work with the data-driving modeling skills you have to make it work.

Are Your Data Models Agile or Fragile: Sprints
When I’m modeling, I’m working with the business team drawing out more clarity of their business rules and requirements. I am on #TeamData and #TeamBusiness. When the business sees you representing their interests, often to a hostile third party implementer, they will move mountains for you. This is the secret to getting CDMs, LDMs, and PDMs done on modern development projects. Just do them as part of your toolkit.  I would prefer to data model completely separately from everyone else. I don’t see that happening on most projects.

The #TeamData Sweet Spot

My sweet spot is to get to the point where the DBAs, Devs, QA analysts and Project Managers are saying "hey, do you have those database printouts ready to go with DDL we just delivered? And do you have the user ones, as well?" I don’t care what they call them. I just want them to call them.  At that point, I know I’m also on #TeamIT.

The key to getting people to at least appreciate logical data models is to just do them as part of whatever modeling effort you are working on.  Don’t say “stop”.  Just model on.  Demonstrate, don’t tell your teams where the business requirements are written down, where they live.  Then demonstrate how that leads to beautiful physical models as well. 

Logical Data Modeling isn’t dead.  But we modelers need to stop treating it like it’s a weapon. Long Live Logical!

 

Thanks to Jeff Smith (@thatjeffsmith | blog ) for pointing out the original post.

Pages:123456»

Subscribe via E-mail

Use the link below to receive posts via e-mail. Unsubscribe at any time. Subscribe to www.datamodel.com by Email


Categories

Archive

UA-52726617-1