What are Conceptual, Logical and Physical Data Models?

Like all good data architects, I want to define the terms I use on this blog, speaking engagements, and on my projects.  And like most data professionals, I’ve discovered that the industry has multiple, often conflicting, definitions of data modeling terms, which I find sadly ironic.  My friend Graeme Simsion has even done research in this naming conflict.

My uses of conceptual, logical, and physical come from the Information Engineering (IE) methods of data modeling.  Other uses and definitions arise from the database schema and academic world. Neither is wrong; it’s just that they are different.   The industry as a whole tends to use the IE definitions, so I tend to stick to them because they are used by the vast majority of practitioner data modelers and other team members.

Conceptual Data Model

A conceptual data model is a summary-level data model that is most often used on strategic data projects.  It typically describes an entire enterprise.  Due to its highly abstract nature, it may be referred to as a conceptual model.

Common characteristics of a conceptual data model:

  • Enterprise-wide coverage of the business concepts.  Think Customer, Product, Store, Location, Asset.
  • Designed and developed primarily for a business audience
  • Contains around 20-50 entities (or concepts) with no or extremely limited number of attributes described. Sometimes architects try to limit it to printing on one page.
  • Contains relationships between entities, but may or may not include cardinality and nullability.
  • Entities will have definitions.
  • Designed and developed to be independent of DBMS, data storage locations or technologies.  In fact, it would address digital and non-digital concepts. This means it would model paper records and artifacts as well as database artifacts.

Logical Data Model

A logical data model is a fully-attributed data model that is independent of DBMS, technology, data storage or organizational constraints.  It typically describes data requirements from the business point of view.  While common data modeling techniques use a relational model notation, there is no requirement that resulting data implementations must be created using relational technologies.

Common characteristics of a logical data model:

  • Typically describes data requirements for a single project or major subject area.
  • May be integrated with other logical data models via a repository of shared entities
  • Typically contains 100-1000 entities, although these numbers are highly variable depending on the scope of the data model.
  • Contains relationships between entities that address cardinality and nullability (optionality) of the relationships.
  • Designed and developed to be independent of DBMS, data storage locations or technologies.  In fact, it may address digital and non-digital concepts.
  • Data attributes will typically have datatypes with precisions and lengths assigned.
  • Data attributes will have nullability (optionality) assigned.
  • Entities and attributes will have definitions.
  • All kinds of other meta data may be included (retention rules, privacy indicators,  volumetrics, data lineage, etc.) In fact, the diagram of a logical data model may show only a tiny percentage of the meta data contained within the model.

A logical data model will normally be derived from and or linked back to objects in a conceptual data model.

Physical Data Model

A physical data model is a fully-attributed data model that is dependent upon a specific version of a data persistence technology.  The target implementation technology may be a relational DBMS, an XML document, a NoSQL data storage component, a spreadsheet or any other data implementation option.

Common characteristics of a physical data model:

  • Typically describes data requirements for a single project or application. Sometimes even a portion of an application.
  • May be integrated with other physical data models via a repository of shared entities
  • Typically contains 10-1000 tables, although these numbers are highly variable depending on the scope of the data model.
  • Contains relationships between tables that address cardinality and nullability (optionality) of the relationships.
  • Designed and developed to be dependent on a specific version of a DBMS, data storage location or technology.
  • Columns will  have datatypes with precisions and lengths assigned.
  • Columns will have nullability (optionality) assigned.
  • Tables and columns will have definitions.
  • Will also include other physical objects such as views, primary key constraints, foreign key constraints, indexes, security roles, store procedures, XML extensions, file stores, etc.
  • The diagram of a physical data model may show only a tiny percentage of the meta data contained within the model.

7 Comments

  • Karen, I just want to say thank you, I’ve started Lecturing and had conversations about different types of ERD and this post has just made it all make sense. Thank you.

  • Karen,

    Thank you. Awesome post about Data Modeling.

  • I’m not sure about “Data attributes will typically have datatypes with precisions and lengths assigned” in logical data model. Defining datatype is a pretty concrete decision that conflicts the statement “model that is independent of DBMS, technology, data storage or organizational constraints”. IMHO datatypes should be defined only in physical data model.

    • I can see what you are saying. But some modeling tools let you specify high level datatypes. Like “text, percentage, decimal, integer” without specifying the exact DBMS datatype like “nvarchar” or “bigint”. Some datatypes it makes sense to capture length or precision at requirements time because these are important things. For instance, e-mail addresses should be at least 254 characters wide because that’s the Internet standard for e-mail addresses. Others include specifying that CUSTOMER NUMBER is actually a character field just like VEHICLE IDENTIFICATION NUMBER and SOCIAL INSURANCE NUMBER are character data items.

      But we are overly bound by what our database and data modeling tools will allow us to do.

      Some datatypes are business and legal requirements. Requirements time is an appropriate time to collect and specify that data in those cases. Everything else could be the generic datatypes.

      • Unfortunately, most modeling tools cannot even draw logical data model. Instead, they use UML class notation and class attributes to represent logical data model and that is the main reason why is it possible to specify data-types in such modeling tools. But again, that model is more of a class diagram then a logical data model.

        I could agree with specifying high level (generic) data-types in logical data model as attribute descriptors, but not with a “datatypes with precisions and lengths assigned” you like specified. Quoted is a concrete technical specification that is used in physical data model upon implementation of table columns and unless it is specified as “generic” or “high-level” (meaning, not concrete or RDBMS dependent) it gets confusing, like in this case.

        The thing is you copy-pasted the same sentence both in description of logical and physical data model, but with logical data model we truly talk about high-level (generic) data-types while in physical data model it is a concrete RDBMS dependent type.

        • I’m confused by your first statement. The top 2 data modeling tools support ERD data modeling. The third does support UML, but it’s mostly a notation thing, not a UML tool. Also, I believe it’s very rare for UML modelers to prepare a logical data model, ever.

          I explained why I think one working at the requirements phase should capture requirements-level lengths and precisions. I gave examples. The examples I gave were certainly not DBMS specific. I’m sorry you are unhappy with my efficiency. I stand by the statements. I didn’t say logical models MUST have lengths and precisions. My experience shows that when business people give you an important requirement for data, you capture it. In your model. And we must capture it at the place it belongs. A business-driven datatype or length goes in the logical model, so that every physical model derived from it gets that requirement. If you wait until physical modeling time, it’s likely to be forgotten, not done, or done differently in every physical model. Don’t do that. Capture all requirements where the requirements model is. That’s the logical model.

          • Data length is not in question here since it can also be used to describe generic data-type. What I find a bit unclear is the term “datatype” since in the article it is not specified as generic or RDBMS dependent. Personally, in the description of logical data model I would write like this:

            “Data attributes will typically have generic datatypes with precisions and lengths assigned.”

            while in physical model description:

            “Columns will have RDBMS dependent datatypes with precisions and lengths assigned.”

            This clearly describes the term “datatype” in each of the models, so I hope you understand what I was going for. Still, a great article! I was not trying to search for a “needle in a haystack” but to point out the possible confusion due to similarity of terms.

            Thank you!

Leave a comment

Subscribe via E-mail

Use the link below to receive posts via e-mail. Unsubscribe at any time. Subscribe to www.datamodel.com by Email


Categories

Archive

UA-52726617-1