I get asked to help teams increase the performance of their database (hint: indexes, query tuning and correct datatypes, in that order) or to help the scale it out for increasing workloads. But when I open it up to take a look, I see something that looks more like this meme.
All those cheats, workarounds and tricks they’ve used are going to make the engine optimizers work harder, make the tuning of queries that much harder and in the end it’s going to cost so much more to make it “go faster” or “go web scale”.
Where are the nail clippers in your data models and databases?
I’m excited to announce that Thomas LaRock (@sqlrockstar | blog) and I will be presenting a full-day Training Day (PreCon) at SQLBIts XIV in London, UK.
Our session, Designing For Performance: Myths and Misunderstandings, is going to feature hands-on labs, exercises and lots of challenges to help you master your own SQL Server superpowers. It will be held on Thursday, 5 March 2015. Registration is open NOW.
Everyone agrees that great database performance starts with great database design. So why do so many poorly designed databases exist in the world? Attend this session to understand why bad designs will always exist, what you can do to avoid them, and how best to work with them when needed.
Discussion topics will include:
- Server/Infrastructure design
- VM/Server configuration
- Physical file layouts
- HA/DR options
- Database/Table design
- Table design
- PK/FK choices
- Index strategies
- Monitoring for performance
- Control reports
Attendees will leave this session with an understanding of the following:
- Why common issues are so common
- How to better anticipate issues before they happen
- How to deploy and implement design choices that benefit everyone
- Proper performance benchmarking and control reports
This will be my 3rd SQLBits conference, They are a lot of fun and jam-packed full of learning and networking. As you can see from the video above, Tom and I will also ensure that you aren’t just sitting through 8 hours of bullet points and sparse slides with funny pictures. We’ll be talking about what design approaches work in what situations and all the myths and misunderstandings out there about database design and configurations.
Register now and engage your own superpowers. Or just stand there looking pretty. It’s up to you.
Today is the general availability release date for the newest version of SQL Server, aptly named SQL Server 2014. I’m excited about many of the new features being rolled out today, but the ones that will impact data architects, modelers and database designers are the new datatypes that will be introduced. But first, for those of you who have their heads stuck in the deep piping and spit-managing of databases, some background about datatypes:
A datatype is a categorization of data items, based on the range and types of data that it can contain and a set of actions that can be validly taken against that data.
As such, applying a datatype to a column in a database makes it work as another type of constraint. A tinyint column can’t hold my Starbucks name (Kitty) because it constrains the values to integers and only a subset of all integers, for example.
The number and type of datatypes (yes, I’m being meta there) varies depending on the strength and quality of the tequila the DBMS product management teams were drinking at their last
Vegas Blow Out team building retreat, as called for in the ISO Standards for databases, AKA
ISO/IEC JTC 1/SC 32 – Data management and interchange.
One of the things that developers and DBAs will tell you is that choosing the right datatype is important for performance reasons. And by that, they mean the smallest datatype you can fit most of the data in. And maybe a bit smaller. Soooo much bad info out there, I know. When Knowledge Conquers Fear, we can love our data. Thank the Cosmos you have me here to help you out.
What’s new in SQL Server 2014: A New Datatype
This new datatype is exciting for me as a data & space enthusiast. The new feature finally allows modern database designers to properly specify the constraints for tracking time and location data in the same column. Yes, this means that your developers and DBAs no longer have to use comma-delimited values in their relational database designs when they need to track how much time and personal space they need to get up to speed on professional database design. And it’s big enough to store that many man-hours. Yeah. I said that.
BTW, it seems that Stack Overflow is *the* place to find info on how to implement comma-delimited values in database columns. Kids, don’t get your database design knowledge from random forums on the Internet.
Anyway, back to the news!
The new feature makes so much sense with Microsoft’s push to the Cloud, it’s embracing of NoSQL technologies and all. It’s AWESOME.
Defines a time and location in a universe.
SQL Server 2014
spacetime [(fractional seconds precision)], (universe, 5DGeometry)
CREATE TABLE Table1 ( Column1 spacetime (1000, 2014.12.0.2000.8
–∞ to +∞ and beyond
[you’ll need a 5D monitor to view this range.]
Timezone offset range
Thank Venus, no, nope, never. We are scientists here. Use Multiuniversal Universal Time Coordinates (UTMC).
Daylight saving aware
Oh, for Carl’s sake. Do you really think something like spacetime needs to be sullied by DST?
If you have to ask, you don’t ever need to use this datatype. Seriously.
+/- 10 Plancks. Depending on how far your server is from the Sun. Earth’s Sun, that is.
Microsoft Azure DB Support
|Yes, of course. But only in Premium plans and higher.|
Special Considerations and Gotchas
Some gotchas with this new datatype:
- Due to the highly multi-dimensional, multiuniversal nature of this datatype, there isn’t any backwards compatibility. Unless, of course, you can fold spacetime and go back and change earlier versions of SQL Server. But if you could do that, you wouldn’t be reading my blog, would you?
- Just like the confusion over timestamps, you can’t really treat this like a date or time datatype. It’s special. And spatial.
- This means you can’t convert it to date, time, datetime, timestamp or spatial datatypes, either.
- The 5D geometry thing is way too complex to explain in a single blog post. But for those of you that managed to stick it out through some college level math, it involves parsecs (the correct usage of the term) and the double declining balance method of space depreciation. In this first rollout of spacetime, the geometry completely ignores most OctoDeca Bands. Except for Miller tracks.
- You can’t use normal date and geometrical math on data in the columns. You can bend or fold the values, but since space has no center, and time has no beginning or end, spacetime has no beginning or end. It is infinite. So the usual infinity rules apply.
- This datatype is only available via O365, but that makes sense since as announced today, SQL Server 2014 is also only available via O365 subscriptions.
- This datatype is only available at O365 plans at U3 and higher. Wait, I don’t think I should have said anything about the new Universe O365 plans. Forget I said anything. That’s probably not going to be a rule in our universe. Seriously. No NDA broken. I think.
Some of this post may have been inspired by some bad veggie April Fish (poisson d’avril) I had last night. If you want to get some real information about the new features of SQL Server 2014, you probably shouldn’t read random blogs on the internet on launch day. Especially when it’s 1 April.
Did you catch all the special references in this post? Let me know.
A developer, Justin Reese, just shared his own story about using the wrong datatype for phone numbers, over at DailyWTF. He did this in an XML document, but I see the same mistake being made over and over again in data models and database designs. In fact, this is a key part of my Data Modeling / Database Design Blunders presentation.
My client reported that was a strange bug on a certain page in an app I built for them. Where the contact information for a series of offices was being displayed, all the information was correct except for one piece: the phone number. For multiple locations, the phone number displayed was the same: 214-748-3647.
I love reading about his quest to track this problem down and what the issue turned out to be. I also love that he wrote a DailyWTF about himself. We all should be doing that: sharing our mistakes so that others can learn from them. I call that "Free Advice That’s Paid For" in my blog posts.
In my presentation my first blunder is using numeric datatypes for data values that aren’t actually numbers. Telephone numbers are one of them. They may have leading zeros. We don’t do math on them, usually. ZIPCodes are another example. Store them as INTEGER and you’ll lose leading zeros. And many Postal Codes have letters. Think you have only US customers? You might. But customers, people who may owe you money, have a way of moving around. Of course, every design decision comes down to cost, benefit and risk. So some designs may make a good case for using numeric datatypes for storing values that aren’t actually numbers. But all the protections for data quality and correct retrieval need to be designed in, too. That’s the trade-off. Also in my presentation I give a rule of thumb:
If business users call it a number, it ain’t a number.
Customer Number. Account Number. Vehicle Identification Number. Social Insurance Number. Social Security Number (yeah, it’s all numbers now, but nothing would stop the powers that be from changing that). This is especially true for numbers that are managed by people outside your organization. You just don’t know when they might decide to add letter or special characters.
I get feedback from at least one person at each presentation that my blunders are way too obvious or that they aren’t serious mistakes. As much as I see poor or inaccurate datatype selection, I have to politely disagree. These are the number one mistakes I see. They compromise data quality, lead to tragic data errors, even. Storing numbers that in fact aren’t numbers as INTEGERS or other numeric datatypes is error prone, leads to nasty slow queries due to all the casting and table scans that may happen. Eventually, those incorrect data values are going to come looking for you. Usually after work hours, in production. If you’ve never seen them in the wild, then either you don’t get out enough of you’ve been blessed by working with highly competent data modelers and database designers. And we all know how rare those are.
Database Design Contentious Issues
A highly interactive and popular session where attendees evaluate the options and best practices of common and advanced design issues, such as:
* Natural vs. Surrogate keys
* Classwords and other Naming Standards
* Varchar Minimums
* Who Calls the Shots and Who Does What?
…and others. Bring your votes, your debates, and your opinions.
Session Level: Intermediate
I so love giving this presentation because it is driven by audience members. Not a lecture by me, but a moderated, sometimes referred debate about issues in database design and data modeling.
But I’m betting you data architects are thinking "Why would I attend an event about SQL Server? I’m DBMS-agnostic." I think it’s great that we modelers and designers have skills that cross multiple vendor products. But I sometimes wonder if some of us pride our agnosticism so much that we have actually become DBMS-illiterates. I know that most of us don’t work hands on with databases as often and DBAs and Developers, but it is important that we understand and have a firm foundation on the platforms upon which our designs will be built.
SQL Saturdays are free events hosted by other data professionals and sponsored by vendors in the data community (that’s what keeps them free). Sometimes there is a small charge for lunch, but that’s it. For the Portland event, it appears that even lunch is free. You must pre-register, but it’s free. Did I mention it’s free?
So you can come to my session but what about the others? Here are ones that I’ve picked out the schedule that would be of value to those of us wearing the modeling/design hat on a project:
Does your company use agile development? It can deliver more value to customers with lower project risk. However, it can also make the system design change rapidly, and require frequent software rollouts. This session will focus on best practices for DBAs and developers to make design, testing and deployments easier. Keep your systems agile, not fragile!
Session Level: Intermediate
This session will describe the best practises for designing a data warehouse to get the most out of SQL Server. Doug has worked in data warehousing for 12 years and will blend experience, with best practises and recommendations from Microsoft’s Fast Track program. Each version of SQL Server introduces new features specifically for data warehousing – by applying the correct technique, feature, hint, modelling approach and layout the data warehouse will be faster and more scalable.
Session Level: Advanced
Dates and times seem simple at first. Kendra Little will show you there’s more to it than you think. She’ll give you five best practices that will help you select the right temporal data type, avoid common issues, and use the most effective techniques to aggregate data. She’ll also explain painful problems with query performance and how to avoid them. Choose wisely: the correct types and high performing data access logic will scale well and save development and administrative time.
Session Level: Intermediate
Self-promotion is often times the best promotion you can get. In this session, we will talk about how to promote yourself, your brand and your career without looking like “That Guy”. We will discuss Social Medias, communities, volunteering and other ways to get your name out… What are the first steps? Come find out.
I’m sure you’ve been told seeks are better than scans. I’m sure you’ve been told that a covering index is ideal. I’m sure you’ve been told small arrows are better than thick ones. Get the whole story.
Session Level: Intermediate
Microsoft offers three distinct platforms for data analysis and a variety of related reporting tools. When should you use BI Semantic Models, PowerPivot, tabular column storage, SSAS cubes or relational data marts? Learn about the right fit for each of these choices and what you need to know to use the next generation of BI reporting tools like Project Crescent, SSRS and SharePoint BI.
With the widespread availability of location and spatial data to both consumers and corporations (such as smartphone GPS data), there is a need to manage and analyze all this data as well. SQL Server 2008 introduces new standards-based spatial data types and associated functionality to the relational engine. Spatial data can be stored in the cloud using SQL Azure. And SQL Server 2008 R2 Reporting Services allows spatial data to be visualized as Maps. In this session, we’ll explore both the SQL spatial data types and SSRS maps, using demos to show this functionality in action. We’ll also cover enhancements to spatial functionality in the forthcoming "Denali" version of SQL Server.
Why these sessions? I believe that even if we aren’t responsible for finalizing a physical data model prior to implementation, it’s still a responsibility of ours to understand the above concepts so that we can work with models that include these design-time decisions. We may not be responsible for choosing all the indexes, but it’s important that our models have them. We need to understand the trade-offs around datatype choices, data warehouse architectures and newer DBMS features such as spatial datatypes, XML columns and others.
The line of responsibility between DBA, developer, and DA is constantly moving and may vary based on your project’s environment and culture. We must understand more about the target environments we are modeling for.
I hope to see you at the Portland SQL Saturday. The SQL community is great at sharing knowledge and we data architects need to be part of that sharing. It’s free, there will be prizes, and it’s fun. Be there.
Dr. Danielle Ofri has a frightening post about how data design can cause serious consequences. She talks about running up against a 1000-character limit when trying to create a patient history for someone about to undergo a risky surgery:
I panic for a moment, fearful that the computer has frozen and that I’ve lost all my work — something that happens all too frequently. But I soon realize that this is not the case. Instead, I’ve come up against a word limit.
It turns out that in our electronic medical record system there is a 1,000-character maximum in the “assessment” field. While I’ve been typing, the character number has been counting backward from 1,000, and now I’ve hit zero. The computer will not permit me to say anything more about my patient.
If you’d done any database design, you know that even if you design a good, business-driven design, others who use the database might apply their own rules to the data on the way in or out of the database.
I remember designing a Point of Sale system database for an appliance retailer. Our data model needed to support the selling of high-end appliances as well as bulk purchases for high-end appliances. So our transaction amount columns were significantly large. A developer on the application team thought our monetary field lengths were insanely large, so he enforced a limit on a transaction of $9,999.99 for the total transaction. To make matters worse, this system went all the way into production with that limit. So on day one of the roll out, sales people couldn’t sell the most highest margin items such as a professional quality stove ($14,000) or sell to developers who were buying 100 low end stoves in one transaction. Their orders had to be chunked up into tiny $9,000 mini-transactions. Order completion was taking hours instead of minutes. Credit cards were being rejected for too many large amount transactions back to back. In other words, a nightmare deployment because organizations trying to spend tens of thousands of dollars were walking out and going to other retailers with “real systems” to make their purchase.
However, no lives were being lost (that I know of). Some people may have gone longer without heat (furnaces) or with rotten food (freezers and fridges), but in the overall scheme of things the impact on customers was not life and death consequences.
If we get back to Dr. Ofri’s situation, though, she was faced with a terrible data dilemma: how to describe a complex patient history in 1000 characters. This number probably sounded like a huge number when the project team was adding that “feature” to the system. I’d even bet their own test data only went as high as 200 characters or so.
I’m also guessing that since this system is a fairly high-risk project that some expert user (or many) was responsible for approving the design of field lengths. Perhaps he or she also thought that 1000 characters was enough.
In desperation, I call the help desk and voice my concerns. “Well, we can’t have the doctors rambling on forever,” the tech replies.
That response from IT (even if it a help desk tech who has no clue as to why there is a limit) makes me mad and afraid at the same time. You’ve all heard it in your design reviews, haven’t you?
- That’s way too long of a field.
- It won’t fit on one window without scrolling
- No one has an e-mail address THAT long
- You are over modeling it
- That’s ridiculous. No one is going to sit there and type that long
- 255 was good enough for the last 10 years, it’s good enough for the next 10 years
- The indexes on that column will be too large
- Go ahead and make the column that long; we’ll just truncate it in the application
The business users are frightened by the negative comments and agree that 25 is sufficient for e-mail address, not even realizing that some of their own e-mail addresses are longer than that.
As I blogged recently about Over Modeling, it’s only over modeling if it doesn’t meet the business need. Sure, we might make concessions to technical feasibility (“make every column 2000 characters, just in case”), but our designs should be business driven.
Dr. Ofri ends her story by saying:
I’ve finally condensed my patient’s complicated medical conditions to exactly 1,000 characters. I quickly hit “save” before I lose everything. I wish him good luck on his operation, wondering if his surgeons will have to condense the entire operative report to 1,000 characters as well. What happens if there are complications?
For my next medical evaluation, I think I will use haiku.
I don’t know about you, but I wouldn’t want to read that about my own patient record.
Next up: What could we have done differently on our project
Subscribe via E-mail
- September 2016
- August 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- September 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- September 2010
- August 2010
- July 2010
- February 2009