I prefer to think of it is trying to decipher the stories data is already saying. So more listening, less torture.
I’ve been attending Enterprise Data World for more than 15 years. This event, focused on data architectures, data management, data modeling data governance and other great enterprise-class methods is part technical training and part revival for data professionals. It’s just that good.
This year the big bash is being held in Austin, TX, a thriving tech-oriented community, 27-April to 1 May. And this year’s theme is “The Transformation to Data-Driven Business Starts Here.”
And right now there’s a $200 Early Bird Discount going…plus if you use coupon code “DATACHICK” you can save $200 more on a multi-day registration or fifty bucks on a one day pass. There. I just saved you $400. And no, I get no kickbacks with this discount code. I don’t need them. I need you to be at this event, sharing your knowledge and meeting other data professionals. I need you to be part of the community of data professionals.
Top 10 Reasons You Need to Go to EDW 2014
- Data is HOT HOT HOT. I deemed 2013 The Year of Data and I see no signs that organizations are going to back to software-is-everything thinking. 2014 is still going to be a year full of data. There’s even an executive, invitation-only CDOvision even co-located.
- Not Just Bullet Points. There are over 20 hours of scheduled networking events for you to chat with other data-curious people. Chatting with other data professionals is my favourite part of this event. Bring your business cards…er… .vcs contact file.
- Lots of Expertise. Not just data celebrities, but also other data professionals with thousands of hours of hands-on experiences, sharing their use cases around data. And not just data modeling. Big Data. Analytics. Methods. Tools. Open Data. Governance. NoSQL. SQL. RDBMS. Fun.
- Certifications. You can take advantage of the Pay-Only-If-You-Pass option for the CDMP on-site certification testing.
- Workshops. I’m doing a half day tutorial on Driving Development Projects with Enterprise Data Models. I’ll be talking about how data models fit within real-life, practical, get-stuff-done development projects. No ivory towers here.
- SIGs. There are special interest groups on data modeling products, industries and methods. You can meet people just like you an share your tips and tricks for data lovin. I will be leading the ER/Studio SIG.
- Ice Cream. This conference has a tradition of the ice cream break on the exhibit floor. Nice ice cream, even.
- Austin. Austin is one of the more vibrant cities in Texas. So cool, it even has a Stevie Ray Vaughan statue. Museums, Theatres, indoor golf, clubs. There’s a reason why SxSW is held here.
- Vendors. Yes, we love them, too. Meet the product teams of the makers of the tools you use every day. Or meet new teams and ask for a demo. They are good people.
- Love Your Data. There’s no better way to show your love than to network with other data professionals and learn from industry leaders.
Come learn how to help your organization love data better. You might even see me in a lightning talk holding a martini. Or taking impromptu pics of @data_model and other data professionals. Or debating data management strategy with people from around the globe. In other words, talking data. With people who love their data. Join us.
Update: It appears that this chart and other data visualizations have been removed from the website and report. I’m hoping that means that the authors will be refactoring them with improved graphics. Meanwhile, I’m going to leave my post below as is. There are good lessons and tips to be shared.
I know. I hear you. It’s still January and we might just have a winner, one that will be impossible to beat during the next 12 months. Incredible. As you may recall, in late 2011 I awarded Stupidest Bar Chart to a doozy from Klout. That bar chart was confusing, but not in the way this one is. First, put down your beverage of choice. Then take a look at this:
Yeah. That…chart. It’s kind of like a horizontal stacked bar chart. I don’t understand anything about it, though. This chart comes from an infographic at Deloitte.com on Analysis Trends for 2014.
Maybe zooming in might help?
Nope, doesn’t make it any clearer. In fact, it’s just as crazy, but bigger. Call it Big Crazy DataTM.
Here are the issues and questions I have about it:
- What do the colours mean? If this were a stacked bar chart, the grey and blue areas would indicate different data. It appears that only some sections have data. But I’m not sure.
- What is the scale? Normally a bar chart would have an axis that indicates some measure and all the bars would be graphed against that axis. This has no axis.
- Why do some bars have signed numbers and one have a range? Why are some numbers unsigned? Even some delta numbers are unsigned.
- What do the relative sizes of the sections mean? In one bar we see a blue section labeled 285, but it’s larger than a section labeled 425-475.
- Where numbers appear, do they describe the section they are on or the section next to the number? I’m not sure
- What does the relative position of the blue section mean? I’m not sure.
- Why are some of the labels in light grey and some in dark grey? I’m not sure
- What are the units of measurement for these numbers? Are some percentages? Units of 1000s? 100,000s? Are they of people? Positions? Something else? I’m not sure.
- Do the endnotes there explain the numbers? No, they are just citations for reference materials used to create the report.
Maybe the chart has an explanation inside the full document, Analytics Trends 2014: (And why some may not materialize)… No, same chart, no text that directly explains any of the numbers. To add some irony to this, the report itself is about Analytics and even covers trends in visualizations.
A Picture is Worth A Thousand Words, Unfortunately.
The report has something to say about data visualizations used in data analytics:
There’s no question that visualization has become a critical capability for organizations of virtually every shape and size. Easy-to-use software makes complex data accessible and understandable for almost any business user. From discovery and visual exploration to pattern and relationship identification, today’s visualization tools easily affirm the adage that a picture is worth a thousand words. Or, in this case, numbers.
This is especially true with big data, where visualization may even be a necessary capability for driving insights. That’s why visually oriented tools are rising in prominence for many big data applications. Users get to understand, explore, share, and apply data efficiently and collaboratively—often without the need for analytics professionals. And that’s where the risk comes in. In their eagerness to dive into data, users may choose polished graphics over thorough data preparation and normalization and rigorous analysis—glossing over important insights and analysis opportunities and potentially producing erroneous results. [emphasis mine]
Keep reading the report from that section. The irony burns.
What’s Going on with this Bar Chart?
I’d bet that the Analytics professionals at Deloitte know much better than this. The webpage and report for Analytics trends is beautiful to look at. I’m guessing that a graphics designer has taken these numbers and created a beautiful, yet meaningless graphic with numbers. And just as the report predicts, people who don’t understand how to best use visualizations can gloss over important insights and analysis opportunities and potentially produce erroneous results. This report has some great points. And it’s pretty. Very, very pretty. But the distraction of bad visualizations makes difficult for me to actually see the points the authors are trying to make.
My guess is also that this data, as a set, had no business being put together in one chart. I’m not sure, but they don’t seem to have the same measures or even be the same type of data. So putting them in one chart won’t help. This was a page in a report needing a graphic, so someone made one.
Jamie Calder ( @jamiecalder) helped me “see” the story this chart is trying to tell: think of it as a math equation. That might get you there. But it’s still not an appropriate use of a bar chart. And Josh Fennessy ( @joshuafennessy) has pointed out that this isn’t supposed to be a bar chart at all. It’s supposed to be a waterfall chart. But it’s dressed up as a bar chart, so I’m going to still leave as a contender for Worst Bar Chart of 2014. Let’s just call it a self-nominated chart. Martin Ribunal has found what is most likely the original chart from which this chart was most likely
copied inspired by and has listed that in comments below.
What Have We Learned About Data Visualizations?
- The best data analysis can be invalidated with bad data visualizations.
- If you develop content, insist that you say in the final published work. I know this is difficult in large corporate entities, but it’s important to ensuring that your goals are met.
- The more accessible we make self-serve BI and data visualization tools available, the more responsibility we have to educate, train, and mentor those using these tools.
- Show your visualizations to other people. Ask them what they see. Ask them if they are confused, what conclusions they might have and what questions they still have.
- Choose the right chart type to fit your data. Then use that chart correctly.
- If you needs a graphic image, don’t mimic a recognized chart type.
- If you add a chart to a document, you should actual reference it in the text in the way that helps the reader understand it.
- If your chart has numbers, you have to say what those are number of, including some sort of unit of measure. And your graphics should correctly portray their relative size.
- If a chart leaves viewers saying “I’m not sure” more than once, it’s not working.
- Loving your data means loving how it is presented, too.
What Would You Ask?
What other questions do you have about this…graphic.? How would you improve it?
I can’t bring myself to call it a bar chart any more. But it’s still dressed as a bar chart, so it fits the nomination category. If you find a bar chart or any other data visualization to nominate, let me know. I wouldn’t want something worse than this one to go unrecognized.
This legacy database system was used throughout Cologne during my recent visit. Do you know how to read it?
I also wonder how far back it goes… And whether technology will eventually make it obsolete?
On 26 September 1983, Stanislav Petrov took a stand against what his systems were telling him and he may have changed the course of history. Petrov was working as a duty officer at the command center for the Oko nuclear early warning system. This is the place where the Soviets monitored incoming attacks, much like the US command center you remember from War Games. Earlier that month, the Soviet Union shot down a Korean commercial jetliner over the Sea of Japan, claiming that it was on a spy mission. 269 people died in that incident, including a US Congressman. Some at the Soviet Union were fearful of a retaliation strike by the US. Cold War tensions were high.
At the command center, Petrov was getting data that a launch of five missiles had been made in the US towards the Soviet Union. But instead of just reading that dashboard and acting he actually used his own inner analytics system to process the data and decide not to report or react.
Had Petrov reported incoming American missiles, his superiors might have launched an assault against the United States, precipitating a corresponding nuclear response from the United States. Petrov declared the system’s indications a false alarm. Later, it was apparent that he was right: no missiles were approaching and the computer detection system was malfunctioning. It was subsequently determined that the false alarms had been created by a rare alignment of sunlight on high-altitude clouds and the satellites’ Molniya orbits, an error later corrected by cross-referencing a geostationary satellite.
Petrov later indicated the influences in this decision included: that he was informed a U.S. strike would be all-out, so five missiles seemed an illogical start; that the launch detection system was new and, in his view, not yet wholly trustworthy; and that ground radars failed to pick up corroborative evidence, even after minutes of delay.
– Wikipedia contributors. "Stanislav Petrov." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 26 Sep. 2012. Web. 26 Sep. 2012.
I’ve always wondered if the system he was using had a bunch of fancy dashboard features, like shiny 3D pie charts, moving average lines and drill down capable reports if he would have been able to not trust the data. I’ve seen this sort of over-trust of data with data model diagrams. It seems the prettier or more advanced the presentation of the data is, the more people want to believe it is right. In fact, I’ve learned to present draft documents to people on my teams with hand-written notes/comments on them to sort of "break the ice" to show people that they are drafts. A modern solution might have included some sort of decision making guidance that say "Confidence Factor of Attack: 99%" or something like that. And it would have been highlighted by some sort of red bar, showing just how confident the system was based on the data – bad data, it turns out.
More details about Petrov and his actions in the video above from History.com
This website brings together key open data sets such as White House visitors,lobbying, campaign donations, etc. As the URL shows, it’s a sub site of the over all US open data project, http://data.gov. You can see in the image below the datasets that comprise the Ethics data site:
The data is available for download and the website offers some nifty ways of working with, visualizing, and embedding the data. For instance, I’ve embedded the White House Visitor data right here. Go ahead, do some searching or filtering, right here.
You can change the column order by using the Manage button:
You can set up some fairly decent filters (is, contains, etc.) on the columns, too. Here are the visitors named Karen Lopez:
That’s not me. (I seem to recall that I am mayor of the Lincoln Bedroom on Foursquare, though.) This is the problem with trying to use something like First Name and Last Name as a primary key. My data does show up in the Federal Campaign donations list, though. Only one donation…my other donation was returned to me because "Canadians can’t donate to US campaigns". Unfortunately for that candidate, they assumed that I was Canadian based on my residency, not my citizenship. They lost the money, but the other campaign got to keep my money. The entire world is one big data modeling problem, I tell ya. Get your semantics and your syntax right and you can take over the world. Or at least the US.
The real power in open data is being able to find correlations. As Deputy CTO Vein mentions, one could match up the data from the White House visitors, lobbyists and campaign donations to see if you find any matches. That’s not bad, it’s just more information. This is tough to pull off with any certainty, though, due to that dang primary key issue I mentioned above. What might help this? URIs. Or some other way of uniquely identifying people and organizations.
To cross match data, you’ll need to use one of the Export methods of using the API (Socrata ) or download the data to your own tools.
Data is available for download in these formats:
You can also discuss the datasets right on the site (registration required). There are only 7 datasets that are part of this ethics website, but the data stewards are eager to find out what datasets you’d like to see added. I’d also like to hear what data you think should be part of an ethics website focused on data. I’m thinking:
- Expenditures that required extra approval/oversight
- Travel data (who went where an why)
Some of the criticism that I’ve heard about data.gov is that there are too few datasets or that so much more could be provided. I’ve even heard complaints about money being spent on this service. As Tony Clement, Canadian MP and President of the Treasury Board (site | @tonyclementCPC ) said recently about the Canadian open data initiatives: open data is about transparency. We can’t wait until we have all the data, in a perfect format, to share it. He also mentioned that open data is saving the Canadian Government in significantly reduced costs for Freedom of Information Access requests. Think about it. What open data will become is self-serve FOIA. No waiting around for someone to spend weeks or months to find some data, then thousands of dollars to prepare and provide it.
I’m also hoping that the move to open data will allow government data architects to influence good data management practices. Exposing the data to sunshine is going to allow us, the people who fund the data collection and processing, to point out where the data is poor quality. The usability and ability to integrate data sets is going to be key in making it useful.
I’m thinking that I’d like to use some of these sets and others from data.gov for some upcoming demos.
Subscribe via E-mail
- September 2016
- August 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- September 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- September 2010
- August 2010
- July 2010
- February 2009