Browsing articles in "Data"

10 Ways I Can Steal Your Data: eBook

I wrote an eBook sponsored by SolarWinds. I share real life stories of non-traditional, non-hacker ways I can steal your data.  You can download the PDF for free (registration required).

clip_image001

I’ve also been contributing a blog series over on THWACK, 5 MORE Ways I can Steal Your Data, 5 More Ways I Can Steal Your Data: Work for you and Stop Working for You, 5 More Ways I Can Steal Your Data: Accessing Unmonitored Servers and Services, 5 More Ways I Can Steal Your Data: Ask the Security Guard to Help Me Carry it Out.  There’s one more post coming up soon, too.

Data protection from a data architect’s point of view is going to be a big focus of mine over the next year or so.  I’m hoping it will be yours, too.

How Deep is My Non-Love? Nested Dependencies and Overly Complex Design

Dec 4, 2017   //   by Karen Lopez   //   Blog, Data Modeling, Database, Database Design, SQL Server, WTF  //  No Comments

Relational databases have this nifty concept of objects (just things, not code objects) being dependent upon other things.  Sometimes those dependencies exist due to foreign key constraints, others via references to other things.  One example of the latter can be found in VIEWs.  A database VIEW is an object that references TABLEs or other VIEWS.  Of course, if that VIEW references other VIEWs, then that view must reference TABLEs or another VIEW.  And it’s that or another VIEW that can get modelers into trouble.

I reviewed a database design that had massively dependent VIEWs.  How did I know that? I used a proper data modeling tool to look at all the dependencies for one central VIEW.  And this is what my data modeling tool showed me:

Data Model with hundreds of dependencies (lines) between a handful of objects (squares)

That diagram shows how ONE VIEW is related to a whole bunch of other VIEWs and TABLEs in that design.  In reviewing the model, I saw that many of the VIEWs appeared to be duplicates or had very high overlap of content with other VIEWs. 

How do VIEWs Like This Happen?

There are many reasons one would created a nested VIEW.  Like anything in a hierarchy, you could have objects that could be used independently and as part of a group on a regular basis.  But that only explains one level of a VIEW hierarchy (nest).   What about VIEWs that are nested dozens are levels deep?  And why would a database have such a complex design around one VIEW?  These are the most common reasons I run into bad practices with VIEWs:

  • Designers who don’t understand the massive performance loss for massively nested VIEWS
  • Designers who design for theory, not for real world data stories
  • Designers who have no idea they are referencing another VIEW when they design their VIEW
  • Designers who are following a worst practice of creating a VIEW for every report and every window in an application
  • Designers who don’t collaborate with other designers and create their own set of VIEWs and dependencies
  • Designers who are compensated for doing work fast and not well
  • Designers who use DDL to do design, therefore never seeing the complexity of their designs
  • Data Governance policies that let anyone create objects in a database
  • A team environment were “everyone is a generalist”.

I could go on.  While I can’t go into details here, in my review I recommended complete refactoring of this overly complex design.  It is my guess this complexity was contributing to performance problems experienced in this application.  I also recommended that professional designer was used to refactor other issues with the database design.  I have no idea if this happened.  But I doubted that this application was going to meet its large scale web application goals.

Why Am I Sharing This?

Because so many design issues I find in reviews have the same causes for performance and data quality issues I’ve listed above.  I find that not using a real data modeling or design tool is the main contributing factor.  There’s a reason why physical world architects and engineers use drawings and architectural diagrams. Models are also how they make modifications successful to the items they build.

Yes, physical objects are different than software/application/database objects. My position is that these latter objects need models at least as much as buildings and devices do.  We need tools to reverse engineer objects, to view the dependencies, to search, and to assess.  In other words, to model.  Engineering data solutions requires engineering tools like data modeling tools.  And, yes, data engineers to understand how to use those tools and how to model out the unnecessary complexity.

Join the live streaming Cloud Field Day 1 #CFD1

Sep 14, 2016   //   by Karen Lopez   //   Analytics, Blog, Cloud, Data, DLBlog, Events, Open Data, Speaking  //  1 Comment

I’m a delegate at the inaugural Cloud Field Day. We’ll be live streaming all the events, including our podcasts today and tomorrow.

Wednesday:

Podcast on Is Data Boring?

Podcast on Is Cloud Tech the solution or is it the Cloud Process?

Podcast on Is DevOps A Load of Crap?

Visit with Cisco

Visit with Druva

Visit with Scality

Visit with Docker

 

What’s a Data Professional Doing at #VMWorld?

Last week I attended VMWorld, the conference for VMWare customers and partners.  I know what you are thinking: “why would a DataChick go to a conference about virtualization technologies?” 

Yes, VMWare is a bit off my normal path of events and writings, but that makes it even more interesting to me. I attended because:

1. Tech Field Day Extra

Tech Field Day invited me to attend Tech Field Day Extra (#TFDx), which is an abbreviated version of their full events (like the Cloud Field Day 1 (#CFD1) I’m attending next week.  Tech Field Days bring in vendor product teams to demo and talk about their products with independent professionals who share their thoughts about what they heard with their audiences and communities. I attended the presentations for:

Docker:  Docker is software based on open standards that helps you package up all the parts of a solution and then deploy that anywhere.  You may have heard people talking about containers and how they help with successful DevOps processes. By using containers, deployments are easier to deploy and scale. More about Docker. 

image
https://www.docker.com/what-docker#/VM

I’ll be writing more about Docker and Datachick data pros in another post.

Primary Data: Primary Data presented about their solution Datasphere, a data virtualization product that uses some nifty market-optimization-like processing to automatically move data to where it needs to be, when it needs to be there.  It’s “storage agnostic”, meaning through rules and group, data professionals can guide the right places for data to reside, and let the system decide (if needed), the fastest place for that data to rest. 

The also had me at the wonderful space graphics on their website.

image
http://primarydata.com

I cover Primary Data in a future post, where I will talk about the use of rules and groups and objectives metadata to manage the data virtualization and data orchestration that are possible.

Sandisk: (owned now by Western Digital)  Sandisk Data Center product teams talked with us about some deep dive internal virtualization features that frankly are well beyond my skills levels in virtual machines.  As an overview, they talked about using Flashsoft for VMWare APIs for managing IO for  storage / caches.

image
https://www.sandisk.com/business/datacenter/resources/data-sheets/flashsoft-4-for-vmware-vsphere-6

I will be hearing from again next week at Cloud Field Day 1, so I will be writing about them in a future post.

2. VMWorld Press

I was invited to VMWorld on a press credential.  That meant I had access to all sessions and exhibits.  I attended various press conference/meetings.  I spent time talking to vendors who were most focused on data, DevOps and cloud technologies: Primary Data, SkyTap, SolarWinds, Datrium, Pure Storage, Dell Software, Turbonomic, X-IO, Github, Puppet, and SIOS.  Most of my coverage of these technologies happened via Twitter @datachick.  I expect from the conversations, though, that I will be covering these solutions and services in the longer term.  Once this series is completed, I’ll wrap it up with some thoughts on VMWorld.

3. Professional Development

Over the last couple of years I’ve been focusing a lot of my professional development on cloud technologies and processes.  This leads to learning more about hybrid technologies (cloud and on-prem, plus private clouds). All of this has shown me that I need to understand virtualization and data centre technologies more than I have had to know in the past.  Working in other communities has helped me make the contacts and friends that I need to be successful. I think every few years IT pros should be an event that is related to but not the focus of their specialization to broaden their understand of the tiny piece of the puzzle they work on.

I also found some time to attend sessions and I hope to get some posts up later about the ones I picked.

4. My Own Data Management Environments

While I was attending these sessions and talking to vendors, I was thinking about the data tools environments I manage: repositories, model marts, data management tools, configuration files, etc.  All of them can benefit from my implementing these technologies.  It’s sort of a “metadata centre” I need to think about, too. I’m hoping to write about those experiences as well.

Finally

The advent of Software-defined {Storage | Data Centre |Networks | Software Smile} means that configurations, metadata, policies, and rules will need to be well-managed.  I see my job as a data professional just as applicable in managing data centre data as line of business data.  If we aren’t apply our rules to our own work, then why would the business trust us when we tell them they should be doing that with “their” data?

Join me at DellWorld 2016 in Austin, TX

Aug 15, 2016   //   by Karen Lopez   //   Blog, Cloud, Data, Data Modeling, DLBlog, NoSQL, Professional Development, SQL Server  //  1 Comment

Dell World 2016 logo
I will be attending DellWorld 2016 as an influencer/media/analyst participant. This means that I’ll get access to the regular sessions, plus special engagements with product teams to see what they’ve been working on recently and what they want to do in the future. I’ve attended a couple of Dell on-site events and am looking forward to talking to key customers and real-world, hands-on data professionals. Also, doesn’t everyone want to visit Austin as much as possible?

If you will be attending Dell World, let me know. I hope we can #selfie. Or just have a real conversation. Or we can get breakfast tacos.

Microsoft Canada Excellence Centre (MCEC)–Great Stuff

Jun 28, 2016   //   by Karen Lopez   //   Blog, Careers, Data, DLBlog, Fun  //  No Comments

I love getting to see new technologies changing the world.  The opening of the new Vancouver Microsoft Canada Excellence Centre included prominent Microsoft and Canadian leaders, including our Geek Prime Minister.  Take a few minutes to see how all my favourite buzzwords come together:

Microsoft + my Canadian BF + Jobs + Deep Learning + AI + Machine Learning + Investing + Accessibility + YVR + SEA + Innovation + Prime Minister "knows how to code already" + Geek + Big news for Canada

This sort of “making a difference” is why I keep getting out of bed in the morning.

Pages:1234567...30»

Subscribe via E-mail

Use the link below to receive posts via e-mail. Unsubscribe at any time. Subscribe to www.datamodel.com by Email


Categories

Archive

UA-52726617-1