Data modeling is the process by which we represent information system objects or entities and the connections between them. Such entities may be people, products, or anything else related to your business; regardless of the entity type, correctly modeling it results in a powerful database set up for fast information retrieval, efficient storage, and more.
TO SEE: Job description: Big data modeller (Tech Republic Premium)
Given the benefits that data modeling provides for database insights, it is important to learn how to effectively apply data modeling in your organization. In this guide, I’ll point out some important mistakes to avoid when modeling your data.
Jump to:
- Do not view quality data models as an advantage
- Not taking into account the use of the data by the application
- Schemaless does not mean data modelless
- Failing to tame semi-structured data
- No plans for data model evolution
- Mapping the UI tightly to the fields and values of your data
- Incorrect or different granularity
- Inconsistent or non-existent naming patterns
- The concept of not separating keys from indexes
- Starting too late with data modeling
Do not view quality data models as an advantage
As Microsoft Power BI consultant Melissa Coates has: be awarewe sometimes optimize our data models for a particular use case, such as analyzing sales data, and using the model quickly becomes more complex when analysts need to analyze more than one thing.
For example, it can be difficult for analysts to analyze the intersection of sales and support conversations if models are optimized for sales data only. Not to mention the extra time, resources, and any costs involved in making additional models if a single model would have been enough.
To avoid this kind of model inefficiency, take the time to make sure your data model has wider applicability and makes long-term financial sense.
Not taking into account the use of the data by the application
One of the hardest things about data modeling is finding the right balance between competing interests, such as:
- The data needs of application(s)
- Performance Goals
- How data is retrieved
It’s easy to get so caught up in the structure of the data that you don’t spend enough time analyzing how an application will use the data and finding the right balance between querying, updating, and processing data.
TO SEE: Recruitment Package: Data Scientist (Tech Republic Premium)
Another way to spot this error is to lack empathy for others who will use the data model. A good data model takes into account all users and use cases of an application and builds accordingly.
Schemaless does not mean data modelless
NoSQL databases (document, key-value, wide-column, etc.) have become an essential part of the enterprise data architecture, given the flexibility they provide for unstructured data. While it is sometimes mistakenly thought of databases with no schema, it is more accurate to think of NoSQL databases as flexible schemas. And while some data schemas merge with data models, the two fulfill different functions.
A data schema instructs a database engine on how data is organized in the database while a data model is more conceptual and describes the data and relationships between the data. Regardless of this confusion about the influence of a flexible schema on data modeling, just like with a relational database, developers need to model data in NoSQL databases. Depending on the type of NoSQL database, that data model will be either simple (key value) or more advanced (document).
Failing to tame semi-structured data
Most data today is unstructured or semi-structured, but like error number three, this doesn’t mean your data model has to follow the same formats. While it can be helpful to think about structuring your data on ingestion, it will almost inevitably hurt you. You can’t avoid semi-structured data, but the way to deal with it is to apply rigor in the data model instead of taking a hands-off approach while retrieving data.
No plans for data model evolution
Given how much work can go into mapping out your data model, it can be tempting to assume that once you’ve built the data model, your work is done. Not so, noted Prefect’s Anna Geller: “Building data assets is an ongoing process,” she said, because “as your analytic needs change over time, so does the schema.”
One way to make data model evolution easier, she continued, is to “split and decouple data transformations.” [to] make the whole process easier to build, debug and maintain in the long run.”
Mapping the UI tightly to the fields and values of your data
As Tailwind Labs partner Steve Schoger has: marked, “Don’t be afraid to ‘think outside the database’”. He goes on to explain that you don’t necessarily have to map your user interface directly to every data field and value. This error usually stems from a fixation on your data model rather than the underlying information architecture. The problem also means that you are likely presenting data in a way that is more intuitive to the application audience than a one-to-one mapping of the underlying data model.
Incorrect or different granularity
In analytics, granularity refers to the level of detail we can see. In a SaaS company, for example, we want to see the consumption of our service per day, per hour or per minute. It’s important to get the right amount of granularity in a data model because if it’s too granular you can end up with all kinds of unnecessary data, making it complicated to decipher and sort everything.
But with too little granularity, you may lack enough detail to discover important details or trends. Now add the possibility that your granularity is focused on daily numbers, but the company wants you to determine the difference between peak and off-peak consumption. At that point you would be dealing with mixed granularity and ultimately confusing users. Determining your exact data usage scenarios for internal and external users is an important first step in determining how much granularity your model needs.
Inconsistent or non-existent naming patterns
Instead of coming up with a unique naming convention, take standard approaches with data models. For example, if tables don’t have consistent logic in how they are named, the data model becomes very difficult to follow. It may seem smart to come up with obscure naming conventions that relatively few people will immediately understand, but this will inevitably lead to confusion later on, especially when new people come on board to work with these models.
The concept of not separating keys from indexes
In a database, keys and indexes have different functions. Like Bert Scalzo has explained, “Keys enforce company rules, that’s a logical concept. Indexes speed up database access – it’s a purely physical concept.”
Since many merge the two, they don’t end up implementing candidate keys and thereby reducing the indexes; in the process, they also slow down performance. Scalzo continued with this advice: “Implement the fewest number of indexes” [that] can support all keys effectively.”
Starting too late with data modeling
If the data model is the blueprint for describing an application’s data and how that data interacts, it makes little sense to start building the application before an big data modeler has fully mapped out the data model. Yet this is exactly what many developers do.
Understanding the shape and structure of data is critical to application performance and ultimately to the user experience. This should be the first consideration and brings us back to mistake number one: not seeing quality data models as an advantage. Not planning the data model is essentially planning to fail (and planning to do a lot of refactoring later to fix the errors).
Disclosure: I work for MongoDB, but the views expressed herein are mine.
TO SEE: Top data modeling tools (TechRepublic)