Now, this time I want to talk about data, in an informal way so we get a high level understanding what data is. That way we get a feeling for data as a concept, and how we can shape and design data to become a valuable asset.
Lets meet data. Data lives in our computers, and drives the modern industry. Data is everywhere and plays a big part in our daily lives. We interact with data daily, when we search for something or when we view a video from for example, Netflix. But what is data actually?
Data is physical
In a lot of ways, data has the same properties as things in our environment. Things must first be created, and before they are, they do not exist. The same is true for data. Things take up space in our environment, and the same is true for data. Data takes up space in our computer in the form of ones and zeroes.
To be clear, taking up space in our computer is a physical property which means there is real physical space involved. Although the space that data occupies is very small, when we store a lots of it, it adds up. For example, when I store data in Google Drive, the data is stored in data centers. Because a lot of people store data in Google Drive, there is a lot of space involved. To get an idea, a single Google data center occupies about 32 square kilometers.
Data does not wear out
Data is a special kind of resource. For example, data does not get old and it does not wear out. Also, you can make an exact copy of data, without loosing any of the original properties. These properties are of interest of the modern industry. Video streaming services like Netflix, Amazon Video, HBO Now and YouTube make heavy use of the property that data can be copied anywhere, and that data does not wear out. By leveraging the attributes of data, the quality that every viewer experiences when consuming the content, is of the same high quality, guaranteed!
Data can be transformed
Another interesting property of data is that it can be transformed in any form that fits our needs. When we watch a video from Netflix, the image we see on screen is a digital representation of the data stream that is being downloaded. Between the television and Netflix there is a transformation step that converts the highly secured and compressed data stream, into a high resolution video stream, in order to watch the movie.
Data is a technology and a resource, that has been specifically designed in order to transform data from one representation into another.
Data is anything we write down
Data is anything that we write down. The technology we choose to write down that ‘something’, can be pen and paper, zeroes and ones in a computer or a stick and sand at a beach. The fact that we’ve registered something makes it data.
Data is a technology
Although, strictly speaking, data does not have to be digital, nowadays, most often it is. For the discussion we will assume that data is digital.
Digital data is a technology specifically designed for transformation. Data makes use of the binary system, invented by Gottfried Wilhelm Leibniz in 1701, and found its usage in electronics and computing. Because technology is all about design and use, data must be designed so that it can be used and applied effectively.
Data must be designed
So, data is anything we write down in ones and zeroes, and must be designed in order to be used effectively in our use cases. One aspect of designing data is that we must understand our domains and use cases. When we understand the vocabulary of our domains, and the use cases of the domains, we know what must be written down. That way the use cases operate properly. However, there are other aspects of designing data other than data modeling.
Data is fluid
Designing data is just data modeling, that was my initial thought. And fair enough, structuring data is part of designing data. As we have seen, data has several properties that can be leveraged in order to create an effective data design. Because data can be created, copied and transformed without wearing it out, data is inherently flexible. You could even argue that data is fluid. By making use of the fluidic properties of data, we can create an effective data design that becomes a valuable asset in our domains.
Data can be costly
Data can be transformed, copied and stored. We can do that over and over again without wearing out data. When we keep on copying data, and storing data it becomes a costly operation. Transforming data uses compute resources and storing data uses storage resources. The cost of these resources add up over time and it can become quite expensive to maintain a large set of data.
Data must be secured
There can be many reasons to secure data. The primary reason is that data is the property of who has created it. It is very easy to copy or transform data. When an unauthorized entity like a person or service has access to data, data can be stolen, or used in a way that was not intended. Security and control measures must be put in place in order to control the distribution, use and quality of the data.
Data must be managed
Handling data is quite complex. There are many more aspects to managing data like data architecture, data development, data quality and master data, to name a few. All aspects of managing data is described in the Data Management Framework (DAMA).
Data as a product
When data is seen as a product, there is automatically a mindshift. Instead of thinking of data in terms of schema and data types, there is a new focus. When thinking about data as a product, you try to understand data along with the ecosystem it exists in and the stakeholders that will use the data and are directly affected by the data product.
Because stakeholders are directly affected by the data, the product needs to be of the highest quality at time of consumption. To guarantee the highest quality, an ecosystem must be be put in place that provides safeguards, checks and validations.
Data exists to be consumed. When I watch a Netflix movie, then I consume the data stream. At the same time I have expectations I put on that data stream. From my perspective, I want to consume a product of the highest quality, without interrupting my experience. The same is true for other data products. There is always a stakeholder that consumes the data and has expectations. Data has some great properties that we can use in order to provide a great service to the stakeholder.
Products come in all shapes and sizes and the same is true for data products. Depending on the requirements of the data consumer, we create a suitable container that provides a suitable context for the data to operate in. The client interacts with the container and from his point of view, the container is the data product.
In real life we see the same analogy. When I buy a bottle of water, the container is the bottle and the data is the water inside the bottle. I interact with the bottle to get to the water.
Examples of data products are:
| data product | content |
| — | — |
| Lookup data products | for getting data – any service that provides data by lookup |
| Notification data products | for pushing data – any publisher that provides data as a trigger |
| Smart data products | data combined with business logic – any service that is a stateful workflow engine |
| Feedback driven data products | parameterized data products – any service that can be tuned by means of parameters |
Data Platforms provide data products to consumers. It is a data factory that creates a data product on demand for a specific consumer. The container that provides the context for the data is arbitrary and can be anything from a data protocol to a service that provides access to the data by means of an API. Data consumers can be both internal or external to the data platform.
In this blog we looked at data, and understand that data is a special kind of resource that can be created, copied, transformed and does not wear out over time. We have also seen that data can be designed to be a valuable asset.
When we recognize data as a product, we design data from the consumer perspective. That way we create an ecosystem that is based on quality of service. Data Platforms create data products, that operate within an ecosystem, so a data consumer can consume data of the highest quality.
This blog is deliberately abstract so that you can make your own translation. That way you can create your own designs based on the traits of data, and with the consumer in mind.