Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Data Principles πŸ“

Learning Goals

By the end of this section you will:

  • understand the issues involved in managing data, including acquisition, integrity, anomalies and security.

  • understand data representation and how choices pertaining to it influence data.

Data principles are fundamental guidelines that ensure data is collected, stored, processed, and used in ways that maintain its quality, reliability, and security.

Data Management πŸ“ΒΆ

Delivering a data-driven digital solution requires a detailed understanding of the data involved. Therefore it is vital to consider the data that underpins the application and issues related to managing that data.

These issues fall under four main categories:

Acquisition πŸ“ΒΆ

Data acquisition is the process of collecting and capturing data from various sources and converting it into a usable digital format for further analysis and processing. In acquiring the your data you need to consider both the timeliness of the data acquisition and the ownership of the data.

Timeliness

Data is a point-in-time measurement, which means there is always a delay between between data creation and data entry. This difference is called data timeliness.

The impact of data timeliness data depends upon its purpose. A database that records daily temperature has a different timeliness demand than a database that records the core temperature of a nuclear powerplant. Despite both databases recording temperature, a delay of a minute would be of no concern for recording the daily temperature, but could be catastrophic for the nuclear powerplant.

In exploring the data needs of your solution, consider what delay there may be in your data acquisition, as well as, how much delay can be tolerated.

Ownership

There are three types of data ownership:

  1. Personal data ownership: refers to an individual’s ownership of their personal data, such as their name, address, and contact information.

  2. Corporate data ownership: refers to the ownership of data generated and collected by an organization or company, such as customer data, sales data, and financial data.

  3. Public data ownership: refers to data that is owned by the public or government, such as census data, public records, and government statistics.

It is possible for the data in a database to contain a mix of ownerships. Each type of data ownership has different handling requirements. It is important to identify the ownership of the data we are working with.

Integrity πŸ“ΒΆ

Data integrity means keeping data accurate, complete, and consistent at every stageβ€”from creation and storage to processing and sharing. It ensures that the data remains trustworthy and protected from unauthorized changes, loss, or damage. This is especially important in fields like healthcare, finance, and research, where reliable data is essential.

In analysing the data for your solution you should consider both the received data integrity in addition to steps you need to take to ensure the ongoing integrity of that data.

Data integrity depends on the data being:

Anomalies & RedundancyΒΆ

Data anomalies refer to inconsistencies or errors that arise when storing or manipulating data in a database. These anomalies can occur in various forms, such as:

Identifying and resolving these data anomalies is critical for maintaining data integrity in databases and ensuring that the data is accurate, complete, and consistent.

Proper database design, normalization techniques, and data validation processes can help minimize the occurrence of data anomalies. We will learn about these techniques later in this Unit. For the purposes of the Explore phase, the prevention of such anomalies would form a requirement for your digital solution.

Security and ProtectionΒΆ

Security is essential to the success of any information system and the valuable data stored in it.

Threats The potential threats to your digital solution include:

Solutions Steps can be taken to minimise the risks presented by the potential threats. These include:

Data RepresentationΒΆ

When we talk about representation, it’s about how we turn different things, like numbers or text, into a language that computers understand, which usually involves using combinations of 0s and 1s for numbers and specific codes for letters and symbols. The way we do this is crucial because it affects how fast and accurately computers can work with data.

Text representationΒΆ

Representing text is a significant issue when considering data. There are two main ways of representing text, ASCII and Unicode.

ASCII is like a simple language that computers first used, where each letter, number, or symbol is represented by a specific number. For example, the letter β€˜A’ is represented by the number 65. This made it easy for early computers to work with text, but it had limitations because it could only represent a small set of characters, mainly English letters and symbols.

Unicode, on the other hand, is like a more advanced and universal language for computers. It can represent characters from almost all the world’s languages, emojis, and special symbols. Instead of just using numbers like ASCII, it assigns a unique number to every single character in all these languages. So, whether you’re typing in English, Chinese, Arabic, or even using emojis, Unicode ensures that computers understand and display it correctly.

Make sure you know which method you data uses.

Other Data representationsΒΆ

The storage of data needs to be consistent. That means the way that one data point is stored is the same from record to record. For example 100,000 and 105 are the same number, but they are represented differently. Before designing a database and storing data, the accepted format of this data needs to be decided.

dates

Some common data formats that need to be established: