Some variables are straightforward to measure without error – blood pressure, number of arrests, whether someone knew a word in a second language.
But many – perhaps most – are not. Whenever a measurement has a potential for error, a key criterion for the soundness of that measurement is reliability.
Think of reliability as consistency or repeatability in measurements.
Not only do you want your measurements to be accurate (i.e., valid), you want to get the same answer every time you use an instrument to measure a variable.
That instrument could be a scale, test, diagnostic tool as reliability applies to a wide range of devices and situations.
So, why do we care? Why make such a big deal about reliability?
Well, researchers would have a very hard time testing hypotheses and comparing data across groups or studies if each time we measured the same variable on the same individual we got different answers. This makes reliability very important for both social sciences and physical sciences.
Think about it: A basic tenet of science is replication, so without reliability, how can we be sure a study wasn’t replicated solely due to measurement error?
For example, say we were testing a new antidepressant drug on symptoms of depression, with the outcome assessed via a series of questions that measure depression.
We would want the scale to be a reliable measure of depressive symptoms.
This reliability takes several forms. Here are a few examples.
Inter-rater reliability
We want to make sure that two different researchers who measure the same person for depression get the same depression score. If there is some judgment being made by the researchers, then we need to assess the reliability of scores across researchers.
Test-retest reliability
Likewise, even if there is no judgment involved, we want to make sure that the questions on the instrument are precise enough that they’re not open to misinterpretation or the current mood of the participant. If a participant took the same depression test a week apart, would they get the same score?
Internal consistency
Another potential form of reliability is the consistency across items on the scale. If every item on the scale really measures the same construct, then the responses should be similar to all items. If they’re not, then these items are not a reliable measure of the construct.
Barr Moses says
Hi Audrey,
Thanks for sharing! This is very applicable to a pedagogical setting (and super spot on!), but in the enterprise, I found that you need to apply a data set-level approach to measuring data reliability. Here are some best practices I leveraged from our friends, the DevOps engineers, to track reliability:
Set SLOs and SLIs for data
Setting SLOs and SLIs for system reliability is an expected and necessary function of any SRE team, and in my opinion, it’s about time we applied them to data, too. Some companies are already doing this, too.
In the context of data, SLOs refer to the target range of values a data team hopes to achieve across a given set of SLIs. What your SLOs look like will vary depending on demands of your organization and the needs of your customers. For instance, a B2B cloud storage company may have an SLO of 1 hour or less of downtime per 100 hours of uptime, while a ridesharing service will aim for as much uptime as humanly possible.
In short, your data SLIs are: freshness, distribution, volume, schema, and lineage.
Full article here: https://towardsdatascience.com/what-is-data-reliability-66ec88578950?source=friends_link&sk=1e68914ac3f3e1bbb38f0f94c48074f0