Data integrity can be compromised in a variety of ways, making data integrity practices an essential component of effective enterprise security protocols. Data integrity may be compromised through:
- Human error, whether malicious or unintentional
- Transfer errors, including unintended alterations or data compromise during transfer from one device to another
- Bugs, viruses/malware, hacking, and other cyber threats
- Compromised hardware, such as a device or disk crash
- Physical compromise to devices
As long as data are collected, stored, and shared, any metric reliant on these data is vulnerable to internal and external quality issues. For social science statistical analysis, data integrity can occur at different stages of data downloading, data analysis and data presentation. Based on the requirements of our degree program and our SOC 355 (Social Statistics) course, the following modules will be proposed to infuse cybersecurity in our SOC 355 (Social Statistics) course —
Motivating Example:
Main empirical finding about changes in living standard after divorce
- for women declines 73%
- for men increases 42%
- American Sociological Association Book Award in 1986
- Between 1986 and 1993, cited in 348 social science articles
and 250 law review articles
- Between 1986 and 1993, cited in 24 legal cases and by the
Supreme Court
Weitzman (1996) —“Unfortunately, the original cleaned master SPSS system no longer exists. I assumed it was being copied and reformatted as I moved for job changes and fellowships from the project’s original offices in Berkeley to Stanford (in 1979), then to Princeton (in 1983), back to Stanford (in 1984) and then to Harvard (in 1986).With each move, new programmers worked on the files to accommodate different computer systems.”
“When I could not replicate the analyses in my book with what I had mistakenly assumed was the archived master SPSS system file, I hired an independent consultant, Professor Angela Aidala from Columbia University, to help me untangle what had happened. She reviewed all of the project files, documentation, and codebooks, as well as the available data and programming files to determine a possible computational error in the standard of living statistic. But she could not do this without an accurate data file to work with. We then went back to the original questionnaires and recoded a random sample of about 25 percent of the cases. There were so many discrepancies between the questionnaires and the \dirty data” raw data file, and between the questionnaires and the mismatched SPSS system file, that we finally abandoned the effort and left a warning to all future researchers that both files at the Murray Center were so seriously awed that they could not be used. It was a very sad, time consuming, and frustrating experience. . . ”
References
- Babbie, E. (2011). The Basics of Social Research. 5th Ed. Wadsworth, Cengage Learning, California.
- Bureau of Labor Statistics Data Integrity Guidelines. https://www.bls.gov/bls/data_integrity.htm
- Schutt, R. K. (2014). Investigating the Social World: The Process and Practice of Research 8th. Sage, California.