Navigating Protection and Presence: Trade-offs around data suppression for small Pacific populations

Main Article Content

Helen Turner https://orcid.org/0000-0002-9629-7691
Connor Flynn
Liliiana Flynn
Catherine Brockway
Rylan Chong
Apo Aporosa
Mata`uitafa Faiai
Chad Jansen
Alexander Stokes

Keywords

Data suppression, population statistics, data privacy, data sovereignty

Abstract

Introduction: Datasets, their analytics and their interpretation are key decision support tools for Pacific Island communities, with the potential to shape public policy, healthcare, and social interventions in the Pacific ‘Blue Continent’. However, in the case of numerically small island populations, privacy concerns have motivated widespread use of data suppression. While suppression safeguards privacy, it also risks erasing the visibility of these populations, leading to ‘statistical invisibility’ that obscures the social, health, and economic challenges. This study critically reviews the practice of data suppression, emphasizing its rationale in privacy protection, but also highlighting the impacts on resource allocation, advocacy, and equitable policy-making for Pacific populations.


Methods: We explored the rationale behind data suppression, and its legal and regulatory context. Using case studies including the U.S. Census Bureau, Centers for Disease Control and Behavioral Risk Factor Surveillance System, we assess the impact of suppression thresholds and privacy-preserving methods on Pacific Island communities. We present a novel analysis of data suppression impacts on ICD code suppression across different levels of geographical units in the Pacific to illustrate disproportionate impacts. We review alternative privacy-preserving methods, including data smoothing, statistical masking, and synthetic data generation, that could mitigate the effects of suppression without compromising individual privacy.


Finding and Conclusions: We recommend inclusive and transparent data practices needed to prevent data suppression compounding systemic marginalization of small Pacific populations. By critically evaluating current practices and proposing alternative strategies grounded in ‘Critical Data Theory’ and Pacific knowledge epistemology, this paper aims to inform policies that balance protection of individual privacy with the accurate representation of small, geographically dispersed populations.

Abstract 155 | PDF 185 Turner Downloads 8

References

References
1. Emam, K. E., Jonker, E., Arbuckle, L. & Malin, B. A Systematic Review of Re-Identification Attacks on Health Data. PLOS ONE 6, e28071 (2011).
2. McGraw, D., Dempsey, J. X., Harris, L. & Goldman, J. Privacy as an enabler, not an impediment: building trust into health information exchange. Health Aff. Proj. Hope 28, 416–427 (2009).
3. Loukides, G., Denny, J. C. & Malin, B. The disclosure of diagnosis codes can breach research participants’ privacy. J. Am. Med. Inform. Assoc. JAMIA 17, 322–327 (2010).
4. Brownstein, J. S., Cassa, C. A. & Mandl, K. D. No place to hide--reverse identification of patients from published maps. N. Engl. J. Med. 355, 1741–1742 (2006).
5. Bhakta, S. Data disaggregation: the case of Asian and Pacific Islander data and the role of health sciences librarians. J. Med. Libr. Assoc. JMLA 110, 133 (2022).
6. U.S. Office of Management and Budget Statistical Policy directive 15 on Race and Ethnicity Data Standards. The 2024 Statistical Policy Directive No. 15. https://spd15revision.gov/content/spd15revision/en/2024-spd15.html (2024).
7. Kao, S.-Y. Z., Tutwiler, M. S., Ekwueme, D. U. & Truman, B. I. Better data for decision-making through Bayesian imputation of suppressed provisional COVID-19 death counts. PLOS ONE 18, e0288961 (2023).
8. UCLA Center for Health Policy Research. Native Hawaiian and Pacific Islander Data Policy Platform: No Health Equity Without Data Equity. https://healthpolicy.ucla.edu/sites/default/files/2023-03/nhpi-data-policy-platform-report-mar2023.pdf (2023).
9. US Census Bureau. Understanding Differential Privacy. Census.gov https://www.census.gov/programs-surveys/decennial-census/decade/2020/planning-management/process/disclosure-avoidance/differential-privacy.html (2020).
10. US Census Bureau. Detailed Look at Native Hawaiian and Other Pacific Islander Groups. Census.gov https://www.census.gov/library/stories/2023/09/2020-census-dhc-a-nhpi-population.html (2020).
11. Hong, J. Census 2020 + / - 2: Census, Differential Privacy, and the Future of Data. Hawai‘i Data Collaborative https://www.hawaiidata.org/news/2020/9/24/census2020-census-differential-privacy-future-of-data (2020).
12. Hotz, V. J. & Salvo, J. A Chronicle of the Application of Differential Privacy to the 2020 Census. Harv. Data Sci. Rev. (2022) doi:10.1162/99608f92.ff891fe5.
13. McKenna, L. Disclosure Avoidance Techniques Used for the 1970 through 2010 Decennial Censuses of Population and Housing. Work. Pap. (2018).
14. Abowd, J. M. et al. The 2020 Census Disclosure Avoidance System TopDown Algorithm. Harv. Data Sci. Rev. (2022) doi:10.1162/99608f92.529e3cb9.
15. Sasa, S. M. & Yellow Horse, A. J. Just data representation for Native Hawaiians and Pacific Islanders: A critical review of systemic Indigenous erasure in census and recommendations for psychologists. Am. J. Community Psychol. 69, 343–354 (2022).
16. United States Census Bureau. American Community Survey Data Suppression. https://www2.census.gov/programs-surveys/acs/tech_docs/data_suppression/ACSO_Data_Suppression.pdf (2016).
17. Klein, R. J., Proctor, S. E., Boudreault, M. A. & Turczyn, K. M. Healthy People 2010 criteria for data suppression. Healthy People 2010 Stat. Notes Cent. Dis. Control Prev. Cent. Health Stat. 1–12 (2002).
18. Patki, N., Wedge, R. & Veeramachaneni, K. The Synthetic Data Vault. in 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) 399–410 (2016). doi:10.1109/DSAA.2016.49.
19. Soria-Comas, J. & Domingo-Ferrer, J. Big Data Privacy: Challenges to Privacy Principles and Models. Data Sci. Eng. 1, 21–28 (2016).
20. Abowd, J. M. & Schmutte, I. M. An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices. Am. Econ. Rev. 109, 171–202 (2019).
21. Machanavajjhala, A., Kifer, D., Gehrke, J. & Venkitasubramaniam, M. L-diversity: Privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1, 3-es (2007).
22. Dwork, C., McSherry, F., Nissim, K. & Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. J. Priv. Confidentiality 7, 17–51 (2016).
23. Makov, E., Fienberg, S. E. & Steele, R. J. Disclosure limitation, perturbation and related methods for categorical Data. J. Off. Stat. 14, 485–502 (1998).
24. Garfinkel, S. Differential Privacy and the 2020 US Census. MIT Case Stud. Soc. Ethical Responsib. Comput. (2022) doi:10.21428/2c646de5.7ec6ab93.
25. Slavković, A. & Seeman, J. Statistical Data Privacy: A Song of Privacy and Utility. Annu. Rev. Stat. Its Appl. 10, 189–218 (2023).
26. Vaughan, A. S. et al. Applying a Bayesian Spatiotemporal Model to Examine Excess County-Level Cardiovascular Disease Death Rates During the COVID-19 Pandemic. Am. J. Epidemiol. kwae330 (2024) doi:10.1093/aje/kwae330.
27. Fadel, A. C., Ochi, L. S., Brito, J. A. de M. & Semaan, G. S. Microaggregation heuristic applied to statistical disclosure control. Inf. Sci. 548, 37–55 (2021).
28. Yan, Y. et al. Privacy preserving dynamic data release against synonymous linkage based on microaggregation. Sci. Rep. 12, 2352 (2022).
29. Soria-Comas, J. & Domingo-Ferrer, J. Probabilistic k-anonymity through microaggregation and data swapping. in 2012 IEEE International Conference on Fuzzy Systems 1–8 (2012). doi:10.1109/FUZZ-IEEE.2012.6251280.
30. Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D. & Martínez, S. Improving the Utility of Differentially Private Data Releases via k-Anonymity. in 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications 372–379 (2013). doi:10.1109/TrustCom.2013.47.
31. Venugopal, R. et al. Privacy preserving Generative Adversarial Networks to model Electronic Health Records. Neural Netw. Off. J. Int. Neural Netw. Soc. 153, 339–348 (2022).
32. Liu, Y., Peng, J., Yu, J. J. Q. & Wu, Y. PPGAN: Privacy-Preserving Generative Adversarial Network. in 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS) 985–989 (2019). doi:10.1109/ICPADS47876.2019.00150.
33. Quick, H. Generating Poisson-Distributed Differentially Private Synthetic Data. J. R. Stat. Soc. Ser. A Stat. Soc. 184, 1093–1108 (2021).
34. Tao, F., Xiao, B., Qi, Q., Cheng, J. & Ji, P. Digital twin modeling. J. Manuf. Syst. 64, 372–389 (2022).
35. Walter, M., Kukutai, T., Carroll, S. R. & Rodriguez-Lonebear, D. Indigenous Data Sovereignty and Policy. in Indigenous Data Sovereignty and Policy (Routledge, London, 2020). doi:10.4324/9780429273957.
36. Fang, H. S. A., Tan, T. H., Tan, Y. F. C. & Tan, C. J. M. Blockchain Personal Health Records: Systematic Review. J. Med. Internet Res. 23, e25094 (2021).
37. Elvas, L. B., Serrão, C. & Ferreira, J. C. Sharing Health Information Using a Blockchain. Healthcare 11, 170 (2023).
38. Haleem, A., Javaid, M., Singh, R. P., Suman, R. & Rab, S. Blockchain technology applications in healthcare: An overview. Int. J. Intell. Netw. 2, 130–139 (2021).
39. Australian Institute of Health and Welfare. Data Sources. Indigenous Mental Health & Suicide Prevention https://www.indigenousmhspc.gov.au/resources/data-resources/data-sources (2023).
40. Commonwealth of Australia. Framework for Governance of Indigenous Data | NIAA. https://www.niaa.gov.au/resource-centre/framework-governance-indigenous-data (2024).
41. Walter, M. et al. Indigenous Data Sovereignty in the Era of Big Data and Open Data. Aust. J. Soc. Issues 56, 143–156 (2021).
42. Sporle, A., Hudson, M. & West, K. Indigenous data and policy in Aotearoa New Zealand. in Indigenous Data Sovereignty and Policy (Routledge, London, 2020).
43. Jansen, R. Indigenous data sovereignty: a Māori health perspective. in Indigenous Data Sovereignty (eds. Kukutai, T. & Taylor, J.) vol. 38 193–212 (ANU Press, Canberra, 2016).
44. U.S. Department of the Interior. Compacts of Free Association. https://www.doi.gov/oia/compacts-of-free-association (2024).
45. U.S. Centers for Disease Control and Prevention. Global Health Center. Global Health https://www.cdc.gov/global-health/index.html (2024).
46. Shimkus, J. H.R.4013 - 107th Congress (2001-2002): Rare Diseases Act of 2002. https://www.congress.gov/bill/107th-congress/house-bill/4013 (2002).
47. Orwell, G. An Age Like This, 1920-1940. (Harcourt, Boston, MA, 1971).
48. Stivers, R. Ethical Individualism and Moral Collectivism in America. Humanitas 16, 56–73 (2003).
49. Newman, C. Big Data Analytics Shows How America’s Individualism Complicates Coronavirus Response UVA Today. https://news.virginia.edu/content/big-data-analytics-shows-how-americas-individualism-complicates-coronavirus-response (2020).
50. Brey, P. et al. International Differences in Ethical Standards and in the Interpretation of Legal Frameworks. 1–133 https://satoriproject.eu/media/D3.2-Int-differences-in-ethical-standards.pdf (2015).
51. Kukutai, T. & Taylor, J. Indigenous Data Sovereignty: Toward an agenda. in Indigenous Data Sovereignty: Toward an agenda vol. 38 (ANU Press, Canberra, 2016).
52. Walter, M. & Suina, M. Indigenous data, indigenous methodologies and indigenous data sovereignty. Int. J. Soc. Res. Methodol. 22, 233–243 (2019).
53. Smith, L. T. Decolonizing Methodologies: Research and Indigenous Peoples. (Zed Books, London, 2012).
54. Kukutai, T. & Cormack, D. “Pushing the space”: Data sovereignty and self-determination in Aotearoa NZ. in Indigenous Data Sovereignty and Policy (Routledge, London, 2020).
55. Moutselos, K. & Maglogiannis, I. Evidence-based Public Health Policy Models Development and Evaluation using Big Data Analytics and Web Technologies. Med. Arch. Sarajevo Bosnia Herzeg. 74, 47–53 (2020).
56. Tecun, A., Hafoka, I., ‘Ulu‘ave, L. & ‘Ulu‘ave-Hafoka, M. Talanoa: Tongan epistemology and Indigenous research method. Altern. Int. J. Indig. Peoples 14, 156–163 (2018).
57. Quanchi, M. Indigenous epistemology, wisdom and tradition; changing and challenging dominant paradigms in Oceania. Cent. Soc. Change Res. Qld. Univ. Technol. 1–13 (2004).
58. Meyer, M. A. Our Own Liberation: Reflections on Hawaiian Epistemology. Contemp. Pac. 13, 124–148 (2001).
59. Arthur, C. Tech giants may be huge, but nothing matches big data. The Guardian https://www.theguardian.com/technology/2013/aug/23/tech-giants-data (2013).
60. Aporosa, S. A. & Fa’avae, D. T. M. Grounding Pacific practice: Fono at the fale and veiqaraqaravi vakavanua. Waikato J. Educ. 26, 35–43 (2021).
61. Moana Research. Pacific Data Sovereignty. The 2019 Moana Research Seminar Series Report. https://moanaconnect.co.nz/wp-content/uploads/2021/11/Pacific-Data-Sovereignty-Report_FINAL_2019.pdf (2019).
62. Noble, S. U. Algorithms of Oppression: How Search Engines Reinforce Racism. xv, 229 (New York University Press, New York, NY, 2018).
63. Boyd, D. & Crawford, K. Critical Questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 15, 662–679 (2012).
64. Eubanks, V. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. (St. Martin’s Press, New York, NY, 2018).
65. Couldry, N. & Mejias, U. A. The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism. (Stanford University Press, Stanford, CA, 2019).
66. Gillborn, D., Warmington, P. & Demack, S. QuantCrit: education, policy, ‘Big Data’ and principles for a critical race theory of statistics. Race Ethn. Educ. 21, 158–179 (2018).
67. Iliadis, A. & Russo, F. Critical data studies: An introduction. Big Data Soc. 3, 2053951716674238 (2016).
68. Richterich, A. The Big Data Agenda: Data Ethics and Critical Data Studies. vol. 6 (University of Westminster Press, London, 2018).
69. Scotto, C. Digital Identities and Epistemic Injustices. HUMANA MENTE J. Philos. Stud. 13, 151–180 (2020).
70. Kay, J., Kasirzadeh, A. & Mohamed, S. Epistemic Injustice in Generative AI. Proc. AAAIACM Conf. AI Ethics Soc. 7, 684–697 (2024).
71. Symons, J. & Alvarado, R. Epistemic Injustice and Data Science Technologies. Synthese 200, 1–26 (2022).
72. Chen, T., Li, W., Zambarano, B. & Klompas, M. Small-area estimation for public health surveillance using electronic health record data: reducing the impact of underrepresentation. BMC Public Health 22, 1515 (2022).
73. Quick, H., Holan, S. H. & Wikle, C. K. Generating Partially Synthetic Geocoded Public Use Data with Decreased Disclosure Risk by Using Differential Smoothing. J. R. Stat. Soc. Ser. A Stat. Soc. 181, 649–661 (2018).
74. Janmey, V. & Elkin, P. L. Re-Identification Risk in HIPAA De-Identified Datasets: The MVA Attack. AMIA. Annu. Symp. Proc. 2018, 1329–1337 (2018).



Similar Articles

11-20 of 55

You may also start an advanced similarity search for this article.