Data mining is a powerful technology for recognizing useful patterns in complex data. It has repeatedly provided proven results in information systems. Its use has benefited many sectors, such as banking, retail, marketing, biology, medicine, telecommunication, and others, resulting in significant advancements for these industries. Lately, higher learning institutions have also been taking advantage of data mining techniques to further the field of education.
Generally, data mining (sometimes called data discovery or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information. Operationally, automated processes come together to extract useful information— patterns, associations, changes, trends, anomalies, and significant structures—from large or complex data sets. Educational data mining (or learning analytics) is the process of collecting and analyzing a wide range of student data (and in great volume) in order to derive knowledge about learning habits and behaviors in order to personalize educational interventions to maximize student outcomes.
This information to be mined is typically generated through the use of an online learning system (commonly referred to as a learning management system, or LMS for short). The availability of LMS data has revolutionized the learning experience in many ways by supporting personalization and differentiation, benefiting students and teachers alike. But at what cost? What considerations are given to student privacy? To what extent should student privacy be a concern?
We‘re still in the Wild West when it comes to educational data mining.
The sophistication of the regulatory environment governing the appropriate use of student data has not kept up with the reality of learning analytics. We‘re still in the Wild West when it comes to educational data mining. Privacy and big data analytics are simply in tension, and the appropriate balance between them extremely fragile. Many think that the time has come to rethink choices that were made decades ago and to enforce constraints on use of these data. Institutions should care more about students’ privacy. They should create the necessary supportive infrastructure to effectively manage privacy. They should also play an important role in raising awareness regarding personal data and its use.
Students often show little or no concern about sharing their personal data, especially in an environment of increased sharing through social media, which creates a landscape of “digital promiscuity” (Murphy, 2014). Post-Snowden, despite a decrease in public trust in data security, no real change in the sharing of personal data has occurred. In fact, it is the very act of sharing that validates the authenticity of information. We essentially have changed our social and cultural norms around this topic—if we have not shared it on Facebook, did it really happen? Are relationships real if they are not reflected in a change of relationship status?
Students may not even know their data is being mined.
The asymmetrical power balance between students and institutions makes students vulnerable, with little access to resources to contest or refuse the sharing of their data. Students may not even know their data is being mined. The purpose of learning analytics is to support and increase the effectiveness of learning. This doesn’t really give higher education a choice—it cannot NOT mine data. Institutions are actually obligated to provide the best educational opportunities possible. There are very strong incentives to use big data to improve the student experience. That’s the institution’s core fiduciary duty to taxpayers and its governing board. However, this duty clearly must be balanced by the need to establish a social contract with students to comprehensively address awareness of the benefits and vulnerabilities associated with data sharing. If this were a research study conducted under the auspices of the Common Rule that protects human subjects in the conduct of research, these students would have the right to refuse to participate even for reasons with which the researchers might not agree. They could simply say “no.”
How do we address student vulnerability? How much agency can and should students have? In an attempt to answer these questions, some have proposed various frameworks. One in particular, regarding student agency, seems promising: Paul Prinsloo and Sharon Slade’s framework for student learning. This framework lays out governing principles to establish and maintain a healthier relationship between higher education institutions, students, and their data. The framework includes:
- Student agency and privacy self-management – how do we think “critically about the range of student control over what data will be analyzed, for what purposes, and how students will have access to verify, correct or supply additional information” (Prinsloo & Slade, 2015)
- Rethinking consent and employing nudges– the value of transparent information exchange and student-centered learning analytics so that the students are viewed not as data, but as collaborators
- Developing partial privacy self-management – where students can choose what portion of their data is sharable based on different contexts and applications
- Adjusting privacy’s timing and focus – students should be able to limit their consent to a specific period of time and for a given context or purpose
The challenge is not only in creating the processes and tools, but also in shifting existing culture.
The way forward involves:
- Developing a coherent approach to consent, one that accounts for the potential for social science discoveries about how people make decisions about personal data;
- Recognizing that people can engage in privacy self-management only selectively;
- Developing more substantive privacy rules; and
- A need for a “palette of ‘privacy solutions’” in order to extricate the binary nature of confidentiality and consent. (Gurses, 2015).
This represents a challenge not only in creating the processes and tools, but also in shifting existing culture. But these are necessary steps to take if we want to create data usage that helps institutes and respects student agency. As Daniel Solove wrote, “providing people with notice, access, and the ability to control their data is key to facilitating some autonomy in a world where decisions are increasingly made about them with the use of personal data, automated processes, and clandestine rationales, and where people have minimal abilities to do anything about such decisions.” (2013, p. 1899).