With Cristiana Santos, Yvonne Lintao and Soheil Human, we recently got a paper accepted at WPES about cookie paywalls. Simply put, we wanted to unveil this new practice, for which the legal playing field is not level, and which is arguably a novel threat to privacy. The idea of cookie paywalls is to present visitors of a website as simple alternative: either you “consent” to all tracking or you pay. Now websites are not even trying to camouflage a semblance of pretension that your personal data is monetized, because it is directly equated against a price. What started as a student project ended up making a bit of noise, and that’s good because we need the attention of regulators on such topics. Our research got featured on Gizmodo and on DataSkeptic, but let’s not close the story here.
Indeed, we pointed out in the conclusion of the paper a couple of ideas to continue this research, and I jotted down a few ideas for a master thesis proposal which would address the technical future work, which can be found below. So if you’re a master student interested in working with me: drop me an email! Also note that the official proposal can be found here.
Large scale detection of Cookie Paywalls
- Security and Privacy
- Web Analysis
- Supervised Machine-Learning
Description and motivation
Most websites offer their content for free, though this gratuity often comes with a counterpart: personal data is collected to finance these websites by resorting, mostly, to tracking and thus targeted advertising Cookie walls and paywalls, used to retrieve consent, recently generated interest from EU DPAs and seemed to have grown in popularity. These paywalls present several novel problems, and their cross analysis with cookie walls, coined cookie paywalls, has been little analyzed so far. However, a previous study conducted by Morel et al. (see https://arxiv.org/abs/2209.09946, featured in Gizmodo https://gizmodo.com/cookie-paywall-eu-gdpr-pay-to-reject-accept-privacy-1849638363 and in the podcast Data skeptic https://dataskeptic.com/) found out that this growth may not be as important as previously thought.
As this study has been conducted on a limited number of websites (2800), a large-scale study may very well yield different results. Additionally, while these cookie paywalls do not seem to track users (prior to consent), they all use the Transparency and Consent Framework (TCF), which has been previously found to violate users consent (see https://ieeexplore.ieee.org/document/9152617). Finally, some paywalls (the ‘cookieness’ of which remains uncertain) present different versions to users according to opaque criteria (such as IP address, type of web browsers, etc).
The goals of this master thesis proposal is manifold, and can be adjusted to the participants wishes. A first set of expectations can nonetheless be sketched as a starting point:
- a robust identification method of the different kinds of walls (cookie wall, paywall, cookie paywall). Two approaches are possible: an automatic classification using a classifier (see https://arxiv.org/abs/1903.01406) or a protocol-based classification (to be determined)
- the creation of a web crawler to detect cookie paywalls on a large scale
- an assessment of tracking practices on cookie paywalls
- prospectively, an investigation of users mental models when interacting with these interfaces
Several groups can partake in the project to cover all the expected outputs, but they will in that case be expected to communicate.
Background and requirements
The students are required to have a technical understanding of web ecosystems and protocols. Knowledge of python (for programming) and machine-learning basics (if an automated approach is picked) are a plus. Students willing to engage in user studies are welcomed as well.
Benefits and new skills
The students working on the project will work on an exciting new project (see the recent media coverage), and will have the opportunity to craft a new set of skills according to various expectations from their part (for industry as for research). New skills can involve:
- applied machine-learning (in one option of goal 1)
- understanding of legal issues in computer science
- conduct of user studies (in the case of goal 4)
Note that a publication can be expected (depending on the results).
Cookies can be expected as well, but not at every meeting :)