retroactive big data analysis of YOU with AI
Corporations and state actors collect vast amounts of data flowing through the internet daily. Their ability to analyse this data is likely1 constrained by the immense computational resources required.
With further progress in AI, and modern freely available models possessing a context window comfortably in the range of millions of tokens, it seems inevitable that AI in 2030 will be able to scan through huge amounts of data with ease.
Currently, the largest "data collector" we carry at all times is our smartphones (and by extension, smartwatches). I use an iPhone, so perhaps my data wasn't collected for advertising purposes as explicitly as Google does with Android, but even with Apple's consumer push for "Privacy", it seems unlikely this will hold2. We can be sure that there is an Apple-specific collection of data about me somewhere; location history, messaging metadata, purchase history, voice requests, heart rate etc. By extension, other companies have their own personalised data; OpenAI from my AI requests, Brave Search on my searches 3 etc.
The next question is: Who can read through and use this data? It is clear that the data-collecting corporation can directly analyse it first-hand. What about state actors? This is less clear, but I would imagine if you are a person of interest, there is little you can do to stop them from being able to access your Apple data, your OpenAI data, your Google data etc.
What are the current repercussions of this data analysis?
As of 2025, this data collection is for advertising purposes, which for most people falls in the spectrum ranging from useful to downright annoying. Otherwise, realistic hyper-surveillance has to be specifically targeted at you i.e. if someone wanted to analyse all your data, they'd have to go pull it, collate it and run the software. However, once AI can parse through and give meaningful data insights on everyone in every dataset in realtime, we suddenly live in a much different society.
In this society, you do not need to be targeted specifically for your data to be analysed. Suddenly, these historically independent datasets on you, from your phone and your internet use, can be immediately combined into a new form, one that perfectly replicates you now, and at every point in time, going back since the start of data collection. In this world, incentives have to be carefully aligned for this not to end badly for regular citizens. The best case scenario is that we have hyper-pervasive and hyper-persuasive targeted advertising. The worst case scenario extends far into the realms of sci-fi.
What can we realistically do about it?
You can not realistically opt out of data collection from services like ChatGPT whilst still using them; the only way to be sure is to stop using the service. This may, or may not be worth it, for you. This helps to an extent, and it is definitely worth considering.
However, with regards to the biggest data culprit, your smartphone; a stronger option exists: GrapheneOS. This hardened version of Android, which ironically runs only on Pixel devices, stems data collection by removing Google entirely from the OS. If you do choose to use Google, to use the Play Store or other services, Google is sandboxed like any other third party application and therefore they can only collect data you explicitly give it access to. Of course, mobile network operators will still have triangulated data on you from your cell tower data.
Is GrapheneOS worth it?
I have run GrapheneOS on a Pixel 4a, followed by a Pixel 6 Pro. In that time, I used it as a daily driver for nearly three years. I switched back to iPhone for a few reasons:
- The inconveniences were too annoying; the camera sucked (back then), apps downloaded from aurora store were glitchy at best, F-droid apps look like they are coded in COBOL, and acted as such.
- eSIM support.
- Banking app support (not GrapheneOS's fault, but still unfortunate).
- The nice ecosystem of iCloud and handoff/continuity is great (when it works). Also, until very recently, iCloud offered end-to-end encryption in the UK which I took advantage of - RIP.
I realise, as writing this, that most of these have been fixed; you can now use eSIMs, banking apps like Revolut currently work, and the camera app takes full advantage of the great Pixel camera.
Finally, and most importantly, now that iCloud does not offer full end-to-end encryption for me, I have no reason to use iCloud. That cuts off a lot of the continuity features of the Apple ecosystem that make it convenient and worth the trade-off. I recently started using my work Macbook, which isn't signed into any Apple account, and therefore I lose (on purpose) any of these features as well.
Will I switch back to GrapheneOS? Will it counter data collection? Will it put a significant gap in my datasets before ASI comes and obliterates my privacy? I am still trying to answer some of these questions for myself, but I thank you for reading, and I hope it was thought provoking.
I like to think parsing these huge amounts of data is somewhat difficult, as that helps me sleep at night; nevertheless, this is most likely a naive, but useful, assumption when it comes to thinking about our future AI-riddled society.↩
I am impressed by Apple's dedication to state-of-the-art cryptography like "Fully Homomorphic Encryption" for their software, but I don't really think it makes a dent in preventing the bulk of meaningful data collection.↩
Though I imagine the data collection is somewhat muted, considering their product placement, but who knows?↩