[disclaimer: all opinions expressed here are my own, not those of Random Hall as a whole or of MIT, blah blah etc.]

Hello, dear readers! (Does anyone still read this blog?)

I just spent the past weekend helping the Free Software Foundation run LibrePlanet 2017 at MIT’s own Stata Center. LibrePlanet is a conference for people who care about their digital freedoms, with presentations and social gatherings focused on the past, present, and future of free software and hardware. It was an awesome crazy time, there were lots of great speakers and interesting topics, and I’m really glad I was able to help make it all happen.

On Sunday morning, Mike Gerwitz gave an excellent talk on Privacy, Security, and Freedom. One of the topics Mike discussed was Google Analytics, Google’s handy service that lets website owners see who’s visiting their sites. The thing is, all the data that Google Analytics gives web admins is also shared with Google; and since Google Analytics is installed on a huge number of sites, Google is able to track users all across the internet without their knowledge, correlating vast amounts of data to build a frighteningly detailed picture of who you are and what you like. As I listened to Mike’s talk, I remembered that Google’s data-hoarding tentacles had crept onto a website close to my heart — this very blog. I’d always been vaguely unsatisfied by this, but hadn’t really thought about it for a while. Now though, as I remembered, a righteous flame of internet justice burst forth in my heart, and I vowed to purge this evil from my beloved home.

You might ask, why do I care so much? It’s not like this data is actually hurting anyone, is it? What do I have to hide? Well, I’m a political activist working to fight the destructive policies of an authoritarian regime. If too much data about my political activities or my personal life gets into the wrong hands, it could be used to interfere with activist groups I’m part of or to personally threaten me. There are a lot of people more directly threatened by Trump’s government who also have plenty to hide, even if they haven’t actually done anything wrong. And if you don’t feel threatened now, the thing about data is that it sticks around. You have no way of knowing what pieces of data about you might be stored somewhere, waiting to come back and bite you if things somehow get even worse than they are already.

With that in mind, I present, off the top of my head at 3am, my four principles for software developers to protect users’ privacy:

1. Transparency: Users should always know exactly what data is being collected about them by whom and how it’s used.

Giving users information on data collection helps them to evaluate risks and make informed decisions. Our webmin emeritus, lucci, did the very awesome thing of actually explaining to users how they were being tracked. We’ve had a little note in the footer of every page explaining that our website uses Google Analytics, and linking to a blog post where users can learn more about how it works and how to disable it using NoScript. This is much more honest and transparent about user tracking than almost anyone cares to be, and lucci deserves serious kudos.

2. Consent: Data collection should be opt-in whenever possible, and users should always be able to opt-out and know how to do so.

Users need to be able to avoid data collection if they don’t want it. The data collection on Random Hall’s blog isn’t opt-in, since Google Analytics is on by default unless you disable it. lucci’s post did provide instructions for disabling tracking, though, which again is a lot better than almost every site that uses Google Analytics.

3. Minimal collection: Don’t collect more data than you actually need.

Some kinds of data can be useful for providing services to users, but data is dangerous, and the less you collect the better. Any unnecessary data collection creates unnecessary risk for users if your database gets subpoena’d by the government or stolen by criminals. As lucci describes, we did minimize the amount of Google Analytics data presented to us, focusing only on what’s actually useful. However, that doesn’t necessarily mean that Google collected any less data, they just shared less of it with us. Also, I’m pretty sure nobody’s even glanced at our website’s analytics in ages. If we aren’t using that data, then it’s a liability rather than an asset, and we shouldn’t be collecting it in the first place.

4. Minimal sharing: Don’t share collected data with anyone you don’t need to share it with.

Again, this is an important part of minimizing risk to users. When you share data with third parties, you lose knowledge and control over how that data is used, and you provide more possible targets for people who want to steal your users’ data. When you use Google Analytics, all of the data on who visits your site is shared with Google; however, there are better options that keep your data under your control! Mike told us about a free-software analytics platform called Piwik that you can run on your own server, protecting your users’ privacy by not sharing their data with anyone else. I don’t think Random Hall’s blog needs analytics at all, but if my fellow bloggers disagree, I’ll be setting up Piwik for us to use in the future.

 

Whew; that concludes my impromptu 3am software ethics lesson. I feel like these sorts of ethical issues are one area where MIT’s CS curriculum is sorely lacking, and I’d like more of my classmates to think more carefully about how users are protected or threatened by the systems we use and the systems we design. I can’t make this stuff part of the Course 6 required class list, but I can at least shove it down the throats of unsuspecting prefrosh who just wanted to find out how Random Hall’s kitchens work. Good night everyone, and stay safe!

Thanks to my parents for instilling me with a healthy paranoia about digital and analog privacy, to my friends for listening to my rants, and to Mike Gerwitz, for telling me about Piwik and for giving me the impetus to do what I should have done a long time ago.