By now, everyone is familiar with the events that happened in Christchurch, New Zealand. Long story short: someone went on a shooting spree in a mosque and streamed the entire thing on-line. There’s been responses all around, ranging from shitty ones (blaming the muslims) to more wholesome ones, where people showed great amount of supports for the victims.
This post is not about the shooting itself, though. It’s about technology, and technological illiteracy of certain actors.
Nearly as soon as the news broke, Facebook wowed to do their best to keep the video of the shooting off their platform. Google too did their best effort to remove the offending video from Youtube. However, that wasn’t enough for some companies who, believing that Google and Facebook haven’t done enough, decided to pull their ads. No moderation approach Facebook and Google have was blamed a lot for the situation. “What are you doing to ensure this doesn’t happen again?” Facebook and Google are being asked, with some proposing that Google and Facebook should moderate what people stream. CEO of one of the companies pulling the ads even went ahead to state the following:
“If the site owners can target consumers with advertising in microseconds, why can’t the same technology be applied to prevent this kind of content being streamed live?”
There’s only one problem with that.
Hot dog (determining what ad to serve is easy)
Computers have been a standard office equipment for over twenty years, yet for some reason people are still painfully unaware about how they work. Hearing things like “oh computers can do this, this means they should be able to do that as well” and “you made computer do this, therefore you can make computer do that” gets criminally common when you’re hanging around an average person (or the half of human population that’s dumber than that).
The fact is that ‘targeting consumers within microseconds’ is easy. When determining what ad to show, you don’t have to do much work. You’ve got end user’s profile, and you track them wherever they go. Did they search for ‘Steam’ and click the first link? Make a mark in the ‘likes video games’ box. Did they watch some game-related videos on youtube? That’s few more marks in the ‘likes video games’ box. Searching for arduino and/or other microcontrollers that you could use as a simple F-13 emitting keyboard? Put few marks in ‘tech enthusiast’ box. Did they search for [insert video game title] flac soundtrack torrent download? That’s one mark for the ‘likes video games’ box, one mark for the ‘audio enthusiast’ box, and one mark for the ‘probably needs VPN’ box.
When advertiser gets asked for an ad to show for this hypothetical user, they take a look at said boxes. “Hmm, that’s a lot of marks in the ‘likes video games’ box,” thinks the advertiser and serves user an ad for the game that’s getting released in the next few weeks. “But wait, this user needs to see another ad!” says the website. Advertiser looks at said boxes again, sees a lot of interest in the video game box but remembers it already showed the ad for Apex within last 30 seconds and decides against showing it. Next category on the list is ‘tech enthusiast’, so it decides to serve ad for USB type C connector (and I shit you not, that thing I got on reddit a few days back was indeed an advertisement for bags of USB type C connectors that you could solder on a PCB).
As you can see, simple stuff — you have a list of categories both for user as well as every ad, and compare them every time — and it’s really not a lot of work. Which is exactly why advertisers can serve you ads in milliseconds.
Not hot dog (determining the content of the picture is hard)
In the last few years, we’ve seen vast improvements in recognizing what’s inside the image. Smartphones have AI-powered cameras that can somewhat accurately determine what they’re looking at. And by ‘somewhat inaccurate’ I mean hit or miss. On one hand, we do have facial recognition software, but that’s very specialized and useless outside its very narrow function. ContentID and similar systems can also work fairly accurately and quickly, as it’s comparing content of the video or a stream against a database of known sounds and videos — but if you show ContentId something that it hasn’t seen before, you’re not getting a hit.
Then you have more general attempts at object recognition, where accuracy is incredibly low. Most of the time, it will be able to tell a cat from a person. Sometimes it will even distinguish between a dog and a cat, but that’s where it ends. It won’t distinguish things where differences can be relatively small, such as distinguishing dragons from wyverns (dragons have 4 legs, wyverns have 2). It also won’t distinguish things where differences are inexistent: it won’t distinguish glass of juice from a glass of tea (often enough, the only difference between the two is the temperature as certain kinds of juice have a color that’s the same as the color of various types of tea). It won’t be able to distinguish lakes or even big rivers from a sea (unless you’re there and it takes a look at GPS and compass data).
In the same vein, it’s currently borderline impossible to reliably distinguish between a movie or a TV series, gameplay of a first person shooter game, any airsoft or paintball video, a police bodycam and an active terrorist shooting — a situation that, for all intents and purposes, happens approximately never.
That’s not all, though. Let’s pretend for a moment that there will be a point where AI can distinguish whether the live stream is an active shooting or not. Remember how I explained how finding the right ad for a visitor of the webpage is relatively simple and quick thing to do? Yeah, image recognition is everything but that. Combing through the vast amount of footage people stream every day and checking whether any second of said streams is a shooting or not is going to take a few orders of magnitude more computational resources than determining what ad to show. Given how this kind of event is exceedingly rare, running all streams through such system would be a bigger waste of power than bitcoin.
The alternative
There’s two other ways a social media site could reliably moderate streams is by hiring people to watch them as they happen. This also would require a lot of manpower and wouldn’t come cheap. The other way to handle that is for sites to disallow streaming (and/or posting videos) altogether unless you’re verified, which brings us to both issues that relate to freedom of expression, as well as whether the collateral damage is worth it. At the end of the day, the question arises: is it worth to waste so many resources? Is the vast collateral damage worth causing the next wannabe livestream shooter to throw their camera away and commit the shooting anyway, with the only difference being one less liveleak video? Is it worth getting closer and closer to 1984/China-level censorship (with the only difference being that the censorship is performed by corporations rather than governments)?
There’s only answer that’s objectively correct in this case: “No. It is not.” People pulling ads over Facebook and Google not having systems to prevent a potential shooter from streaming their shooting are outright unreasonable. And possibly the worst thing about that is they don’t even realize it.
100 Groschen
This post should have really ended in the previous chapter, but this entire debacle with pulling ads only serves to highlight that internet’s reliance on ads is a bad thing, to the point even one of the original creators of the world wide web, Jean-François Groff, wonders whether things would be different if internet had some micropayment infrastructure built into it from the start.
At the moment, it seems that Brave‘s BAT token thing seems to be somewhat promising step in the direction where advertisers don’t matter. On the tin, BAT says users earn tokens by watching ads, and then give out said tokens to the sites they visit. Hopefully one will be able to buy those tokens on their own in the future, meaning websites will profit off your visit even if you don’t watch ads. Here’s to hoping that this project gains some traction and sees widespread adoption soon.