The Changing Nature of the Internet
Why Google isn't AltaVista.
My response to a back-and-forth with @Matt in last Friday’s open forum got too long, so I decided to pitch it as a front page post.
By way of background, I’ve been an internet user since before there was an internet and have participated in dozens of social networking systems over the course of that time, most long gone, and watched the legal battles that emerged almost from the beginning, as well as the legislation designed to head off some of those conflicts. I’m an engineer by trade and have developed firmware and software applications since 1980, and nowadays those software apps are networked or connected in some fashion or another.
I kicked off Friday morning’s conversation observing,
I can’t believe how naive the Supreme Court’s opinion on algorithms is. “If your product does harm but you don’t deliberately program it to do harm and instead use machine learning, you can’t be held responsible.” What could go wrong?
This kicked off a lively back-and-forth that I won’t rehash here but @Matt posted this after I logged off for the night:
I don’t see what you call the “changing nature of the internet”. If you could assist me in seeing what you mean I’d appreciate it.
So first, some history. At the time Section 230 was being crafted, the Internet had existed for a while but the World Wide Web was hardly a blip. The act was passed in 1996 and was therefore being crafted and debated at least as far back as 1994, perhaps earlier. The Web was created in 1990, made available to the public in 1991, and the first browser that made its way into the hands of the general public (as opposed to geeks like me) was Netscape in 1995. So when I say that the internet has fundamentally changed since then, I’m not talking about the specifications or infrastructure. I’m talking about the fact that so much of what billions of people say, think, and do is online, as well as companies, governments, and any other entity you can think of. More important, however, is the way in which this information is used to directly impact all of our lives.
Let’s do a comparison of AltaVista to Google. The former was one of the first web-crawling search engines and the first to gain widespread use. Simplifying, it allowed you to enter search terms and returned pages that matched. You could add things like “AND”, “NOT”, and “NEAR”, but fundamentally it was a predictable machine. Today’s version of Google, on the other hand, does not do anything predictable like that. The algorithms it runs on have been developed by machine learning. There is no human-understandable reason why (non-sponsored) pages rank at the top. In the end, the algorithms predict that more people are likely to clink on link A rather than link B and so A goes to the top. And even when things are deliberately excluded by a human decision (“don’t link to graphic violence,” “don’t link to COVID misinformation”), the method is to train machine learning systems on what types of things fall under those categories.
What does this mean in a practical sense? Take Amazon’s machine learning-based job applicant screener they created in 2014 and used for several years. They took exemplary employees, gathered up their application information (resume, background check, test results, etc) and used it to train a machine learning system. They hoped to eliminate bias by letting a neutral algorithm pick the best candidates. It turns out it was dramatically worse than human screeners when it came to bias. But the information didn’t contain anything about age, sex, race or religion, so how could it give back biased choices? Well, even a human could infer a lot from that information. Did the applicant attend a historically black college? Did they play on a college championship women’s soccer team? Machine learning takes that to a whole new level. It was essentially asked to give back the applicants most like their existing star performers and the algorithm found all kinds of ways to determine and rank “alikeness.”
Don’t think this can affect you? What categories of people are more likely to embezzle from a company? Who is more likely to miss days of work? Who is more likely to quit for a better job? If you have a close relative addicted to drugs, are you more likely to get behind in your car payments? How might caring for a parent with dementia impact your work performance, financial health, or willingness to go on business trips if you got a promotion? Companies won’t ask “are you a white male from a privileged background?” (Because that is who is more likely to embezzle) or “Is your wife thinking about getting pregnant?” or “How careful are you with your birth control” or “How old are your parents?” And they won’t task the algorithm with sussing that information out, even indirectly. But they don’t have to! This type of information gets incorporated by secondary, tertiary, or even more indirect routes. And there is no smoking gun. Machine learning algorithms aren’t logic trees, at least in the way we are used to them. You can’t look at the final algorithm and deduce that it is looking for questions about race or relatives’ drug use. It’s not. It’s looking for patterns of words and data.
Primarily due to ChatGPT, this has finally reached the public’s awareness. But it’s already here and has been here for decades. It is increasing exponentially, though. The newest Apple processor has a whole bunch of cores in it. More than a dozen CPUs. 8 or more GPUs (cores just for graphics). But it also has 16 Neural cores optimized for machine learning.
Why so many? When you talk to Siri or type in a question, an Apple iPhone or Mac does most of the processing on the device itself out of security concerns. Alexa sends it up to the Net for processing. Alexa can correlate your speech or text with everything else the Internet (well, Amazon, Google, etc) knows about you. Siri can only correlate it with things your device specifically knows about you and so has to work a lot harder just to produce poorer results.
We have no idea where this information is being used. I can tell you one place you might not expect, though. Ten years ago, at the largest Health Informatics trade show, a few small but very well-appointed vendor booths showed up with scant and vague signage. Nonetheless they had a pretty steady flow of traffic. It took some digging but I finally figured out what they were offering. They were using machine learning and other techniques to help hospitals identify their most lucrative patients versus the ones they were most likely to lose money on. That’s just hospitals. And that was ten years ago. Imagine what your bank is doing, your credit card company, your employer, heck everybody!
Entering into the more paranoid realm, I’ve been following two things in China I suspect are more powered by machine learning than is being discussed. The first is the Chinese Social Credit System. While they publicly talk about penalizing people for having their dog off a leash or cheating on government job exams, it is widely known that it is also used to hound and harass people who speak out against the government. It would not surprise me at all if they are using machine learning to identify undesirables and deduct points from them. And I speculate an even more insidious use might be occurring in Xinjiang. Uyghurs have been put into re-education camps by the millions. One of the repeating themes from those affected is the arbitrary nature. Suddenly state agents show up and haul people away, seemingly without any warning indicators showing up. While it may be just due to coerced informants or a random policy of intimidation, I have to wonder if it could be a Chinese experiment in Precrime?
The Supreme Court was so pleased with themselves that they dodged a bullet by not addressing Section 230. But instead, they appear to have given a blanket tort defense to anyone who interposes an algorithm between themselves and the harm they do. I expect it won’t stand, but it is just another example of how clueless and out of touch this court is.
Thanks. Both for taking the time to explore the evolution of the beastie, and giving me a chance to think about something that’s been ubiquitous in my life for decades. And no, the djinn’s not going back into the bottle.
Bravo. This is quite enlightening.
“naive” “clueless” “out of touch”
Certainly possible. Perhaps even likely, given that relatively few people (especially the experts 😉 ) understand the ins and outs of this and related technology. Fewer still understand the history.
I do wonder though if the (unanimous) ruling was driven less by their [insert favored pejorative here] and more by other considerations. Justices do, after all, have lots of research minions who, among other things, elicit input from relevant experts* when making such rulings.
Would you be willing to steel(wo)man the “other side” of this?
@Mimai: I honestly don’t think I could steelman this one, but I would love for someone else to give it a try. I’m not a lawyer or anything close, but it seems to me they are saying that in order to be liable for harm you need to show intent, but that is obviously not true and the Supremes know that and wouldn’t make that argument, which means I obviously don’t understand their actual reasoning.
Excellent piece. Even I understood it. I cannot imagine how the new generation of fugitives will manage – we Boomers had it so much easier. Poof, you could disappear.
Couple years ago we applied to refinance our mortgage. The paperwork fetch quest was so onerous though that we tried to bail on the whole thing. (We’re grown-up like that.) But the bank – the Wells Fargo crime syndicate – basically insisted. How much paperwork did we end up doing? None. Suddenly we just had a much lower mortgage payment. So, thank you computer hive mind, but how does this actually help Wells Fargo?
This is very well written and thought-provoking.
I recall reading an article way back in 2010 in Wired about the NSA’s intelligence center in Utah, and the software they had which scans every bit of communication in the US looking for terror-related keywords. Not like specifically target wiretaps, but a scanning program which could scan every single email, tweet, blog post or text message.
Again, this has been 13 years now.
Conversations like this always make me think of the opening scene in Terry Gilliam’s Brazil, where a clerk mistypes a name and sends a SWAT team to an entirely wrong address with horrific results.
Or like how the Stasi had an open file in literally every single man woman and child in East Germany, yet no one was more surprised by the public reaction to the fall of the Wall than the Stasi.
Point being, that even when security agencies have access to vast troves of data, they often are incapable of processing it or acting on it in any coherent way.
Which, far from being reassuring, is terrifying.
By keeping your money in their hands. Even if you have other financial interests elsewhere, they want you in their greedy paws.
I’m in favor of section 230.
I’m in favor because of a scenario where someone found out about, say, bottom surgery on the internet, found a surgeon in another state to do it, but had complications, and relatives in the state of residence of the trans person sue Google to shut down any reference to trans-affirming care. Because harm.
Do not want. Do not want to allow suits based on “my child found out how to make C4 on Google and then blew up my kitchen”. (By the way, a friend has a nephew that cooked C4 on the kitchen stove, but didn’t blow up the kitchen). This will destroy Google’s usefulness as a search engine.
A semi-common trope in older SF works ws the personalized newspaper. It shows up in Clarke’s works, in Babylon 5, and elsewhere. It was also a feature used briefly in the late 90s when “web portals” were a kind of fad.
The idea was you’d get news about subjects, people, places, etc. you choose, perhaps with some general news as well (it varied). Crucially the user got to choose their news.
Social media ran away with the idea, but added its own recommendations to the mix. Not only suggesting you may be interested in this or that person or company or agency, but also showing posts the almighty algorithm determines you might like.
IMO, this makes such sites publishers and not mere carriers. And they should be subjected to the same rules and regulations, and responsibilities and liabilities, as any other publisher.
@Jay L. Gischer:
Just to be clear, the Supremes made no ruling on 230. That said, I’m also in favor of the original purpose of 230, which was to protect companies and administrators who provided a public forum for people to post, but who didn’t otherwise edit the content. It allowed them to moderate and get rid of offensive or dangerous posts, but didn’t hold them liable if they didn’t. It also provided protection to search engines like AltaVista which were highly rule based and easy to predict.
But the internet has evolved since then, and increasingly 230 is being used to protect companies and systems that are actively promoting and, by some definitions, editing specific content. It is widely and credibly believed that their primary motivation is to increase user engagement, especially engagements that result in the user reaching out to other users.
People and companies have been found liable when they incite others to harmful action even if they did not engage in the action themselves. The Supremes seem to be making an exception to this liability by saying if you interpose an algorithm in between you and the incitement, you are exempt.
And to repeat, that has nothing to do with 230.
Nice article, and great to see you as a headliner here!
Not sure this is steel manning, but my sense is that this is just something the current law isn’t able to deal with and it’s unreasonable to expect the courts to determine this on their own using existing inadequate law and precedents of limited relevance.
It is really the job of Congress to determine where the line should be drawn when it comes to tort and AI. Unfortunately, Congress does not want to do its job and is led by old people who probably know even less about the subject than SCOTUS does.
Two things. First, assuming I understand this correctly (which may be a big assumption) I am fine with 230 if it means that the tech company is protected when someone finds something on their site. What I have doubts about is when the tech company promotes associations. I think there is a difference between googling and finding out how to make C4 vs googling “the history of bombs in modern warfare” and google makes recommendations that include how to make C4 because the algorithm decided you might also be interested in knowing.
Second, we have specific courts, tax courts, to handle issues around taxation. You need specially trained judges.Taxes are just so complicated and money is so important. But, you have generalist, and old generalists at that, making decisions about areas like IT about which i ma sure they are mostly clueless. You have a bunch of other lawyers trying to explain this stuff to them and so based upon a few hours of learning about stuff that takes years to develop expertise they make decisions. My sense is that at best these decisions will be the same as flipping a coin. More likely they are deciding based upon their ideological background it pertinent, if not based upon favoring the big money interests or whoever paid for their most recent vacation.
First, the response of the AI is not limited to things you are actively googling. It can take into account all kinds of things. What music you listen to. What videos you watch. What your income level is. How much credit card debt you hold. Further, I’m not sure “decided” is the word here. Let’s say that the AI is being trained to increase engagement. It develops a correlation between promoting the instructions on how to make C4 with an increased likelihood of someone opening that link, liking it or forwarding it to someone else, so it is more likely to promote that content. That’s at the most simplistic level. At the next level, it notices that people with certain histories are even more likely to engage after seeing C4 instructions than the average person, so it promotes the hell out of it to them. The common characteristics could be that they frequently go to gun sites, or sites about mass shootings. And it might have nothing to do with violence. The user might have location history showing they frequently visit criminal mental health facilities.
The “decisions” here are simply based on value free associations and level of engagement. The AI doesn’t “know” anything. It doesn’t know that this pattern it has identified is associated with violent behavior. It doesn’t know that C4 is an explosive that can be used for mass murder. But at the same time it is not simply returning the results of a search based on an exact match of a users search terms. The thing that tipped the AI over into sending that link can be seemingly unrelated. The user could have been listening to a Paul McCartney and Wings song, or watching a YouTube video on how to fix a broken lamp.
My father’s wife has three sisters, all of whom were convicted of embezzlement.
One was a perfectly ordinary padding of travel expenses for the reimbursement. The most basic, petty fraud possible.
One was pocketing tens of thousands of dollars a year from the church she worked for. Medium exciting embezzlement. Her son got to perform the arrest, and I don’t know whether this was a kindness or a hazing.
And the third… she was embezzling money from Alcoholics Anonymous to pay off her gambling debts. Well, to gamble with, but the plan was to win, pay off the gambling debts and then put the money back.
The question before the Court was whether Section 230 applied to recommendations made by an algorithm rather than a person. Quite simply, whether current law can deal with it. A question that is firmly in the Court’s domain, although the legislature can always change the law to make it clear.
The justices may not be experts, but I think it’s reasonable to put the burden of educating the justices on the various parties to the suit.
I think the Court got this wrong, but I also think it’s a hard case to show that this terrorist act was directly inspired by this piece of media being surfaced. I think you would need to show that across a broad group of people, there’s a higher statistical likelihood, in order to quantify the effects of the algorithmic suggestions.
The problem is that an algorithm can be as simple as “If X then Y”. Under current law (as I understand it), if SCOTUS has ruled the other way, someone could sue Google because they weren’t logged into YouTube and were shown an ad that they feel “harmed them”.
I understand the complexity of machine learning (though not the mechanics), and how unrelated things can modify the output . The issue is, however two-fold.
1) Approximately 3.7 million videos are uploaded to YouTube every single day. YT does try to moderate those with extremist messages, but it’s a Hurculean task, and some are going to slip through. And some of those are going to be recommended
2) People can–and will–claim “harm” based on just about anything. Had the ruling gone the other way, that’s a can of worms that our court system is simply not ready to deal with.
At some point, Congress will have to look at what’s going on and fashion laws that deal with the situation in some way. Just completely flipping the rules in one fell swoop would be a very bad way to go about it.
 Over 20 years ago, I was working for a company that would, based the answers to 5 questions, send one of over 200 possible letters–and that was all coded by hand in Access!
 From the best I can tell, YouTube (who was the subject of this suit) recommends videos based on other videos you watch, not on any larger data pool. If you’re getting “how to make C4” videos, you lead the algorithm down that path. I watch YouTube quite extensively, and the only odd suggestions I’ve gotten is “disproving flat Earthers” (which I’m pretty much based on all the science videos I watch), and–for some unknown reason–videos about being autistic/aspergers.
I sometimes try to mess with the Youtube algorithm. If it recommends something I actually find interesting, I’ll mark it I’m not interested, or stop recommending this channel. Then I may search for the subject or channel, watch a video, and maybe subscribe.
I don’t think this has any effect.
I think it’s well past time to regulate social media. Right now, as things are, without new, specific legislation to address the matter, the closest thing is to treat them as publishers, for the reasons I outlined in my post above.
Of course, Kathy’s First Law states “nothing is ever that simple.”
Even if declaring Fakebook, Twitter, Youtube, etc. as publishers were to work as intended (spoiler alert), what about recommendations and featured items, a sort of feed, in sites like Amazon, Audible, Scribd, eBay, Walmart, or even small(er) commercial sites selling clothes or food, for example.
@Mu Yixiao: What you are getting in your recommendations and ads is a result of these companies hiring hundreds, thousands of people to weed out offensive material and categorize misleading information. Part of the reason they have done that, maybe the bulk of the reason, is that they fear liability. They could reasonably interpret the courts decision as saying they can save all that money as they have no liability. I doubt Facebook, Google or YouTube will drop such efforts, but they may reduce them. And I could see Musk deciding to let the dogs run wild at Twitter.
You can’t talk about the difference between search engines now vs then without noting that now every search is being gamed so search engines have to work to see through false flags.
Re: the question of harm, I don’t think algorithms differ from any set of human standards. ‘Being well-spoken’ may not be intentionally designed to do harm and may have benefits as a norm, but you can certainly use that as a way to discriminate against people with dialects that sound ‘rougher’ or against people who speak perfectly fine but whose mannerisms are not exactly ‘comfortable’ for the world in which they wish to enter.
The problem is that real society does not allow for status quos to exist, and most humans, deep down, are aware of this and want this to be true. Whereas algorithms will continuously view the world as if the choice of Pepsi or Coke is eternal expression of actual desire.
I think that is very overstated.
Case law would settle down pretty quickly, and few people have the money to money to sue for funsies. A requirement that harm be shown limits things greatly, as demonstrating harm in these cases is difficult.
Courts are how we settle disputes in a civilized society — we should encourage that. The alternative is duels.
Congress already fashioned laws. The question is whether they apply.
(And the Court recently overturned Roe v Wade, so completely flipping rules is well within their pattern of behavior.)
Ruling that recommendations are not protected, or are less protected, would have a moderately chilling effect on some content.
It might require humans, somewhere, looking at the best performing content in tricky areas to see if that video about C4 is history or if it has instructions. They do this for child porn (algorithmic wide net identifies possible CSAM, which then gets a human review if it doesn’t match known CSAM content), they can do it for C4.
About making C4, I’m reminded of something.
The Mythbusters did a lot of shows using explosives, and perhaps even more with incendiary substances. Typically they used commercial explosives and accelerants, but on two occasions, as I recall, they made their own (legally, under supervision of competent authorities).
One time it was nitroglycerine (this may have been on Adam Savage’s next show, Savage Builds), an another time thermite (for the Hindenburg myths).
As I recall, while they showed video of these things being made, they did not disclose all the ingredients or steps used or taken. Meaning you couldn’t learn how to make either nitro or thermite by watching these shows.
Not all explosives are alike. I’m confident if I were stranded in the iron age due to a flux capacitor malfunction, or a busted Mr. Fusion, I could figure out how to make gunpowder. If I could hire or partner with a blacksmith, guns and bullets as well.
The world wide web is NOT the internet. The internet both predates the WWW/HTTP standards and is the framework/backbone of the WWW. Mosaic was developed in my back yard and was integral to growing the popularity of the web. One of the co-writers of mosaic moved on to develop netscape navigator. Internet explorer’s basis was the mosaic browser which is why you could see it credited in the “about” screen.
The internet was already being used well before WWW became a thing. I was involved with Usenets as an evolution of the BBS concept back in the 80s. So when you say internet my first thought isn’t WWW/HTTP related stuff. I was involved with web crawling in the mid 90s and you’re really understating the tech that was available then. There were several spiders available and some were far more advanced than others.
Honestly this pretty much sums up most of the post. Thus from my perspective the nature of the internet hasn’t changed much. We’re still using terms/tools/concepts developed in the 70s/80s and even whole models are still relevant such as the OSI model (although not as much as it was).
Defining that “harm” in a legislative sense that doesn’t involve conservatives rat fcking me because I don’t believe their BS is more than a bit hard. In a lot of ways my objection is that no one is defining anything specific just using generalities like “we shouldn’t allow stuff” “that causes harm!!”. People are demanding action yet can’t even outline what that action would look like in anything other than the most generic of buzzwords.
Now imagine +70 year olds who can barely use a phone trying to define any of this in a legislative sense..
Both of those are actually pretty easy to make. Nitroglycerine is just glycerol with some easily obtainable acids mixed in. The big catch is brewing the batch in a safe manner to avoid the oopsie boom stuff.. Thermite is just metal powder and rust (oxides). They use thermite for rail road repairs/maintenance in europe.
I actually know a couple people who make black powder from scratch. Turns out there’s a few ways to do so.
I vaguely remember reading in the mid 90s about the NSA either building a new facility with on site power generation or upgrading a current location massively. The important part to me at the time was that the NSA needed to massively expand their computing power and that required new power generation capability.
The problem pre-AI is that the NSA was able to gather petabytes of data but wasn’t able to understand the vast majority of it. They were literally drowning in data so that they couldn’t see the trees. AI is changing that though and now all those big databases that companies have been building for decades are suddenly becoming very useful.
It’s a scary brave new world man.
That is a task youtube has been failing at lately. They have been banning videos simply because their auto CC program incorrectly transcribed what was said. So people are getting videos banned for obscenity without a single obscene comment or image simply because they said a word the CC system got wrong.
@Matt: As I said in the top post the big difference from the era that 230 was first drafted to just few years later is the Web and, much more significantly, the shifting of so much of our lives and culture online. In researching that post I came across several posts from 1998 (!) trying to lay out the reasons companies should create a “web presence”. In other words, the corporate world had not yet even committed to paying for a web page!
And yes, much of the changes in the internet itself has been of “amount” rather than “type”. Faster, more storage, more sites, better search engines, etc. But I would contend that the rise of trained machine learning is a difference in kind. Everyday, more and more of our interactions are governed by algorithms that no human has designed. Without constant human intervention these training based algorithms again and again seem to weigh the basest bigotry and prejudice highly. (No, of course I realize that they have no concepts of bigotry and prejudice, so a better way of saying it is that the results these tools generate seem to very easily match those a typical bigot would produce.) At this point, with massive and costly intervention we have either a) eliminated this bias to a fair extent or b) pushed it so deep into the algorithms we can’t discern it anymore, but it is still there.
“Harm” isn’t nebulous. It’s what most civil lawsuits are about. “Tort” is the legal term and tort law is probably bigger than criminal law. So “harm” isn’t some airy-fairy concept, it’s the driving force behind a huge number of business decisions made every day. The Supremes seem to have carved a blanket exception to this way our society handles risk and responsibility. I think a company could reasonably interpret the ruling this way: “if my engine uses an algorithm that wasn’t deliberately programmed to do harm then I can’t be found liable for any harm that results. I can’t even be bought to court over it.” And that’s a BFD.
If they have bias, and act on that bias, why wouldn’t you say that they have bigotry and prejudice? They have no consciousness, but they do have bigotry — nicely optimized and made more efficient.
We’re really moving past the point where “disproportionate impact” has any legal concern. And the current Republican obsession with denying structural racism just leans into this.
The next logical step is to decide that food safety only matters if you intend to cause food poisoning, and put it in writing. (There was a recent ruling dialing back public corruption prosecution to roughly this standard)