Toward AI Realism
Opening Notes on Machine Learning and Our Collective Future
June 7, 2024
How do we talk about generative AI productively? People tend to feel a way about it, conversations run on vibes, and no one seems to know very much. Critics point out that models run on non-consensually harvested data, reproduce biases, and threaten to corrode our collective history, that scanners routinely fail to identify Black faces, bullies use deepnudes to harass women and gender non-conforming people, artists are being put out of work, education has been upended. And for what? A technology that’s a drain on the planet amid climate change. Extreme critics, also known as AI doomers, argue that the technology could end biological life, citing concerns ranging from rogue lethal autonomous weapons to emergent power-seeking behavior within the models themselves.
AI optimists have a different set of intuitions. They note that AI research has developed treatments for sickle cell anemia, made the COVID-19 vaccine possible, globally improved rural health, advanced cancer treatment, reduced pesticides, sniffed out wildfires, and pinpointed the world’s worst polluters, while autonomous vehicles and brain-computer interfaces allow greater self-determination for disabled people. Extreme Prometheans, like the effective acceleration movement, champion boundless AI advancement even in hypothetical scenarios where billions die.1 “e/acc Leader Beff Jezos vs Doomer Connor Leahy,” YouTube video, 1:23:15, posted by “Machine Learning Street Talk,” April 30, 2024. In the video, Beff Jezos (a leading effective accelerationist) shrugs off the potential nuclear destruction of South Carolina at 1:06.
Most people try to blend the two positions into a reasonable middle ground: let’s use what’s good about artificial intelligence and toss the rest. AI debunkers, on the other hand, are annoyed by the conversation itself. For them, AI is a nothingburger with extra cheese—pure hype, the latest crypto-style hustle. Some fall into debunking because they’re exhausted by all the chatter. Others argue that trained machines will never replicate what organic brains do and therefore never be anything very interesting. Some, like Noam Chomsky, are concerned that AI research is passing itself off as science rather than engineering.2“Debunking the Great AI Lie | Noam Chomsky, Gary Marcus, Jeremy Kahn,” YouTube video, 32:23, posted by “Web Summit,” November 14, 2022. Others just move the goalposts to make the problem disappear: If this unpaywalled AI model isn’t a robot overlord capable of changing diapers while correctly recalling nuanced minutiae about academic philosophy, is it really fair to call it intelligence ?
I propose a fourth position: AI realism.3By AI realism, I do not mean political realism, a nationalist foreign policy position. AI realism includes the common sense meaning of staying grounded amid uncertainty. Like critical realism, it asserts that there are logical processes immune to direct change from human perception alone and, like economic realism, that economic outcomes have a logic distinct from the whims of human psychology. It also shares aesthetic realism’s commitment to represent subjects as they appear in everyday life without idealization. Beyond vibes, AI realists would be committed to grasping how the technology works, contextualizing it, and examining our intuitions, whether they be to vilify or idealize, to mystify or oversimplify. They would understand that models are not just commodities or platforms, but the unfolding outcome of the systemic logic of embedded material social relations. Large models are created for the purpose of profit maximization and trained on the data that humans have generated, ideologically, as subjects making sense of their lives within capitalist social relations. AI realism would measure the impact of machine learning in terms of months and years, rather than speculate about decades, centuries, and millennia. AI realism would entail intellectual humility, admit its own errors, and forgo wild leaps of logic without denying that the world is growing increasingly strange even as it becomes more predictable. Just because something isn’t unfolding as it would in a movie doesn’t mean it isn’t happening. And just because something feels like fiction doesn’t mean it isn’t true.
Developing AI realism requires taking on three tasks: (1) building a framework for coherent conversations about artificial intelligence, (2) reviewing how the most advanced model architectures roughly function, and (3) detailing trends in machine learning so that we know what we’re swimming in, what’s on the immediate horizon, and what’s as unreachable as the stars. Nowhere will I imply that artificial neural networks are conscious, have subjective experiences, or think. I use the term learning with no depth—that is, machines don’t know how to do a generalizable task, then they are trained, and then they can do the general task. If there is a better word for that phenomenon, I’m open to it.
Why Generative AI Is Different
Simply put, an artificial intelligence is a machine capable of outputting autonomous predictions, inferences, and actions in response to a prompt. Smart appliances, Google Maps, Siri, Alexa, vehicle parking assist, social media moderation bots, facial recognition cameras, predictive policing algorithms, and autonomous drones are all AI projects that have been used by us or on us for at least a decade.4For more discussion on AI within households, see Jonathan Martineau and Jonathan Durand Folco, “The AI Fix: Algorithmic Capital and Social Reproduction,” Spectre 8 (2023): 22–41.
But none involve the qualitative advance in artificial intelligence known as transformer architecture. Transformer architecture, first developed in 2017 by Google researchers, made the growth of large language models (LLMs) possible and fueled rapid advancements in computer vision, robotics, sound and brain wave analysis, code generation, and scientific research overall.5Ashish Vaswani et al., “Attention is All You Need,” Advances in Neural Information Processing Systems 30 (NeurIPS, 2017), 5998–6008. Artificial intelligence is not just an appliance component or a type of consumer-facing app, but the imperfect category for machine learning, a subset of computer engineering projects. It’s important to note that the advances we’re witnessing caught almost the entirety of the capitalist world by surprise, including many industry insiders. AI is not the product of a grand conspiracy. The history of its development is open and traceable.
Before transformer architecture, artificial neural networks were primarily used for projects like superficial anomaly detection, image and speech recognition, machine translation, and sentiment analysis in social media posts and product reviews. Transformer architecture improved these capacities, added rich content generation and even steps toward automated problem solving. For example, instead of just analyzing known protein structures, in 2018, DeepMind’s AlphaFold mapped almost all of the world’s unknown protein structures, a task that would have taken the global possible workforce of PhDs hundreds of thousands of years to complete.6John Jumper et al., “Highly Accurate Protein Structure Prediction with AlphaFold,” Nature 596 (2021): 583–89. A further step toward medical breakthroughs, AlphaFold3 now predicts the structure and interactions of proteins, DNA, RNA, and ligands with high accuracy.7Josh Abramson et al., “Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3,” Nature (2024). In 2023, AI models discovered seven hundred new materials. And, of course, models are generating increasingly sophisticated permutations of text, image, music, video, 3D digital objects, and computer code from the matrix of humanity’s collective productivity, leaving some users enthralled with their newfound access to cultural productivity and many creative people bereft over the market devaluation of skill sets that not only pay the rent, but constitute how they make meaning in and through the world.
Large machine learning models are not specifically programmed to solve a specific task. They are not hand-coded software. They’re not based on if-then logic. Modeled loosely after human brains, they’re trained inductively, built with parallelized hardware and advanced chips known as compute. Constructed from calculus, linear algebra, and statistics, these mathematical machines transform data into an internal knowledge infrastructure. They do not store the content they’re trained on. The models’ ability to reason, which is in part an emergent property, is still weak. However, the largest models, called foundation models, make inductive and abductive choices about patterns in training data, then apply what has been gleaned to new inputs. As of this writing, they cannot learn continually from experience, which would require extended memory and real-time retraining.8OpenAI’s GPT-4o does have the ability to remember interactions and learn from them. However it is not retrained every time it learns. That is, its learning is not transformative to its structure. Transformer architecture is not limitless. It has a “quadratic equation problem”—that is, as the input size grows nuance requires exponentially more resources.
Large artificial neural networks are generative, but at the cost of consistency and intelligibility. As Remo Pareschi notes in his work on AI reasoning, getting the best output from an AI means collaborating with it. It doesn’t just do the thing.9Remo Pareschi, “Abductive reasoning with the GPT-4 language model: Case studies from criminal investigation, medical practice, scientific research,” Sistemi intelligenti 2 (August 2023): 435–44. Users have to explain what they want from the AI, talk to it like a coworker or employee. Outcomes are therefore syntheses of the mathematical model’s ability to assess patterns mixed with our own prompting abilities. This interactivity requirement may eventually downgrade the industrial value of programming languages. The goal, as stated by Nvidia CEO Jensen Huang, is that non-experts should be able to talk to computers using natural language prompts, not code.10Nandini Yadav, “Nvidia CEO Says Don’t Learn Coding because AI, Tech Giant Exec Says Jobs Will Be Hit,” India Today, March 1, 2024.
Engineers know how to build large neural networks, but knowing how to build things doesn’t translate to understanding them. The term black box is neither hype nor conspiracy. Any large neural nets’ generative process is beyond the collective human capacity to calculate in real time.11Programmed computers also calculate faster than humans, but failure to execute our exact commands returns an error; causality is clear. Explainable AI (XAI) exists for certain tasks. For example, robot swarms are customizable. Liquid Neural Networks, which are improving vision for autonomous driving, have a smaller, interpretable architecture. However, for reasons that I hope this essay will ultimately make apparent, this level of control is currently not possible for generative AI. Such a lack of transparency, unnerving on its own, makes it difficult to trace not only causality in models, but also biases. Since the quantity of cleaned data affects quality, everything that can be scraped tends to go into foundation models: thoughtful analysis, public domain texts, news, recorded conversations, YouTube transcriptions, movie and television dialogue, literary fiction, debates within Marxist history, polemics against marginalization and oppression, love letters, free academic papers, screeds of all sorts, calls for genocide, and rationalizations for colonialism, racism, queer-bashing, transphobia, antisemitism, and islamophobia.
Beyond vibes, AI realists would be committed to grasping how the technology works, contextualizing it, and examining our intuitions, whether they be to vilify or idealize, to mystify or oversimplify. They would understand that models are not just commodities or platforms, but the unfolding outcome of the systemic logic of embedded material social relations.
For their own economic interests, tech firms have strengthened guardrails to prevent foundation models—pretrained AI models that can be adapted and fine-tuned for individual use cases—from inadvertently spewing antisocial content. But we don’t know how to replicate the guardrails or what inferences lurk below the interface on these closed-source networks. Nor does open source AI development solve ethical problems. The democratization of AI doesn’t preclude the creation and deployment of bespoke, toxic models.12For example, small open-source models offer a route towards intentional bias insertion, automated hacking, and nonconsensual deepfake and deepnude engineering.
While many on the broad left are scrambling to find ways to stop AI models from scraping their content, and while European centrists are passing weak legislative seawalls to ward against the tsunami of disinformation and misuse, the far-right is rushing to load the Internet with prospective training data in hopes of shifting what they call “woke corporate models.” Whether any of these gambits could work is anyone’s guess.
A Snapshot of AI History
AI was not dreamed into existence by celebrity tech moguls in underground mansion bunkers. It is borne of human intelligence, the result of the collective labor power of millions of workers across a century and around the globe: computer scientists, roboticists, software engineers, mathematicians, geoscientists, ocean cable technicians, underground miners, landfill workers, equipment installers, teachers, data annotators, semiconductor processors, and the millions of paid and unpaid carers who maintain them.13Though pre-dating contemporary generative models, Kate Crawford’s book on the subject is excellent. Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence (New Haven: Yale University Press, 2021). Making celebrity CEOs the protagonists of our story, even as villains, undermines labor’s contribution to computational engineering and obscures capitalist logics.
As industrial and imperial rivalries intensified during the early twentieth century, fantasies about a universal computing machine began to percolate. Human calculators gave way to mechanical calculators, which in turn were replaced with vacuum tube computers whose blinking on/off signals became mid-century Hollywood’s visual shorthand for techno-futuristic command and control. Integrated circuits and transistors replaced vacuum tubes. Computers, once the size of whole buildings, became portable and the transmissible data packets they processed became the stuff of the modern Internet. Computing, until then largely a military project, grounded the material infrastructure that enabled capital to push for anti-labor trade agreements, which globalized supply chains to exploit labor power at its cheapest, displacing local and regional networks of production, exchange, and consumption.
AI research emerged within this matrix.14Matteo Pasquinelli’s The Eye of the Master gives a detailed history of the development of artificial intelligence, arguing that AI is an imitation of collective human labor processes, and not of human brains. The Eye of the Master: A Social History of Artificial Intelligence (London: Verso Books, 2023). With consistent funding from the US Department of Defense, its early researchers fell into two camps: symbolic or connectionist. Adherents of the symbolic approach, also known as good old-fashioned AI (GOFAI), assumed humans ought to code deductive rationality into machines. They defined intelligence as the ability to manipulate logical symbols and form abstract theories about the world. Connectionists—the field’s eccentric fringe—argued instead that breakthroughs would emerge from building brain-like electronic networks. Machine learning should emerge from the mechanization of empirical sense-making. Machines would no more need to learn logic to become agentic than a child needs to learn topology to tie their shoes. Horrified, the symbolic camp expelled connectionists from funding streams. By the 1970s, they wrote them out of AI history.
But, despite achievements in chess and logic, symbolic AI’s success sharply plateaued. When GOFAI worked, it yielded precise, air-tight deductions. But its success was limited. How could all human knowledge be encoded into symbols? And then there was the problem of robotic movement—Moravec’s Paradox. While machines might outperform humans at logic, programming them to complete simple physical tasks like twirling a pencil or lifting different types of objects was nearly impossible.
By the millennium, multicore processors and advanced graphics processing units provided the computational power neural networks required. As capitalists reorganized the Internet into platforms, the incomprehensibly vast data, now harvestable, transformed connectionist research. It turned out that machines didn’t require humans to code the meaning of the world into them. They just needed power and data.
Lots and lots and lots of data.
Quantitative to Qualitative Change: Data Relations and Data Production
From cave paintings to magnetic tape, humans have always stored information according to their material social needs. Punch card machines, invented at the close of the nineteenth century, tabulated data for war and colonial conquest as well as for labor management, supply chain control, and inventory tracking. Digital technologies were likewise organized through the lens of capital’s concerns: the acceleration of profit-extracting production and exchange.
Computation changed the way information moved during the last two decades of the twentieth century. In rooms full of clacking keyboards, data entry workers turned ink on paper into bits on a glowing glass screen. Corporate offices, once cluttered with overstuffed file cabinets, became aesthetically airy, now able to move records worldwide at the speed of light. This new era of data transfer helped capital pivot operations globally to wherever labor could be most cheaply exploited and, as well, facilitated lean production. Rapid information exchange also served to heal time gaps in what Harry Braverman describes as the severing of concept (intellect) and execution (action) in the labor process.15Harry Braverman, Labor and Monopoly Capital: The Degradation of Work in the Twentieth Century (New York: NYU Press, 1998), 86. At home, Internet consumers could now relay messages to friends through data packets. On clunky cathode-ray tube monitors everywhere, pundits proclaimed we were entering a new age of openness and connectivity.
But the Internet, originally ARPANET, was not intended as a technology to reconnect with childhood friends. It was a US Department of Defense Cold War hub for maintaining communication during nuclear threats. In the 1970s, it expanded internationally and, by the 1980s, tech standardization allowed it to become a resource for thousands of academics and journalists. Soon, the World Wide Web burst onto the scene capturing nearly ten million users, producing an information explosion.
AI was not dreamed into existence by celebrity tech moguls in underground mansion bunkers. It is borne of human intelligence, the result of the collective labor power of millions of workers across a century and around the globe: computer scientists, roboticists, software engineers, mathematicians, geoscientists, ocean cable technicians, underground miners, landfill workers, equipment installers, teachers, data annotators, semiconductor processors, and the millions of paid and unpaid carers who maintain them.
Our relationship to data was fundamentally different during the web’s early years: no endless scroll, no reason to capture and hold the user’s gaze, no web search ranking beyond keyword counts.16 The number of times a keyword appeared in a website determined its ranking until Google’s founders built a contextual model. Sergey Brin and Lawrence Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Computer Networks and ISDN Systems 30, no. 1–7 (1998): 107–117.
The operative metaphor for data collection was noise-to-signal ratio. Data harvesting may have been a boon for the CIA after 9/11, but making it profitable was messy. You had to comb through heaps of metadata—details like login times, browsing duration, and clickstream paths—to find the important stuff. Users floated over the Internet, ghostlike, anonymous, even designing peer-to-peer networks to evade centralized servers where data might be collected. Early netizens might have worried about hackers revealing their deep dark secrets, but few worried about third-party entities selling their clicking patterns.
Then platform capital lured users with the promise of frictionless social connection. In exchange, the platforms logged interpersonal networks. Originally, Google hadn’t thought to mine individual user data. But after the dot-com bust, with no products to sell, they pivoted to monetizing metadata and search content. By 2003, they filed a patent for personal data harvesting files. In 2006, they bought the supposedly unmonetizable YouTube, scanned one-third of all published texts into Google Books, and began to photograph streets and roads to build their rudimentary world simulator Google Earth.17Google’s role in the rise of data capture is outlined in Shoshanna Zuboff’s The Age of Surveillance Capitalism. Shoshana Zuboff and Karin Schwandt, The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power (London: Profile Books, 2019). The problem of noise-to-signal ratio was solved: all data counted as a type of signal.
Early machine learning techniques helped platform capital decode us. The rise of social media and the global market penetration of smartphones ushered in an era defined by data analytics rather than data transfer. Suddenly, data brokers could make a fortune selling intel about individual psychologies to marketers and states. Police began using social media metadata—connections between accounts, likes, shares, and even the timing of posts—to map networks of activists and monitor protest potential, a seed project that quickly bloomed into predictive policing.18Andrew Guthrie Ferguson, The Rise of Big Data Policing: Surveillance, Race, and the Future of Law Enforcement (New York: New York University Press, 2017).
In the 2010s, the connectionist camp harnessed this quantitative data explosion to build something qualitatively new. In 2009, Fei Fei Li presented a paper dubbed ImageNet, detailing how she hired thousands of low-paid data labelers from the gig platform Mechanical Turk to help her organize a constellation of images.19Jia Deng et al.,“ImageNet: A Large-Scale Hierarchical Image Database,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2009), 248–55, doi: 10.1109/CVPR.2009.5206848. AlexNet processed Li’s work through a small neural network in 2012.20Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Communications of the ACM 60, no. 6 (2017): 84–90, doi.org/10.1145/3065386. In 2013, Word2Vec language mapping made word embeddings—mathematical calculations of word meaning—possible.21Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient Estimation of Word Representations in Vector Space,” arXiv, 2013. Soon, artificial neural nets were assessing sentiment in online posts, moderating content, organizing crowd control, scanning faces, determining credit scores, pricing real estate, vetting job applications, and helping sway elections.22Joy Buolamwini and Timnit Gebru’s work provides a history of discrimination against Black faces in scanning technology. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,” in Proceedings of the 1st Conference on Fairness, Accountability and Transparency, PMLR 81 (2018): 77–91. Also see Joy Buolamwini, Unmasking AI: My Mission to Protect What Is Human in a World of Machines (New York: Random House, 2023). Artificial neural networks had reshaped the world into what Frank Pasquale calls the black box society, where machines make inscrutable calculations, pass their judgements onto humans, the results are accepted, and no one knows exactly why.23Frank Pasquale, The Black Box Society: The Secret Algorithms That Control Money and Information (Cambridge: Harvard University Press, 2015).
Of course, data transfer and data analytics are still ongoing. However, those pivotal moments in our social history are now materially fueling new technological projects. Our data is not just fodder for machines to analyze, our data is a precondition for the machine’s existence. If the ethos of the data transfer era was seamless communication and the ethos of the data analytics era was surveillant control, how do we decipher the meaning of a world enthralled with the possibility of letting the steering wheel steer the wheel?
How Generative AI Works
As AI realists, we have to do some work to get a hairsbreadth grasp on these machines. Most importantly, it must be underscored that neural networks are not algorithms. Algorithms are used to train models, but the models themselves are not algorithms. So, then, what are they? Artificial neural models consist of nodes capable of holding numbers that are interconnected through a tunable network. This sounds complicated, but can be understood more intuitively through an example. For instance, if we wanted to calculate the likelihood of a strike, we could weigh two important facts—say, outcomes of informal polls on a strike authorization vote, which in the example below I’ve set as two separate votes called “Vote Type A & B.” The number represents whether the vote carried. Weights are simply how much weight we give the event on a scale from 0 to 1. The weights in this example have been set by us in terms of what we consider meaningful. The image also shows something called the bias. Bias here doesn’t mean discrimination. In artificial networks, all neurons, except the first layer, are slightly strengthened or weakened depending on the state of the whole, and this is called the bias.
We might be able to calculate the above model in our heads, but with more data the math becomes less manageable. Let’s imagine an architecture where we have three polls for input data and four distinct outputs: strike, no strike, bosses settle before strike, and company dissolves before strike.
Now we have a small prediction engine, but it’s weak. We need more data and middle (hidden) layers in our architecture to handle the complexity.
Our model now has 55 parameters and seems pretty complex.24This model has fifty-five parameters because it has forty-three weighted connections and twelve biases. A bias accompanies every neuron in the middle layers and in the output layer. The input layer does not have a bias. For comparison, Yann LeCun’s classic handwriting recognition model from 1998 has seven layers and around 60,000 parameters.25Yann LeCun et al., “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE 86, no. 11 (1998): 2278–2324. GPT-4, by comparison, has 1.76 trillion parameters, a complex self-attention transformer architecture, and adaptive processing, meaning that the model changes the way it processes based on the characteristics of queries.
Before transformers, neural networks processed data tokens equally. With transformers, data passes through attention blocks that share information laterally before passing information forward to the next layer. An attention block can adjust one vector to change the meaning of another vector, even if that information is found at the end of a long passage. Some layers might focus on grammar, others on semantics, others on tone. But as for how it works in practice, all we have are educated guesses. The model itself determines how attention blocks change content and on what grounds.
Attention blocks are not limited to language. OpenAI’s Sora produces moving images using visual patches rather than generating sentences from language tokens.26Open AI released its Sora explainer on February 15, 2024, called “Video Generation Models as World Simulators.”
Our data is not just fodder for machines to analyze, our data is a precondition for the machine’s existence. If the ethos of the data transfer era was seamless communication and the ethos of the data analytics era was surveillant control, how do we decipher the meaning of a world enthralled with the possibility of letting the steering wheel steer the wheel?
But this only describes how transformer models function. To actually understand how they function, we need to examine the training process.
Let’s say that we want to build a foundation model from scratch—because we have tens of millions of dollars laying around to buy the required high-end chips and pay the water and electric bills.27Pengfei Li et al., “Making AI Less ‘Thirsty’: Uncovering and Addressing the Secret Water Footprint of AI Models,” arXiv, 2023. What would be required? Our first step would be to access cleaned data sets or cleaning our own. The data-cleaning industry once fully relied on low-paid human annotators, but now many sets, like the Colossal Cleaned Crawled Corpus, are public domain. One unique aspect of data sets as a commodity is that they can be consumed without being exhausted, meaning that clean data sets can be used in perpetuity. Though data annotation continues to be a gig for human workers, significant elements of the cleaning process have been outsourced to AI models. One company, Dynosaur, has leveraged LLMs to reduce the cost of producing 800,000 instruction-tuning samples to just $12 USD.28Da Yin et al., “Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation,” arXiv, 2023.
The amount of available cleaned data illustrates why poisoning algorithms such as Nightshade, which hope to corrupt image training models that use scraped content, are a weaker tool than they seem at first glance. Nightshade adds a drop of distorted data into a sea of sufficiently clean information.
But even clean data sets, below the surface, are murky to humans. They appear as scrambled clusters of uncontested facts broken into tokens and eventually into numbers. Time-crunched low-paid gig workers, machine learning engineers, and pre-processing AIs can’t correct factual or contextual errors in the same way a domain expert would. So who resolves these contradictions? While high-stakes data sets are reviewed by humans, lower stakes data sets are often resolved by the machines themselves without explanatory notes.
Now back to our model. Since we’re imagining that we’re building a foundation model, which, to be robust and complex, requires as much data as possible, it would be reasonable to use any clean data we could get our hands on. We cannot easily evaluate billions of tokenized, scrambled, decontextualized inputs for inaccuracies or discriminatory content. Next, we will organize the model architecture and set the weights and biases at random.29This is the part where machine learning engineers have control. As engineers, we will decide the model’s learning rate, the batch sizes (how much it learns at a time), number of epochs (how many times it goes through all the data), the number of layers in a neural network, and dropout rate (forcing the network not to over-rely on particular neurons). We cannot preload our assumptions into the weights like in the earlier example where we tried to predict our strike. The machine itself determines how to weigh the value of its connections. The bitter lesson has been that giving models helpful human hints during training counterintuitively slows their ability to reach benchmarks.30Richard Sutton, “The Bitter Lesson,” In the Loop, March 13, 2019. (And, per usual, no one knows why.) OpenAI’s Sora model illustrates how raw quantity yields quality. Its puppy videos didn’t improve because developers explained puppies to it or even because more puppy videos were added during training. More accurate puppies emerged solely from increases in compute power.31See OpenAI’s technical explainer for Sora called “Video Generation Models as World Simulators.”
Okay, now that we’ve chosen our model architecture, we’re ready to load our data—in scrambled batches to ensure randomization. The model calculates the values for each layer and feeds them forward. As the model engages the data, it determines patterns and relationships at a calculation rate inherently unfathomable to human engineers while simultaneously adjusting its own settings using algorithms to reorganize its initially random parts into a working system.32For example, a model the size of GPT-4 makes an estimated ~1.7 quadrillion calculations per 0.1 sec. Once the forward pass of data is complete, a phase called backpropagation calculates error gradients moving backward from the final layer to the beginning. Then, using a technique called gradient descent, the model readjusts its weights based on this new information. When loss (error) descends to the lowest point in a curve, it means the model is likely approaching optimal parameters. This is what learning looks like for generative AI. We can visualize it as water droplets running downhill toward the deepest nearby basins.
We stop training when our model hits a plateau or reaches a benchmark. We might even leave the model a little underbaked to avoid overfitting. Overfitting is when a model organizes itself to fit the training data so well that it stops being able to properly assess new data because it clings too tightly to the original patterns. Large language models aren’t designed to directly reproduce items in their training sets. Artificial neural nets do not collage information. AI models are not intelligent hard drives pulling stored content from themselves.
Now that our model is trained, it will go through a phase called inference, an entirely separate testing process with new data sets. The inference phase is designed to test our new model for accuracy. After inference, we would subject it to safety tests (red teaming) and allow human domain experts to evaluate and perhaps attempt to tweak its responses.
Okay! Now we have…wait, what exactly do we have? What we have is a next-token predictor. If we build a language model, we have a text predictor. If it’s an image model, we have an image predictor. If it’s a moving image model, we have a visual motion predictor. With machine learning, generation is prediction. So, did we go through all that work just to build spicy autocorrect? Well, it depends on what we mean by just and spicy. For example, if I ask my cell phone to complete “my cat went…,” it quickly collapses into gibberish:
My cat went to the hospital and I was wondering if you could get a ride home from work and when you get home I can get you a ride home and…
While GPT-4 yields:
My cat went to explore the mysterious corners of the old attic, where she discovered a hidden world of forgotten toys and sunbeams dancing through dusty windows.
Why is there a difference? How can an LLM calculate the meaning of words like cat or sunbeam, let alone tune parameters to assess how cats relate to sunbeams?
The first difference is that large models have longer context windows than autocorrect. A context window is how much content a transformer can synthesize before making a prediction. GPT-3 has a context size of 2048 tokens. GPT-4 Turbo can process 128,000 tokens or about 200 single-spaced pages. Anthropic’s customers can buy context windows of up to a million tokens, which means the model can analyze around 2,000 pages of text before predicting the next word. Google has just announced a new architecture called infini-attention that gives models a theoretically infinite context window by creating a working memory within it.33Tsendsuren Munkhdalai, Manaal Faruqui, and Siddharth Gopal, “Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention,” arXiv, 2024.But context length doesn’t explain how a model knows that sunbeams dance through dusty windows. Is that just a quoted fragment pulled from its training data mashed together with another quote? It is easy—and comforting—to imagine that models simply replicate and reassemble snippets from their training data. But, as illustrated, that’s not how properly functioning models generate responses.
The second difference is embeddings.
Humans use symbols to represent phenomena: letters to represent a cat (C-A-T), for example. Machines use numbers. In language models, each word is a set of coordinates mapped to other coordinates in multidimensional concept space.34Word and grammatical tokens to be more accurate. These tangles of vectors, called embeddings, allow the model to discern relationships between concepts as if they were spatial distances. The point for cat, for instance, would be closer to rat than fog. Formed during training, embeddings can be imagined as multidimensional sculptures that capture relationships and complexity.35Embeddings can also be transferred. The fact that they are relatively static means that we can analyze them to understand how they work, to see how the models make sense of information. Researchers at Anthropic are at the very beginnings of finding structural meaning in their embedding models. See Adly Templeton and Tom Conerly et al., “Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet,” Anthropic, May 21, 2024.The number of embeddings used by GPT-4 is secret, but we can assume the model is more advanced than GPT-3, which had a vocabulary of 50,000 embeddings each with 12,288 dimensions of meaning.
Neural networks can be trained on anything patterned: language, images, videos, music, brainwaves, code, engineering blueprints, motion.36Thanks to my “informant,” Mr. J, who works at an engineering firm and informed me in early 2022 that a well-known computer-aided design and engineering software company was requiring its clients to move all their data to its platform, including blueprints and customer service exchanges. Our speculation that the goal of this capture might be the creation of an AI-generated engineering model seems increasingly plausible. Due to successful training on the latter, roboticists are coming closer to solving the problem of fluid machine movement. The result: artificial intelligence is stepping out of the two-dimensional frame and into the world.
Physical AI Agents
The first AI, Shakey, resembled a wobbly filing cart on wheels. Built at Stanford in 1966, it served for half a century as the model for realistic robotic horizons. Machines might surpass humans at calculation, but movement, philosophers argued, required a world awareness, an embodied intelligence that machines could never have.37Hubert L. Dreyfus, What Computers Can’t Do: A Critique of Artificial Reason (New York: Harper & Row, 1972); Hubert L. Dreyfus, What Computers Still Can’t Do: A Critique of Artificial Reason (Cambridge, MA: MIT Press, 1992); Hubert L. Dreyfus, “Intelligence Without Representation,” in Steven Harnad, ed., Artificial Intelligence: The Case Against Cognitive Science (Westport, CT: Ablex Publishing, 1993), 377–411. This physical self-knowledge is so natural to humans that capital has long degraded it as unskilled.38As pointed out in Pasquinelli, The Eye of the Master.
Though reinforcement learning (RL) is an older machine learning technique based on behaviorist theories about operant conditioning, transformer architecture has enhanced its potential. Known as Deep RL, it enables robots to learn in unstructured environments with unforeseen objects. Virtual reality is now part of its training gym. Within Nvidia’s Omniverse, hundreds of versions of the same robot can simultaneously build skill sets in challenging virtual landscapes using different trial-and-error methods.39Screenshots from the virtual gyms give a good sense of the scope of the project, as in NVIDIA’s Omniverse Isaac Gym. The real world robot then acquires this information via transfer learning so it can continue its training in the physical world.
This is not theoretical or merely on the horizon. Virtually trained robotic AI is in its industrial piloting phase. New, cloud-based robotics-as-a-service models are allowing companies to run robot fleets 24/7. Agility Robotics’ Digit is loading boxes at Amazon’s Seattle facility.40Agility Robotics, “Agility Robotics Broadens Relationship with Amazon,” October 24, 2023. That’s in addition to Amazon’s existing non-humanoid swarms. Apptronik just signed a deal with Mercedes-Benz.41Apptronik, “Apptronik and Mercedes-Benz Enter Commercial Agreement,” March 15, 2024. BMW signed with Figure, which uses OpenAI’s GPT model to converse with users.42Brian Heater, “BMW Will Deploy Figure’s Humanoid Robot at South Carolina Plant,” TechCrunch, January 18, 2024. In Canada, auto parts company Magna is using Sanctuary AI’s humanoid, Phoenix. Boston Robotics is claiming that its new Atlas, now electric, will be ready for industrial deployment in a few years.43Aleksandra Sagan, “Magna to Pilot Sanctuary AI Humanoid Robots,” The Logic, April 11, 2024. But robot timelines may have just gotten a boost: Nvidia has launched a robotics foundation model, GR00T, alongside its Blackwell chip, which is five times faster than the compute that current models use.44 NVIDIA, “NVIDIA Blackwell Platform Arrives to Power a New Era of Computing,” March 18, 2024. See also NVIDIA, “NVIDIA Project GR00T: Generalist Robot 00 Technology.”
Digital transfer in the twentieth century involved moving printed content into digital space. Now digitization involves whole persons. Meta’s Ego-Exo4 project, for instance, uses augmented reality to sync participants’ first-person perspective to external third-person cameras, capturing their fine-motor skills from subjective and objective viewpoints: a bicycle mechanic fixes a wheel, a stylist snips hair between her fingers, someone plays piano, cooks, climbs a rock wall.45ego-exo4d-data.org. For full paper, see Kristen Grauman et al., “Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives,” arXiv, 2024. Medical prosthetics and remote-operated industrial robots similarly produce human behavioral data for machine learning. Humanoids aren’t the only robot style, of course. Four-legged models are already deployed for industrial purposes, as are autonomous vehicular machinery, drones, and AI-trained robot arms.46Boston Dynamics, Spot®. Autonomous drones are used in precision agriculture, crop monitoring, wind turbine safety, and environmental monitoring. Other deployed AI robotics projects include: Magnilearn (language education); AMP (robotic recycling sortation); LocusRobotics; AutoStore (warehouse automation); MisoRobotics (line cooking); PathAI (pathology labs); Waymo (automated taxi); LivePerson (customer service chatbots and sentiment analysis); Thedacare (assisting hospital staff) Epiroc (autonomous drilling rigs); FarmWise (weeding); Greenfield (weeding, planting); H2L (tulip field health); Solix (comprehensive farm-field care); Cat (automated self-driving mining trucks); Komatsu (automated drilling systems).
However, autonomous physical machines are still in their infancy. Unsurprisingly, caretaking and household reproductive labor are still weak points for industrial development.47Much digital ink has been spilt on the failure of Japanese care robots in the 2010s. However, future predictions regarding the viability of these robots should be modified given the subsequent development of generative AI (which the robots pre-date). There is a big difference between a robotic stuffed animal from 2010 and advanced voice conversationalists like GPT-4o. So is agentic AI, defined as autonomous AI that can draw conclusions about how to best complete tasks for users, although the new GPT-4o model’s multimodal fluidity is a step in the direction of agentic deployment. Foundation models, as of this writing, cannot reliably generate actions on behalf of users without complex sets of permissions.48However, humans have designed entire autonomous video game production companies using Auto-GPT, including automated CEOs. AIs can be combined for agentic action. For example, GPT can be set up to interact with Zapier in order to complete mundane tasks. Yet a frictionless, agentic AI product has not yet been launched. OpenAI has hinted that GPT-5 will be agentic. However, cybersecurity between proprietary platforms presents a challenge to capital beyond the scope of AI agent design. Transformer architecture is also a rough fit for building machines that could engage in autonomous exploration and a suboptimal fit for building self-driving cars, which may be better served by smaller, explainable liquid neural networks.49Transformer architecture creates inherently static models that require updating and retraining. It lacks built-in mechanisms for continuous spatial-temporal reasoning and the ability to update its own neural network in real-time. For an example of how liquid neural networks function, see point “6.2.1 Liquid neural networks” in Gayashan Porawagamage et al., “A Review of Machine Learning Applications in Power System Protection and Emergency Control: Opportunities, Challenges, and Future Directions,” Frontiers in Smart Grids 3 (2024).
Nor are current generative models artificial general intelligences (AGIs). Existing models have been largely trained to do one task well: language generation or image generation or brain wave interpretation or movement. Until recently, multimodal models have been mixtures of experts—three AIs in a trench coat. However, OpenAI’s new GPT-4o and Google’s newest version of Gemini are natively multimodal, meaning that mixtures of data types such as text, images, video, and sound went into their training sets. This is why OpenAI and Google talk about steps toward general intelligence. Models are indeed becoming less narrow. But AGI is a slippery term with no fixed meaning. Some use it to refer to “AIs that would think like humans.” Others use it to insinuate sci-fi superintelligence. Others use it to suggest that machines can achieve self-consciousness.
But when capitalists say AGI, it means something different: “Open AI defines AGI as highly-autonomous systems that surpass humans in most economically valuable tasks.”50 While this definition, part of OpenAI’s 2018 charter, has been scrubbed, it is still available at Anna Tong, Jeffrey Dastin, and Krystal Hu, “OpenAI Researchers Warned Board of AI Breakthrough Ahead of CEO Ouster, Sources Say,” Reuters, November 23, 2023 and “AGI Defined,” The Batch, January 10, 2024. Capital is not aiming to produce self-aware machines. Firms that give little thought to the rich internal lives of their meat-bag workers probably aren’t overly concerned about developing machines with deep thoughts. The hope seems to be for obedient, cost-cutting, productive things. And also for destructive things.
Capital is not aiming to produce self-aware machines. Firms that give little thought to the rich internal lives of their meat-bag workers probably aren’t overly concerned about developing machines with deep thoughts. The hope seems to be for obedient, cost-cutting, productive things. And also for destructive things.
Algorithmic Killers and Deskillers
An algorithm is a step-by-step solution. Capitalism has always been algorithmic. The objectification of African peoples into slaves and the colonization of whole lifeworlds into exploitable resources would have been impossible without algorithmic protocols. For hundreds of years, capitalists have used scientific management to analyze workers’ behavior to maximize relative surplus value. The three volumes of Karl Marx’s Capital are, in a way, the first text dedicated to cracking the system’s core algorithms to reveal its hidden layers.
Harry Braverman’s Labor and Monopoly Capital extends Marx’s deskilling thesis, even to those tasked with conceptual labor.51Braverman, Labor and Monopoly Capital. The claim that intellectual laborers are part of a “professional managerial class,” and not workers makes it difficult to account for AI job displacement and its social effects. Furthermore, holding that the science and tech laborers who build AI are not workers precludes the claim that the workers who build machines (as opposed to the machines themselves) produce value. Consequently, proponents of this view must conclude that management produces commodities and surplus-value and, even more absurdly, that we no longer live under capitalism despite capital’s organization of our material conditions. Occam’s Razor suggests that intellectual labor is labor and that those who build the world are workers no matter which human body parts contribute to that construction. The crushing unemployment in the tech sector shows the precarity of intellectual labor even while the invention of sociable machines winks at the expendability of affective labor. Braverman’s antagonists argued that conceptual labor was immune to proletarianization. And yet here we are. The World Economic Forum has predicted that 42 percent of workplace tasks will be automated by 2027.52Saadia Zahidi et al., The Future of Jobs Report 2023 (World Economic Forum, April 30, 2023), 6. Matt Welsh, former Harvard professor of computer science, is telling potential junior software engineers that they will soon be replaceable at the cost of around $31 per year.53CS50, “Large Language Models and The End of Programming – CS50 Tech Talk with Dr. Matt Welsh,” YouTube video, 1:23:15. See also Jackie Davalos and Dina Bass, “Microsoft’s AI Copilot is Beginning to Automate the Coding Industry,” Bloomberg News, May 2, 2024. Many who remain employed will likely be subject to labor speed-ups and reskilling. Of course, new job categories will emerge. But unlike inventions such as steam, electricity, and digital computing, the entire hope for AI is that it will “outperform humans at most economically valuable work.” Human ability is being reframed as what machines can’t do yet, even though the source of machine ability is our art, our stories, our conversations, our engineering innovations, our habits, even our motor skills.
Proof-positive that conscious machines with rich interior lives are not the goal, AI fighter pilots beat human fighter pilots because they’ll risk self-destruction to win aerial dogfights.54Dave Hambling, “AI Thrashes Human Fighter Pilot 5-0 in Simulated F-16 Dogfights,” New Scientist, August 25, 2020; Tara Copp, “An AI-Controlled Fighter Jet Took the Air Force Leader for a Historic Ride. What That Means for War,” AP News, May 3, 2024. A new US Department of Defense initiative, Replicator, is transporting thousands of lethal autonomous drones to the Indo-Pacific to leverage against a potential invasion of Taiwan.55Defense Innovation Unit, “Replicator.” These drones are unlikely to operate through transformer architected AIs and will likely use simpler, more interpretable AIs. Taiwan Semiconductor Manufacturing Company is the world’s only advanced computer chip fabricator. The geopolitical calculations here are dizzying. If China and the United States were to engage over Taiwan, the military hardware itself would depend on Taiwan’s manufacturing infrastructure.
But it’s not just tension between the United States and China at play. War is increasingly waged like a video game. Ukraine is an ongoing testing ground for autonomous drone warfare. And when it comes to transformer models, AI’s black-box nature can even be part of its ideological charm: welcome to AI-washing. Israel’s The Gospel picks bombing targets in Gaza while a separate AI, Lavender, chooses which Palestinians to execute, cloaking genocide in technical rationality.56Yuval Abraham, “‘Lavender: The AI Machine Directing Israel’s Bombing Spree in Gaza,” +972 Magazine, April 3, 2024. See Antony Loewenstein,. The Palestine Laboratory: How Israel Exports the Technology of Occupation Around the World (London: Verso Books, 2023). Back in the United States, the Texas Department of Public Safety, the Department of Homeland Security, and the Los Angeles Police Department subscribe to—among other home-grown companies—a monitoring system from Cobweb Technologies, an Israeli firm, originally trained on surveilling Palestinians’ social media and physical activities.57Cobwebs Technologies. “Cobwebs Technologies, an Israeli Firm Presents Its Anti-terror Tech to High-profile U.S. Delegation,” PR Newswire, July 10, 2019. Cobwebs Technologies was purchased by a US venture capital firm after the start of the bombing of Gaza in fall 2023. The name of the company reads like a bitter response to Hassan Nassrallah’s comment in 2000 that Israel is as broad, diaphanous, and fragile as a cobweb. The system continually collects online data from the public, monitoring discussions across communities, even following people onto the dark web. The technology enables police to geofence locations, create instant target cards for individual protesters, which include a profile photograph of them, their online browsing history, a brokered data profile, and real-time tracking location.
As mentioned earlier, the most popular position when it comes to AI is “up with the good stuff, down with the bad.” This common-sense intuition forgets that what counts as good and bad are an ongoing contestation, that we are the ethical agents, not the machines, and that we live inside an organizational logic where commodities are launched for profit, not for good. The AI safety movement talks about making sure machines understand human values, but in a world where the economic operating system actively devalues human flourishing and climate health, that directive seems like a blurry set of prompts. It’s hard to imagine the AI embedding vectors for “human values” as pointing to anything other than a set of vague niceties or a multidimensional wreck of contradictions.
Getting Real About AI
The internet boom…[might] end up looking a lot like the CB radio: initially a cult among specialists; a sudden, skyrocketing surge in popularity, and then, well…not much, really.58Oliver Burkeman, “Waiting for the Revolution,” The Guardian, December 4, 2000.
Do our computer pundits lack all common sense? The truth is no online database will replace your daily newspaper.… [Negroponte] predicts we’ll buy books and newspapers straight over the Internet. Uh, sure.59Clifford Stoll, “Why the Web Won’t Be Nirvana,” Newsweek, February 26, 1995. Stoll’s skepticism in this short quote is directed at Nicholas Negroponte, whose predictions about computers with cameras on them, buttonless cellphone screens, and Internet-enabled cars seemed far-fetched at the time.
Given the overwhelming hype, many in the 1990s and early 2000s argued that the Internet was little more than a passing fad for tech enthusiasts in wealthy nations. But excitement levels don’t change material logic.60Hype and disillusionment are accepted features of the capitalist development cycle and rigorously studied by investors. See Gartner’s many analyses of hype cycles. While some consumer commodities can survive on hype, if productive commodities don’t solve individual capitalists’ quest toward profitability, they fail.61See Karl Marx, Capital, vol. 2, ch. 20, on distinctions between Department I and Department II, the organizational requirements for the means of production as compared to the means of consumption. There’s very little hype surrounding ocean Internet cables or industrial lubricants and quite a bit surrounding semiconductors at the moment, but capitalist infrastructure collapses if any of them disappear. As Søren Mau details in Mute Compulsion, economic power radiates through and is constitutive of capitalist society. It compels workers to sell their labor power. It compels capitalists to adopt productivity accelerants at the right moment or risk obsolescence.62Søren Mau, Mute Compulsion: A Marxist Theory of the Economic Power of Capital (London: Verso Books, 2023). The market is an algorithm of punishment and reward. But its algorithmic quality also implies an element of predictability, meaning we can cut through illusions—including our wildest hopes and deepest fears—to calculate its logical dependencies. So let’s make a few predictions based on what we now know.
It’s reasonable to assume that AI advancement will continue its steady acceleration. Even if venture capitalists were to zip their wallets, global defense budgets would almost certainly include AI research as a line item. Though there’s no reason to believe capital has any intention to divest—quite the opposite. Public support for outlawing AI research seems unlikely given its proven ability to advance medicine. Since, as I’ve tried to establish here, the choices of capitalists, broadly speaking, aren’t inherently irrational or hype-dependent and their goals don’t require solving hard problems like “machine consciousness,” and since individual implementations of accelerated AI will produce profits for those who get to market first (and doom for those who fail to adopt AI wisely), we can assume that pressures on capital for AI development and adoption will continue. This will cause what Marx called the organic composition of capital to further rise and employment will decline in many sectors, but at no point in the foreseeable future would it be machines all the way down.63My point that the economy won’t be machines all the way down isn’t because “capital needs it to be so or it would no longer be capitalism”—which is circular reasoning—but because of likely economic and social limitations. While 3D-printed buildings may be automated someday, it’s not likely that an AI plumber is going to come replace the busted pipe under the floorboards of older houses without human intervention. Nor is it likely that preschool teaching will be entirely ceded to AI. Nor will mental health crisis management and nursing. Nor will AI safety engineering.
Infrastructure collapse, however, is a genuine threat. AIs have bodies. Data centers alone comprise 4 percent of the world’s carbon footprint. In fact, training AI models is so resource intensive that Microsoft is already under contract to build a nuclear reactor to support its AI goals. The higher organic composition of capital from increased machinery could potentially lead to lower-priced AI components, but struggle over conflict minerals, shortages in scientific and engineering labor, and even carefully organized targeted strikes could disrupt AI development. Since neural networks are statistically driven, they inherently contain margins of error, so it’s not absurd to speculate that error-prone models, architected for agency, could be capable of both preventing and causing disasters. While explainable and interpretable models function for some tasks, it’s hard to imagine generative models, particularly large language models, without black box components in their architecture, because language itself is fundamentally complex.
Artificial intelligence is a contested term. The word intelligence itself has a dark history and all too often ends up meaning the capacity for knowledge production in service of those who dominate. The artificial part of AI, however, is less contentious. But should it be? Machine learning models are made of mathematics—the web of existence—and models are designed to organize themselves via information about humanity and the natural world through algorithms that human math scholars invented. We may not all be model-building engineers, but our lives and bodies are the content that constructs the machines. AI reflects us. Just because something doesn’t directly crawl from the earth doesn’t make it alien. If AI is a form of collective general intelligence, and part of this collective intelligence yields the means to sustain life, why wouldn’t there be collective ownership of our data and our machines in the same way we assert the right to our bodies, our natural resources, and our cities?64Karl Marx, Grundrisse: Foundations of the Critique of Political Economy, trans., Martin Nicolaus (London: Penguin Books, 1973); Wendy H Wong, We, the Data: Human Rights in the Digital Age (Cambridge, MA: MIT Press, 2023).
Capitalism is an operating system. Operating systems can be changed. All we have is each other, the earth, the sky, our animal co-inhabiters, and these machines we’ve created. Humanity’s legacy should not be exploitation, the reduction of life into ingestible data sets, and the rationalization of genocide. Our current operating system has brought the world to its present state, but we are creative and resourceful beings. We can do better. And it is up to us. There is no technological solution because we are the origin of all technology. The future is our responsibility. It’s time to get real.