Learned this Week: AI Safety, Decarbonising Steel, Marshall McLuhan, Economics 'Nobel' Prize

w/c 2024-12-09

Dec 15, 2024

AI Safety

When it comes to AI safety, you're dealing with, broadly speaking, two dimensions. There is they manner of action: humans doing harm and computers doing harm. Then there is the method of harm, which is much broader and quite frankly full of scary unknown unknowns.

You're dealing with out-of-expectation harm. Humans are dangerous already. But in a sense we know broadly who the most dangerous organisations of humans are, and their abilities to do harm. And as such, we safeguard against them. The issue is that AI allows them to leap well past our estimation of their capabilities. People you're not paying attention to become way more dangerous because AI scales both their access to, and the speed at which they can action harmful knowledge. This means that a huge class of people can suddenly gain the means to do harm your safeguards are simply not ready. The same is even more true for the people you know to be dangerous.

In the case of computers, the general concern is that an autonomously dangerous AI system goes and does harm. Computers are already dangerous (see below), but we can monitor them and we can shut them down. The danger lies in them being able to greatly outscale our ability to monitor or shut them down. Thus the pros are generally concerned with ensuring that autonomous systems cannot escape notice and escape shutdown, generally by verifying that they are not deceiving the pros about their willingness and capability to do so.

It should generally be noted that in most cases, it is easier to do damage than to prevent it. Which means that even in cases where we can prevent runaway damage, the availability of AI greatly scales the latent degree of potential harm. So not only are there numerically more potential sources of harm (assisted humans or autonomous computers), but the degree of harm per attack also jumps up a class. Multiplied together, that creates a lot of harm in potentia.

Where the complexity really explodes is in the methodology of harm. Here one must look to national defence, insurance and a few other fields for attempts at a full taxonomy of harm. Remembering that entire nation states could count as 'uplifted', you can't scope it down to harm achievable just with computers and easily purchasable resources. The pros currently seem most concerned with your classics of Cyber, NBC (Nuclear Biological Chemical) and psychological manipulation.

When it comes to securing AI in that case, most of the work is around very big very capable commercial models. Again, the good thing is you will know when these are being developed by the resource expenditure alone - it's quite hard to hide. A sobering question is whether smaller free models will be able to do harm, as these are once again creatable by many more people than we currently are equiped to monitor. In this case, what is to AI that a dirty bomb is to a nuke? And what activities will earn you a spot on an equivalent watchlist to those people who buy a few too many pounds of fertiliser?

At this moment, most of the work in evaluation is around trying to scalably verify that AI systems are both incapable and unwilling to create harm.

'Unwilling' is arguably easier to determine, but it inherrently relies on asking AI systems to do harm in enough ways for humans to be satisfied with. We are ultimately relying on a semblence of trustworthyness that is empirically built by passing 'enough' tests. The issue is that we are not dealing with human intelligences with human patience here, and there is no real guarantee that a system that says 'no' to harm 1000 times won't do it on the 1001th time.

The major concern with 'incapable' is we can only evaluate the methods of harm we know about. Although human society at a sum total has in it the knowledge of all methods of harm, this knowledge is split across many actors. In the case of AI, it is consolidated in one. The possibility of combinitorially innovating harm is very real. Even very creative risk professionals are going to be constrained to what they can imagine, and working in teams on this has an upper limit.

Mechanistically, we end up having to design for all manners of action - beyond just 'human' and 'autonomous computers' we build test suites including tool use, data lookup, long form tasks etc. The main challenge is that with the aformentioned variety of harm possible, it becomes very hard to design for and the pros are turning to a number of shortcuts. I don't like calling them shortcuts but I can't think of another word

One such tool is the use of expert auto-graders and auto-grading systems. It is understandable to do this, as we can use verifiably safe AI systems to scale how we test frontier systems. Effectively, we are giving auto-grader systems a scoring rubric to work out how capable a frontier system is at succesfully uplifting humans across a variety of tasks or succesfully doing a variety of autonomous things. The main issue here is that the scoring rubrics are semantic, general and could themselves be wrong. This is because we are concerned with verifying a large amount of plural reality here, and we don't really have any other means to evaluate 'success' at these that isn't qualitative. Humans themselves struggle with qualitative evaluation, but models even more so, as evidenced by expert auto-graders requiring a lot of their own evaluation to be done by human experts.

The current approach tries to instil rigour using the concept of safety cases. Broadly, it uses the idea of dividing a harmful capability into sub-capabilities and verifying each of those. It posits that harm is composable - if the AI fails at a certain sub-capability, it should fail at the parent. It does not however, systematically capture cases where there are multiple ways of achieving the same outcome, and moreover is rather optimistic in assuming that the right subcapabilities can be identified and chunked correctly.

Another risky-feeling shortcut is the tendency to count an AI as incapable of a certain type of harm if it fails at the easy version of a task. It is understandable for scalability purposes, but this again runs the risk of misidentifying 'easy' for an agent that has very different processing capabilites than humans. Simply because an easy approach is not successful doesn’t mean a convoluted and non-obvious approach won’t be. Consider evolution and the way cellular organisms do any kind of chemical processes.

Yet another is the use of 'proxy' tasks which relies on the notion of substitutability. Again, we're relying on far too few humans to map reality and what can actually happen. Further, it's reliant on correctly interpreting that we have correctly categorised tasks - that failing to engineer anthrax means failure on smallpox.

One could make the argument that reality does actually have similarities and that tasks scale linearily in difficulty. Regardless of the size of that ontological claim, we are still at risk of misreading the patterns and scales. When we are at a level of potential harm never seen before and rapidly accelerating, we might not afford to be wrong.

Decarbonising Steel

Steel and cement are by an order of magnitude the most produced materials in the world with 2 and 4 billion tonnes produced per year. By contrast, the next largest materials are plastics and fertilisers, which are produced in the hundreds of millions of tonnes. 50% of global steel production is in China. India makes 140mn tonnes, and Japan, USA and Russia all make below 90mn.

Steel gets made from iron ore aka hematite, rust. Ore is made from iron bonded to oxygen, iron oxide or Fe2O3. Oxygen really doesn’t want to go so we need to ‘reduce’ the oxygen and turn ore into pure elemental iron before a bit of carbon will alloy it into steel. Fun fact: a big isomorphism in chemistry is reduction vs oxidation. When you reduce something, you remove oxygen or add hydrogen (and add electrons) - oxidation adds oxygen or removes hydrogen and removes electrons.

To do our reduction, we currently either use carbon (metallurgical coal) or natural gas. Both of these unavoidably result in carbon dioxide being formed, not from the heating but from the fact the oxygen has to go somewhere and binds with free carbon.

In the case of coal, the coal’s carbon, (thanks also to a lot of heat that needs to be generated), reduces the iron oxide, but because the oxygen needs to bind to something has only the carbon to bind to. This is a slightly convoluted process, as first FE203 +3CS -> 2FE + 3CO happens during the heating, and then the carbon monoxide further reduces other FE2O3 in the blast furnace Fe2O3 +3CO→2Fe+3CO2. Naturally, during coal to generate the heat needed will also generate CO2.

In the case of methane, it first needs to be ‘reformed’ with water to create free hydrogen, which will be primarily used to reduce the iron ore, along with carbon monoxide as a byproduct: CH4 +H2 O→CO+3H2. The hydrogen is used to create elemental iron and handily, water: Fe2 O3 +3H2 →2Fe+3H2 O. The carbon monoxide byproduct is also used but in a minor fashion and still produces CO2: Fe2O3 +3CO→2Fe+3CO2.

Depending on whether we’re doing coal or methane reduction, each tonne of steel outputs 1.6-2.2 tonnes of CO2 equivalent green house gasses. At the scale we make it, steel generates between 8-11% of earth’s yearly c.40 billion tonne of carbon emissions. Due to the aforementioned need to reduce Fe2O3 into elemental iron before forging steel, it can’t just be solved with electrification.

The most naive option is to capture all the gas output, and do something with it. A problem here is that for every tonne of steel we make, we make 2 tonnes of CO2, and that is just too much to store.

A more sophisticated option is to skip straight to hydrogen as the reducing agent, Fe2O3 + 3H2 -> 2Fe + 3H2O , FeO +H2 -> Fe + H2O. This is obviously very attractive because your only waste product is water (provided you’ve electrified the heating from clean sources). The technology is similar to the shaft furnaces used for methane-based steel refining and there’s no science breakthrough needed to do this at scale.

The problem with hydrogen is that even though the methane reduction process is similar, we’d still have to create massive amounts of new infrastructure to produce and deliver the hydrogen (again, billions of tonnes would be needed), let alone retrofit or build new steel mills. We currently have 5000km of hydrogen pipeline vs the 3mn of natural gas pipelines. Leaving alone the technical challenges of containing the relatively more slippery hydrogen, it is also 3x less dense, meaning even a 1-1 swap of capacity requires 3x more pipelines. For reference, hydrogen ‘production’ is currently at 70 million tonnes globally, but this is really 3/4 from methane and 1/4 from coal (0.1% from electrolysis). To produce just this much from electrolysis would require 3,600 TWh (more than the annual energy production of the EU). At current projections by the IEA, under 8% of global steel production in 2050 will rely on hydrogen for DRI despite expecting the technology to be fully mature in 2030. It also expects demand for hydrogen to increase to 287Mt or a 400% jump.

An even more hopeful option is to use a pure electricity (direct electrical reduction) approach to crack the Fe2O3, using electrolysis to separate the iron and oxygen. Here you only get oxygen as waste. An added benefit is you skip the 1/3 energy loss of creating hydrogen for the aforementioned approach, and you don’t need to build new hydrogen-only infrastructure. Broadly speaking, there is a high-temperature approach, a low-temperature approach and a very experimental plasma-based approach. To date, only the high-temp Molten Oxide Electrolysis is at a pilot scale, but requires 4MWh/tonne of steel and means you have to deal with additional complications of working at 1600-2000C. So again, we’d need 8000 TWh or a quarter current of global generation capacity.

In addition to the order of magnitude unavailability of resources needed to decarbonise steel, we also need to remember that steel production is a highly concentrated activities, with facilities employing 10-50 thousand. These facilities were historically placed based on resource and logistical availability, but by their very construction created large communities around them. This means that if the steel production shuts down or moves, you have town-scale unemployment and community collapse issues which we have no free solution for.

Marshall McLuhan's Mosaic Pieces:

The medium is the message
- What does the medium to us instead of what we do to the medium is important
- In another sense: the format creates an effect
- Currently, formats are shortening which is creating dangerous convergence on inattention
We shape our tools and thereafter they shape us
- This also means that we can design our own tools intentionally so they shape us in ways we want them to
- The yin to a tool's yang is the behaviour that tool use creates
- As a model for change, different tools create different behavious, different behaviours create the outcomes we seek
- Obviously, these can be psychological tools as much as physical ones
The rear-view mirror
- We march backward into the future, and the future of the past’s future is our present
- We find comfort in the past, and cling to old ways of creating identity (nation, religion)
- This tends to leave us unable to properly reason about the future because everything is done in a rearward-looking paradigm
- And presumably why black swans catch us by surprise / problem of induction
The global village
- Our interconnectedness and hyper-shared spaces of social media destroy privacy because we’re all living in a handful of multi-billion person villages with the same information
- Our inner worlds will bleed together and we return to a no-privacy world
- This links to the idea I had where in a world where everyone is highly monitoring everyone else's behaviour, normal people can't get away with saying anything and this creates the need for the absurd. Court jesters and village madmen are the only ones who get away with speaking against the prevailing manner/authority (whether enforced by a power or by a crowd). Also explains rise to power of
- Privacy is a rear-view mirror concept
The vortex of energy
- If you study the patterns of the maelstrom and find the objects that float you can understand and adapt and avoid drowning
- Looking at the media right now, most of the currents push us into
- Tighter feedback loops
- We can survive in the profilicity
- Don’t develop a nostalgia for Authenticity the way Bonanza-land was nostalgia for simplicity
- Don’t succumb to the global village’s feedback looping of self-display and peer-surveillance

Economics

There is no Nobel prize in economics, but the equivalent Sveriges Riksbank Prize was awarded to Daron Acemoglu, Simon Johnson and James A. Robinson for a multi-decade study on what creates prosperity.

In a nutshell, it is stability of institutions which creates a trust that whatever you do today won’t get stolen or destroyed or embezzled tomorrow
This means a greater incentive to specialise which increases productivity
As well as a greater incentive to be entrepreneurial, and effectively build business around solving problems for people more efficiently than they could themselves
This essentially, creates a gradient from inefficient atomised self-protection and self-sufficiency towards focus and division of labour, which generates more wealth
Capitalism and the rule of law therefore can be said to create evolutionary pressure/incentives to creating tradeable problem solutions
This, combined with the growth of the middle class means that (due to there being more customers for solutions) should in principle push prosperity upwards
Concentration of wealth / bimodal disparity and a hollowing out of the middle by contrast tends the system towards a decline of stability (as there are less incentives to maintain institutions and rule of law) and will reinforce downward pressure
Generally what we see here is the fundamental multiplicative dynamics of complex systems, but we’ve now identified which factors multiply in which direction

Project Ideas:

Using LLMs to read every single sci-fi which mentions a post-AI future and categorise how that fictional universe deals with handling AI
- My bet is a high proportion of ‘Butlerian Jihad’, ‘AIs are gods and suffer us to exist’

Jan’s Substack

Discussion about this post