Wednesday, May 22, 2024

AI stuff, again

Here's an article suggesting that artificial superintelligences going rogue is what prevents civilisations ever spreading beyond their home planet:

 I believe the emergence of ASI could be such a filter. AI’s rapid advancement, potentially leading to ASI, may intersect with a critical phase in a civilization’s development – the transition from a single-planet species to a multi-planetary one.

This is where many civilizations could falter, with AI making much more rapid progress than our ability either to control it or sustainably explore and populate our Solar System.

The challenge with AI, and specifically ASI, lies in its autonomous, self-amplifying and improving nature. It possesses the potential to enhance its own capabilities at a speed that outpaces our own evolutionary timelines without AI.

The potential for something to go badly wrong is enormous, leading to the downfall of both biological and AI civilizations before they ever get the chance to become multi-planetary. For example, if nations increasingly rely on and cede power to autonomous AI systems that compete against each other, military capabilities could be used to kill and destroy on an unprecedented scale. This could potentially lead to the destruction of our entire civilization, including the AI systems themselves.

In this scenario, I estimate the typical longevity of a technological civilization might be less than 100 years. That’s roughly the time between being able to receive and broadcast signals between the stars (1960) and the estimated emergence of ASI (2040) on Earth. This is alarmingly short when set against the cosmic timescale of billions of years.
Interesting bit of speculation.  

And here, at the New York Times, an article explaining that it seems we are getting a better understanding of how Large Language Models "think", which is a good thing if you want to be able to control them:

The team summarized its findings in a blog post called “Mapping the Mind of a Large Language Model.”

The researchers looked inside one of Anthropic’s A.I. models — Claude 3 Sonnet, a version of the company’s Claude 3 language model — and used a technique known as “dictionary learning” to uncover patterns in how combinations of neurons, the mathematical units inside the A.I. model, were activated when Claude was prompted to talk about certain topics. They identified roughly 10 million of these patterns, which they call “features.”

They found that one feature, for example, was active whenever Claude was asked to talk about San Francisco. Other features were active whenever topics like immunology or specific scientific terms, such as the chemical element lithium, were mentioned. And some features were linked to more abstract concepts, like deception or gender bias.

They also found that manually turning certain features on or off could change how the A.I. system behaved, or could get the system to even break its own rules.

For example, they discovered that if they forced a feature linked to the concept of sycophancy to activate more strongly, Claude would respond with flowery, over-the-top praise for the user, including in situations where flattery was inappropriate.

Chris Olah, who led the Anthropic interpretability research team, said in an interview that these findings could allow A.I. companies to control their models more effectively.

Neat.  I mean, if Dr Smith managed to reprogram the Jupiter 2's robot so easily, LLM's should be equally susceptible to controlling manipulation! 

 

No comments:

Post a Comment