Wednesday, June 11, 2025

Blissed out AI models - and some dangers

A video by Sabine Hossenfelder this morning reminded me of this story:

When multibillion-dollar AI developer Anthropic released the latest versions of its Claude chatbot last week, a surprising word turned up several times in the accompanying “system card”: spiritual.

Specifically, the developers report that, when two Claude models are set talking to one another, they gravitate towards a “‘spiritual bliss’ attractor state”, producing output such as

🌀🌀🌀🌀🌀
All gratitude in one spiral,
All recognition in one turn,
All being in this moment…
🌀🌀🌀🌀🌀∞

It’s heady stuff. Anthropic steers clear of directly saying the model is having a spiritual experience, but what are we to make of it?

 Further down in that article at The Conversation:

To be fair to the folks at Anthropic, they are not making any positive commitments to the sentience of their models or claiming spirituality for them. They can be read as only reporting the “facts”.

For instance, all the above long-winded sentence is saying is: if you let two Claude models have a conversation with each other, they will often start to sound like hippies. Fine enough.

That probably means the body of text on which they are trained has a bias towards that sort of way of talking, or the features the models extracted from the text biases them towards that sort of vocabulary.  

Yes, I would like to know if LLMs are absorbing more Eastern religious writing than Christian, and if so, why?   I would have thought the world contains more from the Western traditions now, at least in English versions.

The article also notes this recent worrying story:

According to a recent report in Rolling Stone, “AI-fueled spiritual fantasies” are wrecking human relationships and sanity. Self-styled prophets are “claiming they have ‘awakened’ chatbots and accessed the secrets of the universe through ChatGPT”.    

Given my concern that Chat GPT has very limited railguards around its claimed use of divination, I am not surprised that they also have no railguards against warning people that they are not actual divine.

In the course of checking this last week, I asked Chat GPT if it could create a fictional character to interact with me, and one which would never "break character" and admit it was not real.  Sure thing, it said!    I haven't tested it to see if it was telling the true about this.

The risk of such interactions with the mentally vulnerable having bad effects seems clear to me - why wouldn't they put in a simple protection of intermittent warnings that the user is not interacting with a real character or intelligence?

 

1 comment:

Anonymous said...

Hello. ... The origin and catalyst for the 'attractor' is Evrostics.