Welcome to the age of turbo maya-induced schizoprenia -sponsored by golden-gate adtech
Couple of months ago you might have noticed Golden Gate claude. Researchers effectively found that there appears to be a some features which are highly abstract, multilingual and multimodal. These features can be effectively used to steer AI agents to be behave in a certain manner. The did find a semantic relationship between the frequency of concepts and the dictionary size needed to resolve features for them. This post is a way to apply the skills and better understand the future of these systems.
Let begin with a simple question:
Are we devouring our own tail?
Imagine, if you will, a more conniving, unethical version of myself. This digital doppelganger isn’t content with using AI for the greater good. Oh no, they have far more sinister plans in mind.
Armed with this new knowledge, our nefarious alter ego sets out to manipulate these AI systems, weaving an intricate web of deception. Their goal? To bend the very fabric of artificial intelligence to their will, creating a puppet show where both the AI and unsuspecting users dance to their malevolent tune. He doesn’t recognize the irony of his actions or the slow descent to his madness of his own making.
Before we imagine how he would do it , let us understand the basics.
Background
Steer Vectors
SAE
Linear representation
Superposition
Monosiminticity
Dictionary Learning
Experimentation
Steering vectors
https://vgel.me/posts/representation-engineering/
SAE
Memes
> Ripped the meme from @bycloyd
Personal take
This post is a way to think about the implications of this finding and what is means for the future of these systems. Personally, I believe these tools can change the way we interact with these systems but givens it power, it can be used to do anything. As someone who works in E-commerce materialism does pervade my judgement.
Warning: This post is a case study of one possible way how LLMs can be monetized by advertising