AI’s risk: Big tech’s bold moves, strange missteps & the search for safety

As AI becomes central to search, decision-making, and even creative work, the question isn’t just whether these models can perform, but how much risk they carry when they fail.

Big tech often makes moves that confuse. Last year, OpenAI moved away from a non-profit structure, pleasing investors by becoming more like a normal startup. However, at the same time, the AI giant closed their superalignment team, which focused on the long-term risks of AI. As a consumer of their product, I’m still wondering where Open AI stands when it comes to AI risk.

And AI risk is significant.

AI has been hallucinating a lot. Recently Google’s Gemini refused to generate images of white people, especially white men. Instead, users were able to generate images of Black popes and female Nazi soldiers. Google’s efforts to make LLM less biased has backfired. Google apologized and paused the feature.

As AI becomes central to search, decision-making, and even creative work, the question isn’t just whether these models can perform, but how much risk they carry when they fail.

Google’s “AI Overview” feature told users they could use glue to stick cheese to pizza and eat one rock per day. It even asked users to ‘drink a couple of liters of light-colored urine to pass kidney stones and said that ‘geologists recommend humans eat one rock per day’. A few months ago, Google also upset Indian IT minister Rajeev Chandrasekhar when Gemini gave a biased opinion about Prime Minister Narendra Modi. Gemini also inaccurately depicted people of color in Nazi-era uniforms, showcasing historically inaccurate and insensitive images.

In another incident, Microsoft’s Bing chat told a New York Times reporter to leave his wife.

Experts like Joscha Bach say Gemini’s behaviour reflects the social processes and prompts fed into it rather than being solely algorithmic. As per MIT Technology Review, the models that power AI-powered search engines simply predict the next word (or token) in a sequence, which makes them appear fluent but also leaves them prone to making things up. They have no ground truth to rely on, but instead choose each word purely on the basis of a statistical calculation. Worst of all? There’s probably no way to fix things. A good reason we shouldn’t blindly trust AI search engines.

According to research, people trust the advice of AI ethics advisors just as much as human ethics advisors, and they assign the same responsibility to both. Is that well found though?

Earlier this month, a group of researchers from multiple universities argued that LLM agents should be evaluated primarily on the basis of their riskiness, not just how well they perform. In real-world, application-driven environments, especially with AI agents, unreliability, hallucinations, and brittleness are ruinous. One wrong move could spell disaster when money or safety are on the line.

What if we could measure this risk?

A UK project called Safeguarded AI aims to build AI systems that can provide quantitative guarantees about the effect of AI on the real world, like risk scores. The project wanted to form AI safety mechanisms by bringing together scientific world models and mathematical proofs. These proofs were to explain the AI’s work, while humans verified whether the AI model’s safety checks are correct. In August, Yoshua Bengio, known as the ‘godfather’ of AI, joined this project, a sign of how crucial such work is.

A paper from OpenAI shows that a little bit of bad training can make AI models go rogue. A group of researchers discovered that fine-tuning a model (in their case, OpenAI’s GPT-4o) by training it on code that contains certain security vulnerabilities could cause the model to respond with hateful, or obscene, or otherwise harmful content, even when the user inputs completely benign prompts.

The researchers found that this problem is generally pretty easy to fix. They could detect evidence of this so-called misalignment and even shift the model back to its regular state through additional fine-tuning with true information.

AI’s growing influence demands more than clever features and flashy launches, it calls for accountability. As AI reshapes industries and everyday life, the path forward must balance ambition with caution. After all, progress without safeguards doesn’t just confuse, it can endanger the very trust these technologies depend on.