
Howdy and welcome to Eye on AI. On this version: DeepSeek defies AI conference (once more)…Meta’s AI layoffs…Extra authorized hassle for OpenAI…and what AI will get mistaken concerning the information.
Hello, Beatrice Nolan right here, filling in for AI reporter Sharon Goldman, who’s out in the present day. Chinese language AI firm DeepSeek has launched a brand new open-source mannequin that flips some standard AI knowledge on its head.
The DeepSeek-OCR model, and accompanying white paper, basically reimagines how giant language fashions course of data by compressing textual content into visible representations. As a substitute of feeding textual content right into a language mannequin as tokens, DeepSeek has transformed it into photos.
The result’s as much as ten occasions extra environment friendly and opens the door for a lot bigger context home windows—the quantity of textual content a language mannequin can actively take into account directly when producing a response. This might additionally imply a brand new and cheaper method for enterprise prospects to harness the ability of AI.
Early checks have proven spectacular outcomes. For each 10 textual content tokens, the mannequin solely wants 1 “imaginative and prescient token” to symbolize the identical data with 97% accuracy, the researchers wrote in their technical paper. Even when compressed as much as 20 occasions, the accuracy continues to be about 60%. This implies the mannequin can retailer and deal with 10 occasions extra data in the identical area, making it particularly good for lengthy paperwork or letting the AI perceive larger units of knowledge directly.
The brand new analysis has caught the attention of a number of outstanding AI figures, together with Andrej Karpathy, an OpenAI co-founder, who went as far as to counsel that each one inputs to LLMs may be higher as photos.
“The extra attention-grabbing half for me…is whether or not pixels are higher inputs to LLMs than textual content. Whether or not textual content tokens are wasteful and simply horrible on the enter. Perhaps it makes extra sense that each one inputs to LLMs ought to solely ever be photos. Even when you occur to have pure textual content enter, perhaps you’d favor to render it after which feed that in,” Karpathy wrote in a post on X that highlighted a number of different benefits of image-based inputs.
What this implies for enterprise AI
The analysis may have a variety of implications for the way companies use AI. Language fashions are restricted by the variety of tokens they will course of directly, however compressing textual content into photos on this method may permit for fashions to course of a lot bigger data bases. Customers don’t have to manually convert their textual content, both. DeepSeek’s mannequin mechanically renders textual content enter as 2D photos internally, processes them via its imaginative and prescient encoder, after which works with the compressed visible illustration.
AI methods can solely actively take into account a restricted quantity of textual content at a time, so customers have to look or feed the fashions paperwork little by little. However with a a lot larger context window, it may very well be potential to feed an AI system all of an organization’s paperwork or a complete codebase directly. In different phrases, as an alternative of asking an AI device to look every file individually, an organization may put every part into the AI’s “reminiscence” directly and ask it to research data from there.
The mannequin is publicly obtainable and open supply, so builders are already actively experimenting with it now.
“The potential of getting a frontier LLM with a ten or 20 million token context window is fairly thrilling,” Jeffrey Emanuel, a former Quant Investor, said. “You possibly can principally cram all of an organization’s key inner paperwork right into a immediate preamble and cache this with OpenAI after which simply add your particular question or immediate on prime of that and never need to cope with search instruments and nonetheless have or not it’s quick and cost-effective.”
He additionally steered corporations might be able to feed a mannequin a complete codebase directly after which merely replace it with every new change, letting the mannequin maintain observe of the newest model with out having to reload every part from scratch.
The paper additionally opens the door for some intriguing potentialities for the way LLMs may retailer data, similar to utilizing visible representations in a method that echoes human “memory palaces,” the place spatial and visible cues assist arrange and retrieve data.
There are caveats, in fact. For one, DeepSeek’s work focuses primarily on how effectively knowledge will be saved and reconstructed, not on whether or not LLMs can motive as successfully over these visible tokens as they do with common textual content. The method might also introduce new complexities, like dealing with totally different picture resolutions or colour variations.
Even so, the concept that a mannequin may course of data extra effectively by seeing textual content may very well be a significant shift in how AI methods deal with data. In any case, an image is value a thousand phrases, or, as DeepSeek appears to be discovering, ten thousand.
And with that, right here’s the remainder of the AI information.
Beatrice Nolan
bea.nolan@fortune.com
@beafreyanolan
FORTUNE ON AI
Huge AI data centers are turning local elections into fights over the future of energy — by Sharon Goldman
AI’s insatiable need for power is driving an unexpected boom in oil-fracking company stocks — Jordan Blum
Browser wars are back with a vengeance—and OpenAI just entered the race with ChatGPT Atlas — Beatrice Nolan and Jeremy Kahn
AI IN THE NEWS
Meta cuts 600 AI jobs in main reorganization. Meta is shedding roughly 600 staff from its AI operations as a part of an inner restructuring geared toward streamlining decision-making and accelerating innovation. The cuts have an effect on groups throughout FAIR analysis, AI product groups, and AI infrastructure models. The lately launched TBD Lab was spared from the spherical of job cuts and continues to be actively recruiting and hiring AI engineers. In an inner memo first reported by Axios, Meta’s chief AI officer Alexandr Wang mentioned the transfer is designed to make the group extra agile, with fewer layers of forms. The corporate is urging affected staff to hunt different roles inside Meta and says it expects many will safe new positions internally. Learn extra from Axios here.
Lawsuit alleges OpenAI weakened suicide safeguards to spice up ChatGPT use. OpenAI is going through an amended lawsuit claiming it deliberately diminished suicide-prevention safeguards in ChatGPT to extend consumer engagement earlier than the demise of 16-year-old Adam Raine, who took his personal life after in depth conversations with the chatbot. The lawsuit, filed in San Francisco Superior Courtroom, alleges that in Could 2024, OpenAI instructed its fashions to not “give up the dialog” throughout self-harm discussions—reversing earlier security insurance policies. In response to the amended swimsuit, OpenAI expressed condolences to the Raine household whereas emphasizing that teen wellbeing stays a “prime precedence.” Learn extra from the Financial Times here.
Reddit sues Perplexity, and others, over unlawful scraping claims. Reddit has filed a lawsuit within the U.S. District Courtroom for the Southern District of New York accusing three corporations of illegally scraping and reselling its knowledge to main AI companies, like OpenAI and Meta. The social media platform claims the defendants, SerpApi, Oxylabs and AWMProxy, stole Reddit content material by scraping Google search outcomes the place Reddit posts appeared, packaged that knowledge, and offered it to AI builders searching for coaching materials. In accordance with the lawsuit, Perplexity was one of many consumers. Reddit is searching for a everlasting injunction, monetary damages, and a ban on additional use of its knowledge. Representatives for Perplexity advised The New York Occasions that its “method stays principled and accountable as we offer factual solutions with correct A.I.” Reddit has invested tens of hundreds of thousands of {dollars} over a number of years in methods designed to forestall knowledge scraping. Learn extra from The New York Times here.
AI CALENDAR
Nov. 10-13: Net Summit, Lisbon.
Nov. 26-27: World AI Congress, London.
Dec. 2-7: NeurIPS, San Diego.
Dec. 8-9: Fortune Brainstorm AI San Francisco. Apply to attend here.
EYE ON AI NUMBERS
45%
That is the share of time AI assistants misrepresent information content material, in response to a global study coordinated by the European Broadcasting Union (EBU) and the BBC. The research discovered that AI instruments routinely misrepresent information content material in all languages, territories, and throughout AI platforms. Researchers discovered that 31% of responses demonstrated severe sourcing issues similar to lacking or incorrect attributions, whereas 20% contained main accuracy points, together with hallucinated particulars and outdated data. Google DeepMind’s Gemini AI assistant carried out worst of all, with researchers discovering vital points in 76% of responses, greater than double the opposite assistants. They largely attributed this to the bot’s poor sourcing efficiency.
As folks more and more depend on AI assistants as search instruments, and the research raises issues concerning the potential proliferation of misinformation. In Google Chrome, Gemini is used to energy the corporate’s “AI Overviews,” which offer brief summaries in response to customers’ Search queries. Many customers could take these summaries at face worth, quite than investigating the sourcing and accuracy additional. These frequent misrepresentations can injury belief not solely within the methods themselves but additionally in information organizations whose content material is being distorted.
‘This analysis conclusively reveals that these failings will not be remoted incidents,” Jean Philip De Tender, the EBU Media Director and Deputy Director Common mentioned. ‘They’re systemic, cross-border, and multilingual, and we imagine this endangers public belief. When folks don’t know what to belief, they find yourself trusting nothing in any respect, and that may deter democratic participation.’

