Researchers upend AI status quo by eliminating matrix multiplication in LLMs

Enlarge / Illustration of a brain inside of a light bulb. (credit: Getty Images)

Researchers claim to have developed a new way to run AI language models more efficiently by eliminating matrix multiplication from the process. This fundamentally redesigns neural network operations that are currently accelerated by GPU chips. The findings, detailed in a recent preprint paper from researchers at the University of California Santa Cruz, UC Davis, LuxiTech, and Soochow University, could have deep implications for the environmental impact and operational costs of AI systems.

Matrix multiplication (often abbreviated to “MatMul”) is at the center of most neural network computational tasks today, and GPUs are particularly good at executing the math quickly because they can perform large numbers of multiplication operations in parallel. That ability momentarily made Nvidia the most valuable company in the world last week; the company currently holds an estimated 98 percent market share for data center GPUs, which are commonly used to power AI systems like ChatGPT and Google Gemini.

In the new paper, titled “Scalable MatMul-free Language Modeling,” the researchers describe creating a custom 2.7 billion parameter model without using MatMul that features similar performance to conventional large language models (LLMs). They also demonstrate running a 1.3 billion parameter model at 23.8 tokens per second on a GPU that was accelerated by a custom-programmed FPGA chip that uses about 13 watts of power (not counting the GPU’s power draw). The implication is that a more efficient FPGA “paves the way for the development of more efficient and hardware-friendly architectures,” they write.

Read 13 remaining paragraphs | Comments

What is your reaction?

Excited

Happy

In Love

Not Sure

Silly

Researchers upend AI status quo by eliminating matrix multiplication in LLMs

What is your reaction?

Backdoor slipped into multiple WordPress plugins in ongoing supply-chain attack

New Cato Research Shows That Illegal Immigrants are Less Likely to Be Convicted of Murder in Texas

Leave a reply Cancel reply

More in:Editor's Pick

OpenAI’s CriticGPT outperforms humans in catching AI-generated code bugs

Mac users served info-stealer malware through Google ads

AI-generated Al Michaels to provide daily recaps during 2024 Summer Olympics

Posts List

New working paper: “Review of Contingent Valuation of Environmental Goods: A Comprehensive Critique. Edited by Daniel McFadden and Kenneth Train (2017): An Update”

New working paper: “Review of Contingent Valuation of Environmental Goods: A Comprehensive Critique. Edited by Daniel McFadden and Kenneth Train (2017): An Update”

New working paper: “Review of Contingent Valuation of Environmental Goods: A Comprehensive Critique. Edited by Daniel McFadden and Kenneth Train (2017): An Update”

Disclaimer:

Posts List

New working paper: “Review of Contingent Valuation of Environmental Goods: A Comprehensive Critique. Edited by Daniel McFadden and Kenneth Train (2017): An Update”

New working paper: “Review of Contingent Valuation of Environmental Goods: A Comprehensive Critique. Edited by Daniel McFadden and Kenneth Train (2017): An Update”

New working paper: “Review of Contingent Valuation of Environmental Goods: A Comprehensive Critique. Edited by Daniel McFadden and Kenneth Train (2017): An Update”

New working paper: “Review of Contingent Valuation of Environmental Goods: A Comprehensive Critique. Edited by Daniel McFadden and Kenneth Train (2017): An Update”

Share

What is your reaction?

You may also like

Leave a reply Cancel reply

More in:Editor's Pick

Posts List

Latest Posts