NJIT AI Researcher Makes Small Language Models Accessible
While ordinary people around the world are waking up to large language models on the cloud, researchers at New Jersey Institute of Technology want you to know about the power of small models on your own hardware.
It’s not unlike fifty years ago, when people were becoming aware of business computers the size and cost of a car, unaware of the imminent personal computing revolution.
The advantages this time are personalization and privacy. By running a small language model on your own server, laptop or even smartphone, the model doesn’t have to be programmed for general knowledge or availability to anyone else, explained Yingcong Li, assistant data science professor in NJIT’s Ying Wu College of Computing.
Li already helped conquer one downside, which is training efficiency. Small models can’t process your answer nearly as efficiently as large ones with more than 50 billion data parameters. Li, as a University of Michigan doctoral student until summer 2025 and then as a new NJIT faculty member in the fall 2025 semester, worked on research that largely closes the gap by giving small models mathematical hints from large models about how to answer questions.
If the small model succeeds after receiving its hint, then a shorter and more difficult hint is provided to see if it still works. If the small model fails, then the hint is lengthened to become easier. The goal is to find what Li and collaborators call an expert anchor — the shortest possible hint where the model can still maintain a specific success rate. Their intention is to make small-model training three times faster, and they presented this project, BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning, at the recent Neural Information Processing Systems conference in San Diego.
Now she is moving on to the next problem, which is how to acquire a suitably pre-trained small model. Individuals and organizations lack the training resources of AI giants like Anthropic, Google or OpenAI. Li feels that might not be a problem, because small models for specific kinds of problems may already exist inside large ones, waiting to be discovered.
Commercial large models can’t be used for this application because their internals are mostly secret. There are research-focused alternatives such as DeepSeek, Meta’s Llama, Mistral and Qwen, which Li’s new software could access, as described in her proposal, Efficient and Adaptable Language Models via Sub-Model Search.
Li described what she calls a router. The router considers your task’s requirements, such as the budget, topic and time allotted. It finds the ideal sub-model by adjusting the depth and width of its processing. Depth is the number of stages, called layers, used to transform your input. Width is how many computing units, measured in terms called attention heads and multilayer perceptrons, are applied to each stage.
In real-world use, a developer would describe parameters to her software, which would then extract only the large language model parts that do whatever is needed, such as image processing or mathematical reasoning. The resulting small model would have the large intelligence that it needs, but would be small enough to run locally instead of requiring a cutting-edge data center.
One downside, Li said, is that while large models receive major updates a couple of times per year, a derived small model would not get those updates. That could mean the small model’s performance stagnates, or the model could remain exposed to security risks for too long. Another drawback is that small models may need substantial updates if their owner’s needs change.
For many individual researchers and smaller organizations, “They want to build their own model, but it is impossible to build just as strong as large companies,” Li observed. “They just want to do their own case. They don’t need to understand all languages or do very complicated reasoning problems.”