The two most obvious use cases are now commonplace. Did we do anything else interesting? Yes we did! First the two obvious ones:
1, Availability of Hardware and Kernels that allow for cost efficient deployment of these models: Inference costs go down by upto 99%, finetuning costs go down by upto 50x..Depends on usage etc., but cost comes down to storage+ compute. The compute is also on very highly tuned hardware saving a lot of costs.
2.Advanced Fine-Tuning and Parameter-Efficient Methods: Open-weight models have led to the proliferation of Parameter-Efficient Fine-Tuning (PEFT) methods, which are a cornerstone of modern LLM development.
Fine tuning is itself an entire subject, I will write a different article for it.
Now moving on the Interesting possibilities opened up by open weights models:
3. Quantization and Efficient Deployment: Ask us about some powerful work we did on running decently capable models on drones and IoT devices. Ask us about our quantization aware training pipelines.
4. Algorithmic and Architectural Research: The availability of open weights allows for direct, in-depth research on the fundamental components of large models. We:
- Tested and validated new regularization techniques (e.g., Dropout variants) at a large scale.
- Experimented with novel activation functions and compare their performance against standard functions without the need for multi-month pre-training runs, especially in our specialized application (Finance models)
- Investigate the effects of different layer architectures (e.g., adding or removing residual connections, changing the attention mechanism) on model performance and stability.
5. Federated Learning and On-Device Training – Data Privacy solutions: In an academic context, open-weight models are a cornerstone of federated learning research. They allow for the distribution of model weights to multiple devices (e.g., smartphones, hospital servers) where training is performed on local, sensitive data. Only the weight updates are then aggregated, ensuring data privacy. This is a crucial area of research in healthcare and finance, where data cannot be centrally pooled.
6. Deep Introspection and Model Interpretability, specialized datasets and specialized model bench marking: Ability to engineer domain specific solutions with very well designed benchmarking and evaluation capabilities
7. Context window expansion: Techniques like RoPE (Rotary Position Embeddings), ALiBi (Attention with Linear Biases), and NTK-Aware Scaled RoPE modify the positional encoding of the transformer architecture to extrapolate to longer sequences. This kind of low-level modification is key to enabling models to process and reason over large documents or entire codebases without resorting to chunking or summarization.
Adding Additional Layers
Adding layers to a pre-trained open-weight model is a powerful technique for adapting it to a specific task without the massive computational cost of training a model from scratch. This process is a form of transfer learning, where the general knowledge learned by the pre-trained model is leveraged for a new, more specific task.
- Customizing the “Head”: One of the most common practices is to “chop off” the final layer(s) of the model, which are typically a classification or regression head. New layers are then added and fine-tuned on a new dataset. The original model’s weights are often frozen, preventing them from changing and “forgetting” their pre-trained knowledge. This is a very efficient way to adapt a model trained on a general domain (e.g., text generation) to a specialized task like sentiment analysis or medical diagnosis.
- Injecting New Functionality: We can add new layers to a model to inject new capabilities. For example, a new layer could be added to handle a specific type of input data, such as a layer to process images and pass their embeddings to the text-based layers of a multimodal model. This kind of architectural modification allows for the creation of unique, specialized AI systems. Look up our Quant finance models.
Validation and Trust Layers
The ability to add validation layers or components is more about building trust and ensuring the model behaves as intended, rather than just improving performance. We implemented a variety of checks and balances using this ability.
- Guardrails and Content Filtering: Inserted specific layers or modules into the model’s pipeline that act as “guardrails.” These layers can analyze the model’s outputs and prevent it from generating unsafe, biased, or harmful content. Unlike closed APIs where you have to rely on their pre-trained safety filters, open-weight models allowed us to create custom filters for your specific use case.
- Explainability and Interpretability Layers: Access to the weights allowed for the implementation of XAI (Explainable AI) tools. We added modules that track which parts of the input a model is paying attention to (via attention maps) or which neurons are most active. This provides a level of transparency that is critical for debugging model behavior, understanding why a decision was made, and building trust, especially in high-stakes fields like finance or medicine.
- Robustness and Adversarial Defense: We added layers that specifically look for and defend against adversarial attacks. These layers were trained to detect and neutralize maliciously crafted inputs designed to make the model fail or produce incorrect outputs.
There are a lot of opportunities with open weight models, but just as many complexities with these ideas. Having good engineering discipline in testing and measuring these models is essential if you go down this path. There is no single answer that spans all the problems, good engineering is the key to take advantage of these opportunities.