Red Mage Creative
Posts
3 Open Source Models you may not have heard of

3 Open Source Models you may not have heard of

Number 3 will blow. your. mind.

Andres Sepulveda Morales
October 19, 2024

Source: https://huggingface.co/HuggingFaceTB/smollm-360M-instruct-add-basics

As we're progressing in the capabilities of closed-source models, there are plenty of meaningful innovations being made in open-source. I'd like to highlight some basics regarding model naming, as well as 3 models I played around with to provide some insight on what they might be good for.

If you're following from my last post on LM Studio, then you'll be happy to know all of these models are Staff Picks for the software, and can be used instantly after being downloaded from HuggingFace. Try them out today!

A Primer on Model Naming Conventions

Typically these models have some verbiage in their names that lends to some information about the general "power" of the model. Take this name for example:

Qwen2 Math 1.5B

The notable parts of the model name are a) the 1.5B and b) Math. We know with 1.5B that this refers to the number of parameters available, which in this case is 1.5 billion. Not every model has this, but those that do help us establish a baseline on how powerful the model is. Comparatively to a model like Mistral 7B, we can tell that Qwen2 Math 1.5B is just over 4x less parameters. I could write a whole other post on parameterization and whether more parameters is better, but just know for now that more parameters generally means more resources (RAM, electricity, etc.) needed to support operation. Conversely, lower parameter counts will tend to align with energy efficient models.

We also know, plainly, that Quen2 Math 1.5B is particularly suited for solving math problems. Much easier to understand than the parameter bit. Good naming convention will likely include a nod to the model's strength, like Mathstral or Codestral.

SmolLM

Had to start with this one. The name is adorable and immediately pulled me in, as it's a play on the term "Large Language Model." Back to naming convention, we know that it is a particularly small model.

On LM Studio, you can install SmolLM 360M v0.2. This is one of the smallest models I've seen publicly available. Looking on HuggingFace, we can see that in the Limitations section, HuggingFace points out that the model "can handle creative writing and basic Python writing." So if you're looking for a lightweight local alternative for Claude, this might be up your alley.

My curiosity after reading the documentation was piqued: Is it good at creative tasks like writing a poem? I think that's up for debate.

Sometimes the mojito's charm is divine...

StableCode

Next up is StableCode, built by the same team that built StableDiffusion. Note that the model name doesn't have the parameter count in it, but the link to HuggingFace does (~3B, btw).

This one is suited for coding related inquiries, and according to StableLM is particularly suited for lighter hardware. I tested it out with a simple question to make an API call to a Dungeons and Dragons 5th Edition API. Did it do well? Sort of. Here's the code:

If you can't tell, there are floating parentheses in some of the errors near the bottom of the code. After fixing those minor syntax errors, it worked as far as I could tell. The output is correct based on the API documentation!

Larger concerns of whether coding with an AI assistant is actually more productive may be worth considering. For now though, I would say that it can for sure help with some basic problems. Start small and expect to do some refactoring.

Qwen2

Xenophobes beware! This model comes from the strange and mysterious lands of China via the Alibaba Group and their subsidiary, Alibaba Cloud. What subversive goals could the CCP have with releasing such a powerful model other than to dismantle the United States of America?!?

That was a joke. In all seriousness, there's been a real renaissance in open-source LM technology in China, and models produced by Alibaba have topped HuggingFace in performance. Them being open source also lend to validity, so I'm inclined to support the actual innovation a model like this might be paving the way for, not shady businessmen adverse to regulation here at home.

Today's model is Qwen2 Math 1.5B , which as the name suggests, is poised towards math problems. I tried it earlier this week for some basic math problems, and it actually got everything correct. The formatting is a little wonky though.

I thought I could trip it up with some parentheses

Also correct! Thought I could trip it up with a poorly worded question as well.

Conclusion

Your mileage may certainly vary with these models, but I hope this post inspires you to play around and use models you may not have considered otherwise. Thanks!

Another post, another picture of Cha Cha. This time after his morning duties since it's been getting cold out here.

I TUCKED HIM IN AFTER THIS, I PROMISE

Reply

or to participate.