This is about running the full model that had o1-level performance, not distilled models like those running on Raspberry Pi.
Someone wrote a post on X/Twitter on how to achieve this in ~USD6000. This is the original post, but if you don’t have an X/Twitter account, you can also use this link to view the whole thread. There are also others who wrote about the same setup here and here so I won’t repeat it here.
This is a video capture of the model’s output in realtime:
This setup is impressive for a few reasons:
- CPU-Only Processing – No GPUs are involved
- Decent Token Generation Speed – 6-8 tokens per second
- Energy Efficient – Operates on <400W of power
- Cost-Effective – Cost ~USD6,000, a fraction of the estimated $100,000+ required for a GPU-based setup
The setup is not exactly cheap, but it is within research or hobbyist-level budget. It will be very interesting to see how much more optimization can be done to make it even more affordable without compromising on quality.