I wrote about how to run DeepSeek R1 model locally – that is, using your own hardware. But if you do not want to commit USD6K on that, there are other options. Certainly there are many providers now supporting DeepSeek R1 via an API, but those are still running on someone else’s stack – you have to send your data to them and trust that they do the right thing like proper data hygiene. Another option if you don’t have the hardware but want complete control is to run it in the cloud. Here we explore using AWS for this.
GPU Option
DeepSeek is currently not one of the model providers for AWS Bedrock. That does not mean you cannot run it. The official article from AWS suggests 3 ways of running:
- The DeepSeek-R1 model in Amazon Bedrock Marketplace
- The DeepSeek-R1 model in Amazon SageMaker JumpStart
- DeepSeek-R1-Distill models using Amazon Bedrock Custom Model Import
Option 3 is out for this exercise as we are only interested in the full 671B model.
Following the steps in the official article, you can choose DeepSeek-R1 from the model catalog (surprisingly, us-east-1 only has the distilled models. I have to choose us-east-2 for this):


The recommended specs to run is ml.p5e.48xlarge which cost a whopping USD 124.375 per hour for us-east-2 on-demand usage. Needless to say I didn’t proceed with this option. Option 2 has similar cost as it also recommends the same instance type.
CPU Option
Instead of using GPU, I went with the CPU only option. The cheapest instance type with minimum 768GB RAM – recommended to run the full model – is r5a.24xlarge:
Instance name | r5a.24xlarge |
On-demand hourly rate | $5.424 (USD) |
vCPU | 96 |
Memory | 768 GiB |
Storage | EBS Only |
Network performance | 20 Gigabit |
Running ollama, the full model (404GB) takes around 35 minutes to download. It appears there was some throttling after 5 minutes of downloading. Loading the model into memory took another 8 minutes before it can be used. After that I asked it the classic strawberry question. So how did it perform (the following video is captured in realtime)?
Not great. Token output is about 0.5-1 token/s.
It runs but the performance is hardly usable for any real world purpose. Then again, do you really need a 671B parameter model for your problem? Maybe you do if you are doing research or tackling problems that require deep understanding. For the common use case out there the 32B or smaller distilled model will probably be fine. And those require much less resources to run.
Conclusion
Running the full DeepSeek R1 model in the cloud is certainly possible, but practicality is another matter. The GPU option, while powerful, comes at an eye-watering cost. The cheapest CPU option, though, suffers from performance issues that make it nearly unusable for real-world applications.
So is it worth running the full 671B model yourself? Unless you have a specific need for such a massive model, it’s likely overkill. For most practical applications, the distilled 32B or smaller versions offer a much more reasonable balance between cost and performance.
Ultimately, while self-hosting DeepSeek R1 gives you full control, the trade-offs in cost and speed mean that for most users, cloud-based API access or a smaller model may be the better choice.