Spaces:
Runtime error
About launch time out
We are now updating this Space to make the second stage model available, but downloading and loading the second stage model increases the launch time, and we are getting the following error:
Runtime error
launch timed out, space was not healthy after 30 min
Could you make the launch time limit longer?
Also, how much host memory is available in this Space? Using the second stage model will increase the memory usage too. As for the GPU memory, I checked this app works with 24 GB VRAM, so I think it's OK with A10, but I'm not sure if it will work with the current amount of host memory.
I increased the launch timeout but you are right the actual issue is an OOM issue. This space is assigned 46GB of memory. How much memory do you think you need ?
Is the high memory usage only at startup to load the model or does it also consumes a lot of memory at actual runtime ?
I updated the error message to reflect the OOM and increased the memory for the Space to 64GB
@chris-rannou
Thanks a lot!
How much memory do you think you need ?
I've tested this app on an A100 instance of GCP with 85GB RAM before pushing it, so 85GB is definitely sufficient, but I wasn't sure how much is the necessary amount. But it seems to be working with 64GB host memory now. Thanks.
Is the high memory usage only at startup to load the model or does it also consumes a lot of memory at actual runtime ?
It consumes a lot of memory at runtime too. When I run the app in an instance mentioned above, it consumes about 40-50GB memory.
The space seems to stabilize around 54GB memory but with a few spikes that went beyond the 64GB limit.
Thanks for the info. I was encountering CUDA OOM when I ran the app with a larger batch size, but now it's fixed and seems to be working. Thanks for your help.