138 Commits (0ed72befe18e086ac160f1a55aa69b37c928ebb9)

Author SHA1 Message Date
comfyanonymous 0ed72befe1 Change log levels.
Logging level now defaults to info. --verbose sets it to debug.
12 months ago
comfyanonymous 65397ce601 Replace prints with logging and add --verbose argument. 12 months ago
comfyanonymous dce3555339 Add some tesla pascal GPUs to the fp16 working but slower list. 12 months ago
comfyanonymous 88f300401c Enable fp16 by default on mps. 1 year ago
comfyanonymous 929e266f3e Manual cast for bf16 on older GPUs. 1 year ago
comfyanonymous 0b3c50480c Make --force-fp32 disable loading models in bf16. 1 year ago
comfyanonymous f83109f09b Stable Cascade Stage C. 1 year ago
comfyanonymous aeaeca10bd Small refactor of is_device_* functions. 1 year ago
comfyanonymous 66e28ef45c Don't use is_bf16_supported to check for fp16 support. 1 year ago
comfyanonymous 24129d78e6 Speed up SDXL on 16xx series with fp16 weights and manual cast. 1 year ago
comfyanonymous 4b0239066d Always use fp16 for the text encoders. 1 year ago
comfyanonymous f9e55d8463 Only auto enable bf16 VAE on nvidia GPUs that actually support it. 1 year ago
comfyanonymous 1b103e0cb2 Add argument to run the VAE on the CPU. 1 year ago
comfyanonymous e1e322cf69 Load weights that can't be lowvramed to target device. 1 year ago
comfyanonymous a252963f95 --disable-smart-memory now unloads everything like it did originally. 1 year ago
comfyanonymous 36a7953142 Greatly improve lowvram sampling speed by getting rid of accelerate.
Let me know if this breaks anything.
1 year ago
comfyanonymous 2f9d6a97ec Add --deterministic option to make pytorch use deterministic algorithms. 1 year ago
comfyanonymous b0aab1e4ea Add an option --fp16-unet to force using fp16 for the unet. 1 year ago
comfyanonymous ba07cb748e Use faster manual cast for fp8 in unet. 1 year ago
comfyanonymous 57926635e8 Switch text encoder to manual cast.
Use fp16 text encoder weights for CPU inference to lower memory usage.
1 year ago
comfyanonymous 340177e6e8 Disable non blocking on mps. 1 year ago
comfyanonymous 9ac0b487ac Make --gpu-only put intermediate values in GPU memory instead of cpu. 1 year ago
comfyanonymous 2db86b4676 Slightly faster lora applying. 1 year ago
comfyanonymous ca82ade765 Use .itemsize to get dtype size for fp8. 1 year ago
comfyanonymous 31b0f6f3d8 UNET weights can now be stored in fp8.
--fp8_e4m3fn-unet and --fp8_e5m2-unet are the two different formats
supported by pytorch.
1 year ago
comfyanonymous 0cf4e86939 Add some command line arguments to store text encoder weights in fp8.
Pytorch supports two variants of fp8:
--fp8_e4m3fn-text-enc (the one that seems to give better results)
--fp8_e5m2-text-enc
1 year ago
comfyanonymous 7339479b10 Disable xformers when it can't load properly. 1 year ago
comfyanonymous dd4ba68b6e Allow different models to estimate memory usage differently. 1 year ago
comfyanonymous 8594c8be4d Empty the cache when torch cache is more than 25% free mem. 1 year ago
comfyanonymous c8013f73e5 Add some Quadro cards to the list of cards with broken fp16. 1 year ago
comfyanonymous fd4c5f07e7 Add a --bf16-unet to test running the unet in bf16. 1 year ago
comfyanonymous 9a55dadb4c Refactor code so model can be a dtype other than fp32 or fp16. 1 year ago
comfyanonymous 88733c997f pytorch_attention_enabled can now return True when xformers is enabled. 1 year ago
comfyanonymous 20d3852aa1 Pull some small changes from the other repo. 1 year ago
Simon Lui eec449ca8e Allow Intel GPUs to LoRA cast on GPU since it supports BF16 natively. 1 year ago
comfyanonymous 1cdfb3dba4 Only do the cast on the device if the device supports it. 1 year ago
comfyanonymous 321c5fa295 Enable pytorch attention by default on xpu. 1 year ago
comfyanonymous 0966d3ce82 Don't run text encoders on xpu because there are issues. 1 year ago
comfyanonymous 1938f5c5fe Add a force argument to soft_empty_cache to force a cache empty. 1 year ago
Simon Lui 4a0c4ce4ef Some fixes to generalize CUDA specific functionality to Intel or other GPUs. 1 year ago
comfyanonymous b8c7c770d3 Enable bf16-vae by default on ampere and up. 2 years ago
comfyanonymous a57b0c797b Fix lowvram model merging. 2 years ago
comfyanonymous f72780a7e3 The new smart memory management makes this unnecessary. 2 years ago
comfyanonymous 30eb92c3cb Code cleanups. 2 years ago
comfyanonymous 51dde87e97 Try to free enough vram for control lora inference. 2 years ago
comfyanonymous cc44ade79e Always shift text encoder to GPU when the device supports fp16. 2 years ago
comfyanonymous a6ef08a46a Even with forced fp16 the cpu device should never use it. 2 years ago
comfyanonymous f081017c1a Save memory by storing text encoder weights in fp16 in most situations.
Do inference in fp32 to make sure quality stays the exact same.
2 years ago
comfyanonymous 0d7b0a4dc7 Small cleanups. 2 years ago
Simon Lui 9225465975 Further tuning and fix mem_free_total. 2 years ago