104 Commits (ae3e4e9ad821c12b955f6b2343e6255e6d71eaf7)

Author SHA1 Message Date
Simon Lui eec449ca8e Allow Intel GPUs to LoRA cast on GPU since it supports BF16 natively. 1 year ago
comfyanonymous 1cdfb3dba4 Only do the cast on the device if the device supports it. 1 year ago
comfyanonymous 321c5fa295 Enable pytorch attention by default on xpu. 1 year ago
comfyanonymous 0966d3ce82 Don't run text encoders on xpu because there are issues. 1 year ago
comfyanonymous 1938f5c5fe Add a force argument to soft_empty_cache to force a cache empty. 1 year ago
Simon Lui 4a0c4ce4ef Some fixes to generalize CUDA specific functionality to Intel or other GPUs. 1 year ago
comfyanonymous b8c7c770d3 Enable bf16-vae by default on ampere and up. 2 years ago
comfyanonymous a57b0c797b Fix lowvram model merging. 2 years ago
comfyanonymous f72780a7e3 The new smart memory management makes this unnecessary. 2 years ago
comfyanonymous 30eb92c3cb Code cleanups. 2 years ago
comfyanonymous 51dde87e97 Try to free enough vram for control lora inference. 2 years ago
comfyanonymous cc44ade79e Always shift text encoder to GPU when the device supports fp16. 2 years ago
comfyanonymous a6ef08a46a Even with forced fp16 the cpu device should never use it. 2 years ago
comfyanonymous f081017c1a Save memory by storing text encoder weights in fp16 in most situations.
Do inference in fp32 to make sure quality stays the exact same.
2 years ago
comfyanonymous 0d7b0a4dc7 Small cleanups. 2 years ago
Simon Lui 9225465975 Further tuning and fix mem_free_total. 2 years ago
Simon Lui 2c096e4260 Add ipex optimize and other enhancements for Intel GPUs based on recent memory changes. 2 years ago
comfyanonymous e9469e732d --disable-smart-memory now disables loading model directly to vram. 2 years ago
comfyanonymous 3aee33b54e Add --disable-smart-memory for those that want the old behaviour. 2 years ago
comfyanonymous 2be2742711 Fix issue with regular torch version. 2 years ago
comfyanonymous 89a0767abf Smarter memory management.
Try to keep models on the vram when possible.

Better lowvram mode for controlnets.
2 years ago
comfyanonymous 1ce0d8ad68 Add CMP 30HX card to the nvidia_16_series list. 2 years ago
comfyanonymous 4a77fcd6ab Only shift text encoder to vram when CPU cores are under 8. 2 years ago
comfyanonymous 3cd31d0e24 Lower CPU thread check for running the text encoder on the CPU vs GPU. 2 years ago
comfyanonymous 22f29d66ca Try to fix memory issue with lora. 2 years ago
comfyanonymous 4760c29380 Merge branch 'fix-AttributeError-module-'torch'-has-no-attribute-'mps'' of https://github.com/KarryCharon/ComfyUI 2 years ago
comfyanonymous 18885f803a Add MX450 and MX550 to list of cards with broken fp16. 2 years ago
comfyanonymous ff6b047a74 Fix device print on old torch version. 2 years ago
comfyanonymous 1679abd86d Add a command line argument to enable backend:cudaMallocAsync 2 years ago
comfyanonymous 5f57362613 Lower lora ram usage when in normal vram mode. 2 years ago
comfyanonymous 490771b7f4 Speed up lora loading a bit. 2 years ago
KarryCharon 3e2309f149 fix mps miss import 2 years ago
comfyanonymous 0ae81c03bb Empty cache after model unloading for normal vram and lower. 2 years ago
comfyanonymous e7bee85df8 Add arguments to run the VAE in fp16 or bf16 for testing. 2 years ago
comfyanonymous ddc6f12ad5 Disable autocast in unet for increased speed. 2 years ago
comfyanonymous 8d694cc450 Fix issue with OSX. 2 years ago
comfyanonymous dc9d1f31c8 Improvements for OSX. 2 years ago
comfyanonymous 2c4e0b49b7 Switch to fp16 on some cards when the model is too big. 2 years ago
comfyanonymous 6f3d9f52db Add a --force-fp16 argument to force fp16 for testing. 2 years ago
comfyanonymous 1c1b0e7299 --gpu-only now keeps the VAE on the device. 2 years ago
comfyanonymous 3b6fe51c1d Leave text_encoder on the CPU when it can handle it. 2 years ago
comfyanonymous b6a60fa696 Try to keep text encoders loaded and patched to increase speed.
load_model_gpu() is now used with the text encoder models instead of just
the unet.
2 years ago
comfyanonymous 97ee230682 Make highvram and normalvram shift the text encoders to vram and back.
This is faster on big text encoder models than running it on the CPU.
2 years ago
comfyanonymous 62db11683b Move unet to device right after loading on highvram mode. 2 years ago
comfyanonymous 8248babd44 Use pytorch attention by default on nvidia when xformers isn't present.
Add a new argument --use-quad-cross-attention
2 years ago
comfyanonymous f7edcfd927 Add a --gpu-only argument to keep and run everything on the GPU.
Make the CLIP model work on the GPU.
2 years ago
comfyanonymous fed0a4dd29 Some comments to say what the vram state options mean. 2 years ago
comfyanonymous 0a5fefd621 Cleanups and fixes for model_management.py
Hopefully fix regression on MPS and CPU.
2 years ago
comfyanonymous 67892b5ac5 Refactor and improve model_management code related to free memory. 2 years ago
space-nuko 499641ebf1 More accurate total 2 years ago