347 Commits (9ba440995a41f3da266012e014d1ad55ad91a032)

Author SHA1 Message Date
comfyanonymous 9ba440995a It's actually possible to torch.compile the unet now. 2 years ago
comfyanonymous 51d5477579 Add key to indicate checkpoint is v_prediction when saving. 2 years ago
comfyanonymous ff6b047a74 Fix device print on old torch version. 2 years ago
comfyanonymous 9871a15cf9 Enable --cuda-malloc by default on torch 2.0 and up.
Add --disable-cuda-malloc to disable it.
2 years ago
comfyanonymous 55d0fca9fa --windows-standalone-build now enables --cuda-malloc 2 years ago
comfyanonymous 1679abd86d Add a command line argument to enable backend:cudaMallocAsync 2 years ago
comfyanonymous 3a150bad15 Only calculate randn in some samplers when it's actually being used. 2 years ago
comfyanonymous ee8f8ee07f Fix regression with ddim and uni_pc when batch size > 1. 2 years ago
comfyanonymous 3ded1a3a04 Refactor of sampler code to deal more easily with different model types. 2 years ago
comfyanonymous 5f57362613 Lower lora ram usage when in normal vram mode. 2 years ago
comfyanonymous 490771b7f4 Speed up lora loading a bit. 2 years ago
comfyanonymous 50b1180dde Fix CLIPSetLastLayer not reverting when removed. 2 years ago
comfyanonymous 6fb084f39d Reduce floating point rounding errors in loras. 2 years ago
comfyanonymous 91ed2815d5 Add a node to merge CLIP models. 2 years ago
comfyanonymous b2f03164c7 Prevent the clip_g position_ids key from being saved in the checkpoint.
This is to make it match the official checkpoint.
2 years ago
comfyanonymous 46dc050c9f Fix potential tensors being on different devices issues. 2 years ago
comfyanonymous 606a537090 Support SDXL embedding format with 2 CLIP. 2 years ago
comfyanonymous 6ad0a6d7e2 Don't patch weights when multiplier is zero. 2 years ago
comfyanonymous d5323d16e0 latent2rgb matrix for SDXL. 2 years ago
comfyanonymous 0ae81c03bb Empty cache after model unloading for normal vram and lower. 2 years ago
comfyanonymous d3f5998218 Support loading clip_g from diffusers in CLIP Loader nodes. 2 years ago
comfyanonymous a9a4ba7574 Fix merging not working when model2 of model merge node was a merge. 2 years ago
comfyanonymous bb5fbd29e9 Merge branch 'condmask-fix' of https://github.com/vmedea/ComfyUI 2 years ago
comfyanonymous e7bee85df8 Add arguments to run the VAE in fp16 or bf16 for testing. 2 years ago
comfyanonymous 608fcc2591 Fix bug with weights when prompt is long. 2 years ago
comfyanonymous ddc6f12ad5 Disable autocast in unet for increased speed. 2 years ago
comfyanonymous 603f02d613 Fix loras not working when loading checkpoint with config. 2 years ago
comfyanonymous af7a49916b Support loading unet files in diffusers format. 2 years ago
comfyanonymous e57cba4c61 Add gpu variations of the sde samplers that are less deterministic
but faster.
2 years ago
comfyanonymous f81b192944 Add logit scale parameter so it's present when saving the checkpoint. 2 years ago
comfyanonymous acf95191ff Properly support SDXL diffusers loras for unet. 2 years ago
mara c61a95f9f7 Fix size check for conditioning mask
The wrong dimensions were being checked, [1] and [2] are the image size.
not [2] and [3]. This results in an out-of-bounds error if one of them
actually matches.
2 years ago
comfyanonymous 8d694cc450 Fix issue with OSX. 2 years ago
comfyanonymous c3e96e637d Pass device to CLIP model. 2 years ago
comfyanonymous 5e6bc824aa Allow passing custom path to clip-g and clip-h. 2 years ago
comfyanonymous dc9d1f31c8 Improvements for OSX. 2 years ago
comfyanonymous 103c487a89 Cleanup. 2 years ago
comfyanonymous 2c4e0b49b7 Switch to fp16 on some cards when the model is too big. 2 years ago
comfyanonymous 6f3d9f52db Add a --force-fp16 argument to force fp16 for testing. 2 years ago
comfyanonymous 1c1b0e7299 --gpu-only now keeps the VAE on the device. 2 years ago
comfyanonymous ce35d8c659 Lower latency by batching some text encoder inputs. 2 years ago
comfyanonymous 3b6fe51c1d Leave text_encoder on the CPU when it can handle it. 2 years ago
comfyanonymous b6a60fa696 Try to keep text encoders loaded and patched to increase speed.
load_model_gpu() is now used with the text encoder models instead of just
the unet.
2 years ago
comfyanonymous 97ee230682 Make highvram and normalvram shift the text encoders to vram and back.
This is faster on big text encoder models than running it on the CPU.
2 years ago
comfyanonymous 5a9ddf94eb LoraLoader node now caches the lora file between executions. 2 years ago
comfyanonymous 9920367d3c Fix embeddings not working with --gpu-only 2 years ago
comfyanonymous 62db11683b Move unet to device right after loading on highvram mode. 2 years ago
comfyanonymous 4376b125eb Remove useless code. 2 years ago
comfyanonymous 89120f1fbe This is unused but it should be 1280. 2 years ago
comfyanonymous 2c7c14de56 Support for SDXL text encoder lora. 2 years ago