ComfyUI-Repo

Commit Graph

Author	SHA1	Message	Date
comfyanonymous	0ed72befe1	Change log levels. Logging level now defaults to info. --verbose sets it to debug.	12 months ago
comfyanonymous	65397ce601	Replace prints with logging and add --verbose argument.	12 months ago
comfyanonymous	dce3555339	Add some tesla pascal GPUs to the fp16 working but slower list.	12 months ago
comfyanonymous	88f300401c	Enable fp16 by default on mps.	1 year ago
comfyanonymous	929e266f3e	Manual cast for bf16 on older GPUs.	1 year ago
comfyanonymous	0b3c50480c	Make --force-fp32 disable loading models in bf16.	1 year ago
comfyanonymous	f83109f09b	Stable Cascade Stage C.	1 year ago
comfyanonymous	aeaeca10bd	Small refactor of is_device_* functions.	1 year ago
comfyanonymous	66e28ef45c	Don't use is_bf16_supported to check for fp16 support.	1 year ago
comfyanonymous	24129d78e6	Speed up SDXL on 16xx series with fp16 weights and manual cast.	1 year ago
comfyanonymous	4b0239066d	Always use fp16 for the text encoders.	1 year ago
comfyanonymous	f9e55d8463	Only auto enable bf16 VAE on nvidia GPUs that actually support it.	1 year ago
comfyanonymous	1b103e0cb2	Add argument to run the VAE on the CPU.	1 year ago
comfyanonymous	e1e322cf69	Load weights that can't be lowvramed to target device.	1 year ago
comfyanonymous	a252963f95	--disable-smart-memory now unloads everything like it did originally.	1 year ago
comfyanonymous	36a7953142	Greatly improve lowvram sampling speed by getting rid of accelerate. Let me know if this breaks anything.	1 year ago
comfyanonymous	2f9d6a97ec	Add --deterministic option to make pytorch use deterministic algorithms.	1 year ago
comfyanonymous	b0aab1e4ea	Add an option --fp16-unet to force using fp16 for the unet.	1 year ago
comfyanonymous	ba07cb748e	Use faster manual cast for fp8 in unet.	1 year ago
comfyanonymous	57926635e8	Switch text encoder to manual cast. Use fp16 text encoder weights for CPU inference to lower memory usage.	1 year ago
comfyanonymous	340177e6e8	Disable non blocking on mps.	1 year ago
comfyanonymous	9ac0b487ac	Make --gpu-only put intermediate values in GPU memory instead of cpu.	1 year ago
comfyanonymous	2db86b4676	Slightly faster lora applying.	1 year ago
comfyanonymous	ca82ade765	Use .itemsize to get dtype size for fp8.	1 year ago
comfyanonymous	31b0f6f3d8	UNET weights can now be stored in fp8. --fp8_e4m3fn-unet and --fp8_e5m2-unet are the two different formats supported by pytorch.	1 year ago
comfyanonymous	0cf4e86939	Add some command line arguments to store text encoder weights in fp8. Pytorch supports two variants of fp8: --fp8_e4m3fn-text-enc (the one that seems to give better results) --fp8_e5m2-text-enc	1 year ago
comfyanonymous	7339479b10	Disable xformers when it can't load properly.	1 year ago
comfyanonymous	dd4ba68b6e	Allow different models to estimate memory usage differently.	1 year ago
comfyanonymous	8594c8be4d	Empty the cache when torch cache is more than 25% free mem.	1 year ago
comfyanonymous	c8013f73e5	Add some Quadro cards to the list of cards with broken fp16.	1 year ago
comfyanonymous	fd4c5f07e7	Add a --bf16-unet to test running the unet in bf16.	1 year ago
comfyanonymous	9a55dadb4c	Refactor code so model can be a dtype other than fp32 or fp16.	1 year ago
comfyanonymous	88733c997f	pytorch_attention_enabled can now return True when xformers is enabled.	1 year ago
comfyanonymous	20d3852aa1	Pull some small changes from the other repo.	1 year ago
Simon Lui	eec449ca8e	Allow Intel GPUs to LoRA cast on GPU since it supports BF16 natively.	1 year ago
comfyanonymous	1cdfb3dba4	Only do the cast on the device if the device supports it.	1 year ago
comfyanonymous	321c5fa295	Enable pytorch attention by default on xpu.	1 year ago
comfyanonymous	0966d3ce82	Don't run text encoders on xpu because there are issues.	1 year ago
comfyanonymous	1938f5c5fe	Add a force argument to soft_empty_cache to force a cache empty.	1 year ago
Simon Lui	4a0c4ce4ef	Some fixes to generalize CUDA specific functionality to Intel or other GPUs.	1 year ago
comfyanonymous	b8c7c770d3	Enable bf16-vae by default on ampere and up.	2 years ago
comfyanonymous	a57b0c797b	Fix lowvram model merging.	2 years ago
comfyanonymous	f72780a7e3	The new smart memory management makes this unnecessary.	2 years ago
comfyanonymous	30eb92c3cb	Code cleanups.	2 years ago
comfyanonymous	51dde87e97	Try to free enough vram for control lora inference.	2 years ago
comfyanonymous	cc44ade79e	Always shift text encoder to GPU when the device supports fp16.	2 years ago
comfyanonymous	a6ef08a46a	Even with forced fp16 the cpu device should never use it.	2 years ago
comfyanonymous	f081017c1a	Save memory by storing text encoder weights in fp16 in most situations. Do inference in fp32 to make sure quality stays the exact same.	2 years ago
comfyanonymous	0d7b0a4dc7	Small cleanups.	2 years ago
Simon Lui	9225465975	Further tuning and fix mem_free_total.	2 years ago

1 2 3

138 Commits (0ed72befe18e086ac160f1a55aa69b37c928ebb9)