Wan2.1 I2v 720p 14b Fp16.safetensors · Direct Link

The model file wan2.1_i2v_720p_14B_fp16.safetensors is a high-fidelity image-to-video (I2V) diffusion model based on the Wan 2.1 architecture. It is designed for generating 720p resolution videos and requires significant hardware resources due to its 14-billion parameter size and FP16 (half-precision) format. Hugging Face Model Specifications Architecture

is a cutting-edge, open-source video foundation model developed by Alibaba's Wan-AI team. Released in early 2025, this 14-billion parameter model specializes in Image-to-Video (I2V) generation, transforming static images into high-definition 720p videos with realistic physics and complex motion dynamics. wan2.1 i2v 720p 14b fp16.safetensors

On a single A100, generating a 4-second 720p video at 24fps (96 frames) takes approximately 12-18 minutes using typical DDIM samplers. On dual 4090s, expect 25-30 minutes. The model file wan2