Abstract: Transferring visual language models (VLMs) from the image domain to the video domain has recently yielded great success on human action recognition tasks. However, standard recognition ...
Abstract: Human-centric instructional videos provide opportunities for users to learn real-world multistep tasks, such as cooking, makeup, and using professional tools. However, these lengthy videos ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results