Question: Use cases for vision language models in edge devices
So... I'm thinking about pitching an ASIC accelerator that can support vision-language models for edge devices. I'm curious if there are any compelling use cases for running a VLM on the edge (as opposed to offloading the VLM workloads to the cloud).
So... immediate benefits I see to doing edge-VLM is
Privacy (no data sent offline)
Latency (no need to send images/input data to a remote machine, where it can then run inference)
Reason 1 is valid but does not have the "wow factor" reasons for supporting VLMs on edge devices (since this is a reason that is workload agnostic). Reason 2 is fair, but where would a latency-sensitive application of VLMs be needed?
Curious to hear folks' opinions on supporting VLMs for edge devices!