Milestone Systems Brings Advanced Video Intelligence to Cities Worldwide
Milestone Systems unveiled its new Vision Language Model (VLM) in late 2025, introducing a new level of video intelligence to smart city concepts. The technology enables cities to understand and respond more quickly and effectively to traffic incidents by transforming real-world video footage into precise, actionable text summaries. Powered by artificial intelligence, the VLM converts complex visual information into clear descriptions of events, accelerating decision-making in control rooms, traffic management centers, and emergency services.
Milestone notes that the solution is available through easy API access, opening the door to a new generation of agentic smart city applications capable of autonomously analyzing situations and supporting coordinated responses by city services.
The Vision Language Model is powered by NVIDIA Cosmos Reason and has been further trained on more than 75,000 hours of collected video data. In collaboration with NVIDIA, Milestone leverages the latest advances in AI reasoning to help cities and organizations make better operational decisions, enhance traffic safety, and improve overall urban resilience. According to the company, the launch of the Vision Language Model marks a significant step forward in the evolution of video management platforms, positioning video not only as evidentiary material but as a key source of intelligence within modern smart city ecosystems.
More information about the solution is available on the company’s official website: https://www.milestonesys.com/company/news/press-releases/milestone-launches-vision-language-model






















