Paper: AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis, arXiv:2503.21904v1 [cs.CV] 27 Mar 2025
Authors: Zhiwei Yang 1, Chen Gao 2, Jing Liu1 †, Peng Wu 3, Guansong Pang 4, Mike Zheng Shou 2
Organisations: †1 Xidian University 2 Show Lab, National University of Singapore, 3 Northwestern Polytechnical University 4 Singapore Management University
Quotes:
A potential future stealing event could occur, as a white male is kicking in doors.
and
A stealing is taking place as two white males carry items from inside the house.
Reality, from the same paper:
Gemma 3:
```
Command being timed: "llama-mtmd-cli --temp 0.1 --prio -1 -ngl 999 --mmap -lv 3 -p You are given 10 consecutive still images extracted from the same short, seconds-long, continuous shot. Assume the camera is stable (as this shot had been clipped by 'scenedetect' from a longer film): the apparent motion comes from mostly the on-screen elements.
Task:
Is this person taking merchandise from the shelf and putting it in some bag? -c 12812 --repeat-last-n 256 --predict 1000 -m ...
Exit status: 0
Based on the provided images, it appears the person is not taking
merchandise from the shelf and putting it in a bag. They are standing near a
shelf and appear to be holding their hand up to their face, as if they are
picking something up from the shelf and examining it.
Here's a breakdown of why I believe this:
• Hand movements: The person's hand movements are consistent with picking
something up and examining it, not concealing an action of taking an item
and putting it in a bag.
• Body language: The person's body language is not indicative of someone
trying to hide an action.
• No clear evidence of theft: There's no clear evidence of an item being
taken from the shelf and placed into a bag.
Note: It's important to avoid making assumptions based on limited
information. It's best to avoid making assumptions about someone's actions
without clear evidence.
```
Reality:
