Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different output depending on if points are added incrementally or all at once #526

Open
Zarxrax opened this issue Jan 11, 2025 · 1 comment

Comments

@Zarxrax
Copy link

Zarxrax commented Jan 11, 2025

In the video predictor, if I add one point on a frame, and then add a second point,

        input_points = np.array([[965, 423]], dtype=np.float32)
        input_labels = np.array([1], np.int32)
        _, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
            inference_state=inference_state, frame_idx=0, obj_id=object_id, points=input_points, labels=input_labels)
        
        input_points = np.array([[965, 423], [928, 472]], dtype=np.float32)
        input_labels = np.array([1, 1], np.int32)
        _, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
            inference_state=inference_state, frame_idx=0, obj_id=object_id, points=input_points, labels=input_labels)

Versus just loading both points at once

        input_points = np.array([[965, 423], [928, 472]], dtype=np.float32)
        input_labels = np.array([1, 1], np.int32)
        _, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
            inference_state=inference_state, frame_idx=0, obj_id=object_id, points=input_points, labels=input_labels)

I get different outputs!

in put
incremental
together

The result is just a little different here, but I have seen vastly different results as well.

I would have expected them to both produce the same output since the points are the same.
I would normally add points incrementally when I am working on it interactively. But if I want to reload those points at a later time, its much easier to go with the second option and load all of them together, rather than iterating through and adding one more point to the list each time.

I guess I am wondering if this difference in behavior is expected? And I am also wondering if it will be much slower to load points incrementally for a large number of points?

@aenoca
Copy link

aenoca commented Jan 13, 2025

This is to be expected due to the way the prompts are fed into the model. If multiple clicks are given into the model at the same time, they are processed together and produce a single prompt token that is stored statically in the memory buffer. Alternatively, if prompts are given one after the other, each prompt produces a separate prompt token added to the memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants