Different output depending on if points are added incrementally or all at once #526

Zarxrax · 2025-01-11T20:14:02Z

In the video predictor, if I add one point on a frame, and then add a second point,

        input_points = np.array([[965, 423]], dtype=np.float32)
        input_labels = np.array([1], np.int32)
        _, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
            inference_state=inference_state, frame_idx=0, obj_id=object_id, points=input_points, labels=input_labels)
        
        input_points = np.array([[965, 423], [928, 472]], dtype=np.float32)
        input_labels = np.array([1, 1], np.int32)
        _, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
            inference_state=inference_state, frame_idx=0, obj_id=object_id, points=input_points, labels=input_labels)

Versus just loading both points at once

        input_points = np.array([[965, 423], [928, 472]], dtype=np.float32)
        input_labels = np.array([1, 1], np.int32)
        _, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
            inference_state=inference_state, frame_idx=0, obj_id=object_id, points=input_points, labels=input_labels)

I get different outputs!

The result is just a little different here, but I have seen vastly different results as well.

I would have expected them to both produce the same output since the points are the same.
I would normally add points incrementally when I am working on it interactively. But if I want to reload those points at a later time, its much easier to go with the second option and load all of them together, rather than iterating through and adding one more point to the list each time.

I guess I am wondering if this difference in behavior is expected? And I am also wondering if it will be much slower to load points incrementally for a large number of points?

The text was updated successfully, but these errors were encountered:

aenoca · 2025-01-13T16:39:00Z

This is to be expected due to the way the prompts are fed into the model. If multiple clicks are given into the model at the same time, they are processed together and produce a single prompt token that is stored statically in the memory buffer. Alternatively, if prompts are given one after the other, each prompt produces a separate prompt token added to the memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different output depending on if points are added incrementally or all at once #526

Different output depending on if points are added incrementally or all at once #526

Zarxrax commented Jan 11, 2025

aenoca commented Jan 13, 2025

Different output depending on if points are added incrementally or all at once #526

Different output depending on if points are added incrementally or all at once #526

Comments

Zarxrax commented Jan 11, 2025

aenoca commented Jan 13, 2025