Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicted point of regard ~10x bigger on demo #20

Open
rogeriochaves opened this issue Jan 24, 2021 · 6 comments
Open

Predicted point of regard ~10x bigger on demo #20

rogeriochaves opened this issue Jan 24, 2021 · 6 comments

Comments

@rogeriochaves
Copy link

rogeriochaves commented Jan 24, 2021

Hello there!

For some reason the predicted PoR is way off screen, to try to debug it, on an already trained network I ran the person calibration again, then saved the gaze_n_vector variable used during training and g_cnn variable used during prediction on frame_processor.py, and if I plot them separate I get this:

image

Leaving the clear error aside, if I plot them together I get this:

image

now if I fit a linear regression I get a coef of almost exactly 0.1 for both

image

now by applying those I get a prediction that makes more sense

image

why is that? Is some part of the calculation missing during prediction on frame_processor.py? Why is PoR always 10x bigger?

@shalinidemello
Copy link
Collaborator

Did you calibrate your camera to obtain its intrinsic parameters and more importantly, the extrinsic parameters (rotation, translation) between your camera and your monitor as described https://github.com/NVlabs/few_shot_gaze/blob/master/demo/README.md, step 2b?

@rogeriochaves
Copy link
Author

rogeriochaves commented Feb 9, 2021

thanks for replying!
no, to be honest I skipped this step at first, but now I tried to follow through again. I couldn't make the newer Ver. 2 work, neither the matlab version (I'm using octave) nor the C version. But I could make the Ver. 1 Matlab version work (I think?), but I don't know what to do with the outputs

Average reprojection error by TNM : 0.169550 pixel.

==== Parameters by TNM ====
R =

  -0.914077  -0.030128   0.404420
  -0.020261   0.999384   0.028656
  -0.405034   0.018000  -0.914124

T =

   285.23
   100.70
   159.83

n1 =

   0.333519
   0.048892
  -0.941475

n2 =

   0.484883
   0.098207
  -0.869048

n3 =

   0.047996
   0.011300
  -0.998784

d1 = 351.78
d2 = 318.01
d3 = 377.73
points =

   285.234   100.701   159.834
   125.271    97.155    88.953
   282.221   200.639   161.634
   285.234   100.701   159.834

points =

    84.2177    71.2328   727.2730     1.0000
   -84.5563    66.3955   681.2629     1.0000
    79.7462   170.9574   733.1905     1.0000
    84.2177    71.2328   727.2730     1.0000

points =

   -32.1676    36.4152   728.7079     1.0000
  -176.3115    36.0734   629.4738     1.0000
   -41.7646   135.0200   742.3086     1.0000
   -32.1676    36.4152   728.7079     1.0000

points =

   262.8758    95.4368   625.1049     1.0000
    96.8575    90.4655   680.2247     1.0000
   259.9411   195.3936   625.2806     1.0000
   262.8758    95.4368   625.1049     1.0000

points =

   184.726    85.967   443.553
    20.357    81.775   385.108
    16.615   181.607   388.967
   180.984   185.798   447.412
   184.726    85.967   443.553

points =

   126.533    68.558   444.271
   -25.520    66.614   359.213
   -31.825   165.886   366.914
   120.228   167.830   451.971
   126.533    68.558   444.271

points =

   274.055    98.069   392.469
   111.064    93.810   384.589
   108.090   193.758   385.577
   271.081   198.016   393.457
   274.055    98.069   392.469

how should I tweak monitor.py based on this, can you give me an example? It's a lot of numbers and documentation is not clear

FYI, I'm using a Macbook Pro webcam if that makes things simpler

@swook
Copy link
Contributor

swook commented Feb 14, 2021

I am not 100% sure, but I imagine that you need to update https://github.com/NVlabs/few_shot_gaze/blob/master/demo/monitor.py#L28 (and its inverse) based on these values that you determined:

T =

   285.23
   100.70
   159.83

I imagine that this is the translation from the screen to camera coordinate system in millimeters.

So for example, you could probably define (yet again, I'm not 100% sure):

    def monitor_to_camera(self, x_pixel, y_pixel):

        x_cam_mm = 285.23 + ((int(self.w_pixels/2) - x_pixel)/self.w_pixels) * self.w_mm
        y_cam_mm = 100.7 + (y_pixel/self.h_pixels) * self.h_mm
        z_cam_mm = 159.83

        return x_cam_mm, y_cam_mm, z_cam_mm

and a corresponding camera_to_monitor

@zwfcrazy
Copy link

zwfcrazy commented Apr 13, 2021

I am not 100% sure, but I imagine that you need to update https://github.com/NVlabs/few_shot_gaze/blob/master/demo/monitor.py#L28 (and its inverse) based on these values that you determined:

T =

   285.23
   100.70
   159.83

I imagine that this is the translation from the screen to camera coordinate system in millimeters.

So for example, you could probably define (yet again, I'm not 100% sure):

    def monitor_to_camera(self, x_pixel, y_pixel):

        x_cam_mm = 285.23 + ((int(self.w_pixels/2) - x_pixel)/self.w_pixels) * self.w_mm
        y_cam_mm = 100.7 + (y_pixel/self.h_pixels) * self.h_mm
        z_cam_mm = 159.83

        return x_cam_mm, y_cam_mm, z_cam_mm

and a corresponding camera_to_monitor

Not exactly.
The code in the following places:

d = -gaze_cam_origin[2] / g_cam_forward[2]

d = -gaze_cam_origin[2] / g_cam_forward[2]

def monitor_to_camera(self, x_pixel, y_pixel):

def camera_to_monitor(self, x_cam_mm, y_cam_mm):

assume that the z axis of the camera and the z axis of the monitor are parallel and there is no translation in the z direction, i.e. z=0.
However, from the R and T given by @rogeriochaves, it can be seen that neither of the two assumptions stands.
In order to correctly apply the calibration results, you need to

  1. apply a full coordinate transformation in monitor.py by using not only the translation vector T but also the rotation matrix R.
  2. change the way of calculating the POR by not assuming z=0, you should find the intersection between the gaze vector and the monitor plane (usually the xy plane of the monitor).

BTW, the R and T given by the calibration process actually describes the relationship between the chessboard pattern displayed on the monitor and the camera. It may not equal to the relationship between the monitor and the camera. You need to find the relationship between the chessboard pattern and the monitor as well.

@ShreshthSaxena
Copy link

I am not 100% sure, but I imagine that you need to update https://github.com/NVlabs/few_shot_gaze/blob/master/demo/monitor.py#L28 (and its inverse) based on these values that you determined:

T =

   285.23
   100.70
   159.83

I imagine that this is the translation from the screen to camera coordinate system in millimeters.
So for example, you could probably define (yet again, I'm not 100% sure):

    def monitor_to_camera(self, x_pixel, y_pixel):

        x_cam_mm = 285.23 + ((int(self.w_pixels/2) - x_pixel)/self.w_pixels) * self.w_mm
        y_cam_mm = 100.7 + (y_pixel/self.h_pixels) * self.h_mm
        z_cam_mm = 159.83

        return x_cam_mm, y_cam_mm, z_cam_mm

and a corresponding camera_to_monitor

Not exactly.
The code in the following places:

d = -gaze_cam_origin[2] / g_cam_forward[2]

d = -gaze_cam_origin[2] / g_cam_forward[2]

def monitor_to_camera(self, x_pixel, y_pixel):

def camera_to_monitor(self, x_cam_mm, y_cam_mm):

assume that the z axis of the camera and the z axis of the monitor are parallel and there is no translation in the z direction, i.e. z=0.
However, from the R and T given by @rogeriochaves, it can be seen that neither of the two assumptions stands.
In order to correctly apply the calibration results, you need to

  1. apply a full coordinate transformation in monitor.py by using not only the translation vector T but also the rotation matrix R.
  2. change the way of calculating the POR by not assuming z=0, you should find the intersection between the gaze vector and the monitor plane (usually the xy plane of the monitor).

BTW, the R and T given by the calibration process actually describes the relationship between the chessboard pattern displayed on the monitor and the camera. It may not equal to the relationship between the monitor and the camera. You need to find the relationship between the chessboard pattern and the monitor as well.

so is the tnm monitor calibration needed for a default laptop webcam (the assumptions of z=0 and Δy = 10 mm fits)? I've got the model to run but I'm wondering if there's some way to improve accuracy further by calibration ?

@zwfcrazy
Copy link

zwfcrazy commented May 1, 2021

I am not 100% sure, but I imagine that you need to update https://github.com/NVlabs/few_shot_gaze/blob/master/demo/monitor.py#L28 (and its inverse) based on these values that you determined:

T =

   285.23
   100.70
   159.83

I imagine that this is the translation from the screen to camera coordinate system in millimeters.
So for example, you could probably define (yet again, I'm not 100% sure):

    def monitor_to_camera(self, x_pixel, y_pixel):

        x_cam_mm = 285.23 + ((int(self.w_pixels/2) - x_pixel)/self.w_pixels) * self.w_mm
        y_cam_mm = 100.7 + (y_pixel/self.h_pixels) * self.h_mm
        z_cam_mm = 159.83

        return x_cam_mm, y_cam_mm, z_cam_mm

and a corresponding camera_to_monitor

Not exactly.
The code in the following places:

d = -gaze_cam_origin[2] / g_cam_forward[2]

d = -gaze_cam_origin[2] / g_cam_forward[2]

def monitor_to_camera(self, x_pixel, y_pixel):

def camera_to_monitor(self, x_cam_mm, y_cam_mm):

assume that the z axis of the camera and the z axis of the monitor are parallel and there is no translation in the z direction, i.e. z=0.
However, from the R and T given by @rogeriochaves, it can be seen that neither of the two assumptions stands.
In order to correctly apply the calibration results, you need to

  1. apply a full coordinate transformation in monitor.py by using not only the translation vector T but also the rotation matrix R.
  2. change the way of calculating the POR by not assuming z=0, you should find the intersection between the gaze vector and the monitor plane (usually the xy plane of the monitor).

BTW, the R and T given by the calibration process actually describes the relationship between the chessboard pattern displayed on the monitor and the camera. It may not equal to the relationship between the monitor and the camera. You need to find the relationship between the chessboard pattern and the monitor as well.

so is the tnm monitor calibration needed for a default laptop webcam (the assumptions of z=0 and Δy = 10 mm fits)? I've got the model to run but I'm wondering if there's some way to improve accuracy further by calibration ?

Every laptop hardware configuration is different but the assumption of z=0 should be OK to use. But you need to at least measure Δy and Δx using a ruler if you really don't want to do the cailbration. (Δy = the distance between the camera and the upper edge of the monitor; Δx = the distance between the camera and the left edge of the monitor, usually equal to monitor width / 2.)

However, a good calibration won't help improve accuray in this case. I believe the accuracy is limited by the image resolution. I did an experiment and it turned out that you almost cannot recognize the eye movement in images taken for two target points that are less than 2cm apart on the screen. Increasing image resolution might be a solution but this will also increase the complexity of the neural network and you need to build a high resolution training dataset as well. So I think this still remains an open problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants