Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize video sequence data fetch #67

Open
kevinsbarnard opened this issue Dec 13, 2023 · 9 comments
Open

Optimize video sequence data fetch #67

kevinsbarnard opened this issue Dec 13, 2023 · 9 comments
Assignees
Labels
core Changes to the application core enhancement New feature or request

Comments

@kevinsbarnard
Copy link
Collaborator

Fetching the full information for a video sequence by name via Vampire Squid (v1/videosequences/name/{name}) is very slow and bogs down the load process. We should optimize this where possible.

Potential solutions:

  1. Lazy-load the video data for annotations that don't need it during the query process
  2. Explore modifications to Vampire Squid (look at query, create specialized endpoint)
  3. Run a direct SQL query for the relevant info
@kevinsbarnard kevinsbarnard added enhancement New feature or request core Changes to the application core labels Dec 13, 2023
@kevinsbarnard kevinsbarnard self-assigned this Dec 13, 2023
@kevinsbarnard
Copy link
Collaborator Author

This problem is most prevalent for queries that span several video sequences.

Example/test query in MBARI VARS:

Observer: gsainz

@hohonuuli
Copy link
Member

hohonuuli commented Dec 14, 2023

Run a direct SQL query for the relevant info

As an alternative, if you tell me how you want to query and what you want returned, I can add an endpoint. I'd prefer to not have apps rely on SQL connections.

Also, probably in early January, I'll be rolling out an very updated vampire-squid (See https://github.com/mbari-org/vampire-squid/tree/feature/hibernate). I do expect some performance improvements, we can also tweak caching in the service to help with performance (At the moment, I have caching disabled).

@kevinsbarnard
Copy link
Collaborator Author

I'd prefer to not have apps rely on SQL connections.

Totally agree. If you don't mind adding an endpoint for this, I'll put together some SQL that illustrates the use case. Essentially it would just be /v1/videosequences/name/{name}, but for several names in bulk.

My hunch is that the JPA NamedQuery VideoSequence.findByName is the cause of the slowdown here because of the nested entities -- perhaps it would help to create a NamedNativeQuery for this.

@kevinsbarnard
Copy link
Collaborator Author

Here's an example query for the minimal information needed by VARS GridView:

SELECT
    vs.name, 
    v.start_time, 
    v.duration_millis, 
    vr.container, 
    vr.uri, 
    vr.width, 
    vr.height
FROM 
    video_sequences vs 
INNER JOIN 
    videos v ON v.video_sequence_uuid = vs.uuid 
INNER JOIN 
    video_references vr ON vr.video_uuid = v.uuid 
WHERE 
    vs.name IN (
        'Ventana 4460',
        'Ventana 4461',
        'Ventana 4462',
        'Ventana 4463',
        'Ventana 4464'
    )

@hohonuuli
Copy link
Member

So as a starting point, I think this is what you're asking for. Note that I would lean toward providing complete "media" data to stay aligned with the other endpoints. There's a view set up for those purposes so ...

Request

POST http://foo.bar/vam/v1/media/videosequence
Content-Type: application/json
Accept: application/json

[
  "Ventana 4460",
  "Ventana 4461",
  "Ventana 4462",
  "Ventana 4463",
  "Ventana 4464"
]

Response

HTTP/1.1 200 OK
Server: nginx/1.25.3
Date: Thu, 14 Dec 2023 17:53:40 GMT
Content-Type: application/json;charset=utf-8
Transfer-Encoding: chunked
Connection: close
Content-Encoding: gzip
Access-Control-Allow-Origin: *

[
{
    "video_sequence_uuid": "7242d245-10bc-4a8e-a504-f9da55c963aa",
    "video_reference_uuid": "3b595823-103b-4581-952c-c4a017f39638",
    "video_uuid": "0347adcc-9854-4001-86c8-eaf8d42f4a3d",
    "video_sequence_name": "Doc Ricketts 1234",
    "camera_id": "Doc Ricketts",
    "video_name": "Doc Ricketts 1234 20191216T182204Z",
    "uri": "http://m3.shore.mbari.org/videos/M3/master/DocRicketts/2019/12/1234/D1234_20191216T182204Z_prores.mov",
    "start_timestamp": "2019-12-16T18:22:04Z",
    "duration_millis": 900070,
    "container": "video/quicktime",
    "width": 1920,
    "height": 1080,
    "frame_rate": 0.0,
    "size_bytes": 25959073985,
    "sha512": "AED13DC3EB51A334D3662E90625E0E6B009C89E280F39B8B0E7AC96882D466F3886411D53ADD661575D771ECDBCF9272E46A80902D23A80B6A9FB08FADBC6D82"
  },
  {
    "video_sequence_uuid": "7242d245-10bc-4a8e-a504-f9da55c963aa",
    "video_reference_uuid": "b3f7cb31-d7f6-4e99-a446-0014566d66cc",
    "video_uuid": "0347adcc-9854-4001-86c8-eaf8d42f4a3d",
    "video_sequence_name": "Doc Ricketts 1234",
    "camera_id": "Doc Ricketts",
    "video_name": "Doc Ricketts 1234 20191216T182204Z",
    "uri": "http://m3.shore.mbari.org/videos/M3/mezzanine/DocRicketts/2019/12/1234/D1234_20191216T182204Z_h264.mp4",
    "start_timestamp": "2019-12-16T18:22:04Z",
    "duration_millis": 900070,
    "container": "video/mp4",
    "width": 1920,
    "height": 1080,
    "frame_rate": 0.0,
    "size_bytes": 2516065108,
    "sha512": "3956ECA1C66B806F120EACA792E1DCDE7810F8B595FDFD9B9EEAF702AEACCBCD7B80EEDED4E7F5763E4C203413BD75832CBA4C9D454722160B02642AE3754735"
  },
 ...
]

@kevinsbarnard
Copy link
Collaborator Author

Yep, exactly, that would be awesome. So more or less:

SELECT 
    * 
FROM 
    unique_videos uv 
WHERE 
    uv.video_sequence_name IN (
        'Ventana 4460',
        'Ventana 4461',
        'Ventana 4462',
        'Ventana 4463',
        'Ventana 4464'
    )

hohonuuli added a commit to mbari-org/vampire-squid that referenced this issue Dec 14, 2023
@hohonuuli
Copy link
Member

@kevinsbarnard I've added this to the upcoming version of vampire-squid. Sorry you'll have to wait a few weeks (Neil and I have to tear down ship replication and change some database tables). Note that there will be openapi docs in that release.

Just a note that I implemented this as a JPA Projection rather than native SQL. Projections handle the idosyncrasies of the different databases. The resulting SQL is pretty good, it looks like:

select vse1_0.uuid,
       v1_0.uuid,
       vr1_0.uuid,
       vse1_0.name,
       vse1_0.camera_id,
       v1_0.name,
       vr1_0.uri,
       v1_0.start_time,
       v1_0.duration_millis,
       vr1_0.container,
       vr1_0.video_codec,
       vr1_0.audio_codec,
       vr1_0.width,
       vr1_0.height,
       vr1_0.frame_rate,
       vr1_0.size_bytes,
       vr1_0.description,
       vse1_0.description,
       v1_0.description,
       vr1_0.sha512
from video_sequences vse1_0
         left join videos v1_0 on vse1_0.uuid = v1_0.video_sequence_uuid
         left join video_references vr1_0 on v1_0.uuid = vr1_0.video_uuid
where vse1_0.name in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
order by v1_0.start_time

The endpoint is as listed above. It will accept limit and offset query params.

@kevinsbarnard
Copy link
Collaborator Author

Thanks so much @hohonuuli! I'll work this into gridview when the time comes.

@lonnylundsten
Copy link

@kevinsbarnard I'm guessing this is completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to the application core enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants