Optimize video sequence data fetch #67

kevinsbarnard · 2023-12-13T23:35:47Z

Fetching the full information for a video sequence by name via Vampire Squid (v1/videosequences/name/{name}) is very slow and bogs down the load process. We should optimize this where possible.

Potential solutions:

Lazy-load the video data for annotations that don't need it during the query process
Explore modifications to Vampire Squid (look at query, create specialized endpoint)
Run a direct SQL query for the relevant info

The text was updated successfully, but these errors were encountered:

kevinsbarnard · 2023-12-13T23:40:06Z

This problem is most prevalent for queries that span several video sequences.

Example/test query in MBARI VARS:

Observer: gsainz

hohonuuli · 2023-12-14T16:37:34Z

Run a direct SQL query for the relevant info

As an alternative, if you tell me how you want to query and what you want returned, I can add an endpoint. I'd prefer to not have apps rely on SQL connections.

Also, probably in early January, I'll be rolling out an very updated vampire-squid (See https://github.com/mbari-org/vampire-squid/tree/feature/hibernate). I do expect some performance improvements, we can also tweak caching in the service to help with performance (At the moment, I have caching disabled).

kevinsbarnard · 2023-12-14T17:41:25Z

I'd prefer to not have apps rely on SQL connections.

Totally agree. If you don't mind adding an endpoint for this, I'll put together some SQL that illustrates the use case. Essentially it would just be /v1/videosequences/name/{name}, but for several names in bulk.

My hunch is that the JPA NamedQuery VideoSequence.findByName is the cause of the slowdown here because of the nested entities -- perhaps it would help to create a NamedNativeQuery for this.

kevinsbarnard · 2023-12-14T17:48:51Z

Here's an example query for the minimal information needed by VARS GridView:

SELECT
    vs.name, 
    v.start_time, 
    v.duration_millis, 
    vr.container, 
    vr.uri, 
    vr.width, 
    vr.height
FROM 
    video_sequences vs 
INNER JOIN 
    videos v ON v.video_sequence_uuid = vs.uuid 
INNER JOIN 
    video_references vr ON vr.video_uuid = v.uuid 
WHERE 
    vs.name IN (
        'Ventana 4460',
        'Ventana 4461',
        'Ventana 4462',
        'Ventana 4463',
        'Ventana 4464'
    )

hohonuuli · 2023-12-14T17:57:23Z

So as a starting point, I think this is what you're asking for. Note that I would lean toward providing complete "media" data to stay aligned with the other endpoints. There's a view set up for those purposes so ...

Request

POST http://foo.bar/vam/v1/media/videosequence
Content-Type: application/json
Accept: application/json

[
  "Ventana 4460",
  "Ventana 4461",
  "Ventana 4462",
  "Ventana 4463",
  "Ventana 4464"
]

Response

HTTP/1.1 200 OK
Server: nginx/1.25.3
Date: Thu, 14 Dec 2023 17:53:40 GMT
Content-Type: application/json;charset=utf-8
Transfer-Encoding: chunked
Connection: close
Content-Encoding: gzip
Access-Control-Allow-Origin: *

[
{
    "video_sequence_uuid": "7242d245-10bc-4a8e-a504-f9da55c963aa",
    "video_reference_uuid": "3b595823-103b-4581-952c-c4a017f39638",
    "video_uuid": "0347adcc-9854-4001-86c8-eaf8d42f4a3d",
    "video_sequence_name": "Doc Ricketts 1234",
    "camera_id": "Doc Ricketts",
    "video_name": "Doc Ricketts 1234 20191216T182204Z",
    "uri": "http://m3.shore.mbari.org/videos/M3/master/DocRicketts/2019/12/1234/D1234_20191216T182204Z_prores.mov",
    "start_timestamp": "2019-12-16T18:22:04Z",
    "duration_millis": 900070,
    "container": "video/quicktime",
    "width": 1920,
    "height": 1080,
    "frame_rate": 0.0,
    "size_bytes": 25959073985,
    "sha512": "AED13DC3EB51A334D3662E90625E0E6B009C89E280F39B8B0E7AC96882D466F3886411D53ADD661575D771ECDBCF9272E46A80902D23A80B6A9FB08FADBC6D82"
  },
  {
    "video_sequence_uuid": "7242d245-10bc-4a8e-a504-f9da55c963aa",
    "video_reference_uuid": "b3f7cb31-d7f6-4e99-a446-0014566d66cc",
    "video_uuid": "0347adcc-9854-4001-86c8-eaf8d42f4a3d",
    "video_sequence_name": "Doc Ricketts 1234",
    "camera_id": "Doc Ricketts",
    "video_name": "Doc Ricketts 1234 20191216T182204Z",
    "uri": "http://m3.shore.mbari.org/videos/M3/mezzanine/DocRicketts/2019/12/1234/D1234_20191216T182204Z_h264.mp4",
    "start_timestamp": "2019-12-16T18:22:04Z",
    "duration_millis": 900070,
    "container": "video/mp4",
    "width": 1920,
    "height": 1080,
    "frame_rate": 0.0,
    "size_bytes": 2516065108,
    "sha512": "3956ECA1C66B806F120EACA792E1DCDE7810F8B595FDFD9B9EEAF702AEACCBCD7B80EEDED4E7F5763E4C203413BD75832CBA4C9D454722160B02642AE3754735"
  },
 ...
]

kevinsbarnard · 2023-12-14T18:02:43Z

Yep, exactly, that would be awesome. So more or less:

SELECT 
    * 
FROM 
    unique_videos uv 
WHERE 
    uv.video_sequence_name IN (
        'Ventana 4460',
        'Ventana 4461',
        'Ventana 4462',
        'Ventana 4463',
        'Ventana 4464'
    )

hohonuuli · 2023-12-14T20:03:30Z

@kevinsbarnard I've added this to the upcoming version of vampire-squid. Sorry you'll have to wait a few weeks (Neil and I have to tear down ship replication and change some database tables). Note that there will be openapi docs in that release.

Just a note that I implemented this as a JPA Projection rather than native SQL. Projections handle the idosyncrasies of the different databases. The resulting SQL is pretty good, it looks like:

select vse1_0.uuid,
       v1_0.uuid,
       vr1_0.uuid,
       vse1_0.name,
       vse1_0.camera_id,
       v1_0.name,
       vr1_0.uri,
       v1_0.start_time,
       v1_0.duration_millis,
       vr1_0.container,
       vr1_0.video_codec,
       vr1_0.audio_codec,
       vr1_0.width,
       vr1_0.height,
       vr1_0.frame_rate,
       vr1_0.size_bytes,
       vr1_0.description,
       vse1_0.description,
       v1_0.description,
       vr1_0.sha512
from video_sequences vse1_0
         left join videos v1_0 on vse1_0.uuid = v1_0.video_sequence_uuid
         left join video_references vr1_0 on v1_0.uuid = vr1_0.video_uuid
where vse1_0.name in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
order by v1_0.start_time

The endpoint is as listed above. It will accept limit and offset query params.

kevinsbarnard · 2023-12-14T20:50:06Z

Thanks so much @hohonuuli! I'll work this into gridview when the time comes.

lonnylundsten · 2024-09-20T22:35:54Z

@kevinsbarnard I'm guessing this is completed.

kevinsbarnard added enhancement New feature or request core Changes to the application core labels Dec 13, 2023

kevinsbarnard self-assigned this Dec 13, 2023

hohonuuli added a commit to mbari-org/vampire-squid that referenced this issue Dec 14, 2023

mbari-org/vars-gridview#67 - Added endpoint

ea082ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize video sequence data fetch #67

Optimize video sequence data fetch #67

kevinsbarnard commented Dec 13, 2023

kevinsbarnard commented Dec 13, 2023

hohonuuli commented Dec 14, 2023 •

edited

Loading

kevinsbarnard commented Dec 14, 2023

kevinsbarnard commented Dec 14, 2023

hohonuuli commented Dec 14, 2023

kevinsbarnard commented Dec 14, 2023

hohonuuli commented Dec 14, 2023

kevinsbarnard commented Dec 14, 2023

lonnylundsten commented Sep 20, 2024

Optimize video sequence data fetch #67

Optimize video sequence data fetch #67

Comments

kevinsbarnard commented Dec 13, 2023

kevinsbarnard commented Dec 13, 2023

hohonuuli commented Dec 14, 2023 • edited Loading

kevinsbarnard commented Dec 14, 2023

kevinsbarnard commented Dec 14, 2023

hohonuuli commented Dec 14, 2023

Request

Response

kevinsbarnard commented Dec 14, 2023

hohonuuli commented Dec 14, 2023

kevinsbarnard commented Dec 14, 2023

lonnylundsten commented Sep 20, 2024

hohonuuli commented Dec 14, 2023 •

edited

Loading