Sequential Topic Segmentation / Session Chapters #9

evamaxfield · 2021-04-09T00:44:08Z

Use Case

Please provide a use case to help us understand your request in context

YouTube has a "Video Chapters" feature that splits the timeline bar into chapters based off of timestamps found in the video description. Example:

Similarly, it would be incredibly useful to jump around a meeting video / transcript based off of the minutes items of the meeting.

Solution

Please describe your ideal solution

Going to take a lot of work on the backend side and a bit of work on the front-end.

We could be fancy and train a topic model or use some sort of seeded clustering, and we likely will at some point but as a first past implementation, it may be interesting to see how far the following gets us:

Look for common phrases: "Moving on to...", "Call the roll", "Attendance", etc. and apply breakpoints there.
Additionally, parse all the minutes item attachments (docs, presentations, etc.) for every minutes item for an event and store the list of words UNIQUE to a specific minutes item. Then compare the transcript for those words. Find the breakpoints by taking a moving window sum of the counts of each of the unique words for a given minutes item against the transcript.

I.e.

"minutes_item_1": ["municipal", "broadband", "light"],
"minutes_item_2": ["it", "department"],

lets talk about the municipal broadband bill that would enable seattle city light to serve customers with broadband...
...
moving on to funding the seattle IT department...
...

The moving window word count would be able to see that at some point we switch from using specific words found in minutes item 1 to using specific words found in minutes item 2. If we can combine that with looking for the "section splitter sequences" ("moving on", "call the roll", etc) I think it may be a good first pass, fast and cheap chapter identifier.

Then store chapter indentifiers as annotations in the transcript for the frontend to parse.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them

Topic modeling? Clustering?

Additionally, we should let whatever pipeline we create the ability to skip this if chapter starts are provided by user as Seattle Channel event descriptions have them in most cases now-a-days.

Stakeholders

Please add any individual person or team's that should be brought in for discussion on the project

Frontend to actually make the video chapters viewer.
Backend for both pipeline and transcript mutation.

Major Components

Please add any major components that need to be done for this project

Function to get unique n-grams from minutes item attachments for all minutes items in event
Function to apply moving window sum using unique n-grams
Function to find "common" section splitter sequences"
Function to weight and merge, moving window sum and section splitter sequences
Function to store into transcript as annotation
Pipeline to wrap the whole thing (may include as part of primary event processing with option to not?)
Frontend video timeline to parse and use chapter annotations

Dependencies

Please add any other major or minor project dependencies here

(required) cdp-frontend, enable npm install for the entire CDP web application #6

Other Notes

Please add any extra notes here

My one concern is how to handle many-session events. We only store minutes items on the event level and not on the session level, but we will need to find a way to gracefull handle this.

The text was updated successfully, but these errors were encountered:

evamaxfield · 2021-04-09T00:53:05Z

This could also be rolled in with LVN topic clustering on timeline: live example.

evamaxfield · 2021-04-09T00:54:48Z

And related is some very old work on "events over time": https://github.com/CouncilDataProject/seattle_v1/blob/master/projects/quick_analysis.ipynb

evamaxfield · 2021-08-22T18:06:10Z

More ideas copied from Slack:

timestamped minutes items / "the youtube chapters idea" -- youtube videos introduced a "chapters" features where if in the description of the video, the author attaches timestamps and then text it will create chapters that can be hovered over the play bar. I don't have an example on me but basically its a similar idea to just "timestamping the minutes items" of our meetings. To achieve this, we would need a function to run during event ingestion that tries to match up minutes items to sentences in the transcript. I.e. "these 50 sentences relate to minutes item 1 and these next 70 to minutes item 2". because all of our sentences are timestamped, we can get timestamped event minutes items using this method. (to store this value, we can store the time in seconds of the minutes item on the EventMinutesItem model)

during event ingestion, if we also run a sentiment analysis over the whole event + each "chapter" or "minutes item transcript section", so that we have an overall sentiment for the meeting and for each minutes item. This can be stored in a new table (and in the transcript as an annotation maybe).
these both culminate in making the legislation page have a bit more functionality. My thinking:
"Lets make it easy for someone to enter the site and search for 'legislation about upzoning the city', 'legislation for increasing parks funding for social programs', etc."

The user flow then would be search (for legislation), for "parks funding for social programs", and be taken to relevant pieces of legislation, then when they click on one they can see the as previously discussed page of "title, abstract, status, etc" but in the tree viz of the matter history, when we link to each event we can link directly to either (or both!) the whole event OR the startpoint of the discussion on that matter. additionally, we can show how positive or negative the discussion was about the bill in that specific meeting.

evamaxfield · 2021-08-22T18:10:01Z

I have a rough start to this with creating an embedding for each minutes item and each sentence in the transcript then running a moving window distance comparison. Find the collection of sentences that minimizes each moving window to minutes item distance.

From there we can find the "strict boundaries" of the windows by looking for trigger words. I.e. "moving on to...", "next up...", etc.

For word embeddings we have one of:

evamaxfield · 2021-09-01T19:06:49Z

Starting prototype work here: https://github.com/JacksonMaxfield/cue-queue

evamaxfield transferred this issue from another repository Aug 25, 2021

evamaxfield added the proposal A detailed proposal / spec for a CDP feature label Aug 25, 2021

evamaxfield changed the title ~~Aligning Video to Minutes~~ Discourse Segmentation / Session Chapters Aug 25, 2021

evamaxfield changed the title ~~Discourse Segmentation / Session Chapters~~ Linear Topic Segmentation / Session Chapters Sep 1, 2021

evamaxfield mentioned this issue Sep 1, 2021

feature/transcript-annotation-cleanup-and-section-annotation-type CouncilDataProject/cdp-backend#97

Merged

evamaxfield changed the title ~~Linear Topic Segmentation / Session Chapters~~ Sequential Topic Segmentation / Session Chapters Sep 16, 2021

evamaxfield mentioned this issue Oct 21, 2021

Topics (or X) Timeline in Meeting and Instance as a Whole #23

Open

evamaxfield added feature New feature or request data Requires some data analysis or computational modeling labels Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequential Topic Segmentation / Session Chapters #9

Sequential Topic Segmentation / Session Chapters #9

evamaxfield commented Apr 9, 2021 •

edited

Loading

evamaxfield commented Apr 9, 2021 •

edited

Loading

evamaxfield commented Apr 9, 2021

evamaxfield commented Aug 22, 2021

evamaxfield commented Aug 22, 2021

evamaxfield commented Sep 1, 2021

Sequential Topic Segmentation / Session Chapters #9

Sequential Topic Segmentation / Session Chapters #9

Comments

evamaxfield commented Apr 9, 2021 • edited Loading

Use Case

Solution

Alternatives

Stakeholders

Major Components

Dependencies

Other Notes

evamaxfield commented Apr 9, 2021 • edited Loading

evamaxfield commented Apr 9, 2021

evamaxfield commented Aug 22, 2021

evamaxfield commented Aug 22, 2021

evamaxfield commented Sep 1, 2021

evamaxfield commented Apr 9, 2021 •

edited

Loading

evamaxfield commented Apr 9, 2021 •

edited

Loading