Fintwit Voyager

A summarization chain to process finance podcasts listed in YouTube.

At the time of writing, pytorch doesn't support some functions for the apple silicon metal performance shaders, so the audio-to-text transcription and diarization must be done in a google colab notebook.

Several LLM-mediated steps are introduced to achieve the following:

Identify the speakers' names from the youtube video metadata (video title, channel title, video description).
Identify the true names of the SPEAKER_x naive labels from the text transcription using a retrieval augmented system.
Summarize the final text with the SPEAKER_x labels replaced with their true names.

Instruct based models were used for the steps 1 and 2. A long context length model was used for the step 3.

https://twitter.com/fintwit_voyager

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
pipeutils.py		pipeutils.py
simplified_pipe.py		simplified_pipe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fintwit Voyager

About

Releases

Packages

Languages

nahuel89p/summarization_chain

Folders and files

Latest commit

History

Repository files navigation

Fintwit Voyager

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages