[SlidesLive] Use token for JSON retrieval and improve metadata extraction #29958

dirkf · 2021-09-13T13:59:23Z

Please follow the guide below

Before submitting a pull request make sure you have:

Searched the bugtracker for similar pull requests
Read adding new extractor tutorial
Read youtube-dl coding conventions and adjusted the code to meet them
Covered the code with tests (note that PRs without tests will be REJECTED)
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

The SlidesLive extractor now needs a player_token query parameter in the JSON URL, whose value is given by the data-player-token attribute in the HTML element (<div>) with id="player".

Also, some of the expected metadata (timestamp, eg) wasn't being collected.

Resolves #29954.
May resolve #30881.

dirkf · 2021-09-13T14:53:36Z

The second download test fails (E) because ffmpeg is needed to download m3u8. Isn't ffmpeg in the CI VMs?

pukkandan · 2021-09-15T18:34:14Z

You are supposed to just use skip_download when downloading from manifests. Even if you have ffmpeg, the md5 may not be consistent making the test useless

dirkf · 2021-09-15T21:49:01Z

Thanks, that makes sense, in that it accords with what I've seen. The second test didn't have skip_download, though (as the diffs show), so I wonder if it ever passed before.

Wu-Chenyang · 2021-09-20T09:32:51Z

The second test didn't have skip_download, though (as the diffs show), so I wonder if it ever passed before.

I think he means that you should add a skip_download to the second test.

dirkf · 2021-09-20T09:39:48Z

Indeed, not running the failing part of the test would stop it failing.

The point is that this test as originally written didn't skip the download test (see git blame) and so I wondered whether it ever did pass when it was written.

pukkandan · 2021-09-20T09:48:04Z

download test with ffmpeg passes on the machine the test is written in. Once ffmpeg version changes, it may no longer pass. So ig the maintainers never tested it 🤷

dirkf · 2021-09-20T10:23:52Z

Is this ffmpeg version dependency because different versions may download different-sized initial pieces of the video, which should be addressed here?

Or is it that different versions of ffmpeg may reassemble the video in equivalent but bit-different ways? In which case we should be testing some invariant(s) rather than the MD5 of the download.

pukkandan · 2021-09-20T11:43:47Z

I am not entirely sure. It's just something I have noticed from experience

btw, the link in ur comment is broken

dirkf · 2021-09-20T12:27:25Z

btw, the link in ur comment is broken

Apparently, to put an anchor in GH Markdown you say:

<a id="anchor-name">text to be target of anchor</a>

And then this makes an anchor with actual HTML name user-content-anchor-name, which therefore you link with s/t like [link text](page_url#user-content-anchor-name)

Who could guess? Anyway, better now.

kumuji · 2021-12-06T11:51:15Z

Hi! Thank you for fixing the issue with slideslive!

I am trying to download my video I recorded for a conference: https://slideslive.com/38972123/mix3d-outofcontext-data-augmentation-for-3d-scenes
On the master branch I was having an issue with authorization (401), but with your commit it is solved.

However, now I get an Error related to parsing JSON:

youtube_dl.utils.ExtractorError: 38972123: Failed to parse JSON  (caused by JSONDecodeError('Expecting value: line 1 column 1 (char 0)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Do you have a clue where should I look to solve it?

dirkf · 2021-12-06T17:11:53Z

The supposed JSON downloaded from https://ben.slideslive.com/player/' + video_id turns out to be a playlist (#EXTM3U ...). If you d/l this, extract the slides.json URL from the playlist, and download it, you get something like this:

https://cdn.slideslive.com/data/presentations/38972123/v2/slides.json?1638065435


{
  u'slides': [
    {
      u'image': {
        u'name': u'X52r__1637863423__0000__Adfi'
      },
      u'type': u'image',
      u'time': 0
    },
    {
      u'video': {
        u'service': u'yoda',
        u'export_as_video': True,
        u'muted': True,
        u'duration_ms': 21280,
        u'resolution': {
          u'width': 1920,
          u'height': 1080
        },
        u'id': u'vvzVOxYp_ynO'
      },
      u'type': u'video',
      u'time': 4440
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0002__tkva'
      },
      u'type': u'image',
      u'time': 25720
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0003__QKb1'
      },
      u'type': u'image',
      u'time': 30520
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0004__nShc'
      },
      u'type': u'image',
      u'time': 39240
    },
    {
      u'video': {
        u'service': u'yoda',
        u'export_as_video': True,
        u'muted': True,
        u'duration_ms': 2880,
        u'resolution': {
          u'width': 1920,
          u'height': 1080
        },
        u'id': u'dGeEm6bx_0e1'
      },
      u'type': u'video',
      u'time': 43240
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0006__Czuy'
      },
      u'type': u'image',
      u'time': 46120
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0007__POUv'
      },
      u'type': u'image',
      u'time': 52880
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0008__3R0E'
      },
      u'type': u'image',
      u'time': 57240
    },
    {
      u'video': {
        u'service': u'yoda',
        u'export_as_video': True,
        u'muted': True,
        u'duration_ms': 14320,
        u'resolution': {
          u'width': 1920,
          u'height': 1080
        },
        u'id': u'Uk9xX-x5Muo4'
      },
      u'type': u'video',
      u'time': 65480
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0010__1LjO'
      },
      u'type': u'image',
      u'time': 79800
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0011__ueTK'
      },
      u'type': u'image',
      u'time': 82320
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0012__QGwP'
      },
      u'type': u'image',
      u'time': 86920
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0013__HuJC'
      },
      u'type': u'image',
      u'time': 88520
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0014__OBFk'
      },
      u'type': u'image',
      u'time': 90240
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0015__5oLL'
      },
      u'type': u'image',
      u'time': 93440
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0016__PIRH'
      },
      u'type': u'image',
      u'time': 100920
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0017__WXeP'
      },
      u'type': u'image',
      u'time': 107120
    },
    {
      u'video': {
        u'service': u'yoda',
        u'export_as_video': True,
        u'muted': True,
        u'duration_ms': 9200,
        u'resolution': {
          u'width': 1920,
          u'height': 1080
        },
        u'id': u'DYmVED0Js_gb'
      },
      u'type': u'video',
      u'time': 122240
    },
    {
      u'image': {
        u'name': u'X52r__1637863423__0019__YgcM'
      },
      u'type': u'image',
      u'time': 131440
    }
  ],
  u'slide_qualities': [
    u'big',
    u'medium'
  ]
}

As a test I extracted the service_name (`.service`) and service_id (`.id`) from `.slides[1].video` in the above and got this:


[info] Available formats for vvzVOxYp_ynO:
format code           extension  resolution note
hls-group_A1-audio_5  mp4        audio only 
dash-5                m4a        audio only [eng] DASH audio  128k , m4a_dash container, mp4a.40.2 (44100Hz)
dash-1                mp4        416x234    [eng] DASH video  145k , mp4_dash container, avc1.4d400d, video only
hls-275               mp4        416x234     275k , avc1.4d400d, video only
dash-2                mp4        640x360    [eng] DASH video  365k , mp4_dash container, avc1.4d401e, video only
hls-495               mp4        640x360     495k , avc1.4d401e, video only
dash-3                mp4        768x432    [eng] DASH video  730k , mp4_dash container, avc1.4d401e, video only
hls-860               mp4        768x432     860k , avc1.4d401e, video only
dash-0                mp4        1280x720   [eng] DASH video 3000k , mp4_dash container, avc1.4d401f, video only
hls-3130              mp4        1280x720   3130k , avc1.4d401f, video only
dash-4                mp4        1920x1080  [eng] DASH video 4500k , mp4_dash container, avc1.4d4028, video only
hls-4629              mp4        1920x1080  4629k , avc1.4d4028, video only (best)

However I suppose the other video items are also relevant, and the image data may be as well. Perhaps a playlist should be extracted in this case?

dteney · 2022-01-06T17:27:46Z

Hi @dirkf
Thanks for all your work.. Is there currently a way to make this patch work to download slideslive audio/video ? I'm the original poster of the issue about slideslive downloads being broken, so I'm still very interested in this feature. Thanks again!

dirkf · 2022-01-06T17:50:55Z

Depending on what type of yt-dl installation you have, you can replace the extractor/slideslive.py file with the PR version.

However, as you appear to have the Windows self-extracting version, it's not so easy. It is reported that WinRAR can manipulate the yt-dl self-extracting archive (~8MB). Or if Python is installed on your Windows system (rather than using the version bundled into the self-extracting archive), you can install yt-dl 2021.12.17 with pip or pip3 and then update the extractor.

When replacing the extractor, any compiled version (*.*pyc, *.pyo) needs to be removed as well.

dteney · 2022-01-07T11:44:15Z

Thanks @dirkf I'm fine with that, I was referring to latest issue "Failed to parse JSON" reported above. I do get the same thing. Example below.

python -m youtube_dl --verbose -x --audio-format mp3 https://slideslive.com/38957205/accessibility-hci-ml-papers
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', '-x', '--audio-format', 'mp3', 'https://slideslive.com/38957205/accessibility-hci-ml-papers']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.9.9 (CPython) - Windows-10-10.0.18363-SP0
[debug] exe versions: none
[debug] Proxy map: {}
[SlidesLive] 38957205: Downloading webpage
[SlidesLive] 38957205: Downloading JSON metadata
ERROR: 38957205: Failed to parse JSON (caused by JSONDecodeError('Expecting value: line 1 column 1 (char 0)')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

dirkf · 2022-01-07T12:41:49Z

Ah, I understand.

Let's assume that the unparsable JSON is always a .m3u8 playlist pointing to JSON that has a similar format to the one I quoted, ie has a slides list each of whose items is either an image or a video, tagged with a time, presumably the time from the start of the presentation.

So is the right thing just to extract a playlist of all the videos in time order?

dteney · 2022-01-07T12:50:26Z

slides list each of whose items is either an image or a video, tagged with a time, presumably the time from the start of the presentation.
So is the right thing just to extract a playlist of all the videos in time order?

I think that would be the right thing to do. Looking at the `'duration_ms' fields I'm guessing each chunk corresponds to a portion of the video to be synced with a specific slide. It did not use to be the case before so I wonder why the change, given that the website has no new functionalities compared to ~6 months ago. Anyway thanks again @dirkf.

dteney · 2022-01-28T18:29:37Z

Anyone (@dirkf or others) keen to implement a fix for this ? I can offer a personal donation to the developer or the project if anyone can bump this up in the list of request priorities :)

vojtad · 2022-02-01T10:03:28Z

CTO from SlidesLive here.

When you fetch 'https://ben.slideslive.com/player/' + presentation_id using the player token you get a custom M3U8 playlist which contains video service (#EXT-SL-VOD-VIDEO-SERVICE-NAME) and ID (#EXT-SL-VOD-VIDEO-SERVICE-ID) for the actual video and link to a JSON (#EXT-SL-VOD-SLIDES-JSON-URL) which contains information about all the slides.

When video service is youtube you can use usual YouTube implementation to download video with that ID.

When video service is yoda you can use one of the servers from #EXT-SL-VOD-VIDEO-SERVERS (preferably always the first one) to download that particular video using following URL https://$VIDEO_SERVER/$VIDEO_ID/master.m3u8 which is Apple HLS playlist.

Slides JSON contains images and videos. These are static slides and animations. Video slides are a little bit trickier to download and right now there isn't straightforward way to do that, unfortunately. Each slide has a time field which is a timestamp in millis from the presentation start when it should appear. Video slides also have their duration in millis in video.duration_ms field.

However, we are in the process of upgrading our infrastructure a little bit right now. I cannot promise any kind of compatibility, unfortunately. Also, the custom M3U8 playlist might contain slides information directly in the future instead of the JSON link.

Also, I would like to remind everyone to download videos in moderation. Traffic is not free and if too many videos would be downloaded causing high traffic charges we would have to start fighting against it. Also, videos are a property of conferences organizers.

EDIT: I would love to help with fixing youtube-dl but it will have to wait until we finish the infrastructure upgrade.

dirkf · 2022-02-01T10:40:08Z

Your contribution is highly appreciated, as is the reminder to respect the site's resources. When a conference user archives a session for private use it may well save the site the cost of serving that session repeatedly whenever the user reviews it. And while wishing the site success and longevity, circumstances might lead to the session being unavailable online.

dteney · 2022-02-01T12:13:21Z

Also, I would like to remind everyone to download videos in moderation. Traffic is not free and if too many videos would be downloaded causing high traffic charges we would have to start fighting against it. Also, videos are a property of conferences organizers.

EDIT: I would love to help with fixing youtube-dl but it will have to wait until we finish the infrastructure upgrade.

This is great to hear, thanks for getting involved in the discussion @vojtad.

You may be interested in hearing that my use case is to download conference talks in audio format for offline/>1x listening (e.g. with an MP3 player). The possible time savings for academics make this is a key feature of conferences turning online. Conferences that choose YouTube as a host make this effortless. So I think it's indeed in Slideslive's interest to make this possible (perhaps as a built-in feature ?).

dirkf added 2 commits September 13, 2021 13:06

Use player_token in JSON retrieval

91557e7

Improve metadata extraction

666a963

dirkf force-pushed the df-slideslive-token-patch branch from 56c6a1c to 666a963 Compare September 13, 2021 14:04

dirkf force-pushed the master branch from 01bf89e to 4c6fba3 Compare August 26, 2022 07:51

HireTheHero mentioned this pull request Sep 8, 2022

Downloading from slideslive.com is broken #30881

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SlidesLive] Use token for JSON retrieval and improve metadata extraction #29958

[SlidesLive] Use token for JSON retrieval and improve metadata extraction #29958

dirkf commented Sep 13, 2021 •

edited

Loading

dirkf commented Sep 13, 2021 •

edited

Loading

pukkandan commented Sep 15, 2021 •

edited

Loading

dirkf commented Sep 15, 2021

Wu-Chenyang commented Sep 20, 2021

dirkf commented Sep 20, 2021

pukkandan commented Sep 20, 2021

dirkf commented Sep 20, 2021 •

edited

Loading

pukkandan commented Sep 20, 2021

dirkf commented Sep 20, 2021

kumuji commented Dec 6, 2021

dirkf commented Dec 6, 2021 •

edited

Loading

dteney commented Jan 6, 2022 •

edited

Loading

dirkf commented Jan 6, 2022

dteney commented Jan 7, 2022

dirkf commented Jan 7, 2022

dteney commented Jan 7, 2022

dteney commented Jan 28, 2022 •

edited

Loading

vojtad commented Feb 1, 2022 •

edited

Loading

dirkf commented Feb 1, 2022

dteney commented Feb 1, 2022 •

edited

Loading

[SlidesLive] Use token for JSON retrieval and improve metadata extraction #29958

Are you sure you want to change the base?

[SlidesLive] Use token for JSON retrieval and improve metadata extraction #29958

Conversation

dirkf commented Sep 13, 2021 • edited Loading

Please follow the guide below

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

dirkf commented Sep 13, 2021 • edited Loading

pukkandan commented Sep 15, 2021 • edited Loading

dirkf commented Sep 15, 2021

Wu-Chenyang commented Sep 20, 2021

dirkf commented Sep 20, 2021

pukkandan commented Sep 20, 2021

dirkf commented Sep 20, 2021 • edited Loading

pukkandan commented Sep 20, 2021

dirkf commented Sep 20, 2021

kumuji commented Dec 6, 2021

dirkf commented Dec 6, 2021 • edited Loading

dteney commented Jan 6, 2022 • edited Loading

dirkf commented Jan 6, 2022

dteney commented Jan 7, 2022

dirkf commented Jan 7, 2022

dteney commented Jan 7, 2022

dteney commented Jan 28, 2022 • edited Loading

vojtad commented Feb 1, 2022 • edited Loading

dirkf commented Feb 1, 2022

dteney commented Feb 1, 2022 • edited Loading

dirkf commented Sep 13, 2021 •

edited

Loading

dirkf commented Sep 13, 2021 •

edited

Loading

pukkandan commented Sep 15, 2021 •

edited

Loading

dirkf commented Sep 20, 2021 •

edited

Loading

dirkf commented Dec 6, 2021 •

edited

Loading

dteney commented Jan 6, 2022 •

edited

Loading

dteney commented Jan 28, 2022 •

edited

Loading

vojtad commented Feb 1, 2022 •

edited

Loading

dteney commented Feb 1, 2022 •

edited

Loading