Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper for .nfo exports from kodi/plex #689

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

Phasetime
Copy link
Contributor

@Phasetime Phasetime commented Aug 15, 2021

Adds a scraper for Kodi/Plex etc. .nfo export files in the directory of the video file. Resolves #484, #429

@Phasetime Phasetime marked this pull request as draft August 15, 2021 13:25
@Phasetime Phasetime marked this pull request as ready for review August 15, 2021 15:17
@Phasetime Phasetime changed the title added scraper for .nfo files from kodi Scraper for .nfo exports from kodi/plex Aug 15, 2021
@bnkai bnkai added the script Scraper executes a script label Aug 15, 2021
@Phasetime
Copy link
Contributor Author

Phasetime commented Aug 16, 2021

Going into draft again until i change get_file_path to use GraphQL instead of direct SQLite access

@Phasetime Phasetime marked this pull request as draft August 16, 2021 02:03
@AiWABR
Copy link

AiWABR commented Sep 3, 2021

You are giving this error or trying to use or scraper kodi

"Error running scraper script: exec: "python": executable file not found in %PATH%"

@Emilaia
Copy link

Emilaia commented Sep 19, 2021

Hey, thanks for making this. I adapted some of it for my own use, but wanted to let you know it's possible to get images from the folder, however not directly.

The short of it is, if you convert the image to a base64 string, you no longer need to return a public url via the scraper results, and can just return that instead. This is how I did it based on your code:

#new import
import base64

# Changes done to the bottom of your script, so the first executed part:
imagePath = os.path.dirname(videoFilePath) + "/poster.jpg" # you already had the videoFilePath variable, I'm just getting an image on that same directory named "poster.jpg".
lookup_xml(nfoFilePath, fragment['title'], imagePath) # Added an extra parameter to pass this path to the next method

# Changes done to your query_xml def:
res={'title':title}    
with open(imagePath, "rb") as image_file:
    res['image'] = "data:image/jpeg;base64," + base64.b64encode(image_file.read()).decode()

This is a non-smart, hardcoded demonstration, but should give you a gist of what you can do. Might be a good idea to add it.

@peterpannimmerland
Copy link

peterpannimmerland commented Sep 19, 2021

Got this Error Message:

21-09-19 15:03:44
Error
could not unmarshal json: EOF
2021-09-19 15:03:44
Error
scraper: KeyError: 'tags'
2021-09-19 15:03:44
Error
scraper: if res["tags"] is not None:
2021-09-19 15:03:44
Error
scraper: File "kodi.py", line 39, in query_xml
2021-09-19 15:03:44
Error
scraper: res = query_xml(nfoFilePath, fragment["title"])
2021-09-19 15:03:44
Error
scraper: File "kodi.py", line 103, in
2021-09-19 15:03:44
Error
scraper: Traceback (most recent call last):
2021-09-19 15:03:44
Error
scraper: Exact match found for Defiance

@Phasetime
Copy link
Contributor Author

@peterpannimmerland should be fixed now

@peterpannimmerland
Copy link

Great, Thanks a lot.
But now there is a new error:

21-09-26 08:56:13
Error
could not unmarshal json: EOF
2021-09-26 08:56:13
Error
scraper: AttributeError: 'NoneType' object has no attribute 'text'
2021-09-26 08:56:13
Error
scraper: if actor.find("type").text == "Actor":
2021-09-26 08:56:13
Error
scraper: File "kodi.py", line 46, in query_xml
2021-09-26 08:56:13
Error
scraper: res = query_xml(nfoFilePath, fragment["title"])
2021-09-26 08:56:13
Error
scraper: File "kodi.py", line 103, in
2021-09-26 08:56:13
Error
scraper: Traceback (most recent call last):
2021-09-26 08:56:13
Error
scraper: No exact match found for brazzersexxtra.21.04.17.kristina.rose.and.tru.kait.two.wives.one.cock. Matching with Two Wives One Cock!
2021-09-26 08:56:13
Debug
Scraper script started

I have no idea why, the scraper cant find the files. In my directory all files sorted like:

image

Screen from Stash:

image

regards,

@Phasetime
Copy link
Contributor Author

@peterpannimmerland can you provide that nfo file for me? Seems like it differs from the schema i thought was universal.... BTW you can test it again, should have fixed that bug aswell.

@peterpannimmerland
Copy link

Hi, in the attachment you find the .nfo file. Please rename .txt to .nfo

brazzersexxtra.21.04.17.kristina.rose.and.tru.kait.two.wives.one.cock.txt

I update your script an getting new errors :-)

021-10-03 09:56:53
Error
could not unmarshal json: EOF
2021-10-03 09:56:53
Error
scraper: SyntaxError: invalid syntax
2021-10-03 09:56:53
Error
scraper: ^
2021-10-03 09:56:53
Error
scraper: else if actor.find("name") != None:
2021-10-03 09:56:53
Error
scraper: File "kodi.py", line 49
2021-10-03 09:56:41
Error
Error loading scraper /root/.stash/scrapers/javdb.yml: yaml: unmarshal errors:
line 21: field sceneByName not found in type scraper.config
line 25: field sceneByQueryFragment not found in type scraper.config
2021-10-03 09:56:41
Error
Error loading scraper /root/.stash/scrapers/ThePornDB.yml: yaml: unmarshal errors:
line 16: field sceneByName not found in type scraper.config
line 20: field sceneByQueryFragment not found in type scraper.config
2021-10-03 09:56:41
Error
Error loading scraper /root/.stash/scrapers/SARJ-LLC.yml: yaml: unmarshal errors:
line 3: field sceneByName not found in type scraper.config
line 11: field sceneByQueryFragment not found in type scraper.config
2021-10-03 09:56:41
Error
Error loading scraper /root/.stash/scrapers/JavLibrary.yml: yaml: unmarshal errors:
line 22: field sceneByName not found in type scraper.config
line 26: field sceneByQueryFragment not found in type scraper.config

Thanks for your great work

regards

scrapers/kodi.py Outdated Show resolved Hide resolved
scrapers/kodi.py Outdated Show resolved Hide resolved
Copy link

@mid-dev-media mid-dev-media left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some necessary changes required

@CapgrasDelusion2
Copy link

CapgrasDelusion2 commented Dec 2, 2021

Hi, thanks for making this. I'm getting the following error:

ERRO[2021-12-02 14:37:33] [Scrape / Kodi XML] File "/root/.stash/scrapers/kodi.py", line 49
ERRO[2021-12-02 14:37:33] [Scrape / Kodi XML] else if actor.find("name") != None:
ERRO[2021-12-02 14:37:33] [Scrape / Kodi XML] ^
ERRO[2021-12-02 14:37:33] [Scrape / Kodi XML] SyntaxError: invalid syntax
ERRO[2021-12-02 14:37:33] could not unmarshal json: EOF

I get it with all nfo files, not just one. A representative nfo is here:
Natasha Nice - My First Sex Teacher

All NFOs were all created using this:

https://forum.kodi.tv/showthread.php?tid=360299

Thanks in advance for any help.

EDIT: I guess since I'm here, just confirming: my data is in /volume1/Adult. Docker has /volume1 mounted as /data. My base directory before should then be /volume1 my base after should be /data , correct?

@bnkai
Copy link
Collaborator

bnkai commented Dec 3, 2021

@CapgrasDelusion2 for your specific error try changing the else if in line 49 of kody.py to elif. Not sure if it will work afterwards but that should take care of the specific error. Bare in mind that this PR is still a draft so it might need some more fixes.
For your second question
If the storage volumes are
/volume1 -> /data in the docker
you should see
/data/Adult now if you try to add a library from stash

@CapgrasDelusion2
Copy link

Worked like a charm, thank you very much

@Phasetime Phasetime marked this pull request as ready for review January 9, 2022 22:33
Comment on lines +15 to +20

# If you want to ingest image files from the .nfo the path to these files may need to be rewritten. Especially when using a docker container.
rewriteBasePath = False
# Example: Z:\Videos\Studio_XXX\example_cover.jpg -> /data/Studio_XXX/example_cover.jpg
basePathBefore = 'Z:\Videos'
basePathAfter = "/data"
Copy link
Collaborator

@bnkai bnkai Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need that anymore since you added the base64 option
From what i understand we can either have a URL and thus we return it as a string
Or a full path (also from what i understand Kodi requires full path not relative ?) which we base64 encode

With that in mind i adjusted the code a bit and pasted below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the script runs from inside a container won't the base64 encode not also fail without rewriting the base path?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm i didnt actually think of that....
You mean if the nfos were generated using a different os or setup....
You are probably right, your rewrite covers the case from windows to linux or docker. What happens if its a linux -> linux or linux-> docker setup? It might need a bit of adjusting as the .replace("\\", "/") might replace something it shouldnt in that case

@bnkai
Copy link
Collaborator

bnkai commented Jan 10, 2022

The below seems to work ok for me in linux, with the above assumptions
only the py file was modified

import sys
import pathlib
import mimetypes
import base64
import json
from urllib.parse import urlparse

import xml.etree.ElementTree as ET

try:
    import py_common.graphql as graphql
    import py_common.log as log
except ModuleNotFoundError:
    print(
        "You need to download the folder 'py_common' from the community repo (CommunityScrapers/tree/master/scrapers/py_common)",
        file=sys.stderr)
    sys.exit(1)
"""  
This script parses kodi nfo files for metadata. The .nfo file must be in the same directory as the video file and must be named exactly alike.
"""


def query_xml(path, title):
    res = {"title": title}
    try:
        tree = ET.parse(path)
    except Exception as e:
        log.error(f'xml parsing failed:{e}')
        print(json.dumps(res))
        exit(1)

    if title == tree.find("title").text:
        log.info("Exact match found for " + title)
    else:
        log.info("No exact match found for " + title + ". Matching with " +
                 tree.find("title").text + "!")

    # Extract metadata from xml
    if tree.find("title") != None:
        res["title"] = tree.find("title").text

    if tree.find("plot") != None:
        res["details"] = tree.find("plot").text

    if tree.find("releasedate") != None:
        res["date"] = tree.find("releasedate").text
    elif tree.find("premiered") != None:
        res["date"] = tree.find("premiered").text

    if tree.find("tag") != None:
        res["tags"] = [{"name": x.text} for x in tree.findall("tag")]
    if tree.find("genre") != None:
        if "tags" in res:
            res["tags"] += [{"name": x.text} for x in tree.findall("genre")]
        else:
            res["tags"] = [{"name": x.text} for x in tree.findall("genre")]

    if tree.find("actor") != None:
        res["performers"] = []
        for actor in tree.findall("actor"):
            if actor.find("type") != None:
                if actor.find("type").text == "Actor":
                    res["performers"].append({"name": actor.find("name").text})
            elif actor.find("name") != None:
                res["performers"].append({"name": actor.find("name").text})
            else:
                res["performers"].append({"name": actor.text})

    if tree.find("studio") != None:
        res["studio"] = {"name": tree.find("studio").text}

    if tree.find("art") != None:
        if tree.find("art").find("poster") != None:
            posterElem = tree.find("art").find("poster")
            if posterElem.text != None:
                if uri_validator(posterElem.text):
                    # if image is a valid url return the url
                    res["image"] = posterElem.text
                elif pathlib.Path(posterElem.text).is_file(
                ):  # if image is a file return its base64 string
                    res["image"] = make_image_data_url(posterElem.text)
                else:  # non valid image text
                    log.warning(f"Non valid image data <{posterElem.text}>")
    return res


def uri_validator(u):
    try:
        result = urlparse(u)
        return all([result.scheme, result.netloc, result.path])
    except:
        return False


def make_image_data_url(image_path):
    # type: (str,) -> str
    mime, _ = mimetypes.guess_type(image_path)
    with open(image_path, 'rb') as img:
        encoded = base64.b64encode(img.read()).decode()
    return 'data:{0};base64,{1}'.format(mime, encoded)


if sys.argv[1] == "query":
    fragment = json.loads(sys.stdin.read())
    s_id = fragment.get("id")
    if not s_id:
        log.error(f"No ID found")
        sys.exit(1)

    # Assume that .nfo/.xml is named exactly alike the video file and is at the same location
    # Query graphQL for the file path
    scene = graphql.getScene(s_id)
    if scene:
        scene_path = scene.get("path")
        if scene_path:
            p = pathlib.Path(scene_path)
            res = {"title": fragment["title"]}
            f = p.with_suffix(".nfo")
            if f.is_file():
                pass
            elif p.with_suffix(".NFO").is_file():
                f = p.with_suffix(".NFO")
            else:
                log.info(f"No nfo/xml files found for the scene: {p}")
                print("{}")
                exit(0)
            res = query_xml(f, fragment["title"])
            print(json.dumps(res))
            exit(0)
    log.error(f"No scene found for {s_id}")
    exit(1)

@nymeras
Copy link
Contributor

nymeras commented Mar 23, 2022

Would it be a possibility to locate images that also share the same path as the video/nfo? For instance:

video.mkv
video.nfo
video-fanart.jpg

@adultsesamestreet
Copy link

How would someone like myself, who has only a limited experience dabbling with some very basic coding, implement this scraper into their setup? Thank you in advance

@edgar1016
Copy link

Getting the following error message

2022-05-27 17:34:06
Error   
could not unmarshal json from script output: EOF
2022-05-27 17:34:06
Error   
[Scrape / Kodi XML] TypeError: can only concatenate str (not "NoneType") to str
2022-05-27 17:34:06
Error   
[Scrape / Kodi XML]     log.info("No exact match found for " + title + ". Matching with " + tree.find("title").text + "!")
2022-05-27 17:34:06
Error   
[Scrape / Kodi XML]   File "F:\Stash\scrapers\kodi.py", line 34, in query_xml
2022-05-27 17:34:06
Error   
[Scrape / Kodi XML]     res = query_xml(f, fragment["title"])
2022-05-27 17:34:06
Error   
[Scrape / Kodi XML]   File "F:\Stash\scrapers\kodi.py", line 110, in <module>
2022-05-27 17:34:06
Error   
[Scrape / Kodi XML] Traceback (most recent call last):

@Phasetime Phasetime marked this pull request as draft June 9, 2022 10:10
@jake4dave4
Copy link

Hi, this scraper looks like it would be very useful but am I right in thinking it does not work at the moment? Has it been abandoned?

I've tried really hard to get it working but I always get the same error: "scraper kodi: could not unmarshal json from script output: EOF"

@TgSeed TgSeed mentioned this pull request Sep 26, 2022
action: script
script:
- python
# use python3 instead if needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to documentation, Stash is supposed to detect which python is available on the system : either python or python3. See: https://docs.stashapp.cc/in-app-manual/scraping/scraperdevelopment/#actions

If the documentation affirmation stands, this comment may be deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
script Scraper executes a script
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scrape from Plex