Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper for .nfo exports from kodi/plex #689

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 115 additions & 0 deletions scrapers/kodi.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
import sys
import pathlib

import mimetypes
import base64

import json
import xml.etree.ElementTree as ET

import py_common.graphql as graphql
import py_common.log as log
"""
This script parses kodi nfo files for metadata. The .nfo file must be in the same directory as the video file and must be named exactly alike.
"""

# If you want to ingest image files from the .nfo the path to these files may need to be rewritten. Especially when using a docker container.
rewriteBasePath = False
# Example: Z:\Videos\Studio_XXX\example_cover.jpg -> /data/Studio_XXX/example_cover.jpg
basePathBefore = 'Z:\Videos'
basePathAfter = "/data"
Comment on lines +15 to +20
Copy link
Collaborator

@bnkai bnkai Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need that anymore since you added the base64 option
From what i understand we can either have a URL and thus we return it as a string
Or a full path (also from what i understand Kodi requires full path not relative ?) which we base64 encode

With that in mind i adjusted the code a bit and pasted below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the script runs from inside a container won't the base64 encode not also fail without rewriting the base path?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm i didnt actually think of that....
You mean if the nfos were generated using a different os or setup....
You are probably right, your rewrite covers the case from windows to linux or docker. What happens if its a linux -> linux or linux-> docker setup? It might need a bit of adjusting as the .replace("\\", "/") might replace something it shouldnt in that case


def query_xml(path, title):
res = {"title": title}
try:
tree = ET.parse(path)
except Exception as e:
log.error(f'xml parsing failed:{e}')
print(json.dumps(res))
exit(1)

if title == tree.find("title").text:
log.info("Exact match found for " + title)
else:
log.info("No exact match found for " + title + ". Matching with " + tree.find("title").text + "!")

# Extract matadata from xml
if tree.find("title") != None:
res["title"] = tree.find("title").text

if tree.find("plot") != None:
res["details"] = tree.find("plot").text

if tree.find("releasedate") != None:
res["date"] = tree.find("releasedate").text

if tree.find("tag") != None:
res["tags"]=[{"name":x.text} for x in tree.findall("tag")]
if tree.find("genre") != None:
if "tags" in res:
res["tags"] += [{"name":x.text} for x in tree.findall("genre")]
else:
res["tags"] = [{"name":x.text} for x in tree.findall("genre")]

if tree.find("actor") != None:
res["performers"] = []
for actor in tree.findall("actor"):
if actor.find("type") != None:
if actor.find("type").text == "Actor":
res["performers"].append({"name": actor.find("name").text})
elif actor.find("name") != None:
res["performers"].append({"name": actor.find("name").text})
else:
res["performers"].append({"name": actor.text})

if tree.find("studio") != None:
res["studio"] = {"name":tree.find("studio").text}

if tree.find("art") != None:
if tree.find("art").find("poster") != None:
posterElem = tree.find("art").find("poster")
if posterElem.text != None:
if not rewriteBasePath and pathlib.Path(posterElem.text).is_file():
res["image"] = make_image_data_url(posterElem.text)
elif rewriteBasePath:
rewrittenPath = posterElem.text.replace(basePathBefore, basePathAfter).replace("\\", "/")
if pathlib.Path(rewrittenPath).is_file():
res["image"] = make_image_data_url(rewrittenPath)
else:
log.warning("Can't find image: " + posterElem.text.replace(basePathBefore, basePathAfter) + ". Is the base path correct?")
else:
log.warning("Can't find image: " + posterElem.text + ". Are you using a docker container? Maybe you need to change the base path in the script file.")
return res

def make_image_data_url(image_path):
# type: (str,) -> str
mime, _ = mimetypes.guess_type(image_path)
with open(image_path, 'rb') as img:
encoded = base64.b64encode(img.read()).decode()
return 'data:{0};base64,{1}'.format(mime, encoded)

if sys.argv[1] == "query":
fragment = json.loads(sys.stdin.read())
s_id = fragment.get("id")
if not s_id:
log.error(f"No ID found")
sys.exit(1)

# Assume that .nfo/.xml is named exactly alike the video file and is at the same location
# Query graphQL for the file path
scene = graphql.getScene(s_id)
if scene:
scene_path = scene.get("path")
if scene_path:
p = pathlib.Path(scene_path)

res = {"title": fragment["title"]}

f = p.with_suffix(".nfo")
if f.is_file():
res = query_xml(f, fragment["title"])
else:
log.info(f"No nfo/xml files found for the scene: {p}")

print(json.dumps(res))
exit(0)
10 changes: 10 additions & 0 deletions scrapers/kodi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: "Kodi XML"
sceneByFragment:
action: script
script:
- python
# use python3 instead if needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to documentation, Stash is supposed to detect which python is available on the system : either python or python3. See: https://docs.stashapp.cc/in-app-manual/scraping/scraperdevelopment/#actions

If the documentation affirmation stands, this comment may be deleted.

- kodi.py
- query

# Last Updated August 15, 2021