Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper for .nfo exports from kodi/plex #689

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions scrapers/kodi.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
import os
import sys
import json
import sqlite3
import mimetypes
import base64
import xml.etree.ElementTree as ET
"""
This script parses kodi nfo files for metadata. The .nfo file must be in the same directory as the video file and must be named exactly alike.
"""
debug = False


# If you want to ingest image files from the .nfo the path to these files may need to be rewritten. Especially when using a docker container.
rewriteBasePath = False
# Example: Z:\Videos\Studio_XXX\example_cover.jpg -> /data/Studio_XXX/example_cover.jpg
basePathBefore = 'Z:\Videos'
basePathAfter = "/data"
Comment on lines +15 to +20
Copy link
Collaborator

@bnkai bnkai Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need that anymore since you added the base64 option
From what i understand we can either have a URL and thus we return it as a string
Or a full path (also from what i understand Kodi requires full path not relative ?) which we base64 encode

With that in mind i adjusted the code a bit and pasted below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the script runs from inside a container won't the base64 encode not also fail without rewriting the base path?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm i didnt actually think of that....
You mean if the nfos were generated using a different os or setup....
You are probably right, your rewrite covers the case from windows to linux or docker. What happens if its a linux -> linux or linux-> docker setup? It might need a bit of adjusting as the .replace("\\", "/") might replace something it shouldnt in that case


def query_xml(path, title):
tree=ET.parse(path)
# print(tree.find("title").text, file=sys.stderr)
if title == tree.find("title").text:
debug("Exact match found for " + title)
else:
debug("No exact match found for " + title + ". Matching with " + tree.find("title").text + "!")

# Extract matadata from xml
res={"title":title}
if tree.find("title") != None:
res["title"] = tree.find("title").text
if tree.find("plot") != None:
res["details"] = tree.find("plot").text
if tree.find("releasedate") != None:
res["date"] = tree.find("releasedate").text
if tree.find("tag") != None:
res["tags"]=[{"name":x.text} for x in tree.findall("tag")]
if tree.find("genre") != None:
if "tags" in res:
res["tags"] += [{"name":x.text} for x in tree.findall("genre")]
else:
res["tags"] = [{"name":x.text} for x in tree.findall("genre")]
if tree.find("actor") != None:
res["performers"] = []
for actor in tree.findall("actor"):
if actor.find("type") != None:
if actor.find("type").text == "Actor":
res["performers"].append({"name": actor.find("name").text})
else if actor.find("name") != None:
res["performers"].append({"name": actor.find("name").text})
else:
res["performers"].append({"name": actor.text})
if tree.find("studio") != None:
res["studio"] = {"name":tree.find("studio").text}

if tree.find("art") != None:
if tree.find("art").find("poster") != None:
posterElem = tree.find("art").find("poster")
if posterElem.text != None:
if not rewriteBasePath and os.path.isfile(posterElem.text):
res["image"] = make_image_data_url(posterElem.text)
elif rewriteBasePath:
rewrittenPath = posterElem.text.replace(basePathBefore, basePathAfter).replace("\\", "/")
if os.path.isfile(rewrittenPath):
res["image"] = make_image_data_url(rewrittenPath)
else:
debug("Can't find image: " + posterElem.text.replace(basePathBefore, basePathAfter) + ". Is the base path correct?")
else:
debug("Can't find image: " + posterElem.text + ". Are you using a docker container? Maybe you need to change the base path in the script file.")

return res

def debug(s):
if debug: print(s, file=sys.stderr)

# Would be nicer with Stash API instead of direct SQlite access
def get_file_path(scene_id):
db_file = "../stash-go.sqlite"

con = sqlite3.connect(db_file)
cur = con.cursor()
for row in cur.execute("SELECT * FROM scenes where id = " + str(scene_id) + ";"):
#debug_print(row)
filepath = row[1]
con.close()
return filepath

def make_image_data_url(image_path):
# type: (str,) -> str
mime, _ = mimetypes.guess_type(image_path)
with open(image_path, 'rb') as img:
encoded = base64.b64encode(img.read()).decode()
return 'data:{0};base64,{1}'.format(mime, encoded)

if sys.argv[1] == "query":
fragment = json.loads(sys.stdin.read())
res = {"title": fragment["title"]}
# Assume that .nfo is named exactly like the video file and is at the same location
# WORKAROUND: Read file name from db until filename is given in the fragment
videoFilePath = get_file_path(fragment["id"])

# Reconstruct file name for .nfo
temp = videoFilePath.split(".")
temp[-1] = "nfo"
nfoFilePath = ".".join(temp)

if os.path.isfile(nfoFilePath):
res = query_xml(nfoFilePath, fragment["title"])
else:
debug("No file found at" + nfoFilePath)

print(json.dumps(res))
exit(0)
10 changes: 10 additions & 0 deletions scrapers/kodi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: "Kodi XML"
sceneByFragment:
action: script
script:
- python
# use python3 instead if needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to documentation, Stash is supposed to detect which python is available on the system : either python or python3. See: https://docs.stashapp.cc/in-app-manual/scraping/scraperdevelopment/#actions

If the documentation affirmation stands, this comment may be deleted.

- kodi.py
- query

# Last Updated August 15, 2021