-
-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scraper for .nfo exports from kodi/plex #689
Draft
Phasetime
wants to merge
17
commits into
stashapp:master
Choose a base branch
from
Phasetime:kodi
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 11 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
2ce64fe
added scraper for .nfo files from kodi
Phasetime be41f82
remove redundant lookup_xml(), last updated in py
Phasetime 6fd808d
update last updated in yml
Phasetime 10742b6
added performer, art, studio scraping
Phasetime 820751e
updated info about rewriteBasePath
Phasetime 7b608de
updated info about rewriteBasePath #2
Phasetime 31c6d37
fix error when no art is included
Phasetime 015a5fb
make the scraper always return a result
Phasetime 2c50697
omitted debugging statements for production
Phasetime d99e45c
fix keyError 'tags'
Phasetime b9ce84d
fix actor type error
Phasetime d7386ca
Merge branch 'stashapp:master' into kodi
Phasetime 542a519
Control flow error
Phasetime 3312b9c
Merge branch 'kodi' of https://github.com/Phasetime/CommunityScrapers…
Phasetime 544f3ea
Added graphQL support instead of direct SQLite access + logging
Phasetime 26c3989
worked in #827 cleanup, pathlib
Phasetime 7ec69fa
more pathlib
Phasetime File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
import os | ||
import sys | ||
import json | ||
import sqlite3 | ||
import mimetypes | ||
import base64 | ||
import xml.etree.ElementTree as ET | ||
""" | ||
This script parses kodi nfo files for metadata. The .nfo file must be in the same directory as the video file and must be named exactly alike. | ||
""" | ||
debug = False | ||
|
||
|
||
# If you want to ingest image files from the .nfo the path to these files may need to be rewritten. Especially when using a docker container. | ||
rewriteBasePath = False | ||
# Example: Z:\Videos\Studio_XXX\example_cover.jpg -> /data/Studio_XXX/example_cover.jpg | ||
basePathBefore = 'Z:\Videos' | ||
basePathAfter = "/data" | ||
|
||
def query_xml(path, title): | ||
tree=ET.parse(path) | ||
# print(tree.find("title").text, file=sys.stderr) | ||
if title == tree.find("title").text: | ||
debug("Exact match found for " + title) | ||
else: | ||
debug("No exact match found for " + title + ". Matching with " + tree.find("title").text + "!") | ||
|
||
# Extract matadata from xml | ||
res={"title":title} | ||
if tree.find("title") != None: | ||
res["title"] = tree.find("title").text | ||
if tree.find("plot") != None: | ||
res["details"] = tree.find("plot").text | ||
if tree.find("releasedate") != None: | ||
res["date"] = tree.find("releasedate").text | ||
if tree.find("tag") != None: | ||
res["tags"]=[{"name":x.text} for x in tree.findall("tag")] | ||
if tree.find("genre") != None: | ||
if "tags" in res: | ||
res["tags"] += [{"name":x.text} for x in tree.findall("genre")] | ||
else: | ||
res["tags"] = [{"name":x.text} for x in tree.findall("genre")] | ||
if tree.find("actor") != None: | ||
res["performers"] = [] | ||
for actor in tree.findall("actor"): | ||
if actor.find("type") != None: | ||
if actor.find("type").text == "Actor": | ||
res["performers"].append({"name": actor.find("name").text}) | ||
else if actor.find("name") != None: | ||
res["performers"].append({"name": actor.find("name").text}) | ||
else: | ||
res["performers"].append({"name": actor.text}) | ||
if tree.find("studio") != None: | ||
res["studio"] = {"name":tree.find("studio").text} | ||
|
||
if tree.find("art") != None: | ||
if tree.find("art").find("poster") != None: | ||
posterElem = tree.find("art").find("poster") | ||
if posterElem.text != None: | ||
if not rewriteBasePath and os.path.isfile(posterElem.text): | ||
res["image"] = make_image_data_url(posterElem.text) | ||
elif rewriteBasePath: | ||
rewrittenPath = posterElem.text.replace(basePathBefore, basePathAfter).replace("\\", "/") | ||
if os.path.isfile(rewrittenPath): | ||
res["image"] = make_image_data_url(rewrittenPath) | ||
else: | ||
debug("Can't find image: " + posterElem.text.replace(basePathBefore, basePathAfter) + ". Is the base path correct?") | ||
else: | ||
debug("Can't find image: " + posterElem.text + ". Are you using a docker container? Maybe you need to change the base path in the script file.") | ||
|
||
return res | ||
|
||
def debug(s): | ||
if debug: print(s, file=sys.stderr) | ||
|
||
# Would be nicer with Stash API instead of direct SQlite access | ||
def get_file_path(scene_id): | ||
db_file = "../stash-go.sqlite" | ||
|
||
con = sqlite3.connect(db_file) | ||
cur = con.cursor() | ||
for row in cur.execute("SELECT * FROM scenes where id = " + str(scene_id) + ";"): | ||
#debug_print(row) | ||
filepath = row[1] | ||
con.close() | ||
return filepath | ||
|
||
def make_image_data_url(image_path): | ||
# type: (str,) -> str | ||
mime, _ = mimetypes.guess_type(image_path) | ||
with open(image_path, 'rb') as img: | ||
encoded = base64.b64encode(img.read()).decode() | ||
return 'data:{0};base64,{1}'.format(mime, encoded) | ||
|
||
if sys.argv[1] == "query": | ||
fragment = json.loads(sys.stdin.read()) | ||
res = {"title": fragment["title"]} | ||
# Assume that .nfo is named exactly like the video file and is at the same location | ||
# WORKAROUND: Read file name from db until filename is given in the fragment | ||
videoFilePath = get_file_path(fragment["id"]) | ||
|
||
# Reconstruct file name for .nfo | ||
temp = videoFilePath.split(".") | ||
temp[-1] = "nfo" | ||
nfoFilePath = ".".join(temp) | ||
|
||
if os.path.isfile(nfoFilePath): | ||
res = query_xml(nfoFilePath, fragment["title"]) | ||
else: | ||
debug("No file found at" + nfoFilePath) | ||
|
||
print(json.dumps(res)) | ||
exit(0) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
name: "Kodi XML" | ||
sceneByFragment: | ||
action: script | ||
script: | ||
- python | ||
# use python3 instead if needed | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to documentation, Stash is supposed to detect which python is available on the system : either python or python3. See: https://docs.stashapp.cc/in-app-manual/scraping/scraperdevelopment/#actions If the documentation affirmation stands, this comment may be deleted. |
||
- kodi.py | ||
- query | ||
|
||
# Last Updated August 15, 2021 |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we need that anymore since you added the base64 option
From what i understand we can either have a URL and thus we return it as a string
Or a full path (also from what i understand Kodi requires full path not relative ?) which we base64 encode
With that in mind i adjusted the code a bit and pasted below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the script runs from inside a container won't the base64 encode not also fail without rewriting the base path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm i didnt actually think of that....
You mean if the nfos were generated using a different os or setup....
You are probably right, your rewrite covers the case from windows to linux or docker. What happens if its a linux -> linux or linux-> docker setup? It might need a bit of adjusting as the
.replace("\\", "/")
might replace something it shouldnt in that case