-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scrapy-spider-metadata support. #75
Changes from all commits
ffbe2b7
07df6eb
1c1608e
7e3da4f
0a86dc2
71f2b8c
4d4c5ac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -35,8 +35,23 @@ | |||
def run(self, args, opts): | ||||
result = { | ||||
'project_type': 'scrapy', | ||||
'spiders': sorted(self.crawler_process.spider_loader.list()) | ||||
'spiders': sorted(self.crawler_process.spider_loader.list()), | ||||
} | ||||
try: | ||||
from scrapy_spider_metadata import get_spider_metadata | ||||
except ImportError: | ||||
pass | ||||
else: | ||||
result['metadata'] = {} | ||||
for spider_name in result['spiders']: | ||||
spider_cls = self.crawler_process.spider_loader.load(spider_name) | ||||
metadata_dict = get_spider_metadata(spider_cls) | ||||
try: | ||||
# make sure it's serializable | ||||
json.dumps(metadata_dict) | ||||
except (TypeError, ValueError): | ||||
continue | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we have a warning here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The command prints a JSON text which some other components read, we could print warnings to stderr but I don't think anything will read them (but I'm not sure). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Still, it seems currently it's the only place where the error could be show; without it there is no indication of issue at all. But maybe there is a better place for it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried to do that and found that the command explicitly disables logging:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might need to find another place to log these errorrs, or document it (in scrapy-spider-metadata?). But it seems this shouldn't block this PR. |
||||
result['metadata'][spider_name] = metadata_dict | ||||
if opts.debug: | ||||
output = subprocess.check_output( | ||||
['bash', '-c', self.IMAGE_INFO_CMD], | ||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
import os | ||
import subprocess | ||
import sys | ||
from pathlib import Path | ||
from typing import Tuple, Optional, Union | ||
|
||
|
||
def call_command(cwd: Union[str, os.PathLike], *args: str) -> Tuple[str, str]: | ||
result = subprocess.run( | ||
args, | ||
cwd=str(cwd), | ||
stdout=subprocess.PIPE, | ||
stderr=subprocess.PIPE, | ||
universal_newlines=True, | ||
) | ||
assert result.returncode == 0, result.stderr | ||
return result.stdout, result.stderr | ||
|
||
|
||
def call_scrapy_command(cwd: Union[str, os.PathLike], *args: str) -> Tuple[str, str]: | ||
args = (sys.executable, "-m", "scrapy.cmdline") + args | ||
return call_command(cwd, *args) | ||
|
||
|
||
def create_project(topdir: Path, spider_text: Optional[str] = None) -> Path: | ||
project_name = "foo" | ||
cwd = topdir | ||
call_scrapy_command(str(cwd), "startproject", project_name) | ||
cwd /= project_name | ||
(cwd / project_name / "spiders" / "spider.py").write_text(spider_text or """ | ||
from scrapy import Spider | ||
|
||
class MySpider(Spider): | ||
name = "myspider" | ||
""") | ||
return cwd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we lack tests on CI which check a case where the library is not installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.6 and 3.7 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this line detected as covered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of run() is detected as covered: https://app.codecov.io/github/scrapinghub/scrapinghub-entrypoint-scrapy/commit/ebf23f19ff3bc1853a35f001a8b2df7e1b1ec63b/blob/sh_scrapy/commands/shub_image_info.py
Not sure if codecov can detect coverage for code run as a separate process.