Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added support for recipeInstruction as string; added step formating #503

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JohannesFleischer
Copy link

@JohannesFleischer JohannesFleischer commented Oct 24, 2024

This MR adds scraping support for websites which store the recipeInstructions as a string and not as an array of objects.

This includes chefkoch.de and therefore closes #278 and #498


Also included is a minor additional formatting of the Steps including paragraphs and numbering:

image

@iwontknow
Copy link

Is there someone who is able to check the merge request and merge it?
I cant add recipes because, i get this error everywhere i try to get a recipe.

@JohannesFleischer
Copy link
Author

@iwontknow you could verify/test if my changes fix your problem or tell me which site you want to scrape from, and then I could take a look when I have some time.

@iwontknow
Copy link

iwontknow commented Nov 28, 2024

@iwontknow you could verify/test if my changes fix your problem or tell me which site you want to scrape from, and then I could take a look when I have some time.

I will try it tomorrow and report back.

Edit - 2024-11-29:
I have tested it on 2 different sites that were not working before and so far they are working after this commit.

sites:
chefkoch.de, ndr.de/ratgeber/kochen/rezepte/

@JohannesFleischer Can you check out why https://www.lidl-kochen.de/rezeptwelt/klassischer-gurkensalat-150209 for example isnt working? It just shows: "Unable to extract Recipe metadata from provided url"

@OPVL
Copy link

OPVL commented Feb 27, 2025

Screenshot 2025-02-27 at 14 16 33

If you take a look in the script tag for that recipe (cmd + f : ld+)

code

You can see the JSON is malformed, on the description. The first character of the description is a new line which causes recipe buddy to fail to parse it as json.

Screenshot 2025-02-27 at 14 19 03

easiest way to fix this is with a find and replace, or improve the escape logic in rb to handle things like this. The problem is you need to handle the ld contents as a string as you cannot simply tell it parse & escape description (as it won't find a value for that key if it initialises a json obj at all)

Just had a quick check and schema.org was able to parse the recipe no problemo which is a tad confusing but I guess they're handling their escapes properly. I might take a look in to how the data is being handled as a sidequest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when importing from chefkoch.de
3 participants