Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: add shopify scraping to crawling options setup #2533

Open
3 tasks
skeptrunedev opened this issue Oct 4, 2024 · 0 comments
Open
3 tasks

feature: add shopify scraping to crawling options setup #2533

skeptrunedev opened this issue Oct 4, 2024 · 0 comments
Assignees
Labels

Comments

@skeptrunedev
Copy link
Contributor

Description

We have successfully created a CLI for ingesting a shopify store into a Trieve index - https://github.com/devflowinc/shopify-scraper .

Issue with this is that it's foreign to the other scraping options when ideally it would not be.

Functionally, this should be implemented the same way the OpenAPI spec is where if a shopify_url is specified on the crawl_options, we try to pull it separately before waiting on the crawl to return.

This is the block of code to reference:

if let Some(crawl_options) = &crawl_options {
            if let Some(spec) = &spec {
                if page_tags.contains(&crawl_options.openapi_tag) {
  • feature: add shopify_products_json_url field to crawl setup on dataset creation
  • feature: Rust code to get the products.json from the URL
  • feature: Rust code to queue chunks for creation similar to the upload command on the CLI

Target(s)

crawl-worker,server

Requirement to close

PR adding the described functionality.

Community channels

Matrix is preferred. Reach out on discord or Matrix for further assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants