Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[algolia-pro] CLI indexing. Crawl indexing get stuck at 0 and doesn't start indexing. #337

Open
Sogl opened this issue Feb 3, 2023 · 4 comments
Assignees
Labels
algolia-pro Algolia Pro plugin

Comments

@Sogl
Copy link

Sogl commented Feb 3, 2023

I prepared my sitemap for indexing Flex objects as described here:
https://getgrav.org/premium/algolia-pro/docs/backend#crawl-page-search-intermediate

Code in my plugin:

public function onSitemapProcessed(Event $e)
{
    $sitemap = $e['sitemap'];
    $directory = $this->grav['flex']->getDirectory('therapies');
    foreach ($directory->getCollection()->filterBy(['published' => true]) as $therapy) {
        $route = "therapies/{$therapy->slug}";
        $entry = new SitemapEntry(
            Utils::url($route, true),
            date('Y-m-d', $therapy->updated_at),
            'daily',
            '1.0'
        );
        $sitemap[Utils::url($route)] = $entry;
    }
    $e['sitemap'] = $sitemap;
}

I use #therapy body selector:
image

But I can't start indexing:

% bin/plugin algolia-pro index                

Re-indexing Algolia Search
==========================


 131/131 [============================] 100% 2 secs/2 secs -- Index Config: pages | Algolia Index: pages-ru-grav
   0/157 [>---------------------------]   0% < 1 sec/< 1 sec -- Index Config: pages | Algolia Index: pages-ru-grav
                                                                                   
  Unable to display the estimated time if the maximum number of steps is not set.  
                                                                            

Same index name, as you can see. In Algolia it has proper name:
image

I also tried with additional index parameter:

% bin/plugin algolia-pro index --indexes=crawl

Re-indexing Algolia Search
==========================


   0/157 [>---------------------------]   0% < 1 sec/< 1 sec -- %message%
                                                                                   
  Unable to display the estimated time if the maximum number of steps is not set. 

The same error. What's this?

P.S. What I found about this error:
symfony/symfony#47244
fr05t1k/codeception-progress-reporter#12

@Sogl Sogl added the algolia-pro Algolia Pro plugin label Feb 3, 2023
@rhukster
Copy link
Member

rhukster commented Feb 3, 2023

it sounds like the problem is that the indexing get's stuck at 0 and doesn't start indexing? The message from the progerss bar about being unable to display the estimated time is not the real issue?

your issue title is confusing.

@Sogl Sogl changed the title [algolia-pro] CLI indexing. Unable to display the estimated time if the maximum number of steps is not set. [algolia-pro] CLI indexing. Crawl indexing get stuck at 0 and doesn't start indexing. Feb 3, 2023
@Sogl
Copy link
Author

Sogl commented Feb 3, 2023

it sounds like the problem is that the indexing get's stuck at 0 and doesn't start indexing?

Yes.

I did some debugging and found that my Flex pages simply don't open during Crawl:

image

CrawlPageSearch.php line 230 ($page is null):

$page = $pages->find($route);
...
if ($page instanceof PageInterface) {
    $this->addRecordFromResponse($page, $response, $url,$records, $status);
} else {
    $status[] = [
        'status' => 'error',
        'msg' => 'Page Not Found: ' . $route,
        'url' => $url
    ];
...

I think it's because my routes are created dynamically in my Flex plugin:

public function onPluginsInitialized(): void
{
    if (!$this->isAdmin()) {
        $this->router();
    }
}

public function router()
{
    /** @var Uri $uri */
    $uri = $this->grav['uri'];
    $route = Uri::getCurrentRoute()->getRoute();

    if (Utils::startsWith($route, '/therapies') && !Utils::contains($route, '.')) {
        $this->enable([
            'onPagesInitialized' => ['addTherapyPage', 0]
        ]);
    }
}

public function addTherapyPage()
{
    $route = Uri::getCurrentRoute()->getRoute();

    $normalized = trim($route, '/');
    if (!$normalized) {
        return;
    }

    $parts = explode('/', $normalized, 2);
    $key = array_shift($parts);
    $path = array_shift($parts);
    

    /** @var Pages $pages */
    $pages = $this->grav['pages'];
    if ($pages->find($route)) {
        /** @var Debugger $debugger */
        $debugger = $this->grav['debugger'];
        $debugger->addMessage("Page {$route} already exists, page cannot be added", 'error');
        return;
    }

    $flex = Grav::instance()->get('flex');
    $therapy = $flex->getObject($path, 'therapies');

    $page = $pages->find('/therapies/therapy');
    if ($page) {
        $page->id($page->modified() . md5($route));
        $page->slug(basename($route));
        $page->folder(basename($route));
        $page->route($route);
        $page->rawRoute($route);
        $page->modifyHeader('object', $path);

        if ($therapy) {
            $title = $therapy->getProperty('title');
            $page->title($title);

            $page->media($therapy->getMedia());

            $page->content($therapy->getProperty('description'));
        }

        $pages->addPage($page, $route);
    }
}

What is the best way to add such objects to the index?

@rhukster
Copy link
Member

rhukster commented Feb 3, 2023

So let me get this straight.. if you go to the route /therapies/depressiva-u-devushki-33-h-let in your browser what do you get? a working page or a 404? Because the crawler is saying that page is not accessible as it's getting a 404 response code when crawling it.

@Sogl
Copy link
Author

Sogl commented Feb 3, 2023

So let me get this straight.. if you go to the route /therapies/depressiva-u-devushki-33-h-let in your browser what do you get? a working page or a 404?

A working page.

It can't find my route in a routes list (/system/src/Grav/Common/Page/Pages.php):

/**
     * Find a page based on route.
     *
     * @param string $route The route of the page
     * @param bool   $all   If true, return also non-routable pages, otherwise return null if page isn't routable
     * @return PageInterface|null
     */
    public function find($route, $all = false)
    {
        $route = urldecode((string)$route);

        // Fetch page if there's a defined route to it.
        $path = $this->routes[$route] ?? null;      //HERE

Same with findSiteBasedRoute($route) check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algolia-pro Algolia Pro plugin
Projects
None yet
Development

No branches or pull requests

3 participants