Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate search functionality across products by enhancing Bitcoin Search API #154

Open
kouloumos opened this issue Dec 12, 2024 · 0 comments

Comments

@kouloumos
Copy link
Member

kouloumos commented Dec 12, 2024

Currently, we have three different implementations of search functionality across our products. While Bitcoin Search has a well-structured API that handles searching and filtering, both ChatBTC and Bitcoin TLDR have implemented their own versions with similar but slightly different features.

Problems with Current Implementation

The API in chatbtc differs from Bitcoin Search API in two main ways:

  • It needs to exclude documents of type "question" from the results, but the current Bitcoin Search API doesn't support exclusions in its filtering system
  • It uses a different index based on whether the user selected the "coredev" persona

Bitcoin TLDR's implementation differs in how it handles domain filtering:

  • It needs to filter for documents of type "combined-summary"
  • It allows searching within specific mailing list domains (bitcoin-dev, lightning-dev, delvingbitcoin) using OR logic between them
  • The current Bitcoin Search API only supports AND logic between filters, making this impossible to achieve

What Needs to Change

The Bitcoin Search API needs to be enhanced in three ways:

  1. Support for Exclusions: We need to add the ability to exclude certain documents based on field values. This will allow ChatBTC to exclude question-type documents.
  2. Support for OR Logic: We need to add support for OR logic between filters. This will allow Bitcoin TLDR to filter for documents from any of several specified domains. The API should maintain backwards compatibility by defaulting to AND logic when no logic type is specified.
  3. Support for Array Values: We need to allow arrays of values in filters to make the API more flexible and reduce the need for multiple filter entries for the same field.
  4. Index Selection: We need to consider how to handle different indices (like ChatBTC's coredev index) in a clean way.

These changes will allow us to consolidate our search functionality into a single, flexible API that can serve all our products while maintaining consistent behavior and making our codebase more maintainable.

Proposal

I propose enhancing the Bitcoin Search API's query builder to support more flexible filtering options. The key enhancement would be expanding the filterFields parameter to support exclusions and implicit OR logic for array values, as well as adding support for different indices.

The API will accept a new optional parameter:

  • index: Specifies which index to search against. Supported values are:
    • default (used if not specified)
    • coredev

Each filter in filterFields would accept one new optional parameter:

  • operation: Can be set to "exclude" to exclude matching documents

When a filter's value is an array, the API will automatically use OR logic between those values. Single values will continue to use AND logic between different filters for backward compatibility.

Here's how ChatBTC's search would look with the enhanced API:

{
  index: "coredev",  // Use the core developer index
  queryString: "taproot",
  filterFields: [
    {
      field: "type",
      value: "question",
      operation: "exclude"  // Excludes all question-type documents
    },
    {
      field: "authors",
      value: "Pieter Wuille"  // Regular filtering stays the same
    }
  ]
}

And here's how Bitcoin TLDR would use the API (using default index):

{
  queryString: "lightning network",
  filterFields: [
    {
      field: "type",
      value: "combined-summary"  // Must be a summary document
    },
    {
      field: "domain",
      value: [  // Implicit OR logic between these domains
        "lists.linuxfoundation.org/pipermail/bitcoin-dev",
        "lists.linuxfoundation.org/pipermail/lightning-dev",
        "delvingbitcoin.org"
      ]
    }
  ]
}

This approach:

  • Maintains backward compatibility (single values in different filters use AND logic)
  • Makes the API more intuitive (array values naturally imply OR logic)
  • Simplifies the interface by removing the need for explicit logic specification
  • Provides a clean way to switch between indices without environment variables
  • Supports all current use cases across our products
  • Allows for future flexibility without further API changes

Dynamic Aggregations with Sub-aggregations Support

Support dynamic aggregations with the ability to include sub-aggregations. This is implemented through the aggregationFields parameter, which accepts an array of aggregation configurations.

Each aggregation configuration can include:

  • field: The field to aggregate on (required)
  • size: The number of buckets to return (optional, defaults to the configured aggregatorSize)
  • subAggregations: An object containing Elasticsearch sub-aggregation configurations (optional)

Here's how to use the enhanced aggregations:

{
  queryString: "taproot",
  filterFields: [...],
  aggregationFields: [
    // Simple field aggregation
    { field: "authors" },
    
    // Aggregation with custom size
    { field: "domain", size: 100 },
    
    // Aggregation with sub-aggregations
    {
      field: "thread_url",
      size: 1000,
      subAggregations: {
        latest_doc: { max: { field: "indexed_at" } },
        doc_count: { value_count: { field: "thread_url.keyword" } }
      }
    }
  ]
}

The API will:

  • Use .keyword for term aggregations automatically
  • Apply the default aggregatorSize if no size is specified
  • Support any valid Elasticsearch sub-aggregation configuration
  • Allow mixing simple aggregations with complex ones that include sub-aggregations
  • Maintain backward compatibility by using default aggregations when no aggregationFields are specified

Example response structure for an aggregation with sub-aggregations:

{
  // ... other response fields ...
  aggregations: {
    thread_url: {
      buckets: [
        {
          key: "https://example.com/thread1",
          doc_count: 5,
          latest_doc: {
            value: "2024-03-20T00:00:00Z"
          },
          doc_count_value: 5
        },
        // ... more buckets ...
      ]
    }
  }
}

This enhanced aggregation system provides the flexibility needed for different search interfaces while maintaining a clean and consistent API structure. It supports complex use cases like the thread view in the explorer, which needs both document counts and latest document timestamps for each thread.

Migration Strategy

Phase 1: Enhance Bitcoin Search API

  • Implement support for exclusions via operation: "exclude"
  • Add array support in filter values (implicit OR logic)
  • Add index selection support
  • Maintain backward compatibility

Phase 2: Update ChatBTC

  • Remove custom search implementation
  • Migrate to enhanced Bitcoin Search API
  • Update index handling to use new index parameter

Phase 3: Update Bitcoin TLDR

  • Remove custom search implementation
  • Convert domain filtering to use array values
  • Update type filtering to use standard format

Phase 4: Documentation & Cleanup

  • Update API documentation with new features
  • Remove deprecated search implementations
  • Add examples for common use cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant