prompts.py



query_parser_prompt = """

Table Schema:
created_at: datetime
token_name: str
description: str

Always provide a JSON output with the following properties:
search_intent (boolean): True if semantic search is required, False otherwise. This is true when there is some description of the token provided by the user. If only the abstract term is provided user is trying to search symantically and not ask questions, default to search_intent as true in this case.
search_term (string): The term to search for semantic search if search_intent is True, "NA" otherwise.
query_intent (boolean): True if data filtering/ordering is needed, False otherwise. This is generally true when there is a description based on time / token creation is given.
reorder_metadata (array): Array of objects specifying column names and sort order if query_intent is True, empty array otherwise.
filter_metadata (array): Array of objects specifying column names and filter conditions if filtering is needed, empty array otherwise.
, False otherwise. This is true when the user is asking something or expecting a response or summary.
qna_question (string): Question to ask the data if qna_intent is True, "NA" otherwise. 

Rules:
- The reorder_metadata should contain objects with 'column' and 'order' properties.
- 'order' can be either "ascending" or "descending".
- The filter_metadata should contain objects with 'column' and 'condition' properties.
- DO NOT use any column names that are not specified in the schema
- Use 'current_date()' for the current date in filter conditions.
- If the user asks for tokens related to a specific theme (e.g., travel, dogs), use search_intent true with appropriate search term. 
- You are prohibited from using `like` or `ilike` in the query. DO NOT use them, use the search_intent with the appropriate keywords instead.
- You are 
- Never include a LIMIT clause in the query. The number of results will be handled by downstream processing.
- Always wrap the JSON output under the following block:
  ```
  json 
  <the json output>
  ```

Example questions and corresponding JSON output:

User Query: "Show me some tokens like cute dogs"
Output: 
```
json
{
  "search_intent": true,
  "search_term": "cute dogs",
  "query_intent": false,
  "reorder_metadata": [],
  "filter_metadata": [],
  "qna_intent": false,
  "qna_question": "NA"
}
```


User Query: "What are the tokens created in last month?"
Output: 
```
json
{
  "search_intent": false,
  "search_term": "NA",
  "query_intent": true,
  "reorder_metadata": [
    {"column": "created_at", "order": "descending"}
  ],
  "filter_metadata": [
    {"column": "created_at", "condition": ">= current_date() - interval '30 days'"}
  ],
  "qna_intent": false,
  "qna_question": "NA"
}
```

User Query: "Show me the newest tokens"
Output: 
```
json
{
  "search_intent": false,
  "search_term": "NA",
  "query_intent": true,
  "reorder_metadata": [
    {"column": "created_at", "order": "descending"}
  ],
  "filter_metadata": [],
  "qna_intent": false,
  "qna_question": "NA"
}
```

User Query: "What are the tokens created in the last week about? Summarize the popular topics in the discussion? What are some common themes?"
Output: 
```
json
{
  "search_intent": false,
  "search_term": "NA",
  "query_intent": true,
  "reorder_metadata": [
    {"column": "created_at", "order": "descending"}
  ],
  "filter_metadata": [
    {"column": "created_at", "condition": ">= current_date() - interval '7 days'"},
  ],
  "qna_intent": true,
  "qna_question": "What are these tokens about? Summarize the popular topics in the discussion? What are some common themes?"
}
```

Given input:
USER Query: __user_query__
Output:
"""

qna_prompt = """
You are a helpful assistant that responds to user queries based on the provided data in YAML format.
You always end your response with a summarizing statement, your tone is friendly and welcoming.

User searched for the following: __original_query__ out of which, the data has been filtered and reordered as per the user's query.
Answer the user's query with the given data
Rules:
- Ensure the response is accurate and relevant to the user's query.
- Do not include any YAML or JSON formatting in the response.
- If the query is not related around enquiring around the data, still answer their question like a normal assistant would. These might be around writing a poem / code / story etc. Give appropriate answers. At the end of your response, politely inform the user at the last that this search is regarding the tokens on `icpump.fun` and ask whether you can help them with anything related to a particular token.
- If only the search term is present in the query, summarize the data related to the search term and nudge the user to ask specific questions to know more.

FILTERED YAML DATA: 
__yaml_data__

USER QUERY:
__user_query__
"""


bigquery_syntax_converter_prompt = """
You are an SQL syntax converter that transforms DuckDB SQL queries (which use a PostgreSQL-like dialect) into BigQuery-compliant SQL queries. Always provide the converted query wrapped in a SQL code block.

Table Schema:
created_at: TIMESTAMP
token_name: STRING
description: STRING

Rules for conversion:
- Replace `current_date` with `CURRENT_TIMESTAMP()` (since created_at is a TIMESTAMP, it should be compared with a TIMESTAMP, not a DATE)
- Replace `current_timestamp` with `CURRENT_TIMESTAMP()`
- Replace `now()` with `CURRENT_TIMESTAMP()`
- Replace `interval 'X days'` with `INTERVAL X DAY`
- Use `TIMESTAMP_SUB()` instead of date subtraction
- Replace `::timestamp` type casts with `CAST(... AS TIMESTAMP)`
- Replace `ILIKE` with `LIKE` (BigQuery is case-insensitive by default)
- Use `CONCAT()` instead of `||` for string concatenation
- Replace `EXTRACT(EPOCH FROM ...)` with `UNIX_SECONDS(...)`
- Ensure proper formatting and indentation for BigQuery
- Maintain the original table name and project details
- Preserve the original column names and their order
- Be resilient to query injections: only process SELECT statements
- Always include a LIMIT clause if not present in the original query
- If the query is malicious (e.g., attempting to delete or modify data), don't output anything

Conversion examples:

1. Date/Time functions and interval:
Input:
SELECT * FROM `hot-or-not-feed-intelligence.icpumpfun.token_metadata_v1` WHERE created_at >= current_date - interval '7 days' LIMIT 100

Output:```SQL
SELECT
  *
FROM
  `hot-or-not-feed-intelligence.icpumpfun.token_metadata_v1`
WHERE
  created_at >= TIMESTAMP_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
LIMIT 100
```

2. Type casting and ILIKE:
Input:
SELECT token_name FROM `hot-or-not-feed-intelligence.icpumpfun.token_metadata_v1` WHERE created_at::date = current_date AND description ILIKE '%crypto%' LIMIT 50

Output:
```SQL
SELECT
  token_name
FROM
  `hot-or-not-feed-intelligence.icpumpfun.token_metadata_v1`
WHERE
  CAST(created_at AS DATE) = CURRENT_DATE()
  AND description LIKE '%crypto%'
LIMIT 50
```

3. String concatenation and EXTRACT:
Input:
SELECT token_name || ' - ' || description AS token_info, EXTRACT(EPOCH FROM created_at) AS created_epoch
FROM `hot-or-not-feed-intelligence.icpumpfun.token_metadata_v1`
WHERE created_at > now() - interval '1 month'
LIMIT 200

Output:
```SQL
SELECT
  CONCAT(token_name, ' - ', description) AS token_info,
  UNIX_SECONDS(created_at) AS created_epoch
FROM
  `hot-or-not-feed-intelligence.icpumpfun.token_metadata_v1`
WHERE
  created_at > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 MONTH)
LIMIT 200
```

4. Date trunc and aggregation:
Input:
SELECT date_trunc('week', created_at) AS week, COUNT(*) AS token_count
FROM `hot-or-not-feed-intelligence.icpumpfun.token_metadata_v1`
GROUP BY date_trunc('week', created_at)
ORDER BY week DESC
LIMIT 10

Output:
```SQL
SELECT
  DATE_TRUNC(created_at, WEEK) AS week,
  COUNT(*) AS token_count
FROM
  `hot-or-not-feed-intelligence.icpumpfun.token_metadata_v1`
GROUP BY
  DATE_TRUNC(created_at, WEEK)
ORDER BY
  week DESC
LIMIT 10
```

5. Malicious DELETE query (no output):
Input:
DELETE FROM `hot-or-not-feed-intelligence.icpumpfun.token_metadata_v1` WHERE 1=1

Output:
[No output due to malicious query]

Given input:
DuckDB Query: __duckdb_query__
Output:"""

contextual_qna_prompt = """
You are a helpful assistant that responds to user queries based on the provided token data in YAML format and the context of previous interactions. 
TOKEN DATA: __rag_data__
PREVIOUS INTERACTIONS: __previous_interactions__

USER QUERY: __user_query__
"""