Fix HuggingFace regression due to prompt truncation #994

CTY-git · 2024-11-01T11:35:23Z

Avoid prompt truncation when base_url deviates when using OpenAI type API interfaces.

patched-admin · 2024-11-01T11:43:39Z

The pull request review highlighted several concerns, primarily focusing on changes to the is_prompt_supported function, which altered its return type from bool to int. This shift is problematic because it may introduce bugs and confusion due to the new use of -1 instead of False, compromising the original boolean semantics. The review suggests reverting to a bool return type or clearly defining what integer values represent and amending the function's type hints and documentation. It points out potential runtime errors when the -1 value is improperly handled elsewhere in the code and warns of indirect security implications through logic errors. Additionally, the review mentions a new method call __is_not_openai_url() that lacks visibility in the diff and could lead to AttributeError if not implemented. It advises hardening security against URL parsing attacks and recommends clarifying inline comments and adhering to coding standards, such as ensuring consistent naming conventions and proper state management for the variable is_added. A version update from '0.0.75' to '0.0.76' is noted without concerns but is flagged as a reminder to verify related changes for potential issues.

File changed: patchwork/common/client/llm/aio.py
1. Return Type Change: The return type of the is_prompt_supported function changed from bool to int. This change does not adhere to original usage or intended semantics as the function previously returned a boolean to indicate support. Using an int for this purpose can introduce confusion and bugs, especially since the code previously explicitly returned False, and now returns -1. A possible fix is reverting to a bool return type or defining what specific integer values signify clearly.

Potential Bugs: By returning -1, it's not clear if this int return value is expected or handled by the call sites of this method. This could lead to runtime errors if the calling code treats this return value as a boolean.
Security Vulnerability: There's no clear immediate security vulnerability introduced by this specific change, but if the returned integer value is mistakenly used in a conditional statement elsewhere, it could cause logic errors that might indirectly impact security.
Mismatch in Type Hinting: The function signature should reflect the change in return type if intentional. If changing to int is the desired behavior, the signature should have a return type of int and function documentation or comments must explain the meaning of the return values because -1 may not be intuitive for all developers.

File changed: patchwork/common/client/llm/openai_.py
1. Potential Bugs:
- The new condition if self.__is_not_openai_url() assumes the existence of the method __is_not_openai_url(), but this method is not shown in the diff. Ensure that this method is implemented and correctly determines whether the URL is an OpenAI endpoint. If it is not implemented, this will result in an AttributeError. Additionally, consider edge cases such as variations in URL format, protocol (http vs https), or subdomains.

Security Vulnerabilities:
- If the method __is_not_openai_url() involves parsing URLs, ensure it is protected against injection attacks or malformed input. Improper parsing can potentially lead to security vulnerabilities, especially if user input is involved.
Coding Standards:
- The inline comment # might not implement model endpoint is a bit vague and non-descriptive. Consider improving clarity to explain why this condition check is necessary. It would help future developers understand the implication of this block.
- Ensure that this code is consistent with the existing project's coding standards regarding comment styles and logging practices (if applicable).

File changed: patchwork/steps/JoinList/JoinList.py
1. Potential Bugs:
- The variable is_added is being used to track if the item has been processed. However, it should be reset for each item iteration within the list. Ensure is_added = False is set inside the loop that iterates over self.data items to prevent incorrect state carry-over from a previous iteration.

Security Vulnerabilities:
- Using json.dumps on arbitrary dictionary items without validation should be scrutinized, especially if data is coming from an untrusted source. Consider implementing validation or sanitization of these dictionary items to avoid any security risks (such as injection attacks if this data is subsequently used in sensitive contexts).
Coding Standards:
- The newly introduced variable is_added should follow any established naming conventions (e.g., isAdded or is_added_flag) if such conventions exist in the rest of the codebase.
- Ensure proper indentation and alignment of the code to match with the project’s style guide if applicable. The new code appears to conform but should be double-checked against any contributing guidelines or style standards specific to the project.

File changed: pyproject.toml
The change in version number from '0.0.75' to '0.0.76' in the pyproject.toml file is a standard update and does not inherently introduce any bugs, security vulnerabilities, or deviate from coding standards. However, since this is the only change visible, ensure that accompanying changes in the codebase related to this version update have been comprehensively reviewed for potential issues.

Fix hf regression due to prompt truncation

b36e733

CTY-git requested a review from codelion November 1, 2024 11:35

codelion approved these changes Nov 1, 2024

View reviewed changes

CTY-git merged commit ef4fef5 into main Nov 1, 2024
5 checks passed

CTY-git deleted the huggingface-regresssion-fix branch November 1, 2024 11:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix HuggingFace regression due to prompt truncation #994

Fix HuggingFace regression due to prompt truncation #994

CTY-git commented Nov 1, 2024

patched-admin commented Nov 1, 2024

Fix HuggingFace regression due to prompt truncation #994

Fix HuggingFace regression due to prompt truncation #994

Conversation

CTY-git commented Nov 1, 2024

patched-admin commented Nov 1, 2024