Add o1 support with structured output #1025

CTY-git · 2024-11-15T06:55:19Z

PR Checklist

The commit message follows our guidelines: Code of conduct
Tests for the changes have been added (for bug fixes / features)
Docs have been added / updated (for bug fixes / features)
Does this PR introduce a breaking change?
Include PR in release notes?

PR Type

What is the current behavior?

Currently o1 models are not supported due to outdated tiktoken version. Current tiktoken version is 0.7.0
In addition, o1 models do not support structured outputs.
Issue Number: N/A

What is the new behavior?

I have bumped our own fork of code2prompt to depend on version ^0.8.0 of tiktoken which adds encoding mapping of o1 models.

I have opted to always follow the recommendation here when structured output is required of o1 models.

Other information

patched-codes · 2024-11-15T09:36:43Z

File Changed: `patchwork/common/client/llm/openai_.py`

Details: Function name __o1_chat_completion uses double underscores, which is typically reserved for name mangling in Python. This violates the snake_case naming convention rule.

Affected Code Snippet:

def __o1_chat_completion(
    self,
    messages: Iterable[ChatCompletionMessageParam],
    model: str,
    frequency_penalty: Optional[float] | NotGiven = NOT_GIVEN,
    logit_bias: Optional[Dict[str, int]] | NotGiven = NOT_GIVEN,
    logprobs: Optional[bool] | NotGiven = NOT_GIVEN,
    max_tokens: Optional[int] | NotGiven = NOT_GIVEN,
    n: Optional[int] | NotGiven = NOT_GIVEN,
    presence_penalty: Optional[float] | NotGiven = NOT_GIVEN,
    response_format: dict | completion_create_params.ResponseFormat | NotGiven = NOT_GIVEN,
    stop: Union[Optional[str], List[str]] | NotGiven = NOT_GIVEN,
    temperature: Optional[float] | NotGiven = NOT_GIVEN,
    top_logprobs: Optional[int] | NotGiven = NOT_GIVEN,
    top_p: Optional[float] | NotGiven = NOT_GIVEN,
):

Start Line: 127
End Line: 141

Details: The function __o1_chat_completion lacks a proper docstring explaining its purpose, parameters, and return type.

Affected Code Snippet:

def __o1_chat_completion(
    self,
    messages: Iterable[ChatCompletionMessageParam],
    model: str,
    frequency_penalty: Optional[float] | NotGiven = NOT_GIVEN,
    logit_bias: Optional[Dict[str, int]] | NotGiven = NOT_GIVEN,
    logprobs: Optional[bool] | NotGiven = NOT_GIVEN,
    max_tokens: Optional[int] | NotGiven = NOT_GIVEN,
    n: Optional[int] | NotGiven = NOT_GIVEN,
    presence_penalty: Optional[float] | NotGiven = NOT_GIVEN,
    response_format: dict | completion_create_params.ResponseFormat | NotGiven = NOT_GIVEN,
    stop: Union[Optional[str], List[str]] | NotGiven = NOT_GIVEN,
    temperature: Optional[float] | NotGiven = NOT_GIVEN,
    top_logprobs: Optional[int] | NotGiven = NOT_GIVEN,
    top_p: Optional[float] | NotGiven = NOT_GIVEN,
):

Start Line: 127
End Line: 141

Details: The function __o1_chat_completion is missing a return type annotation, which violates the rule of including type hints for function return values.

Affected Code Snippet:

def __o1_chat_completion(
    self,
    messages: Iterable[ChatCompletionMessageParam],
    model: str,
    frequency_penalty: Optional[float] | NotGiven = NOT_GIVEN,
    logit_bias: Optional[Dict[str, int]] | NotGiven = NOT_GIVEN,
    logprobs: Optional[bool] | NotGiven = NOT_GIVEN,
    max_tokens: Optional[int] | NotGiven = NOT_GIVEN,
    n: Optional[int] | NotGiven = NOT_GIVEN,
    presence_penalty: Optional[float] | NotGiven = NOT_GIVEN,
    response_format: dict | completion_create_params.ResponseFormat | NotGiven = NOT_GIVEN,
    stop: Union[Optional[str], List[str]] | NotGiven = NOT_GIVEN,
    temperature: Optional[float] | NotGiven = NOT_GIVEN,
    top_logprobs: Optional[int] | NotGiven = NOT_GIVEN,
    top_p: Optional[float] | NotGiven = NOT_GIVEN,
):

Start Line: 127
End Line: 141

File Changed: `patchwork/patchflows/GenerateUnitTests/GenerateUnitTests.py`

Violation of rule: Do not write functions without proper docstrings including parameter types, return types, and purpose

Details: The __init__ and run methods in the GenerateUnitTests class lack proper docstrings. Docstrings should include parameter types, return types, and a clear description of the method's purpose.

Affected Code Snippet:

class GenerateUnitTests(Step):
    def __init__(self, inputs):
        super().__init__(inputs)

        final_inputs = yaml.safe_load(_DEFAULT_INPUT_FILE.read_text())
        if final_inputs is None:
            final_inputs = {}

        final_inputs.update(inputs)

        # ... rest of the method ...

    def run(self):
        outputs = CallCode2Prompt(self.inputs).run()
        new_file_name = f"test_file.{self.inputs['test_file_extension']}"
        new_file_path = Path(outputs["uri"]).with_name(new_file_name)
        Path(outputs["uri"]).rename(new_file_path)
        outputs["uri"] = str(new_file_path)
        # ... rest of the method ...

Start Line: 13
End Line: 55

Violation of rule: Do not omit type annotations and type hints for function parameters and return values

Details: The __init__ and run methods lack type annotations for their parameters and return values. Type hints should be added to improve code readability and maintainability.

Affected Code Snippet:

class GenerateUnitTests(Step):
    def __init__(self, inputs):
        super().__init__(inputs)

        # ... rest of the method ...

    def run(self):
        outputs = CallCode2Prompt(self.inputs).run()
        # ... rest of the method ...

Start Line: 13
End Line: 55

Violation of rule: Do not use f-strings with untrusted user input

Details: The code uses f-strings to create pr_title and branch_prefix. While the current usage doesn't appear to directly include user input, it's a potential security risk if self.__class__.__name__ could be manipulated by user input in any way.

Affected Code Snippet:

final_inputs["pr_title"] = f"PatchWork Unit Tests generated"
final_inputs["branch_prefix"] = f"{self.__class__.__name__.lower()}-"

Start Line: 31
End Line: 32

Violation of rule: Do not define input/output structures without using TypedDict

Details: The code uses dictionaries for input and output structures (final_inputs, outputs) without using TypedDict to define their structure. This can lead to potential runtime errors and makes the code harder to maintain.

Affected Code Snippet:

final_inputs = yaml.safe_load(_DEFAULT_INPUT_FILE.read_text())
if final_inputs is None:
    final_inputs = {}

final_inputs.update(inputs)

# ... and later in the run method ...

outputs = CallCode2Prompt(self.inputs).run()

Start Line: 16
End Line: 55

Violation of rule: Do not skip using Annotated for StepTypeConfig

Details: The code doesn't use Annotated for StepTypeConfig, which is recommended for better type checking and documentation of step configurations.

Affected Code Snippet:

class GenerateUnitTests(Step):
    def __init__(self, inputs):
        super().__init__(inputs)
        # ... rest of the method ...

Start Line: 13
End Line: 55

File Changed: `patchwork/patchflows/init.py`

Details: The __all__ list in the __init__.py file is not using consistent quotation marks for the module names. All items should use double quotes for consistency.

Affected Code Snippet:

__all__ = [
    "AutoFix",
    "DependencyUpgrade",
    "GenerateREADME",
    "PRReview",
    "ResolveIssue",
    "GenerateDocstring",
    "GenerateUnitTests",
]

Start Line: 9
End Line: 17

Details: The imports in the __init__.py file are not sorted using isort. The GenerateUnitTests import is out of alphabetical order.

Affected Code Snippet:

from .AutoFix.AutoFix import AutoFix
from .DependencyUpgrade.DependencyUpgrade import DependencyUpgrade
from .GenerateDocstring.GenerateDocstring import GenerateDocstring
from .GenerateREADME.GenerateREADME import GenerateREADME
from .GenerateUnitTests.GenerateUnitTests import GenerateUnitTests
from .PRReview.PRReview import PRReview
from .ResolveIssue.ResolveIssue import ResolveIssue

Start Line: 1
End Line: 7

File Changed: `patchwork/steps/ModifyCode/ModifyCode.py`

Details: The code diff shows an inconsistent use of whitespace. The change removes a trailing whitespace, which is good practice, but it's not a major violation of the provided rules.

Affected Code Snippet:

             if new_code is None:
                 continue
-            
+
             replace_code_in_file(uri, start_line, end_line, new_code)
             modified_code_file = dict(path=uri, start_line=start_line, end_line=end_line, **extracted_response)
             modified_code_files.append(modified_code_file)

Start Line: 88
End Line: 93

This review is based on a thorough analysis of the provided code diff and pull request information against the given rules. The change made is actually an improvement in code style by removing unnecessary whitespace. However, this doesn't constitute a major violation of any of the specified rules.

No other clear and major violations of the provided rules were identified in the given code diff. The PR seems to focus on adding support for o1 models and structured output, which is not directly related to the code changes shown in the diff.

It's worth noting that without seeing the full context of the file, it's challenging to make a comprehensive assessment of all the rules. The diff only shows a small portion of the code, which limits the scope of the review.

File Changed: `pyproject.toml`

Details: The version specification for patched-code2prompt is using an imprecise development version format.

Affected Code Snippet:

patched-code2prompt = "~0.9.0.dev3"

Start Line: 36
End Line: 36

Details: The patchwork-cli package version is not following the correct development version format.

Affected Code Snippet:

[tool.poetry]
name = "patchwork-cli"
version = "0.0.79"

Start Line: 1
End Line: 3

Copilot reviewed 6 out of 6 changed files in this pull request and generated no suggestions.

Comments skipped due to low confidence (3)

patchwork/common/client/llm/openai_.py:32

The line contains a tab character which should be removed.

gpt-4

patchwork/common/client/llm/openai_.py:115

The line is unreachable due to the preceding return statement.

return self.client.chat.completions.create(**NotGiven.remove_not_given(input_kwargs))

patchwork/common/client/llm/openai_.py:177

The string concatenation should use f-string formatting for consistency.

"content": f"Given the following data, format it with the given response format: {o1_choice.message.content}"

CTY-git added 2 commits November 15, 2024 14:51

lint and add o1 support with structured outputs

ff6812a

lint

5d684db

patched-codes bot deleted a comment from patched-admin Nov 15, 2024

CTY-git requested a review from jonahdc November 15, 2024 07:35

jonahdc approved these changes Nov 15, 2024

View reviewed changes

patched-admin requested a review from Copilot November 20, 2024 02:57

Copilot AI reviewed Nov 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add o1 support with structured output #1025

Add o1 support with structured output #1025

CTY-git commented Nov 15, 2024

patched-codes bot commented Nov 15, 2024

Add o1 support with structured output #1025

Are you sure you want to change the base?

Add o1 support with structured output #1025

Conversation

CTY-git commented Nov 15, 2024

PR Checklist

PR Type

What is the current behavior?

What is the new behavior?

Other information

patched-codes bot commented Nov 15, 2024

File Changed: patchwork/common/client/llm/openai_.py

File Changed: patchwork/patchflows/GenerateUnitTests/GenerateUnitTests.py

File Changed: patchwork/patchflows/__init__.py

File Changed: patchwork/steps/ModifyCode/ModifyCode.py

File Changed: pyproject.toml

Choose a reason for hiding this comment

File Changed: `patchwork/common/client/llm/openai_.py`

File Changed: `patchwork/patchflows/GenerateUnitTests/GenerateUnitTests.py`

File Changed: `patchwork/patchflows/init.py`

File Changed: `patchwork/steps/ModifyCode/ModifyCode.py`

File Changed: `pyproject.toml`