use indentation from response rather than rstudioapi? #9

simonpcouch · 2024-10-06T19:21:57Z

Should look into whether models are getting indentation right already.

Koalha · 2024-11-20T16:54:07Z

I'm assuming this is related to some weirdness I get when calling pal using a locally served model. I use ollama and Qwen2.5-coder:14b. Here's my .Rprofile:

options(.pal_fn = "chat_ollama",.pal_args = list(model = "qwen2.5-coder:14b"))

I've created a toy assistant using prompt_new and the following system prompt:

You are an R programming assistant designed to generate code based on one or few lines of comments (beginning with one ore more copies of the character #). The comments describe in natural language what the code is supposed to do. NEVER wrap the code in backticks or format it in markdown, just print the code. Always, ALWAYS make sure that each command and comment is printed on a new line.

(The prompt might or might not give some hints about what my problem is)

When I serve Qwen with ollama from the command line using the above-mentioned system prompt and the comment ## Create a tibble called dd with the variables x and y that are standard normal variates., I get expected output:

# Load necessary library
library(tidyverse)

# Create a tibble with x and y as standard normal variates
dd <- tibble(
  x = rnorm(100),
  y = rnorm(100)
)

However when I trigger the pal within Rstudio, the output is mangled and sometimes contains markdown formatting, like so:
```Rlibrary(tibble)set.seed(123) # For reproducibilitydd <- tibble( x = rnorm(100), y = rnorm(100))```

Anyways, thanks for developing the package! Despite the rough edges, I'm already having a ton of fun with it.

simonpcouch · 2024-11-20T17:38:36Z

Thanks for the kind comment @Koalha! Stoked to hear it's functional enough that you're able to have fun with it.

This is interesting. I had assumed that the kind of missing-newline behavior you were seeing was an artifact of qwen2.5-coder:14b just not being very effective when I observed something similar last week, but now that I've run that model from the command line I do see that it tends to get line-breaks right more often from that interface. I would note that, with pal's interface, I see newlines sometimes, so they're not totally disappearing:

pal-qwen-newline.mov

It does seem like, from either interface, the model after wants to surround output in ```r, and pal won't be in the business of "fixing" that. I'll need to check whether there's some sort of newline-disappearance that's due to pal or elmer, but I also would entertain the idea that ollama is doing something smart to infer when newlines are probably warranted but missing.

Koalha · 2024-11-21T09:23:43Z

Thanks for the quick reply @simonpcouch! Seems I should have done some more testing related to the backticks. It's actually good to hear that the problem is probably related to the specific model I was using (and maybe to its interaction with ollama), since that means I can just focus on finding a model that plays better with pal.

simonpcouch · 2024-11-25T15:05:57Z

Hey @Koalha, you were right! The sub() call in this line was the culprit:

pal/R/rstudioapi.R

Line 153 in f3455dd

output_lines <- paste(output_lines, sub("\n$", "", chunk), sep = "")

I don't remember why I initially introduced that line in the first hacky hours of the package, and removing it seems to resolve the problem. Qwen 2.5 seems to chunk up tokens in a way that triggers this bug more often than other models. From Qwen, I see newlines included in chunks like ":\n\n", "\n", or "(\n". From Claude, I see newlines interspersed like " \n when including arguments\", call", "\n", or "::cli_abort(\n \"here's a". As the function is now written, though, I don't see any reason we need to strip trailing newlines per-token. Removing now.

Going to go ahead and close this issue, too, as I do think we'll continue to rely on the rstudio API to reformat.

simonpcouch mentioned this issue Nov 20, 2024

rethrow error on pal creation #62

Closed

simonpcouch closed this as completed in 633c38d Nov 25, 2024

simonpcouch added a commit to simonpcouch/ensure that referenced this issue Nov 25, 2024

address linebreaking issue re: simonpcouch/pal#9

f4a87c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use indentation from response rather than rstudioapi? #9

use indentation from response rather than rstudioapi? #9

simonpcouch commented Oct 6, 2024

Koalha commented Nov 20, 2024 •

edited

Loading

simonpcouch commented Nov 20, 2024

Koalha commented Nov 21, 2024

simonpcouch commented Nov 25, 2024

use indentation from response rather than rstudioapi? #9

use indentation from response rather than rstudioapi? #9

Comments

simonpcouch commented Oct 6, 2024

Koalha commented Nov 20, 2024 • edited Loading

simonpcouch commented Nov 20, 2024

Koalha commented Nov 21, 2024

simonpcouch commented Nov 25, 2024

Koalha commented Nov 20, 2024 •

edited

Loading