Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not properly segment within quotations #118

Open
Hgherzog opened this issue Feb 10, 2023 · 2 comments
Open

Does not properly segment within quotations #118

Hgherzog opened this issue Feb 10, 2023 · 2 comments

Comments

@Hgherzog
Copy link

Hgherzog commented Feb 10, 2023

When dealing with a long statement of facts quoted from legal text, the text is not split up within left double quotations and write double quotations. this is different than the " characterI cannot share the text here as it deals with sensitive content.

import pysbd
seg = pysbd.Segmenter(language='en')
sentences = seg.segment(above_text)

Returns a lot of length 1 and does not split by sentences. The expected behavior is to split up into sentences within the quotations.

@libTorrentUser
Copy link

libTorrentUser commented Jul 14, 2023

I confirm the... bug? Not sure if it is a bug or intentional but

import pysbd

text = 'He said "hello. And then world."'
seg = pysbd.segmenter.Segmenter(language='en', clean=True)
print(seg.segment(text))
['He said "hello. And then world."']

I was expecting

[
    'He said', 
    '"hello.',
    'And then world."'
]

It almost does the right thing when using single quotes. Almost. The sentence is split correctly, but it considers the terminating single quote as a sentence

import pysbd

text = "He said 'hello. And then world.'"
seg = pysbd.segmenter.Segmenter(language='en', clean=True)
print(seg.segment(text))
[
    "He said 'hello.",
    'And then world.',
    "'"]

@ghnp5
Copy link

ghnp5 commented Feb 2, 2025

diasks2/pragmatic_segmenter#13 and #45 seems to have the answer - it was a design choice.

But I wonder if we could have an option to change that behavior, as a parameter in the constructor... ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants