Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPT20039: Test the limits of Pecha annotation transfer functionality #102

Open
3 of 5 tasks
tenzin3 opened this issue Dec 30, 2024 · 1 comment
Open
3 of 5 tasks
Assignees

Comments

@tenzin3
Copy link
Contributor

tenzin3 commented Dec 30, 2024

Description

The primary functionality of this toolkit is transferring span annotations from one text to another. Other important features, such as alignment annotation transfer, heavily depend on this capability. Therefore, we aim to thoroughly test the limits of this functionality.

Illustration of a Simple Text Annotation Transfer

Image

Expected Output

Establish clear metrics to determine the conditions under which annotation transfer is possible.

Gold standard data brain storming

Implementation Steps

  • Prepare Benchmark data.
  • Research other tools for text span annotation transfer.
  • Test the tools.
  • Enhance the toolkit's functionality based on test results.

Reviewer

@tenzin3 tenzin3 self-assigned this Dec 30, 2024
@tenzin3
Copy link
Contributor Author

tenzin3 commented Dec 30, 2024

Benchmark Data Brainstorming

When preparing gold-standard data, all possible differences between source and target texts should be considered. This ensures comprehensive annotations and precise data preparation. Below are the scenarios, examples, and annotation spans:


i) Segmentation Differences

Explanation: Differences in how text is segmented, such as use of newlines, spaces, or tabs.

source_text = "This is first sentence.This is      second sentence.This is third sentence."
target_text = "This is first sentence.\nThis is  second sentence.This is     third sentence."

annotation_spans_to_transfer => [
  "This is first sentence.",
  "This is      second sentence.",
  "This is third sentence."
]

ii) Extra Text in the Middle

Explanation: The source text contains additional content not present in the target text.

source_text = "This is first sentence.This is the second sentence.This is third sentence."
target_text = "This is first sentence.This is second sentence.This is third sentence."

annotation_spans_to_transfer => [
  "This is the second sentence."
]

iii) Missing Text in the Middle

Explanation: The source text is missing content that is present in the target text.

source_text = "This is first sentence.This is the sentence.This is third sentence."
target_text = "This is first sentence.This is second sentence.This is third sentence."

annotation_spans_to_transfer => [
  "This is the sentence."
]

iv) Extra Text at middle of sentence.

Explanation: The source text contains extra content at the beginning.

source_text = "This is first sentence.Starting from here is nice.This is second sentence.This is third sentence."
target_text = "This is first sentence.This is second sentence.This is third sentence."

annotation_spans_to_transfer => [
  "Starting from here is nice."
]

v) Word/Phrase Differences

Explanation: The source text contains different words or phrases compared to the target text.

source_text = "This is first sentence.This is my own second sentence.This is third sentence."
target_text = "This is first sentence.This is second sentence.This is third sentence."

annotation_spans_to_transfer => [
  "This is my own second sentence."
]

vi) Reordering of Content

Explanation: The order of sentences or phrases differs between the source and target texts.

source_text = "This is first sentence. This is third sentence. This is second sentence."
target_text = "This is first sentence. This is second sentence. This is third sentence."

annotation_spans_to_transfer => [
  "This is third sentence.",
  "This is second sentence."
]

vi) Combination of above cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant