Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic image generator #751

Merged
merged 23 commits into from
Jul 15, 2024
Merged

Conversation

mwawrzos
Copy link
Collaborator

@mwawrzos mwawrzos commented Jul 11, 2024

The PR allows users to add multimodal data to the synthetic prompts.

Generated images:

  • are filled with uniform noise,
  • will have randomized shape (controlled with mean_size and dimiensions_stddev parameters),
  • user can choose image format from PNG or JPEG for base64 encoding.

@@ -0,0 +1,87 @@
import base64
import os

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'os' is not used.
@@ -0,0 +1,95 @@
import base64
from io import BytesIO, StringIO

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'StringIO' is not used.
)

# exception is raised, when PIL.Image.resize is called with negative values
image = next(sut)

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable image is not used.
@patch("pathlib.Path.exists", return_value=True)
@patch(
"PIL.Image.open",
return_value=DUMMY_IMAGE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return_value=DUMMY_IMAGE,
return_value=Image.new("RGB", (100, 100), color="blue"),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing against the DUMMY_IMAGE in an assertion below, so I prefer to keep it named, but I moved the variable definition from the global to the local scope.

from enum import Enum, auto
from io import BytesIO
from pathlib import Path
from typing import List, Optional, Tuple, cast

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'List' is not used.
import base64
from enum import Enum, auto
from io import BytesIO
from pathlib import Path

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'Path' is not used.
from enum import Enum, auto
from io import BytesIO
from pathlib import Path
from typing import Generator, Optional, Tuple, cast

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'Generator' is not used.
Import of 'Tuple' is not used.
Import of 'cast' is not used.
from typing import Generator, Optional, Tuple, cast

import numpy as np
from genai_perf.exceptions import GenAIPerfException

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'GenAIPerfException' is not used.
@@ -0,0 +1,87 @@
import base64
from io import BytesIO
from pathlib import Path

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'Path' is not used.
import base64
from io import BytesIO
from pathlib import Path
from unittest.mock import patch

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'patch' is not used.

import numpy as np
import pytest
from genai_perf.exceptions import GenAIPerfException

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'GenAIPerfException' is not used.
@nv-hwoo nv-hwoo merged commit 92b2f3d into vision-language Jul 15, 2024
5 checks passed
@nv-hwoo nv-hwoo deleted the synthetic-image-generator branch July 15, 2024 21:32
nv-hwoo added a commit that referenced this pull request Jul 18, 2024
* POC LLaVA VLM support (#720)

* POC for LLaVA support

* non-streaming request in VLM tests

* image component sent in "image_url" field instead of HTML tag

* generate sample image instead of loading from docs

* add vision to endpoint mapping

* fixes for handling OutputFormat

* refactor - extract image preparation to a separate module

* fixes to the refactor

* replace match-case syntax with if-elseif-else

* Update image payload format and fix tests

* Few clean ups and tickets added for follow up tasks

* Fix and add tests for vision format

* Remove output format from profile data parser

* Revert irrelevant code change

* Revert changes

* Remove unused dependency

* Comment test_extra_inputs

---------

Co-authored-by: Hyunjae Woo <[email protected]>

* Support multi-modal input from file for OpenAI Chat Completions (#749)

* add synthetic image generator (#751)

* synthetic image generator

* format randomization

* images should be base64-encoded arbitrarly

* randomized image format

* randomized image shape

* prepare SyntheticImageGenerator to support different image sources

* read from files

* python 3.10 support fixes

* remove unused imports

* skip sampled image sizes with negative values

* formats type fix

* remove unused variable

* synthetic image generator encodes images to base64

* image format not randomized

* sample each dimension independently

Co-authored-by: Hyunjae Woo <[email protected]>

* apply code-review suggestsions

* update class name

* deterministic synthetic image generator

* add typing to SyntheticImageGenerator

* SyntheticImageGenerator doesn't load files

* SyntheticImageGenerator always encodes images to base64

* remove unused imports

* generate gaussian noise instead of blank images

---------

Co-authored-by: Hyunjae Woo <[email protected]>

* Add command line arguments for synthetic image generation (#753)

* Add CLI options for synthetic image generation

* read image format from file when --input-file is used

* move encode_image method to utils

* Lazy import some modules

* Support synthetic image generation in GenAI-Perf (#754)

* support synthetic image generation for VLM model

* add test

* integrate sythetic image generator into LlmInputs

* add source images for synthetic image data

* use abs to get positive int

---------

Co-authored-by: Marek Wawrzos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants