-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New OpenSearch API source implementation #4603
Conversation
List<Map<String, Object>> jsonListData = new ArrayList<>(); | ||
|
||
String requestBody = new String(httpData.toInputStream().readAllBytes(), StandardCharsets.UTF_8); | ||
List<String> jsonLines = Arrays.asList(requestBody.split(REGEX)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better to create the Pattern
in the constructor. Then you can call splitPattern.split(requestBody))
. This should avoid compilation each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable . I am going to address this.
violationRules { | ||
rule { //in addition to core projects rule | ||
limit { | ||
minimum = 0.90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new project. Can we aim for 100% coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable . I am going to address this.
Thread.currentThread().interrupt(); | ||
throw new RuntimeException(ex); | ||
} | ||
LOG.info("Started OpenSearch API source on port " + sourceConfig.getPort() + "..."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG.info("Started OpenSearch API source on port " + sourceConfig.getPort() + "..."); | |
LOG.info("Started OpenSearch API source on port {}.", sourceConfig.getPort()); |
Let's avoid these ellipses at the end. They are unclear and make it seem that more is coming.
Also, please use SLF4J interpolation over string concatenation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable . I am going to address this.
|
||
public class OpenSearchAPISourceConfig extends BaseHttpServerConfig { | ||
|
||
static final String DEFAULT_ENDPOINT_URI = "/opensearch"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default should just be /
since this matches existing OpenSearch domains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable . I am going to address this.
import java.util.Arrays; | ||
import java.util.ArrayList; | ||
|
||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run a code formatter over your changes. There is a lot of whitespace that is off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable . I am going to address this.
if (buffer == null) { | ||
throw new IllegalStateException("Buffer provided is null"); | ||
} | ||
if (server == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot here that is similar to the http
source. Can we extend your work from #4570 to have more of this shared in common?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable . I am going to address this.
|
||
@Post("/_bulk") | ||
public HttpResponse doPostBulk(final ServiceRequestContext serviceRequestContext, final AggregatedHttpRequest aggregatedHttpRequest, | ||
@Param("pipeline") Optional<String> pipeline, @Param("routing") Optional<String> routing) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing Optional
in as a parameter is an anti-pattern. You don't actually know if it is null
or not. So you have tw checks you need to make pipeline != null && pipeline.isPresent()
. Just take in a String
and expect it to possibly be null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable . I am going to address this.
@Post("/{index}/_bulk") | ||
public HttpResponse doPostBulkIndex(final ServiceRequestContext serviceRequestContext, final AggregatedHttpRequest aggregatedHttpRequest, @Param("index") Optional<String> index, | ||
@Param("pipeline") Optional<String> pipeline, @Param("routing") Optional<String> routing) throws Exception { | ||
requestsReceivedCounter.increment(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's consolidate all of this logic in processBulkRequest
. The only thing that each of these methods should do is create the BulkAPIRequestParams
and then callprocessBulkRequest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable . I am going to address this.
f052e71
to
1ad4250
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactoring work!
|
||
private static final Logger LOG = LoggerFactory.getLogger(OpenSearchAPIService.class); | ||
|
||
// TODO: support other data-types as request body, e.g. json_lines, msgpack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO
really true? I think this is the only codec we need for _bulk
.
fe8659f
to
77930fc
Compare
918dfd9
to
0e7bf17
Compare
.../java/org/opensearch/dataprepper/plugins/source/opensearchapi/OpenSearchAPISourceConfig.java
Outdated
Show resolved
Hide resolved
.scheme(SessionProtocol.HTTP) | ||
.authority(AUTHORITY) | ||
.method(HttpMethod.GET) | ||
.path("/health") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want a health check endpoint? This is not something that the OpenSearch API would normally have.
If anything, maybe we should support the root API?
GET /
We could add this in a follow-on PR. But, I think we want to remove the /health
endpoint.
Do you want to create a follow-on PR for the root endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable. I will resolve this. The default health check endpoint is /health
which is the default endpoint for Http source. I will modify the BastHttpSource implementation to ensure that the OpenSearch API source can control the endpoint /
for health check.
private static final ObjectMapper mapper = new ObjectMapper(); | ||
private final String PLUGIN_NAME = "opensearch_api"; | ||
private final String TEST_PIPELINE_NAME = "test_pipeline"; | ||
private final String TEST_INDEX = "test-index"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have this be a variable generated in @BeforeEach
.
private String testIndex;
...
@BeforeEach
void setUp() {
testIndex = UUID.randomUUID().toString();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable. I will resolve this.
class OpenSearchAPISourceTest { | ||
private static final ObjectMapper mapper = new ObjectMapper(); | ||
private final String PLUGIN_NAME = "opensearch_api"; | ||
private final String TEST_PIPELINE_NAME = "test_pipeline"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also make this a random value generated in the setup method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable. I will resolve this.
integerHashMap.put("batch_size", 1); | ||
final PluginSetting pluginSetting = new PluginSetting("blocking_buffer", integerHashMap); | ||
pluginSetting.setPipelineName(TEST_PIPELINE_NAME); | ||
return new BlockingBuffer<>(pluginSetting); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a mock buffer instead?
@ParameterizedTest | ||
@ValueSource(booleans = {false, true}) | ||
public void testBulkRequestAPIJsonResponse413(boolean includeIndexInPath) throws JsonProcessingException { | ||
testBulkRequestJsonResponse413(includeIndexInPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's inline this. We don't need an extra method here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable. I will resolve this.
|
||
// Fill in the buffer | ||
WebClient.of().execute(testRequestHeaders, testHttpData).aggregate() | ||
.whenComplete((i, ex) -> assertSecureResponseWithStatusCode(i, HttpStatus.OK)).join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this expecting an OK
? Shouldn't it be the 408?
private final HttpData serializedRequestBadEmptyNewLines = HttpData.ofUtf8("\n\n\n\n\n\n\n \n"); | ||
private final HttpData serializedRequestBadInvalidJson = HttpData.ofUtf8("{\"text\":"); | ||
|
||
private final MultiLineJsonCodec multiLineJsonCodec = new MultiLineJsonCodec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the createObjectUnderTest()
method pattern. It allows more flexibility as tests grow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable. I will resolve this.
|
||
try { | ||
if (buffer.isByteBuffer()) { | ||
buffer.writeBytes(content.array(), null, bufferWriteTimeoutInMillis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will probably want to support chunking. But, we can do this in a follow-on PR. Can you create a PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dlvenable. I will create a separate PR to follow up on adding the chunking functionality.
* Bulk API supports query parameters "pipeline", "routing" and "refresh" | ||
*/ | ||
@Blocking | ||
public class OpenSearchAPIService implements BaseHttpService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to support end-to-end acknowledgements, but we can do this in a follow-on PR. Can you create a GitHub issue to track this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e1fe2ba
to
472c912
Compare
Replaced by #5024. |
Description
In order for DataPrepper to support all OpenSearch Document API(s), we need to build a new source similar to the existing http source. This pull request is intended to implement a new OpenSearch API source like
opensearch_api
similar tohttp
source. This source should support the Document API Bulk.This pull request includes the following:
_bulk
<index>/_bulk
pipeline
androuting
. (TODO:pipeline
parameter handling on Sink side)Example pipeline configuration with
opensearch_api
source looks like:Issues Resolved
Contributes to #248
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.