Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Retrospective] Release Version 2.13.0 #4602

Closed
gaiksaya opened this issue Apr 2, 2024 · 13 comments
Closed

[Retrospective] Release Version 2.13.0 #4602

gaiksaya opened this issue Apr 2, 2024 · 13 comments

Comments

@gaiksaya
Copy link
Member

gaiksaya commented Apr 2, 2024

Related release issue?

#4433

How to use this issue?

Please add comments to this issue, they can be small or large in scope. Honest feedback is important to improve our processes, suggestions are also welcomed but not required.

What will happen to this issue post release?

There will be a discussion(s) about how the release went and how the next release can be improved. Then this ticket will be updated with the notes of that discussion along side action items.

@github-actions github-actions bot added the untriaged Issues that have not yet been triaged label Apr 2, 2024
@sandervandegeijn
Copy link

Great, going to install it on the first clusters today. Thanks for all the hard work!

Glad to see there are some core improvements as well, the focus seems to lie heavilly on the ML/LLM/GenAI stuff. Although the whole industry is jumping on this bandwagon, so I do understand, I'm a bit worried other things are taking a back seat. Where are we headed in the near future?

@gaiksaya
Copy link
Member Author

gaiksaya commented Apr 4, 2024

@sandeshkr419
Copy link

sandeshkr419 commented Apr 4, 2024

Time to speed up the gradle check as well for the next release. It takes eternity (60-90 minutes on good days) for gradle check to finish. Can think of dissecting some of the test modules out of it to an independent action item.

Tracking issue: opensearch-project/OpenSearch#832

Impact - friction in development process.

@bbarani
Copy link
Member

bbarani commented Apr 5, 2024

Great, going to install it on the first clusters today. Thanks for all the hard work!

Glad to see there are some core improvements as well, the focus seems to lie heavilly on the ML/LLM/GenAI stuff. Although the whole industry is jumping on this bandwagon, so I do understand, I'm a bit worried other things are taking a back seat. Where are we headed in the near future?

Thanks for your feedback. Multiple enhancements, features and bug fixes were added to 2.13.0 release in OpenSearch core repo as listed here including performance related improvements around Terms Aggregation. Terms Aggregation without sub-aggregations can now be computed by without sub-aggregations that can now be computed by directly utilizing pre-computed term frequencies instead to iterating through all documents delivering a remarkable 85-100x latency improvement on the big5 benchmark for keyword terms. This optimization applies to keyword type fields for a single term aggregation.

We are also looking in to categorizing core changes in OpenSearch roadmap for better visibility.

Btw, are you looking for any specific changes? Do you have any recommendation to improve the core roadmap? Your inputs is very much appreciated.

@sandervandegeijn
Copy link

sandervandegeijn commented Apr 6, 2024

Hi Bbarani,

It was more of a general observation. What I'm writing here is by no means an attack, I'm really happy with the opensearch project and all the effort that is being put in by everyone.

I can also appreciate the attention that has gone to the core as well. But the shift in development effort seems noticeable, for example when it comes to the security analytics module. Some improvements were made again, but the general usability is still pretty poor and the backlog is not managed really actively. I'm looking from the outside in of course, but the speed of development on the security analytics has diminished and it seems to be correlated to the shift to the AI-related functionality. I seem to notice more bugs in Dashboards or in related plugins and those backlogs aren't always that actively managed, bugs are creeping in and not always solved that fast. The bug in searchable snapshots (disk running out of space -> cluster crashes) also comes to mind.

I'm a bit worried that the project has so many plugins/functionality added in a short amount of time, all this stuff needs to be maintained as well along with everything that was already built. Adding features is relatively easy, but maintaining them and keeping everything coherent for the long haul is always the challenge, let alone phasing out functionality to free up time/capacity. The QA could be in the crumple zone and I'm afraid that I might be seeing the first signs of that. Shifting some more attention to QA and maintenance could be a good idea (not talking extremes here, maybe 5-15% and see how that goes).

The improvement for the project roadmap is also nice. I follow this regularly and I must say I can't always figure out what the focus of the next release will be or what the focus will be in the medium term. It's always somewhat of a surprise what in the release notes :)

In terms of functionality I'm thinking of the general QA (see above) and polishing what's already there, but might be a bit rough, a couple of points to give an idea:

  • Easy to understand overview of the cluster state (issue is there and worked on)
  • Fool proof disaster recovery (issue is there and worked on for the long term)
  • Better overview of errors for an sysadmin (instead of combing through logs)
  • Implementation of searchable snapshots in the ILM

In most cases the basic functionality is already there, but the user needs to script it, know how to use it, etc. What's already there should work like a charm and be easy to use imho.

Again, many thanks for everything that's being done. I'm contributing where I can (not being a particularly good java dev ;), I'm more in the filing issues and testing stuff department), so don't take this the wrong way!

@Pallavi-AWS
Copy link
Member

Thanks @sandervandegeijn, your feedback is valuable. It would be great to get community feedback on project roadmap. @bbarani is there any way we can enable comments/opening issues against OpenSearch roadmap?

On Security Analytics, @praveensameneni can you please scrub the backlog with help from Jimish (couldn't find his id) and update this thread? Thanks.

@gaiksaya gaiksaya removed the untriaged Issues that have not yet been triaged label Apr 8, 2024
@praveensameneni
Copy link
Member

@sandervandegeijn , Thank you for the feedback and opening issues/feature requests in Security Analytics repo. As you may have noticed, we have been actively working on fixing issues (bugs) and optimizing rule executions these past few months. We merged in over 55 PR's (including backports) in the last one month alone and have more to go. The response and interest from the community has been very encouraging, to say the least and we continue to make inroads to the list of new features and capabilities.

We would like to actively partner and solicit feedback from the community, and are starting office hours to answer questions, triage issues and discuss feature requests and enhancements.

Wednesdays (Bi-weekly): 09:00 AM PT / 17:00 UTC
Meeting id: https://chime.aws/6979836516

Hi Bbarani,

It was more of a general observation. What I'm writing here is by no means an attack, I'm really happy with the opensearch project and all the effort that is being put in by everyone.

I can also appreciate the attention that has gone to the core as well. But the shift in development effort seems noticeable, for example when it comes to the security analytics module. Some improvements were made again, but the general usability is still pretty poor and the backlog is not managed really actively. I'm looking from the outside in of course, but the speed of development on the security analytics has diminished and it seems to be correlated to the shift to the AI-related functionality. I seem to notice more bugs in Dashboards or in related plugins and those backlogs aren't always that actively managed, bugs are creeping in and not always solved that fast. The bug in searchable snapshots (disk running out of space -> cluster crashes) also comes to mind.

I'm a bit worried that the project has so many plugins/functionality added in a short amount of time, all this stuff needs to be maintained as well along with everything that was already built. Adding features is relatively easy, but maintaining them and keeping everything coherent for the long haul is always the challenge, let alone phasing out functionality to free up time/capacity. The QA could be in the crumple zone and I'm afraid that I might be seeing the first signs of that. Shifting some more attention to QA and maintenance could be a good idea (not talking extremes here, maybe 5-15% and see how that goes).

The improvement for the project roadmap is also nice. I follow this regularly and I must say I can't always figure out what the focus of the next release will be or what the focus will be in the medium term. It's always somewhat of a surprise what in the release notes :)

In terms of functionality I'm thinking of the general QA (see above) and polishing what's already there, but might be a bit rough, a couple of points to give an idea:

  • Easy to understand overview of the cluster state (issue is there and worked on)
  • Fool proof disaster recovery (issue is there and worked on for the long term)
  • Better overview of errors for an sysadmin (instead of combing through logs)
  • Implementation of searchable snapshots in the ILM

In most cases the basic functionality is already there, but the user needs to script it, know how to use it, etc. What's already there should work like a charm and be easy to use imho.

Again, many thanks for everything that's being done. I'm contributing where I can (not being a particularly good java dev ;), I'm more in the filing issues and testing stuff department), so don't take this the wrong way!

@sandervandegeijn
Copy link

Great to see there is progress, especially on the triage meetings, I attend the ones from the Security guys quite frequently and it gives both structure to the backlog management and input from the community. I'll try to attend (if I'm not in the car on holiday ;) )

@bbarani
Copy link
Member

bbarani commented Apr 10, 2024

Please join us on Monday, April 15th 2024 at 9AM PST for the retrospective meeting of 2.13.0 release to provide feedbacks, suggestions and improvements.

Meeting link: https://chime.aws/3770472381

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Apr 10, 2024

Just a note to be discussed during the retro: @DarshitChanpura opened pull requests for all repos with the updated certificates they needed at the end of February. Some repos still failed to merge these changes until the release candidates were generated and they ran into issues. How can we establish efficient patterns to prevent this sort of thing in the future?

Fortunately, DC was able to help everyone get the issues resolved and we did not need to delay but it is scary nonetheless. DC is a really active contributor and was able to resolve the issues during the release itself but what if the contributor was less involved than DC? I worry that delayed PR merges--especially for core and/or security focused components--could cause issues down the road.

@hdhalter
Copy link
Contributor

Documentation highs and lows

Highs:

  • There is a requirement for devs to create documentation PRs for their features, and kudos to dev for creating 92% (35 of 38) of the doc PRs for 2.13! A big improvement from 2.12, where they created 75%, and for 2.11, 13%. Two of the 2.13 PRs were created by non-AWS devs, which is always great to see.

Lows:

  • While the entrance criteria was met for the features we knew about, 10 more doc PRs were submitted after the first RC date of 3/19. With 1 tech writer OOO and 1 working on another release delivering at the same time (DP2.7), there was a heavy load on 1 author to do the majority of reviews within the 2-week window.
  • Manual tracking and status updates in the roadmap and unified scorecard led to incomplete information that resulted in last-minute, unplanned work.

@gaiksaya
Copy link
Member Author

Friendly reminder about the retrospective meeting tomorrow. Here is the link to the retrospective board: https://github.com/orgs/opensearch-project/projects/201

Thanks!

@gaiksaya
Copy link
Member Author

Below are some of the action items dicussed during 2.13.0 retrospective:

  1. Follow up (probably before release cycle starts) with the component teams to have a proper assignee for release issue in each repo. This saves a lot of time to chase all maintainers to get an update on failing tasks.
  2. Automate roadmap updates to avoid misses about upcoming features and related PRs, Documentation updates, etc
  3. Strict timeline for submission of Documentation PR. Submitting PRs after first RC puts a heavy load on 1 author to do the majority of reviews within the 2-week window.

Tagging next release manager for few of the upcoming releases to take this action items into consideration @zelinh @rishabh6788

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants