Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Design Proposal] Enhanced quality, CI and version update mechanism around Core Plans used to build Habitat ecosystem #3450

Closed
davymcaleer opened this issue Aug 25, 2020 · 8 comments

Comments

@davymcaleer
Copy link

Title

Enhanced quality, CI and version update mechanism around core plans used to build Habitat ecosystem

Motivation

As a project owner and core plan maintainer,

I want to uplift the quality of all the core plans needed to build and maintain the wider ecosystem, so that downstream users gain quality improvements, improved version update cycles and increased assurance in the underlying ecosystem for Habitat.

Specification

There are 4 aspects to this design change, which together aim to get us to a more automated and higher level of quality across the ecosystem:

  1. Addition of more complete testing and documentation around a large set of plans used to build and maintain the ecosystem
  2. Implementation of new CI pipeline against each of these plans
  3. Moved out to a new Github org and a multi-repo structure
  4. Introduction of new tooling to initially monitor the health of each plan, with a goal to automate version updates

These 4 aspects are being applied initially to the set of Core Plans that underpin the ecosystem, these currently are the defined Base Plans along with an additional set of plans that are needed when building out the entirety of the toolset around Habitat and Builder. In total we have 120 plans in this initial set, although we are now adding an additional ~62 transitive dependencies to get to ~182 plans in total in the initial phase.

The first aspect being applied to this group of plans is the addition of more complete tests around every plan, these have been implemented already and are more comprehensive in terms of coverage than anything in the existing repo. As this was done we have also looked to ensure every plan has complete documentation against it to make it much more consumable and understandable for new users to the project.

Along with an improvement in quality the Chef folks have been working on a much more automated way to incrementally update the versions of base plans in a much more maintainable and supportable fashion, this helps with both day-to-day support and also allows us to move towards more frequent refresh cycles. The desire is to automate much more of the daily maintenance and give everyone involved more time to look at further improvements to the wider ecosystem, as opposed to performing repetitive tasks that should be handled by automation.

The initial piece of uplift around these underlying base plans and additional ecosystem building plans was done in a new location to add a refreshed build system and allow for much quicker ways to build, test and add comprehensive improvements - https://github.com/chef-base-plans (name of org to be finalised) - the addition of individual build pipelines, along with more comprehensive testing across all the plans was done using Azure DevOps Pipelines, which are open for the entire community to see current status, - https://dev.azure.com/chefcorp-partnerengineering/Chef%20Base%20Plans/_dashboards/dashboard/18b9cff5-e4ec-4a90-81d4-5c64848b3760.

Screenshot 2020-08-25 at 8 35 56 PM

The updated structure within this new Github org allowed us to move more rapidly and get to a uniform state across the initial batch of plans in a short space of time. This remains entirely Open Source and open to further contributions.

Once the additional automation is fully applied to these plans in the new location, we will have a daily check and merge on version updates for each of these, as well as much improved linting and issue creation for any problems found with any of the plans. At present we have the initial part of this automated tooling working from a visibility standpoint only, this gives us a clearer view into the state of this subset of plans. An example of the current state of the dashboard is below:

Screenshot 2020-08-25 at 8 35 01 PM

As we progress further we’d like to provide a read-only version of this to everyone, there is some more refinement to be done ahead of that stage and the focus is on getting more complete automation in place at this stage.

This is the initial phase of an overall set of improvements with a view to getting to much more frequent refreshes by ensuring we always have an up-to-date set of plans ready for a full rebuild. The intention is that we are also making efficiency savings to offer better support to the full range of core plans and provide a more stringent SLA to all plans in the new location. We also intend to adopt more plans into this location over time and ensure we continue to improve the underlying ecosystem.

Downstream Impact

The most obvious impact is to the daily admin and maintenance of the wider set of core plans - since we now have 2 locations to work from - Chef staff are looking to take on the burden of this, however anyone wanting to raise issues against the moved set of plans will obviously need to do this in the new org. Over time we’ll look at making all of this a cleaner experience and continue to improve the organisation and quality of all plans, documentation and tests.

A PR will be raised as a result of this design proposal to remove all of the core plans currently in the new org, further PRs will be raised as needed for any future transitions.

@MindNumbing
Copy link
Contributor

I think this kind of change is very positive, obviously I have been working to make it a reality but the workflow for updates using the above has been nicer. With the integration of botanist and higher level inspec testing it should help point out failures before they become a problem, and has done so already for several issues. (#3395 & #3448) so 👍 from me.

@jsirex
Copy link
Contributor

jsirex commented Aug 26, 2020

Good news. The same approach is used by the Debian Team (salsa.debian.org). I also split habitat plan repositories.
However, there are number of questions you must address too:

  1. "Where to cut". For example, there are postgresql and postgresql-client packages. Do some repositories should contain multiple tightly coupled packages?
  2. Package generation: postgresql has 9.6, 10, 11, 12 versions. Should it be separate repository or just branching strategy?
  3. bin-package pattern: in case I change only service part (hook, config), whole package have rebuilt. Huge waste of resources. Split package on binary and service. postgresql-bin / postgresql-server. Does it worth it?

Want to hear your opinion.

@davymcaleer
Copy link
Author

@jsirex thanks for the feedback on this one, much appreciated! Responses below:

  1. Our first pass has uncoupled any packages such that they can build in isolation. Further changes would be dealt with on a case by case basis. In this initial phase we are bringing across anything needed to build the ecosystem as well as any closely related packages to build other Chef tools used around that ecosystem as well.
  2. As per the current model, if we pull over a package then we’ll pull over every version and have a different repo for each version
  3. Out of scope for this current design proposal for the first phase - be good to discuss this in the future though

@stevendanna
Copy link
Contributor

I'm glad to see work being put into the long term maintance of the Habitat core-plans! Thank you for putting this proposal together.

My own maintainership activities have tappered off, so my comments here are intended mostly just as items for your consideration and not blockers.

  1. Addition of more complete testing and documentation around a large set of plans used to build and maintain the ecosystem

Great!

  1. Implementation of new CI pipeline against each of these plans

I'm very much in favor of more CI. I would love to see a bit more detail either here or in some developer documentation about the new CI system. Currently, I can see the the Azure pipelines reference a repository that I (and I assume others) don't have access to:

https://github.com/chef-base-plans/musl/blob/master/azure-pipelines.yml#L14

While I think it is fine for the CI pipelines to depend on some private resources (for example, Expeditor isn't open source); I think the tasks and build instructions should be visibile to maintainers who might need to locally reproduce a build failure.

  1. Moved out to a new Github org and a multi-repo structure

I'm a bit apprehensive about this change since I think it will make some types of maintenance activities harder and/or dependent on GitHub. That said, I am willing to go with it if it means we get more automation around version bumping and other maintenace tasks. It isn't clear to me from the proposal why we think this was a necessary change to get the results we want, so it might be good to include some detail there for posterity.

More importantly, we should clarify what the membership and access policy on the new GitHub org will look like. Currently, at least two of our top 10 contributors are from outside of Chef:

> git shortlog -sne | head -11
  1036	Graham Weldon <[email protected]>
   614	Fletcher Nichol <[email protected]>
   406	Scott Macfarlane <[email protected]>
   355	Scott Macfarlane <[email protected]>
   345	Jamie Winsor <[email protected]>
   236	Nell Shamrell-Harrington <[email protected]>
   177	Steven Danna <[email protected]>
   176	Romain Sertelon <[email protected]>
   148	echohack <[email protected]>
   147	Gavin Didrichsen <[email protected]>
   144	Ian Henry <[email protected]>

So I think it is important to be very clear about how we expect the contribution process to work in this new structure.

Personally, if I need to fork hundreds of repositories to contribute regularly to core-plans, it will be a little annoying.

  1. Introduction of new tooling to initially monitor the health
    of each plan, with a goal to automate version updates

Great! From your description it appears this tooling focuses on version bumping and linting at the moment, which are definitely two top priorities.

Will this tooling be open for contribution? For example, I would like to develop some tooling to check for common linking errors in plans. Is that a feature that we would be able to contribute to this tooling?

It might be nice to get a bit more detail on this tooling, and how maintainers can interact with it.

For example, one of the nice things about @predominant's groundskeeper application is that it is very easy to run on a
local fork and carve up the data using standard command line tools.

These 4 aspects are being applied initially to the set of Core Plans that underpin the ecosystem, these currently are the defined Base Plans along with an additional set of plans that are needed when building out the entirety of the toolset around Habitat and Builder. In total we have 120 plans in this initial set, although we are now adding an additional ~62 transitive dependencies to get to ~182 plans in total in the initial phase.

What is the desired timeline for rolling this out to more plans?

As a follow-up, I think we should update the base-plans policy document to make sure we are all on the same page about this new expanded set, what restrictions (if any) do they have on being rapidly updated, and what terminology we are using.

Specifically, while I bet it won't be an issue, some plans such as core/curl, core/dex, core/openssl often need to be updated
with same-day urgency in response to some security advisor or an internal need. Clearly the changes here are aimed at making such rapid response more likely, but I think some clear policy might help during the transition period.

GitHub ate some of my more extensive comments, so that'll do for now.

@davymcaleer
Copy link
Author

Thanks for the feedback @stevendanna - much appreciated. With respect to your comments:

@stevendanna > While I think it is fine for the CI pipelines to depend on some private resources (for example, Expeditor isn't open source); I think the tasks and build instructions should be visible to maintainers who might need to locally reproduce a build failure.

@davymcaleer > We’ll move some of the docs we have in that private repo out into a public one to explain the inner workings and also give better ability to reproduce build failures locally. We’re also going to use that repo to allow for raising of more general issues and suggestions around the structure as a whole and flagging the need for any new plans that don’t yet exist.

@stevendanna > It isn't clear to me from the proposal why we think this was a necessary change to get the results we want, so it might be good to include some detail there for posterity.

@davymcaleer > The switch to repo per plan went hand in hand with using the Azure DevOps pipelines and associated tooling in a very effective and fast way to build out a repeatable pattern that resulted in improved quality CI. We also built in templating to the Github org along with this - the initial ~120 plans where transitioned and completed in 4 or 5 weeks using the pattern and the resultant speed with which we’ve been able to layer on everything has fit very well with the repo per plan. When we started experimenting it was pretty obvious it was going to get very noisy if we tried to do this in the same Github org as the rest of the Habitat work and the existing core-plans - hence the new org.

Your queries about contributing to the the new org versus how things are currently contributed to - nothing changes, this is all still Open Source and the contribution guidelines as per https://github.com/chef/chef-oss-practices/ still stand. We’ll be updating those also to ensure we add the new location.

With respect to the updated automation tooling for versioning and linting - we’re not planning on Open Sourcing this just yet, so contributions would be limited to issues being raised for improvements in the overall types of problems that may be missed at various times. We will be looking to make the dashboard for the automation tooling accessible to the community in the near future in a read only format - this should give a current view of the state of the building blocks of the ecosystem. I’m also looking forward to showing more of the backend tooling and how it was built and deployed as Kubernetes microservices in some future presentations - it would make a good discussion at ChefConf or even a Habitat Community webinar in the near future.

@stevendanna > What is the desired timeline for rolling this out to more plans?
@davymcaleer > The addition ~62 plans are in flight at present and being added to the new structure - obviously getting this design proposal closed out is key to ensuring all of that can move ahead smoothly. We are also aiming to get another refresh done by end of September in parallel with all of this work and getting ever more frequent with those. I’d suggest we then look to the community for a further nomination of say ~30 core plans that are key to the wider ecosystem and between the community and ourselves we work on adopting those in a similar fashion - say by end of Q4 this year.

@stevendanna > As a follow-up, I think we should update the base-plans policy document to make sure we are all on the same page about this new expanded set, what restrictions (if any) do they have on being rapidly updated, and what terminology we are using.
@davymcaleer > The intent is that updating of Base Plan policy document and list of base plans will be performed as part of the PR following on from this design proposal - ensuring links to all the new repos are provided to make it as straightforward as possible to navigate to any of the base plans that no longer sit in the habitat-sh Github organization
Also concerns about any critical plans are warranted and yes we are aiming to be able to perform full refreshes very frequently as long as the rest of the ecosystem can handle it. Reactive updates to any particular plan will be handled in the same way they are now, however all of this is aimed at earlier warning and less stale plans and less backlog all round as well as pushing less of the version maintenance onto the community.

@predominant
Copy link
Collaborator

This is going to be very annoying as a heavy core plans contributor, to close hundreds of repos. I'd like to propose we keep it as a single repository.

I'd also like massive changes like this discussed in public in the early stages, rather than being done behind closed doors.

@davymcaleer
Copy link
Author

Thanks for the feedback @predominant

The overall benefits of removing the need to maintain ongoing version increments are designed to remove the burden of doing this manually by members of the community, allowing the community to provide value in a more additive way to the ecosystem. All of this also ties into the efforts internally to perform more frequent core plan refreshes to give a more up-to-date set of core plan packages for both the community and our customer base, reducing the amount of CVEs that both are exposed to at any point in time.
In terms of the repo structure and movement - we're aiming to leave the current core-plans repo structure intact, but remove the folders for the plans that move to the new org. We'll also update the docs to make it straightforward to navigate to the new location with all the new tests, CI pipelines and documentation in place, plus we're going to update how all of the new structure works in the new org as part of all of this, as previously noted.
The 2 engineers that have been working full-time with the current 130 repos in the new structure for several months have found no additional burden due to the break out into individual repos versus the single repo in the habitat-sh org - they found the new structure much more straightforward to work with and version increments are much more thoroughly tested and with reduced effort per plan.
To your point regarding discussing changes such as this in public, once we agreed this structure internally, between the content team, the Habitat team and Product management, we are then free to discuss it with the wider community - which is the purpose of this Design Proposal as per the Chef OSS Practices - https://github.com/chef/chef-oss-practices/blob/master/contributors/guide/design-proposals.md

@adamhjk
Copy link
Contributor

adamhjk commented Sep 3, 2020

I am not doing the work, and I'm not involved at all day to day. I'll drop a few things that might be worth considering about why things were set up the way they were.

  1. When repos were separated out, the burden of maintainership across the community, in particular for abandoned repositories, is brutal. While it may be possible to pay someone (ie: chef employees) to do it, asking a community maintainer to understand and sign up for that many separate locations is a lot. Having hundreds or thousands of disparate repositories is a nightmare to manage for human beings, whatever upside you might get for the robots.

  2. Builder was designed to provide exactly this as a service - the job was to be attached directly to the repositories, and to trigger both the build and test environment directly. There shouldn't really be a need for a CI pipeline, in that builder itself should be able to both spin up the environment, build the software, and test it in place. I get that may not be how it evolved, but it seems a shame to not extend the service to a task so tightly coupled with the applications long term health.

  3. I think your response on @predominant asking for more discussion is... not my favorite response. Essentially you're saying: "we all decided it was a good idea, and then we told you" - which might be a policy someplace, but it's not a very good one in reality. I would urge you to reconsider both as a matter of policy and a matter of community building.

Best,
Adam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants