Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containerise small services #807

Open
grischard opened this issue Dec 9, 2022 · 22 comments
Open

Containerise small services #807

grischard opened this issue Dec 9, 2022 · 22 comments
Assignees
Labels

Comments

@grischard
Copy link
Collaborator

grischard commented Dec 9, 2022

There's a number of small services that currently live on a full server, and could live inside a container.

  • Identify small services that could be containerised
  • Identify server to host containers
  • Choose container system
  • Write container files
@Firefishy
Copy link
Member

I am going to test OSQA as a container first.

@Firefishy Firefishy self-assigned this Dec 9, 2022
@tomhughes
Copy link
Member

Yes sure why not pick literally the hardest possible thing to try first.

@Firefishy
Copy link
Member

Choose container system... examples could be:

  • k3s (unlikely, only recommended for 100% disposable. eg: CI)
  • kubernetes (no.)
  • nomad (tempting)
  • docker without any bells and whistles managed using chef with systemd
  • podman without any bells and whistles managed using chef with systemd
  • others?

@tomhughes
Copy link
Member

Well there's actually really two separate questions - what to use to build images and what to use to deploy them.

As I understand there are other less horrible languages that Dockerfile for describing images and those images can be deployed with any of the major systems just as podman for example can deploy an image built from a Dockerfile.

@tomhughes
Copy link
Member

Some articles on container systems and orchestration systems.

I think https://buildah.io/ is what I was thinking of as the main alternative to Dockerfile - it can actually use Dockerfiles but the native way is just based around writing a script that uses buildah commands to manipulate the image.

@Firefishy
Copy link
Member

Dockerfiles are the beast I know and by a million miles has more adoption than the alternatives.

There are alternatives for building OCI compatible container images with varying levels completeness. Other than https://buildah.io/ there is also https://github.com/genuinetools/img (uses Dockerfile) and https://github.com/GoogleContainerTools/jib (java images)

@grischard
Copy link
Collaborator Author

The stuff that runs on naga (redirectors, blog aggregator, munin server that we're going to demise (#501)) might be good.

@grischard
Copy link
Collaborator Author

grischard commented Dec 15, 2022

https://hardware.osm.org which runs on idris even already has a dockerfile in https://github.com/osmfoundation/osmf-server-info

@pnorman pnorman pinned this issue Dec 16, 2022
@Firefishy
Copy link
Member

Firefishy commented Feb 12, 2023

@jpds
Copy link

jpds commented Apr 12, 2023

You probably want to deploy https://github.com/google/cadvisor on every Docker host and point Prometheus at that,

@tomhughes
Copy link
Member

Well that rather depends how it works - if it depends on the docker daemon then it probably won't work for us.

@tomhughes
Copy link
Member

A quick test on naga suggests it doesn't manage to collect anything useful in our setup - all it finds is some basic host hardware metrics:

# HELP machine_cpu_cores Number of logical CPU cores.
# TYPE machine_cpu_cores gauge
machine_cpu_cores{boot_id="d474a5a2-46e3-45bb-bd70-8babbdf0fd25",machine_id="0787fff1c8bb4e53a49da871f67e3246",system_uuid="32353537-3835-584d-5136-303130324447"} 64 1681303543392
# HELP machine_cpu_physical_cores Number of physical CPU cores.
# TYPE machine_cpu_physical_cores gauge
machine_cpu_physical_cores{boot_id="d474a5a2-46e3-45bb-bd70-8babbdf0fd25",machine_id="0787fff1c8bb4e53a49da871f67e3246",system_uuid="32353537-3835-584d-5136-303130324447"} 16 1681303543392
# HELP machine_cpu_sockets Number of CPU sockets.
# TYPE machine_cpu_sockets gauge
machine_cpu_sockets{boot_id="d474a5a2-46e3-45bb-bd70-8babbdf0fd25",machine_id="0787fff1c8bb4e53a49da871f67e3246",system_uuid="32353537-3835-584d-5136-303130324447"} 2 1681303543392
# HELP machine_dimm_capacity_bytes Total RAM DIMM capacity (all types memory modules) value labeled by dimm type.
# TYPE machine_dimm_capacity_bytes gauge
machine_dimm_capacity_bytes{boot_id="d474a5a2-46e3-45bb-bd70-8babbdf0fd25",machine_id="0787fff1c8bb4e53a49da871f67e3246",system_uuid="32353537-3835-584d-5136-303130324447",type="Registered-DDR4"} 2.06158430208e+11 1681303543392
# HELP machine_dimm_count Number of RAM DIMM (all types memory modules) value labeled by dimm type.
# TYPE machine_dimm_count gauge
machine_dimm_count{boot_id="d474a5a2-46e3-45bb-bd70-8babbdf0fd25",machine_id="0787fff1c8bb4e53a49da871f67e3246",system_uuid="32353537-3835-584d-5136-303130324447",type="Registered-DDR4"} 12 1681303543392
# HELP machine_memory_bytes Amount of memory installed on the machine.
# TYPE machine_memory_bytes gauge
machine_memory_bytes{boot_id="d474a5a2-46e3-45bb-bd70-8babbdf0fd25",machine_id="0787fff1c8bb4e53a49da871f67e3246",system_uuid="32353537-3835-584d-5136-303130324447"} 2.02673983488e+11 1681303543392
# HELP machine_nvm_avg_power_budget_watts NVM power budget.
# TYPE machine_nvm_avg_power_budget_watts gauge
machine_nvm_avg_power_budget_watts{boot_id="d474a5a2-46e3-45bb-bd70-8babbdf0fd25",machine_id="0787fff1c8bb4e53a49da871f67e3246",system_uuid="32353537-3835-584d-5136-303130324447"} 0 1681303543392
# HELP machine_nvm_capacity NVM capacity value labeled by NVM mode (memory mode or app direct mode).
# TYPE machine_nvm_capacity gauge
machine_nvm_capacity{boot_id="d474a5a2-46e3-45bb-bd70-8babbdf0fd25",machine_id="0787fff1c8bb4e53a49da871f67e3246",mode="app_direct_mode",system_uuid="32353537-3835-584d-5136-303130324447"} 0 1681303543392
machine_nvm_capacity{boot_id="d474a5a2-46e3-45bb-bd70-8babbdf0fd25",machine_id="0787fff1c8bb4e53a49da871f67e3246",mode="memory_mode",system_uuid="32353537-3835-584d-5136-303130324447"} 0 1681303543392
# HELP machine_scrape_error 1 if there was an error while getting machine metrics, 0 otherwise.
# TYPE machine_scrape_error gauge
machine_scrape_error 0

@tomhughes
Copy link
Member

Also generating metrics with timestamps (rather than letting the server add them) is generally considered bad form and can cause problems with metric ingestion.

@jpds
Copy link

jpds commented Apr 12, 2023

generating metrics with timestamps (rather than letting the server add them) is generally considered bad form

Do you have a source for this? Because just searching for "timestamp" on my own Grafana brings up:

  • grafana_build_timestamp
  • node_boot_time_seconds
  • prometheus_config_last_reload_success_timestamp_seconds
  • prometheus_tsdb_lowest_timestamp_seconds
  • thanos_bucket_store_blocks_last_loaded_timestamp_seconds

@tomhughes
Copy link
Member

I mean that fact that each of those metrics has a timestamp like 1681303543392 after the metric value at the end of the line - when that is there the server will use that as the timestamp to associate with the value instead of using it's own clock to get a scrape time to associate with the value.

See https://promlabs.com/blog/2022/12/15/understanding-duplicate-samples-and-out-of-order-timestamp-errors-in-prometheus/#buggy-client-side-timestamps for some discussion of the potential issues with it.

@jpds
Copy link

jpds commented Apr 12, 2023

Ah yes - that's already discussed on cAdvisor issue #2526.

However, if you don't see one of these https://github.com/google/cadvisor/blob/master/metrics/prometheus.go#L138 metrics here, then something is misconfigured.

@tomhughes
Copy link
Member

Well I didn't do any configuration. I just ran the executable as there didn't seem to be any clear instructions telling me to do anything else.

As I say because we are using podman if it relies on talking to dockerd to get statistics then it's probably not going to work.

@Firefishy
Copy link
Member

dmca.osm.org now a container: openstreetmap/chef@4ac7cf5

@grischard grischard added the epic label Nov 13, 2023
@grischard
Copy link
Collaborator Author

grischard commented Nov 15, 2023

Left to containerise on Ridley: tracked in #1028

@Firefishy
Copy link
Member

https://github.com/openstreetmap/birthday20-website/ is a wordpress to static site I generated using wp2static.

wp2static did a good job. Some cleanup was required, but it does a reasonable job.

Linked issue: #1125

@Firefishy
Copy link
Member

SoTM 2007, 2008 and 2009 are now containers.

@Firefishy
Copy link
Member

I need to work out what the list to completion is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants