Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues mirroring Oracle Linux #119

Open
OlsenOliver opened this issue Oct 25, 2023 · 25 comments · Fixed by #124
Open

Issues mirroring Oracle Linux #119

OlsenOliver opened this issue Oct 25, 2023 · 25 comments · Fixed by #124
Labels
bug Something isn't working

Comments

@OlsenOliver
Copy link

I'm trying to mirror https://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64, but the packages are never downloaded (only primary.xml, primary.xml.gz and repomd.xml). From the logs I can see
- Retrieving packages list from /home/repo/download-mirror-ol7-1698133890/primary.xml.gz but it ends there.
Debian 11/12 and Epel works fine without any problems. I've tried rewriting the URL like https://yum.oracle.com/repo/OracleLinux/OL$releasever/latest/$basearch but still no more than the three mentioned files. Any ideas here? Version used is 3.7.3

@lbr38
Copy link
Owner

lbr38 commented Oct 25, 2023

Hi

Just tried on my side and it's working fine. Maybe the oracle repository was temporarily inaccessible? Have you tried later?

Here are the steps I did from a fresh 3.7.3 install:

image

  • Created a new mirror from the 'oracle' source repository:

image

  • Packages are correctly synced:

image

@OlsenOliver
Copy link
Author

Thanks for the rapid feedback. I suspect that you are right about the repo must have been temporarily down as it works like a charm now :-)

@OlsenOliver
Copy link
Author

Yet it failed after downloading approx 8gb twice this afternoon (both times on firefox...), so looks like Oracle has some issues.

@lbr38
Copy link
Owner

lbr38 commented Oct 26, 2023

Okay, does it fail with an error message?

I will try on my side and see if it fails before the end too. I'll let you know ;)

@OlsenOliver
Copy link
Author

Seems to be a timeout, not an error with Repomanager. Will give it another shot soon.

(4243/26267)  ➙ getPackageSource/firefox-102.13.0-2.0.1.el7_9.src.rpm ... <span class="redtext">Curl error: Operation timed out after 300001 milliseconds with 438304768 out of 623893972 bytes received</span>
</pre></div><div class="op-step-title-error">Download error</div><div class="rrhsexlatpagrwcoemefbobg-time op-step-time"></div></div><style>.rrhsexlatpagrwcoemefbobg-loading-25596 { display: none; }.rrhsexlatpagrwcoemefbobg-maindiv-25596 { background-color: #ff0044; }.rrhsexlatpagrwcoemefbobg-time:before { content: "15m26s" }</style><div class="vtmtnjrmfpkrrtgqeyuucfya-maindiv-25596 op-step-div"><div class="op-step-title"><span>TOTAL DURATION</span></div><div class="op-step-duration">15m26s</div></div><style>.vtmtnjrmfpkrrtgqeyuucfya-loading-25596 { display: none; }.vtmtnjrmfpkrrtgqeyuucfya-maindiv-25596 { background-color: #182b3e; }</style>edb99003@Docker01:/opt/docker/repomanager/data/logs/main$

@lbr38
Copy link
Owner

lbr38 commented Oct 26, 2023

Wow that's a big one (+600MB package)

The timeout is internal to the code, I set it at 300000 milliseconds (= 5min). Which means that 5min is not enough to download this package... Probably the bandwidth of the Oracle repo is limited or saturated.

I will see about making the timeout parameter customizable in a future release. In the meantime I can tell you how to change it in the code.

@OlsenOliver
Copy link
Author

Seems to be throtteling going on. The first 8gb of todays test took 3 minutes, then firefox-102.13.0-2.0.1.el7_9.src.rpm took 4 minutes until a timeout. Happened twice yesterdan and just now. Maybe I should try limiting my downlad bandwidth and see if it makes any difference.

(4245/26267)  ➙ getPackageSource/firefox-102.14.0-1.0.1.el7_9.src.rpm ... <span class="redtext">Curl error: Operation timed out after 300000 milliseconds with 350861776 out of 623832219 bytes received</span>
</pre></div><div class="op-step-title-error">Download error</div><div class="xdghjwjwaghqklumvsmenreg-time op-step-time"></div></div><style>.xdghjwjwaghqklumvsmenreg-loading-17302 { display: none; }.xdghjwjwaghqklumvsmenreg-maindiv-17302 { background-color: #ff0044; }.xdghjwjwaghqklumvsmenreg-time:before { content: "10m7s" }</style><div class="djweksryjbbikulhlangoait-maindiv-17302 op-step-div"><div class="op-step-title"><span>TOTAL DURATION</span></div><div class="op-step-duration">10m7s</div></div><style>.djweksryjbbikulhlangoait-loading-17302 { display: none; }.djweksryjbbikulhlangoait-maindiv-17302 { background-color: #182b3e; }</style>

@lbr38
Copy link
Owner

lbr38 commented Oct 26, 2023

Okay, I you wish to increase the internal timeout, here is how to do it:

Enter the docker container:

docker exec -it repomanager /bin/bash

Edit the Mirror.php file:

vim /var/www/repomanager/controllers/Repo/Mirror/Mirror.php

Go to the line 126 and change the curl timeout value from 300 (5 minutes) to 600 (10 minutes) or more if needed:

curl_setopt($this->curlHandle, CURLOPT_TIMEOUT, 300); // set timeout

Save the file and exit the container (Ctrl + D);

You can then retry your mirror task.

@OlsenOliver
Copy link
Author

Thanks - done that now, and also added a 200mbit cap on my vnic for the sake of it. A bit strange that the timeout was consistent after downloading 8gb, so I try being a bit more "bandwidth friendly" and see if it makes any difference. We do have a 10gbit link (University) so unless being actively throtteled I should not need more than 5 minutes to download 600mb :-)

@OlsenOliver
Copy link
Author

Timestamps seem to indicate some active throttling, so increased timeout + cap on nic hopefully will le me complete

-rw-r--r-- 1 www-data www-data 623791567 Oct 26 08:45 firefox-102.10.0-1.0.1.el7_9.src.rpm
-rw-r--r-- 1 www-data www-data 115019748 Oct 26 08:45 firefox-102.10.0-1.0.1.el7_9.x86_64.rpm
-rw-r--r-- 1 www-data www-data 623872704 Oct 26 08:46 firefox-102.11.0-2.0.1.el7_9.src.rpm
-rw-r--r-- 1 www-data www-data 115026904 Oct 26 08:46 firefox-102.11.0-2.0.1.el7_9.x86_64.rpm
-rw-r--r-- 1 www-data www-data 623812956 Oct 26 08:47 firefox-102.12.0-1.0.1.el7_9.src.rpm
-rw-r--r-- 1 www-data www-data 115041844 Oct 26 08:47 firefox-102.12.0-1.0.1.el7_9.x86_64.rpm
-rw-r--r-- 1 www-data www-data 623893972 Oct 26 08:48 firefox-102.13.0-2.0.1.el7_9.src.rpm
-rw-r--r-- 1 www-data www-data 115061876 Oct 26 08:48 firefox-102.13.0-2.0.1.el7_9.x86_64.rpm
-rw-r--r-- 1 www-data www-data 623832219 Oct 26 08:52 firefox-102.14.0-1.0.1.el7_9.src.rpm
-rw-r--r-- 1 www-data www-data 115071576 Oct 26 08:53 firefox-102.14.0-1.0.1.el7_9.x86_64.rpm
-rw-r--r-- 1 www-data www-data 623967941 Oct 26 09:01 firefox-102.15.0-1.0.1.el7_9.src.rpm
-rw-r--r-- 1 www-data www-data 115086712 Oct 26 09:02 firefox-102.15.0-1.0.1.el7_9.x86_64.rpm
-rw-r--r-- 1 www-data www-data 623828057 Oct 26 09:10 firefox-102.15.1-1.0.1.el7_9.src.rpm
-rw-r--r-- 1 www-data www-data 115090880 Oct 26 09:11 firefox-102.15.1-1.0.1.el7_9.x86_64.rpm
-rw-r--r-- 1 www-data www-data 601878528 Oct 26 09:18 firefox-102.3.0-6.0.1.el7_9.src.rpm

@lbr38
Copy link
Owner

lbr38 commented Oct 26, 2023

Indeed I'm trying on my side and I'm facing the same issue on the ~same package (firefox-xx). The package is downloading very slowly (I increased the timeout to 15min on my side).

The content of this oracle repo seems similar to the base repo of CentOS7, except that the latter does not include source packages (.src.rpm). Depending on your needs, maybe you can think about mirroring CentOS7 base repo instead? (http://mirror.centos.org/centos-7/7/os/$basearch/)

@OlsenOliver
Copy link
Author

Being based on same REL src's I guess you are right, but for large Oracle databases and other critical applications we run I think I'll stick with the painfully slow oracle repo. Having firefox from version 52 to 115 incl src.rpm's doesn't really help either ;-)
3hrs 15 min and only 22gb, wereas the first 8gb were done in 3 mins

@lbr38
Copy link
Owner

lbr38 commented Oct 26, 2023

I may have some kind of fix for you:

In the process of mirroring I added a condition to ignore packages that are not matching the desired architecture (in your case x86_64). In other words, the fix prevents .src.rpm packages from being synchronized.

This halves the number of packages to synchronize and the size of the final repo.

I'm testing the fix and it seems to work for the moment, but firefox-xx packages still take a long time to copy (even it's only .rpm packages)... :/

In the case you want to test the fix, you will have to add some code below.

Enter the docker container:

docker exec -it repomanager /bin/bash

Edit the Rpm.php file:

vim /var/www/repomanager/controllers/Repo/Mirror/Rpm.php

Go to the line 200 and add the following code after the foreach instruction:

/**
 *  If package does not match the current mirroring arch then skip it
 */
if ($data['arch'] != $this->currentArch) {
    continue;
}

Screenshot:
image

Save the file and exit the container (Ctrl + D);

This fix should be part of a next release.

@OlsenOliver
Copy link
Author

Superb work :-) Spinning up a new container with sufficient space in my homelab to give it a shot in a few minutes.
Ideally this would be an option when adding a mirror where you can tick off "exclude src.rpm's when mirroring this repo". Given Firefox is 600-700mb pr release for src.rpm and they have a few dozens of them in the OL7 repo (I have 129 packages with Firefox in rpm and src.rpm now, and still not done downloading all releases...) this will save a considerable amount of space and bandwidth.

@lbr38
Copy link
Owner

lbr38 commented Oct 26, 2023

On my side it took ~1hour and 15min to sync the oracle repo with the patch and no bandwidth limitation:

image

Total repo size is 29GB.

Ideally this would be an option when adding a mirror where you can tick off "exclude src.rpm's when mirroring this repo"

The fact is there is actually an option to choose to sync source packages or not :-) But oracle seems to have decided to mix the source packages with the x86_64 packages and reference them in the same repodata/primary.xml file rather than differentiating them into separate directories, making the option useless in this case.
I don't know if this is a good or bad practice but it's the first time I've seen a repo like that (I was used to CentOS repos). So I will have to rethink some of the mirroring process.

@OlsenOliver
Copy link
Author

You're right - missed that one. Anyhow, Oracle is Oracle, so I guess there will always be some tinkering needed to get it all right. I haven't downloaded and recompiled an src.rpm for more than a decade, so I guess I can live without these. 29GB is quite an improvement compared to whatever my download will end up with, so nice job..!! ;-)

We mainly use Debian for other Linux tasks (which I had no issues with adding repos for), but we intend to block outbound traffic to internet for backend servers (typically running OL with some Oracle software), so getting a local Oracle Linux repo is quite nice to still being able to keep up :-)

Thanks a lot -really appreciate the effort..!!

@OlsenOliver
Copy link
Author

I didn't get to test the revised code yesterday, but looking at it now I assume that it will only download the x86_64.rpm packages? I noticed that in my (still ongoing!) download I have close to 3400 noarch.rpm files as well, so if my assumption is right I will have a few missing packages in my repo. I have no clue about PHP and how to rewrite your code, but the "logic" to me would be to have a function that does "download everything except the src.rpm files".

@lbr38
Copy link
Owner

lbr38 commented Oct 27, 2023

I'm currently working on a fix to make sure the mirroring process is downloading only what you have specified.

In others words if you want to download x86_64 and noarch packages you will have to select those archs when creating a new mirror. Source packages will only be downloaded if the src arch is selected as well.

I think this should meet your need.

Hoping to have tested and published the fix for the next week!

@lbr38 lbr38 mentioned this issue Oct 30, 2023
@Starbix
Copy link
Contributor

Starbix commented Jul 2, 2024

I'm experiencing the same issues again with v4.2.1. I'm trying to mirror https://yum.oracle.com/repo/OracleLinux/OL8/baseos/latest/x86_64/ and I have a source repo setup with it.
However when creating a repository, I get this output and then nothing happens.

Packages will be retrieved from following URLs:
 • https://yum.oracle.com/repo/OracleLinux/OL8/baseos/latest/x86_64

Getting repomd.xml from https://yum.oracle.com/repo/OracleLinux/OL8/baseos/latest/x86_64/repodata/repomd.xml ... OK

Getting primary.xml.gz from https://yum.oracle.com/repo/OracleLinux/OL8/baseos/latest/x86_64/repodata/83776d0e840a292528e430b887eb3131ef26d711e65e4bc070c7326f3cb3ec33-primary.xml.gz ... OK

Retrieving packages list from /home/repo/download-mirror-oracle8-baseos-int-1719917965/primary.xml.gz ...

Do you have any idea what could cause this? The oracle repo doesn't seem to be down.
EDIT: seems like it's stuck here: $xml = new SimpleXMLElement(file_get_contents($primaryFile), LIBXML_PARSEHUGE);
the primary.xml is possibly just too large, Appstream works fine

@lbr38
Copy link
Owner

lbr38 commented Jul 2, 2024

Hello

Might be due to a php memory limit reach. I will test on my side and let you know.

@lbr38 lbr38 reopened this Jul 2, 2024
@lbr38 lbr38 added bug Something isn't working labels Jul 2, 2024
@Starbix
Copy link
Contributor

Starbix commented Jul 2, 2024

Indeed, I set the memory limit to 2G and it seemed to have been able to parse the XML. (it threw a cURL error afterwards, the provided GPG key doesn't seem to work)
It'd be great if it were possible to change the memory limit through an environmental variable.

@lbr38
Copy link
Owner

lbr38 commented Jul 2, 2024

Alright, could you just tell me what parameters you set for the mirroring? Because on my side it worked (failed on GPG missing key but primary.xml has been parsed without problem).

image

image

In a future release I will see how to catch those memory limit error and display some message. I will also see how to make the limit customizable in the settings tab.

@Starbix
Copy link
Contributor

Starbix commented Jul 2, 2024

While trying to reproduce I saw that php got OOM Killed. This was most likely the cause. I'm running repomanager on kubernetes, so I think I'll just need to give some nodes more memory and give the pod a reasonably amount of max memory. So I think this issue can be closed again.
The memory consumption suddenly jumps up to 4GiB+ when syncing. (repomanager.task-run, not php-fpm itself)
Do you know what a reasonable memory limit for repomanager is, or how it scales with lots of repositories?

Anyways thanks for the software and quick support!

@lbr38
Copy link
Owner

lbr38 commented Jul 3, 2024

Hard to tell, it depends on repository and how much data are stored in primary.xml file. Oracle repository seems to store a huge amount of data in primary.xml.

For example here are 3 repos sync tasks in following order:

  • oracle (~16000 packages)
  • epel (~13000 packages)
  • nginx (~364 packages)

image

Indeed Oracle repo sync is consuming way too much memory comparing to epel which is quite a big repo too. I'm trying to add some memory freeing code to the script but it seems that it's not freeing memory at all. Have to investigate more.

@lbr38 lbr38 mentioned this issue Jul 12, 2024
@lbr38
Copy link
Owner

lbr38 commented Jul 12, 2024

After some tests and research, I was able to measure the memory consumed by PHP in real-time during the syncing of the Oracle repo and I confirm that the PHP script does not consume more memory than allocated to it (512MB by default). It consumes up to 340MB on average during the mirroring of the Oracle repo.

However, there is indeed a high memory usage (4GB+) that I also observed. This may come from external libraries used by the PHP script (such as curl or libxml2). It looks like the high usage appears during the parsing of the XML file (so I suspect libxml2) but analyzing memory consumption is a complex subject and I do not have enough knowledge to detect if there is a memory leak somewhere.

Anyway, I have added error handling during the execution of tasks (repomanager.task-run) which will display an error message on the web interface if PHP reaches the memory limit on a running task.

image

I have also added a parameter to increase the memory allocated to repomanager.task-run if needed.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants