Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error BARMAN #1010

Open
alainmahe opened this issue Aug 28, 2024 · 5 comments
Open

Error BARMAN #1010

alainmahe opened this issue Aug 28, 2024 · 5 comments

Comments

@alainmahe
Copy link

[barman@SRV_barman barman]$ barman backup srv_postres
WARNING: No backup strategy set for server 'srv_postres' (using default 'concurrent_backup').
Starting backup using rsync-concurrent method for server srv_postres in /SVG_FS/srv_postres/base/20240828T094848
Backup start at LSN: 247/15000028 (000000010000024700000015, 00000028)
This is the first backup for server srv_postres
ERROR: The backup has failed starting backup
Asking PostgreSQL server to finalize the backup.
ERROR: Backup failed writing backup label.
DETAILS: [Errno 2] No such file or directory: '/SVG_FS/srv_postres/base/20240828T094848/data/backup_label'
Processing xlog segments from file archival for srv_postres
000000010000024700000014
000000010000024700000015
000000010000024700000015.00000028.backup
EXCEPTION: [Errno 5] Input/output error: '/SVG_FS/srv_postres/wals/tmp3os2d4wd'
See log file for more details.

2024-08-28 09:48:48,429 [3269512] barman.backup_executor INFO: 16400, fib_data, /pg_tblspce
2024-08-28 09:48:49,040 [3269512] barman.backup_executor INFO: Backup start at LSN: 247/15000028 (000000010000024700000015, 00000028)
2024-08-28 09:48:49,047 [3269512] barman.backup_executor INFO: This is the first backup for server srv_postres
2024-08-28 09:48:49,071 [3269512] barman.backup_executor ERROR: The backup has failed starting backup
2024-08-28 09:48:49,071 [3269512] barman.backup_executor INFO: Asking PostgreSQL server to finalize the backup.
2024-08-28 09:48:52,720 [3269512] barman.backup ERROR: Backup failed writing backup label.
DETAILS: [Errno 2] No such file or directory: '/SVG_FS/srv_postres/base/20240828T094848/data/backup_label'
2024-08-28 09:48:52,803 [3269512] barman.wal_archiver INFO: Found 3 xlog segments from file archival for srv_postres. Archive all segments in one run.
2024-08-28 09:48:52,803 [3269512] barman.wal_archiver INFO: Archiving segment 1 of 3 from file archival: srv_postres/000000010000024700000014
2024-08-28 09:48:53,018 [3269512] barman.wal_archiver INFO: Archiving segment 2 of 3 from file archival: srv_postres/000000010000024700000015
2024-08-28 09:48:53,245 [3269512] barman.wal_archiver INFO: Archiving segment 3 of 3 from file archival: srv_postres/000000010000024700000015.00000028.backup
2024-08-28 09:48:53,395 [3269512] barman.cli ERROR: [Errno 5] Input/output error: '/SVG_FS/srv_postres/wals/tmp3os2d4wd'
See log file for more details.
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/barman/cli.py", line 2390, in main
args.func(args)
File "/usr/lib/python3.6/site-packages/barman/cli.py", line 546, in backup
backup_name=args.backup_name,
File "/usr/lib/python3.6/site-packages/barman/server.py", line 1651, in backup
self.backup_manager.remove_wal_before_backup(backup_info)
File "/usr/lib/python3.6/site-packages/barman/backup.py", line 1259, in remove_wal_before_backup
with tempfile.TemporaryFile(mode="w+", dir=xlogdb_dir) as fxlogdb_new:
File "/usr/lib64/python3.6/tempfile.py", line 624, in TemporaryFile
_os.unlink(name)
OSError: [Errno 5] Input/output error: '/SVG_FS/srv_postres/wals/tmp3os2d4wd'
2024-08-28 09:49:01,416 [3269547] barman.config WARNING: Discarding configuration file: .barman.auto.conf (not a file)
2024-08-28 09:49:01,438 [3269547] barman.backup_executor WARNING: No backup strategy set for server 'srv_postres' (using default 'concurrent_backup').
2

barman@SRV_barman barman]$ barman diagnose
WARNING: No backup strategy set for server 'srv_postres' (using default 'concurrent_backup').
{
"global": {
"config": {
"barman_home": "/SVG_FS",
"barman_user": "barman",
"compression": "gzip",
"configuration_files_directory": "/etc/barman/conf.d",
"errors_list": [],
"log_file": "/var/log/barman/barman.log",
"log_level": "INFO",
"minimum_redundancy": "0",
"retention_policy": "REDUNDANCY 35"
},
"system_info": {
"barman_ver": "3.10.0",
"kernel_ver": "Linux SRV_barman 5.15.0-106.131.4.el8uek.x86_64 #2 SMP Fri Sep 22 16:00:58 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux",
"python_ver": "Python 3.6.8",
"release": "RedHat Linux Red Hat Enterprise Linux release 8.8 (Ootpa)",
"rsync_ver": "rsync version 3.1.3 protocol version 31",
"ssh_ver": "",
"timestamp": "2024-08-28T09:45:08.523948+02:00"
}
},
"models": {},
"servers": {
"srv_postres": {
"active_model": null,
"backups": {
"20240828T093529": {
"backup_id": "20240828T093529",
"backup_label": "'START WAL LOCATION: 247/12000060 (file 000000010000024700000012)\nCHECKPOINT LOCATION: 247/12000098\nBACKUP METHOD: streamed\nBACKUP FROM: master\nSTART TIME: 2024-08-28 09:35:30 CEST\nLABEL: Barman backup srv_postres 20240828T093529\nSTART TIMELINE: 1\n'",
"begin_offset": 96,
"begin_time": "2024-08-28T09:35:29.762677+02:00",
"begin_wal": "000000010000024700000012",
"begin_xlog": "247/12000060",
"compression": null,
"config_file": "/pg_data/MyDB/postgresql.conf",
"copy_stats": null,
"deduplicated_size": null,
"end_offset": 304,
"end_time": "2024-08-28T09:35:31.149047+02:00",
"end_wal": "000000010000024700000012",
"end_xlog": "247/12000130",
"error": "failure writing backup label ([Errno 2] No such file or directory: '/SVG_FS/srv_postres/base/20240828T093529/data/backup_label')",
"hba_file": "/pg_data/MyDB/pg_hba.conf",
"ident_file": "/pg_data/MyDB/pg_ident.conf",
"included_files": null,
"mode": "rsync-concurrent",
"pgdata": "/pg_data/MyDB",
"server_name": "srv_postres",
"size": null,
"status": "FAILED",
"systemid": "6744984042826244766",
"tablespaces": [
[
"fib_data",
16400,
"/pg_tblspce"
]
],
"timeline": 1,
"version": 110003,
"xlog_segment_size": 16777216
}
},
"config": {
"active": true,
"archiver": true,
"archiver_batch_size": 0,
"autogenerate_manifest": false,
"aws_profile": null,
"aws_region": null,
"azure_credential": null,
"azure_resource_group": null,
"azure_subscription_id": null,
"backup_compression": null,
"backup_compression_format": null,
"backup_compression_level": null,
"backup_compression_location": null,
"backup_compression_workers": null,
"backup_directory": "/SVG_FS/srv_postres",
"backup_method": "rsync",
"backup_options": "concurrent_backup",
"bandwidth_limit": null,
"barman_home": "/SVG_FS",
"barman_lock_directory": "/SVG_FS",
"basebackup_retry_sleep": 30,
"basebackup_retry_times": 0,
"basebackups_directory": "/SVG_FS/srv_postres/base",
"check_timeout": 30,
"cluster": "srv_postres",
"compression": "gzip",
"config_changes_queue": "/SVG_FS/cfg_changes.queue",
"conninfo": "host=srv_postres port=5432 user=barman dbname=postgres password=REDACTED",
"create_slot": "manual",
"custom_compression_filter": null,
"custom_compression_magic": null,
"custom_decompression_filter": null,
"description": "NON_PROD PostgreSQL Master server",
"disabled": false,
"errors_directory": "/SVG_FS/srv_postres/errors",
"forward_config_path": false,
"gcp_project": null,
"gcp_zone": null,
"immediate_checkpoint": false,
"incoming_wals_directory": "/SVG_FS/srv_postres/incoming",
"last_backup_maximum_age": null,
"last_backup_minimum_size": null,
"last_wal_maximum_age": null,
"lock_directory_cleanup": true,
"max_incoming_wals_queue": null,
"minimum_redundancy": 0,
"msg_list": [],
"name": "srv_postres",
"network_compression": false,
"parallel_jobs": 1,
"parallel_jobs_start_batch_period": 1,
"parallel_jobs_start_batch_size": 10,
"path_prefix": null,
"post_archive_retry_script": null,
"post_archive_script": null,
"post_backup_retry_script": null,
"post_backup_script": null,
"post_delete_retry_script": null,
"post_delete_script": null,
"post_recovery_retry_script": null,
"post_recovery_script": null,
"post_wal_delete_retry_script": null,
"post_wal_delete_script": null,
"pre_archive_retry_script": null,
"pre_archive_script": null,
"pre_backup_retry_script": null,
"pre_backup_script": null,
"pre_delete_retry_script": null,
"pre_delete_script": null,
"pre_recovery_retry_script": null,
"pre_recovery_script": null,
"pre_wal_delete_retry_script": null,
"pre_wal_delete_script": null,
"primary_checkpoint_timeout": 0,
"primary_conninfo": null,
"primary_ssh_command": null,
"recovery_options": "",
"recovery_staging_path": null,
"retention_policy": "redundancy 5 b",
"retention_policy_mode": "auto",
"reuse_backup": null,
"slot_name": null,
"snapshot_disks": null,
"snapshot_gcp_project": null,
"snapshot_instance": null,
"snapshot_provider": null,
"snapshot_zone": null,
"ssh_command": "ssh postgres@srv_postres",
"streaming_archiver": false,
"streaming_archiver_batch_size": 0,
"streaming_archiver_name": "barman_receive_wal",
"streaming_backup_name": "barman_streaming_backup",
"streaming_conninfo": "host=srv_postres port=5432 user=barman dbname=postgres password=REDACTED",
"streaming_wals_directory": "/SVG_FS/srv_postres/streaming",
"tablespace_bandwidth_limit": null,
"wal_conninfo": null,
"wal_retention_policy": "simple-wal 5 b",
"wal_streaming_conninfo": null,
"wals_directory": "/SVG_FS/srv_postres/wals"
},
"status": {
"archive_command": "rsync -a %p barman@SRV_barman:/SVG_FS/srv_postres/incoming/%f",
"archive_mode": "on",
"archive_timeout": 900,
"archived_count": 9005,
"checkpoint_timeout": 300,
"config_file": "/pg_data/MyDB/postgresql.conf",
"current_archived_wals_per_second": 0.00214236564998093,
"current_lsn": "247/1400A4C8",
"current_size": 39049876757.0,
"current_xlog": "000000010000024700000014",
"data_checksums": "off",
"data_directory": "/pg_data/MyDB",
"failed_count": 7546,
"has_backup_privileges": true,
"has_monitoring_privileges": true,
"hba_file": "/pg_data/MyDB/pg_hba.conf",
"hot_standby": "on",
"ident_file": "/pg_data/MyDB/pg_ident.conf",
"is_archiving": true,
"is_in_recovery": false,
"is_superuser": true,
"last_archived_time": "2024-08-28T09:37:31.465322+02:00",
"last_archived_wal": "000000010000024700000013",
"last_failed_time": "2024-08-28T09:33:44.354622+02:00",
"last_failed_wal": "0000000100000246000000A1",
"max_replication_slots": "10",
"max_wal_senders": "10",
"postgres_systemid": "6744984042826244766",
"replication_slot": null,
"replication_slot_support": true,
"server_txt_version": "11.3",
"stats_reset": "2024-07-09T18:10:18.974990+02:00",
"synchronous_standby_names": [
""
],
"version_supported": true,
"wal_compression": "off",
"wal_keep_segments": "0",
"wal_level": "replica",
"xlog_segment_size": 16777216
},
"system_info": {
"kernel_ver": "Linux srv_postres 3.10.0-1160.119.1.el7.x86_64 #1 SMP Tue May 14 11:55:25 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux",
"python_ver": "",
"release": "RedHat Linux Red Hat Enterprise Linux Server release 7.9 (Maipo)",
"rsync_ver": "rsync version 3.1.2 protocol version 31",
"ssh_ver": ""
},
"wals": {
"last_archived_wal_per_timeline": {
"00000001": {
"compression": "gzip",
"name": "000000010000024700000013",
"size": 17711,
"time": 1724830650.0249295
}
}
}
}
}
}

@martinmarques
Copy link
Contributor

Is wals_directory a WORM partition? I think you have a problem with the initial backup where Barman removed unneeded WALs, but the FS refuses to unlink such files:

File "/usr/lib/python3.6/site-packages/barman/backup.py", line 1259, in remove_wal_before_backup
with tempfile.TemporaryFile(mode="w+", dir=xlogdb_dir) as fxlogdb_new:
File "/usr/lib64/python3.6/tempfile.py", line 624, in TemporaryFile
_os.unlink(name)

@alainmahe
Copy link
Author

Hello,
Thank you for your reply,
The problem is the directory data is not created
DETAILS: [Errno 2] No such file or directory: '/SVG_FS/srv_postres/base/20240828T094848/data/backup_label'

[barman@SRV_barman base]$ ls -l 20240828T094848
total 0
-rwxrwxrwx. 1 root root 1067 Aug 28 09:48 backup.info

@martinmarques
Copy link
Contributor

Did you check for errors at the OS level? The error comes from the OS when calling _os.unlink(name).

What FS holds the /SVG_FS/srv_postres/ directory? Can you share the output from df /SVG_FS/srv_postres/?

@alainmahe
Copy link
Author

Hello:

blobfuse2 4.0G 12K 4.0G 1% /SVG_FS/srv_postres

@martinmarques
Copy link
Contributor

Have you tested manually writing and deleting files from that file system?

The errors you shared point to writing and deleting files in the wals and backup directories.

I would also recommend moving to the latest 3.11.1 as we've added a small change to the exception handling regarding errors that come from OS permission issues. It's possible that the culprit of the failure is hidden and the changes we have added in 3.11.1 will show us where the problem is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants