Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete corrupt file on the Isilon #6604

Open
hackartisan opened this issue Jan 16, 2025 · 7 comments
Open

Delete corrupt file on the Isilon #6604

hackartisan opened this issue Jan 16, 2025 · 7 comments
Assignees

Comments

@hackartisan
Copy link
Member

hackartisan commented Jan 16, 2025

figgy-web-staging1 still / again cannot get a listing or interact with the directory `/mnt/hydra_sources/

replicate by trying to ls that location.

It definitely worked for a minute after the box was rebuilt. But now it does not.

previous ticket: #6593

@hackartisan hackartisan changed the title staging1 box still has trouble connecting to local_uploads mount directory staging1 box has trouble connecting to local_uploads mount directory Jan 16, 2025
@hackartisan
Copy link
Member Author

It looks like a bunch of clean:expired_local_files jobs are starting and never finishing, then eventually you can't get a file listing at that location.

@tpendragon
Copy link
Contributor

We killed the tasks with sudo kill $(ps aux | grep 'expired_local_files' | awk '{print $2}') -9

We do need that task to run, though.

@tpendragon
Copy link
Contributor

tpendragon commented Jan 21, 2025

What we've found:

We can't read or delete that file in SMB mounts.

If we SSH to the Isilon, we can't read the file still.

The file in question (on the isilon): /ifs/hydra/binaries/ingest_scratch/local_uploads_broken/1435651a747a13f36d59cb498ce70b04

On the Figgy boxes this is /mnt/hydra_sources/ingest_scratch/local_uploads_broken/1435651a747a13f36d59cb498ce70b04

From what we've been able to tell the other files are fine, but this one is holding up the cleanup process.

Edit: The paths were local_uploads, but we were able to mv local_uploads local_uploads_broken; mkdir local_uploads to prevent our cleanup task from never completing and freezing Figgy.

@tpendragon tpendragon changed the title staging1 box has trouble connecting to local_uploads mount directory Delete corrupt file on the Isilon Jan 21, 2025
@aruiz1789
Copy link

aruiz1789 commented Jan 21, 2025

@tpendragon we have an ongoing conversation with Dell support team as we are having an issue replacing a drive in the storage device. We will provide more information once we are done with Dell support as we believe this issue is not related to the figgy problem but needs to be discarded.

@aruiz1789
Copy link

@tpendragon can you try deleting the file once again? All cifs shares look normal and I am able to list the file in question.

pulsys@figgy-web-staging1:~$ ls -al /mnt/hydra_sources/ingest_scratch/local_uploads_broken/1435651a747a13f36d59cb498ce70b04
-rwxr-xr-x 1 deploy www-data 6269599 Dec 18 21:39 /mnt/hydra_sources/ingest_scratch/local_uploads_broken/1435651a747a13f36d59cb498ce70b04

@tpendragon
Copy link
Contributor

tpendragon commented Jan 21, 2025

@aruiz1789 Still broken: cat /mnt/hydra_sources/ingest_scratch/local_uploads_broken/1435651a747a13f36d59cb498ce70b04 just hangs.

It's also broken ON the isilon, when SSH'd into diglibdata1. I don't think it has anything to do with smb, or if it does it's some sort of file lock or something.

@aruiz1789
Copy link

Hey @tpendragon I unmounted and remounted the share but still can't see the content of the file.... but I can list inside the dir path which I was not able to before.

I also mounted the share on a "Rocky Linux 9.5" VM and got the same results, the system hangs while reading the file. So we can eliminate the client OS version being the issue.

When you tried deleting the file from the Storage Device did you get a busy error or it just hangs like from the client? If you get a busy output we can try rebooting the client to release the connection to the file.

I will also get in touch with the storage admin to see if we can pinpoint the cause as I do not have access to it.

@acozine acozine self-assigned this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants