Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importer errors: FATAL: PCDATA invalid Char value 11 #129

Open
12 tasks done
bkiahstroud opened this issue Mar 21, 2024 · 4 comments
Open
12 tasks done

Importer errors: FATAL: PCDATA invalid Char value 11 #129

bkiahstroud opened this issue Mar 21, 2024 · 4 comments
Assignees

Comments

@bkiahstroud
Copy link

bkiahstroud commented Mar 21, 2024

Story

Affected importer IDs

  • 92
  • 117
  • 280
  • 281
  • 282
  • 283
  • 284
  • 285
  • 286
  • 287
  • 288
  • 289
@afred
Copy link

afred commented Mar 25, 2024

This one is due to vertical tabs present in the source xml. We are fixing it at the source and will notify here once the importers can be retried.

@afred
Copy link

afred commented Mar 29, 2024

I gathered a list of AAPB IDs from the filenames printed in the error messages. I'm hoping that these are the only ones that need to be looked at, rather than the 10K files for each importer.
aapb_ids_vertical_tab_in_pbcore.txt

@bkiahstroud
Copy link
Author

Script to switch broken XML files with the fixed ones:

fixed_xml_files = Dir.glob(Rails.root.join('tmp', 'imports', 'i129-fixed-xml', '*.xml'))
imp_ids = [92, 117, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289]
imp_dirs = []
results = {}

imp_ids.each do |id|
  imp_dirs << Rails.root.join('tmp', 'imports', Bulkrax::Importer.find(id).name)
end

fixed_xml_files.each do |f|
  filename = File.basename(f)
  imp_dirs.each do |dir|
    true_path = File.join(dir, filename)
    next unless File.exist?(true_path)

    results[filename] = true_path
    break
  end
end

results.each do |filename, path|
  FileUtils.mv(path, Rails.root.join('tmp', 'imports', 'i129-broken-xml'))
  FileUtils.cp(Rails.root.join('tmp', 'imports', 'i129-fixed-xml', filename), path)
end

@bkiahstroud
Copy link
Author

All the effected Importers have been rerun and no longer error 🙌

@bkiahstroud bkiahstroud moved this from In Development to Client Verification in AMS / GBH Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Client Verification
Development

No branches or pull requests

2 participants