-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scripts not working with unicode input on mac #1
Comments
Hi there, Can you upload the updated script?I seem to be having some issues. |
Hi Saad, Try the following command: Waleed
|
Hi Waleed, |
Hey @wammar , Hope you're well - I also want to extract entire articles in English and Target Language (Urdu) versions. Could you please point me in the right direction?I haven't programming in Perl before, so just little bit confused. |
Sorry Saad but I can hardly write perl myself.
|
Hey @roeeaharoni I also want to extract entire articles in English and Target Language (Urdu) versions. Could you please point me in the right direction?I haven't programming in Perl before, so just little bit confused. |
if you're getting the following error:
Go to scripts > extract.pl PS: This change worked on Ubuntu & Mac |
@randinterval @wammar please I am getting below error --- |
@imrrahul You're using both the wrong path and the wrong files (enwikivoyage). |
Hi!
I Tried to use the tool according to the readme file, on macosx, with hebrew as the source language and arabic as the target. When I executed the following command (after installing the dependencies):
./build-corpus.sh ar hewiki-20141102 > titles_he_ar.txt
I got the following output:
Target language code: ar
Using hewiki-20141102-langlinks.sql.gz
Using hewiki-20141102-page.sql.gz
Reading page data from hewiki-20141102-page.sql.gz...
iconv: conversion from utf8 unsupported
iconv: try 'iconv -l' to get the list of supported encodings
read 0 documents
Reading langlinks data from hewiki-20141102-langlinks.sql.gz...
iconv: conversion from utf8 unsupported
iconv: try 'iconv -l' to get the list of supported encodings
read 0 documents
I tried to fix this by changing the perl scripts that called iconv with parameter 'utf8' to call it with 'utf-8', and it seems to work fine now.
Best regards,
Roee
The text was updated successfully, but these errors were encountered: