Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scripts not working with unicode input on mac #1

Open
roeeaharoni opened this issue Nov 22, 2014 · 10 comments
Open

scripts not working with unicode input on mac #1

roeeaharoni opened this issue Nov 22, 2014 · 10 comments

Comments

@roeeaharoni
Copy link

Hi!

I Tried to use the tool according to the readme file, on macosx, with hebrew as the source language and arabic as the target. When I executed the following command (after installing the dependencies):

./build-corpus.sh ar hewiki-20141102 > titles_he_ar.txt

I got the following output:

Target language code: ar
Using hewiki-20141102-langlinks.sql.gz
Using hewiki-20141102-page.sql.gz
Reading page data from hewiki-20141102-page.sql.gz...
iconv: conversion from utf8 unsupported
iconv: try 'iconv -l' to get the list of supported encodings
read 0 documents
Reading langlinks data from hewiki-20141102-langlinks.sql.gz...
iconv: conversion from utf8 unsupported
iconv: try 'iconv -l' to get the list of supported encodings
read 0 documents

I tried to fix this by changing the perl scripts that called iconv with parameter 'utf8' to call it with 'utf-8', and it seems to work fine now.

Best regards,
Roee

@randinterval
Copy link

Hi there, Can you upload the updated script?I seem to be having some issues.
saad@Arc:/Desktop/wiki$ ./build-corpus.sh en enwiki-20151102 > titles.txt
Target language code: en
Using enwiki-20151102-langlinks.sql.gz
Using enwiki-20151102-page.sql.gz
Reading page data from enwiki-20151102-page.sql.gz...
read 37804388 documents
Reading langlinks data from enwiki-20151102-langlinks.sql.gz...
read 0 documents
saad@Arc:
/Desktop/wiki$ ./build-corpus.sh ur urwiki-20151123 > titles.txt
Target language code: ur
Using urwiki-20151123-langlinks.sql.gz
Using urwiki-20151123-page.sql.gz
Reading page data from urwiki-20151123-page.sql.gz...
read 401280 documents
Reading langlinks data from urwiki-20151123-langlinks.sql.gz...
read 0 documents

@randinterval
Copy link

@roeeaharoni @redpony @wammar

@wammar
Copy link
Member

wammar commented Dec 1, 2015

Hi Saad,

Try the following command:
./build-corpus.sh en urwiki-20151123

Waleed
On Dec 1, 2015 9:46 AM, "Saad Ahmed" [email protected] wrote:

@roeeaharoni https://github.com/roeeaharoni @redpony
https://github.com/redpony @wammar https://github.com/wammar


Reply to this email directly or view it on GitHub
#1 (comment)
.

@randinterval
Copy link

Hi Waleed,
Thank you so much! 👍 @wammar

@randinterval
Copy link

Hey @wammar , Hope you're well - I also want to extract entire articles in English and Target Language (Urdu) versions. Could you please point me in the right direction?I haven't programming in Perl before, so just little bit confused.

@wammar
Copy link
Member

wammar commented Dec 14, 2015

Sorry Saad but I can hardly write perl myself.
On Dec 14, 2015 4:14 AM, "Saad Ahmed" [email protected] wrote:

Hey @wammar https://github.com/wammar , Hope you're well - I also want
to extract entire articles in English and Target Language (Urdu) versions.
Could you please point me in the right direction?I haven't programming Perl
before, so just little bit confused.


Reply to this email directly or view it on GitHub
#1 (comment)
.

@randinterval
Copy link

Hey @roeeaharoni I also want to extract entire articles in English and Target Language (Urdu) versions. Could you please point me in the right direction?I haven't programming in Perl before, so just little bit confused.

@ghost
Copy link

ghost commented Mar 6, 2017

if you're getting the following error:

iconv: conversion from utf8 unsupported
iconv: try 'iconv -l' to get the list of supported encodings

Go to scripts > extract.pl
change two instances of "utf8" to "utf-8" in Line 29 & Line 53.
It will work.

PS: This change worked on Ubuntu & Mac

@imrrahul
Copy link

imrrahul commented Jun 2, 2018

@randinterval @wammar please I am getting below error ---
screenshot from 2018-06-03 00-29-13

@twielfaert
Copy link

@imrrahul You're using both the wrong path and the wrong files (enwikivoyage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants