Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate english actor name to arabic name, and utilize the actor dictionary already exist in English #180

Open
YanLiang1102 opened this issue Jun 23, 2017 · 8 comments
Assignees

Comments

@YanLiang1102
Copy link
Contributor

No description provided.

@YanLiang1102 YanLiang1102 self-assigned this Jun 23, 2017
YanLiang1102 added a commit that referenced this issue Jun 23, 2017
…dictionray and update teh lda-kemans code
@YanLiang1102
Copy link
Contributor Author

finish uploading the data to the mongodb , took almost 6 hours, we have 18000+ records in english existing dictionaries but most of them can't find the correspoind arabic name.
I add some extra records by doing [['ar_actor],[alternative_ar_names for the actor]] with their multiple roles inserted in to the db, altogether 5696 records inserted.

YanLiang1102 added a commit that referenced this issue Jun 25, 2017
@YanLiang1102
Copy link
Contributor Author

YanLiang1102 commented Jun 25, 2017

something need to be aware that we store all the actors that coders tagged, and we need to come up intelligent way to clean it up
like this word:القوات
image
choose what is the best one to stay

@YanLiang1102
Copy link
Contributor Author

this can be done using wiki, if the old service can find it then make the old service find it, if the old service can't find it then make it go to wiki to find the name, then we can fully utilize our existing dictionary in english

YanLiang1102 added a commit that referenced this issue Jun 28, 2017
…the service we already have can't not find our ar name first commit code for this
YanLiang1102 added a commit that referenced this issue Jun 29, 2017
@YanLiang1102
Copy link
Contributor Author

YanLiang1102 commented Jun 29, 2017

we can't get the full response html using python requests, so we can not see the ar_url for some name,
look at this page should help:
https://stackoverflow.com/questions/37969536/why-are-lis-not-showing-up-with-python-requests-response
we need to implement selinum to mimic a browser to make it return eveything.


two things in order to make selenium to work
1 need to make the selenium drive to point to
where firefox stored, find by 'which firefox"
2. need to make the driver point to geckodriver, store this under '/usr/local/bin' otherwise you need to export the path where you store it.


good soruce to use selenium in python 👍 http://thiagomarzagao.com/2013/11/12/webscraping-with-selenium-part-1/

@YanLiang1102
Copy link
Contributor Author

YanLiang1102 commented Jun 29, 2017

This name does not return anything on wiki but on google directly:
SIBGHATULLAH_MOJADEDI
since the correct name should be:
Sibghatullah Mojaddedi
which means in english dictionary that we already have , might not be accurate at all.

@YanLiang1102
Copy link
Contributor Author

diretly using wiki url is case sensitive say this
BABRAK_KARMAL does not return a record on wiki, but if u use wiki url to search something like Babrak_Karmal it will return something.

YanLiang1102 added a commit that referenced this issue Jun 29, 2017
@YanLiang1102
Copy link
Contributor Author

Think about to make it run in parallel.

YanLiang1102 added a commit that referenced this issue Jul 2, 2017
YanLiang1102 added a commit that referenced this issue Jul 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant