Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Redevelopment of the Quora scrapper #918

Closed
3 tasks done
Saurabh254 opened this issue May 12, 2024 · 4 comments
Closed
3 tasks done

Bug: Redevelopment of the Quora scrapper #918

Saurabh254 opened this issue May 12, 2024 · 4 comments
Assignees
Labels
gssoc GSSoC 2024

Comments

@Saurabh254
Copy link

Saurabh254 commented May 12, 2024

Describe the feature

As an GSSoC'24 contributer, I want to enhance my developing skills into this scrape-up.
also I'll be working in this issue,
point to be noted I'm the contributor of the python package pyquora (quora scrapper).

I'm working on this because pyquora lacks some features like fetch get Answers by search query.

also I would like to make the scrap-up quora scrapper better.

the part I'll be covering will be

  • fetch user Answers
  • Get answers by a search query
  • fetch UserProfile

Add ScreenShots

will cover every details from the bellow image
2024-05-12-21:34:49-screenshot

will also cover the top answers

2024-05-12-21:35:45-screenshot

Record

  • I agree to follow this project's Code of Conduct
  • I'm a GSSoC'24 contributor
  • I want to work on this issue
@viththagi
Copy link

hi @Saurabh254 i would like to work on this issue my steps would be:

1.web scraping using libraries such as beautifulsoup,selenium
2.Understand the Website Structure to inspect the HTML of the comments section.
3.efficiency consideration:
Add delays between requests to avoid overloading the website's server.
Handle pagination
4.sentiment analysis: using libraries like TextBlob or NLTK.

@Saurabh254
Copy link
Author

@nikhil25803 you can assign me this task. :)

@Saurabh254
Copy link
Author

hi @Saurabh254 i would like to work on this issue my steps would be:

1.web scraping using libraries such as beautifulsoup,selenium 2.Understand the Website Structure to inspect the HTML of the comments section. 3.efficiency consideration: Add delays between requests to avoid overloading the website's server. Handle pagination 4.sentiment analysis: using libraries like TextBlob or NLTK.

we don't have to use selenium because not every system supports it.
I rather be using regex to scrap the json.

@nikhil25803
Copy link
Member

Go ahead @Saurabh254

Note

  • Please create a separate module for this, as in the folder and project structure (if it is already created, just add your features as functions in the same module).
  • Do not use the `selenium web driver as it is incompatible with all devices and cloud platforms.
  • Before making any changes, please check whether the module you want to add exists. If yes, then you can add your functionality as a method only make a separate module and class for it.

All the best 👨‍💻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gssoc GSSoC 2024
Projects
None yet
Development

No branches or pull requests

3 participants