Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Friendly Hello from another parking scraper #229

Open
defgsus opened this issue Nov 14, 2021 · 11 comments
Open

Friendly Hello from another parking scraper #229

defgsus opened this issue Nov 14, 2021 · 11 comments

Comments

@defgsus
Copy link
Collaborator

defgsus commented Nov 14, 2021

Hi there. Are you still working on this?

I wrote a couple of parking space scrapers myself at https://github.com/defgsus/parking-scraper

Data is persisted at https://github.com/defgsus/parking-data

Maybe we can join forces.

I'm more interested in recording than in actual use by driving people, but who knows?

@kiliankoe
Copy link
Member

Hey! It would be great to join forces, adapt the scrapers you've written to our format and dump our historical data into your data repo as well. According to our dumps they go back to July 2015 for many lots, others as soon as their scrapers were introduced, but @jklmnn knows how current the data there is. Also please beware that the archive expands quite a bit when unarchived.

I'm more interested in recording than in actual use by driving people, but who knows?

Haha, same, who needs a car anyways 😅

@defgsus
Copy link
Collaborator Author

defgsus commented Nov 15, 2021

Cool, i will just start the port.

Ach nee, is ja Montag.. Need to earn some money first.

Only kidding. Thanks for the fast reply. Thought that not much is going on here because of the pending pull requests. I will certainly add some more.

Your archive is quite something! Here's the one-year-celebration of mine: https://defgsus.github.io/blog/2021/04/08/one-year-parking.html

@jklmnn
Copy link
Member

jklmnn commented Nov 15, 2021

Great work! Our archive is up to date to then end of 2020. I will add this year at the start of 2022. For more current data we also have an API (@kiliankoe do you remember if and where this is documented). However we're not advertising this API as it puts a strain on the server (which is also the reason that the largest request that can be made is limited to a week per lot). This API contains all data since the server has been upgraded (09/2020). I try to make sure that all historical data is always available either in the archives or via API.

About the pending PRs, They're the ones that got stuck at some point of the process, we merged multiple others in the mean time. But you're right, we're currently not super active because as you said it's Monday and someone has to earn the money to keep the servers running ;).

defgsus added a commit to defgsus/ParkAPI that referenced this issue Nov 16, 2021
@defgsus
Copy link
Collaborator Author

defgsus commented Nov 16, 2021

Good morning. Earned some money in the meantime? I did, but it's not enough..

However, i had some trouble setting up the local development. I'm not the raw SQL type of guy. So i added a few lines to the README.md to help others. #230 (EDIT: actually, now i found the park_api/setupdb.py)

Actually, i wanted to see if Frankfurt is still failing (#153). I have the same problem in other scrapers. Some websites issue certificates which are fine for a browser but not for the requests api.

Anyways, i have two immediate proposals:

  1. Add some command-line args to the scraper, so that one can work on a particular website without needing to scrape all the others as well. If the list of cities is growing that would certainly be helpful.
  2. Switch the whole project to Django. Actually, i'm not sure if Django is the best of choices for a (potentially) high-demand API server, but it's not a bad choice either and they have a usable database ORM, migrations, an integrated shell and command-line tasks (that can be cron-jobbed) out of the box. I'm a Django fan, admittedly. That would require a move of the data to a new database but that would fit with Proper database schema #224 i guess. Also agree with Exclude static data / modules. #144 in the long term.

Waddjasay?

EDIT

If you ask why i would like to help refactoring a whole project that i havn't known two days ago and whose public website i havn't even loaded without all the scriptblockers, here's a few motivational points (because i was wondering myself):

  • I like collecting this kind of data. And i did some mistakes along in my own project which are still in there because nobody else seems to look at it.
  • So it would certainly be supportive to work on a canonical standard as a team
  • The legacy code is much like my own python projects from 6 years ago. I'd do it quite differently, today. I know Django quite well and it would help to make the code, data, CI and community involvement more durable... i think, Using the django rest-framework the API can be automatically documented, throttled or parts of it restricted via access-keys, etc...
  • In other words: I like to refresh the code at a couple of points and actually would rather invest the time to port it to a new framework

@defgsus
Copy link
Collaborator Author

defgsus commented Nov 18, 2021

Hello again!

Guess one should not come along and right away call other people's projects legacy code. It's just enthusiasm. I don't want to hijack anything. Still, interested in discussion.

@jklmnn
Copy link
Member

jklmnn commented Nov 18, 2021

Sorry for the late reply. I wanted to point you to park_api/setupdb.py but I see you already found it. Since it's not documented yet, could you change #230 to include that into the setup instructions?

You're certainly right about the state of the code, most of it is legacy and only maintained as needed for it not to break. The main reason for that is the lack of time, energy and people (as far as I can tell only @kiliankoe and I actively work on the current code base outside of module contributions. For my side, the biggest restriction is time and motivation (both is hard to find for programming after working full time in software engineering). So we'd be really thankful for any help that improves the current implementation.

Tbh I don't have any experience with Django but if that provides us with a (more or less) easy implementation of the full stack including the database I'd be in favor of that. @kiliankoe do you have any opinion on that? About #224, that's the best I could come up with, but I'm certainly no database export.

About refactoring the project, I'd say rewriting is the proper term. Most of the code has been written by different people (many of which aren't working actively on that code base anymore) and then updated by me. I think a new implementation would be faster and yield better results than trying to understand the current code base and refactor it. Especially since we don't have a specific goal for refactoring (other than "improving" which is not exactly specific).

@jklmnn
Copy link
Member

jklmnn commented Nov 18, 2021

Also for better communication you can join #parkendd:matrix.org or I can invite you into the OKF Germany Slack.

@defgsus
Copy link
Collaborator Author

defgsus commented Nov 18, 2021

Thanks for the reply! So i guessed about right about the state of code. I just don't want to put someone off but it looks like it should be rewritten to make further contributions easier. I've been using Django for a few years now, also use it to earn some of that money we were talking about. After some introduction it's a quite friendly and intuitive framework (until you start extending the admin interface ;-)

My current workload is acceptable and i can certainly spend a few hours each week. I will check the matrix link above.

I will update the #230 pull-req. Can you please just clarify the environments a bit? As i understand it:

  • The unittests will always use testing
  • Otherwise it defaults to development
  • And on the live system you probably set the env to staging or production

So calling setupdb.py will probably create the development tables and to create testing-db i need to set the environment variable env=testing.

Just nod if that's right

@jklmnn
Copy link
Member

jklmnn commented Nov 18, 2021

Yes the unittests use testing and on our server we use production. setupdb.py as all other scripts will use the env environment variable to select the environment. I don't really know why we have staging though. It would probably useful if we had a specific test system but that is probably not going to happen.

@defgsus
Copy link
Collaborator Author

defgsus commented Nov 21, 2021

Hi there,

here's a new prototype https://github.com/defgsus/ParkAPI2

(don't mind the 2, it's still supposed to become 1 ;)

Right now it's just a basic framework for tying parking lots to cities, states and countries with geo-coordinates on all entities. Most interesting part are the models and the store methods in https://github.com/defgsus/ParkAPI2/tree/master/web/park_data/models

Certainly stuff to discuss. The pool entity from #224 can be added besides the cities.

@jklmnn
Copy link
Member

jklmnn commented Feb 22, 2022

Hi @defgsus, sorry for the long reply times. I invited you into a matrix room for a discussion about the future of ParkAPI. Please have a look :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants