Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ESB ID capture #281

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

blakefinney
Copy link

Copied the GSIS_ID function to pick up the ESB ID which is used for Injuries and Player Headshots.

Copied the GSIS_ID function to pick up the ESB ID which is used for Injuries and Player Headshots.
@blakefinney blakefinney mentioned this pull request Jan 6, 2017
Moved the function out, named parameters profile_url
@@ -181,6 +193,7 @@ def meta_from_soup_row(team, soup_row):
return {
'team': team,
'profile_id': profile_id_from_url(profile_url),
'esb_id': esb,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is sufficient. I think you also need to add it as a member on the Player object?

Do you have code that works with this patch?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought I had it in my code when it was fetching correctly.

Then used: game.players.playerid(pid)

Which fetched the player data from a particular game but was omitting the esb_id.

Working on it now

@ochawkeye
Copy link
Contributor

Not trying to prematurely optimize this, but when we're dealing with thousands of players it's probably prudent to be aware that you'll end up hitting every profile_url twice, once to collect the gsis_id and again to collect the esb_id. Think it would be more sensitive to NFL.com to just grab both ids in one pass, no?

@blakefinney
Copy link
Author

@ochawkeye Probably best to. In hindsight after running the update a few times I noticed it slowed significantly. Would update the gsis_id function to fetch both essentially and feed it in that way?

@blakefinney
Copy link
Author

Improving efficiency. Make sure nflgame Player object has the "esb_id" passed through

@blakefinney blakefinney closed this Jan 6, 2017
@BurntSushi
Copy link
Owner

@ochawkeye Good catch, I missed that. Yeah, the update script is specifically designed around limiting HTTP traffic.

@blakefinney
Copy link
Author

@ochawkeye @BurntSushi I've been trying to see when the gsis_id function (here: https://github.com/blakefinney/nflgame/blob/5af4de168872493fbd315d92c3adda0ff8e9ba45/nflgame/update_players.py#L116 ) is called.

But I think it only gets called when a player isn't in nflgame yet (Correct me if I'm wrong)

So it seems like the best approach would be to only include the ESB capture on a full-scan? But I can't run that because I'm getting " File "C:/Python27/Lib/site-packages/nflgame/update_players.py", line 365, in run
for _, schedule in nflgame.sched.games.itervalues():
ValueError: too many values to unpack"

What do you recommend?

@ochawkeye
Copy link
Contributor

A previously submitted pull request #224 and a current pull request #266 attempted to address that issue. You can implement something like those as a work around.

@blakefinney
Copy link
Author

blakefinney commented Jan 6, 2017

@ochawkeye It's not the same issue (multiple commas in names) as I've already patched that myself.

It's a different one I believe, the same one as in this issue: #202

@ochawkeye
Copy link
Contributor

Oh...that one. The old .itervalues() to .iteritems() one.

Can either do this:

for schedule in nflgame.sched.games.itervalues():
    # If the game is too far in the future, skip it...

or this:

for _, schedule in nflgame.sched.games.iteritems():
    # If the game is too far in the future, skip it...

@blakefinney
Copy link
Author

How about this guys? Still works in a similar way to the original PR I submitted, but when reading a row for a rostered player it checks if it's on full_scan. If it is, then go ahead and scrape the Player Profile page, if not skip over it.

It'll also be getting the ESB at the same time as the GSIS when scraping it for a completely new player so it shouldn't ever request the same profile page twice.

Obviously significant increases the speed of the --full-scan update players, but hopefully doing that once will get every player up to date with their ESB

@blakefinney blakefinney reopened this Jan 7, 2017
@ochawkeye
Copy link
Contributor

@blakefinney Would you mind tossing your players.json into this PR as well? Inclusion of that would allow all to have to full_scan to bring them up to latest(-ish).

@blakefinney
Copy link
Author

@ochawkeye I've actually been working on doing it slightly differently. One of the other features that I was curious about was player's injury status leading up to a game.

I've been working on a scraping injury status from http://www.nfl.com/injuries and that could include the ESB ID capture. On that page they list players in the html, but Beautiful Soup doesn't pick it up. However they have a script tag that contains a JSON object of the injuries which I've successfully scraped and then assigned to players via the esb_id.

Just want to test it again with this week's injury report and may have a different PR up.

Let me know if it's worth doing. Maybe you as well @BurntSushi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants