-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change post date-time timezone #416
Comments
You can do import pytz
post['time'] = post['time'].replace(tzinfo=pytz.utc).astimezone(pytz.timezone('America/Lima')) for example. But I'm not completely sure if all the dates the scraper extract are in UTC, so you might want to double check that. |
The timestamp is local time, based on the timezone of your system. So, check the timezone set on your system. |
Thank you :) |
@neon-ninja Sorry to reopen a closed issue. When pulling posts, is there any way for Facebook Scraper to get the post timestamp's timezone when scraping FB's site? Issue: I'm assuming that the facebook scraper's timestamps are pulled directly from the FB post, and the timestamps on the FB post are based on the user's IP address. If this is true, then using proxies would mean that the timestamps may not match your system's timezone. |
@TowardMyth depending on how you're scraping, you might get the post time in a slightly different way. Unauthenticated requests can actually receive a UNIX timestamp. In that case, a debug log message "Got exact timestamp from publish_time" would be printed. See #273 for example. The UNIX timestamp is defined as the number of seconds since 00:00:00 UTC on 1 January 1970. So it is the same regardless of timezone. The timestamp served by https://m.facebook.com/story.php?story_fbid=306426417518211&id=100044525645708&_rdr is 1619377200, which equates to Sun Apr 25 2021 19:00:00 GMT+0000. The scraper uses |
@neon-ninja Do authenticated requests also receive a unix timestamp? If not, does it at least receive a timezone-aware datetime from FB (or just a naive non-timezone aware one)? |
Sometimes. I tested by adding a field to indicate whether a timestamp was exact or not, and the following code: for post in get_posts("Nintendo", cookies="cookies.txt", options={"allow_extra_requests": False}):
print(post["post_id"], post["time"], post.get("time_exact")) And these were the results (GMT+12): 4377850938965992 2021-08-07 10:09:16 True
4377380082346411 2021-08-07 06:12:37 True
4377121662372253 2021-08-07 04:23:11 True
4371727619578324 2021-08-05 09:33:45 True
4365624383521981 2021-08-03 08:09:04 True
4365050470246039 2021-08-03 04:00:19 True
4329977513753335 2021-07-21 15:51:00 None
4314796888604731 2021-07-16 10:00:00 None
4314503055300781 2021-07-16 07:54:00 None
4311927432225010 2021-07-15 11:00:00 None
4306850432732710 2021-07-15 05:00:05 True
4294380853979668 2021-07-10 04:37:24 True
4291314330952987 2021-07-09 04:27:44 True
4290937374324016 2021-07-09 02:02:31 True
4288189624598791 2021-07-08 03:30:03 True
4284852408265846 2021-07-07 01:09:41 True
4279695485448205 2021-07-05 05:00:01 True
4274331009317986 2021-07-03 10:00:03 True
4273772932707127 2021-07-02 08:34:00 None
4270449279706159 2021-07-01 06:08:00 None
4268327196585034 2021-06-30 11:30:00 None
4268132489937838 2021-06-30 10:03:00 None
4253713764713044 2021-06-25 09:11:00 None
4244168652334222 2021-06-22 04:01:00 None
4239326416151779 2021-06-20 09:00:00 None
4230186287065792 2021-06-17 01:20:00 None
4226613690756385 2021-06-15 18:00:00 None
4226242920793462 2021-06-15 14:25:00 None
4225498680867886 2021-06-15 08:30:00 None
4222851827799238 2021-06-14 09:09:00 None
4217919734959114 2021-06-13 09:37:33 True
4217774244973663 2021-06-13 08:17:58 True
4214518608632560 2021-06-12 05:11:49 True
4214033132014441 2021-06-12 02:00:01 True
4194934713924283 2021-06-04 17:33:00 None
4193752747375813 2021-06-04 08:11:00 None
4191173794300375 2021-06-03 11:00:00 None
4188174317933656 2021-06-02 12:01:00 None In the absence of a UNIX timestamp, the scraper tries to make a guess by parsing text similar to the examples given in https://github.com/kevinzg/facebook-scraper/blob/master/tests/test_parse_date.py. Note the lack of seconds in that case. Tweak the code a bit to: for post in get_posts("Nintendo", pages=2, cookies="cookies.txt", options={"allow_extra_requests": False, "posts_per_page": 50}):
print(post["post_id"], post["time"], post.get("time_exact")) and then you get: 4377850938965992 2021-08-07 10:09:16 True
4377380082346411 2021-08-07 06:12:37 True
4377121662372253 2021-08-07 04:23:11 True
4371727619578324 2021-08-05 09:33:45 True
4365624383521981 2021-08-03 08:09:04 True
4365050470246039 2021-08-03 04:00:19 True
4329977513753335 2021-07-22 10:51:17 True
4314796888604731 2021-07-17 05:00:02 True
4314503055300781 2021-07-17 02:54:30 True
4311927432225010 2021-07-16 06:00:03 True
4306850432732710 2021-07-15 05:00:05 True
4294380853979668 2021-07-10 04:37:24 True
4291314330952987 2021-07-09 04:27:44 True
4290937374324016 2021-07-09 02:02:31 True
4288189624598791 2021-07-08 03:30:03 True
4284852408265846 2021-07-07 01:09:41 True
4279695485448205 2021-07-05 05:00:01 True
4274331009317986 2021-07-03 10:00:03 True
4273772932707127 2021-07-03 03:34:57 True
4270449279706159 2021-07-02 01:08:47 True
4268327196585034 2021-07-01 06:30:03 True
4268132489937838 2021-07-01 05:03:44 True
4253713764713044 2021-06-26 04:11:55 True
4244168652334222 2021-06-22 23:01:20 True
4239326416151779 2021-06-21 04:00:20 True
4230186287065792 2021-06-17 20:20:18 True
4226613690756385 2021-06-16 13:00:00 True
4226242920793462 2021-06-16 09:25:25 True
4225498680867886 2021-06-16 03:30:00 True
4222851827799238 2021-06-15 04:09:22 True
4217919734959114 2021-06-13 09:37:33 True
4217774244973663 2021-06-13 08:17:58 True
4214518608632560 2021-06-12 05:11:49 True
4214033132014441 2021-06-12 02:00:01 True
4194934713924283 2021-06-05 12:33:32 True
4193752747375813 2021-06-05 03:11:22 True
4191173794300375 2021-06-04 06:00:35 True
4188174317933656 2021-06-03 07:01:46 True
4188053201279101 2021-06-03 06:00:31 True
4187481031336318 2021-06-03 02:00:01 True
4168993589851729 2021-05-28 01:05:00 True
4151037621647326 2021-05-22 05:29:47 True
4148574141893674 2021-05-21 10:31:31 True
4144965658921189 2021-05-20 04:30:08 True
4141878672563221 2021-05-19 05:00:02 True
4128969113854177 2021-05-15 10:00:01 True
4126036390814116 2021-05-14 08:00:00 True
4112996902118065 2021-05-10 05:00:01 True
4106669112750844 2021-05-08 10:00:38 True
4088623351222087 2021-05-02 05:04:28 True
4085738151510607 2021-05-01 05:20:30 True
4085430424874713 2021-05-01 03:37:20 True So maybe it's determined by whether the page includes a certain type of post |
@neon-ninja thanks for helping to debug. I've spent the past few hours playing around with this too, and like you, I found that sometimes, FB will return a unix timestamp whereas other times, it won't.
|
In my test above, increasing the posts_per_page improved the chance of getting unix timestamps. Do you not get the same? Which page/profile/group doesn't serve unix timestamps for you? |
Here's my command:
Results:
Interestingly, even when unix timestamp is not being returned, I get the seconds. I've tried different combinations: having cookies, not having cookies; setting post_per_page to 10/50/other numbers, pages=2/50/100, to no avail: all the timestamps are not unix timestamps. |
time_exact isn't part of the library, it was a one-off modification I did locally to test with. If you're getting seconds, you're getting UNIX timestamps. |
Further to my above comment: I am getting an error running the code I pasted above. Not sure if related?
|
That's just a warning, not an error. Probably safe to ignore. |
Thanks.
|
Sure, d4be429 should do it |
Many many many thanks! One last question if you don't mind (I'm still a beginner): is this change automatically merged into pip library (i.e. I can just do |
No. You'd need to install it from github. Like so: |
Got it to work with your |
Will |
No, it would be |
One more q: is there any difference in what is returned if you are authenticated vs non-authenticated? I know that if you are non-authenticated, you get the unix timestamps, but if authenticated, then less likely to get unix timestamps. If I only scrape without authentication going forward, is there any content/values/etc I would miss? |
You're more likely to run into a LoginRequired exception if you scrape unauthenticated |
If I use authenticated: is there any chance that any personal data related to the authenticated cookie/FB user will be printed onto some output (ex: maybe the userid of that FB user will be in the |
Probably not |
Lame question but how do I change the timezone used to give post's timestamp by post["time"]?
If I used as it is, the post time shows tomorrow's date lol
The text was updated successfully, but these errors were encountered: