change post date-time timezone #416

Prabesh01 · 2021-07-31T15:12:41Z

Lame question but how do I change the timezone used to give post's timestamp by post["time"]?

If I used as it is, the post time shows tomorrow's date lol

kevinzg · 2021-08-01T00:44:46Z

You can do

import pytz
post['time'] = post['time'].replace(tzinfo=pytz.utc).astimezone(pytz.timezone('America/Lima'))

for example.

But I'm not completely sure if all the dates the scraper extract are in UTC, so you might want to double check that.

neon-ninja · 2021-08-01T21:17:18Z

The timestamp is local time, based on the timezone of your system. So, check the timezone set on your system.

Prabesh01 · 2021-08-02T00:29:26Z

Thank you :)

TowardMyth · 2021-08-09T18:42:06Z

@neon-ninja Sorry to reopen a closed issue. When pulling posts, is there any way for Facebook Scraper to get the post timestamp's timezone when scraping FB's site?

Issue: I'm assuming that the facebook scraper's timestamps are pulled directly from the FB post, and the timestamps on the FB post are based on the user's IP address. If this is true, then using proxies would mean that the timestamps may not match your system's timezone.

neon-ninja · 2021-08-10T00:03:05Z

@TowardMyth depending on how you're scraping, you might get the post time in a slightly different way. Unauthenticated requests can actually receive a UNIX timestamp. In that case, a debug log message "Got exact timestamp from publish_time" would be printed. See #273 for example. The UNIX timestamp is defined as the number of seconds since 00:00:00 UTC on 1 January 1970. So it is the same regardless of timezone. The timestamp served by https://m.facebook.com/story.php?story_fbid=306426417518211&id=100044525645708&_rdr is 1619377200, which equates to Sun Apr 25 2021 19:00:00 GMT+0000. The scraper uses datetime.fromtimestamp(timestamp) which will convert this to local time.

TowardMyth · 2021-08-10T02:54:10Z

@neon-ninja Do authenticated requests also receive a unix timestamp? If not, does it at least receive a timezone-aware datetime from FB (or just a naive non-timezone aware one)?

neon-ninja · 2021-08-10T03:43:51Z

Sometimes. I tested by adding a field to indicate whether a timestamp was exact or not, and the following code:

for post in get_posts("Nintendo", cookies="cookies.txt", options={"allow_extra_requests": False}):
    print(post["post_id"], post["time"], post.get("time_exact"))

And these were the results (GMT+12):

4377850938965992 2021-08-07 10:09:16 True
4377380082346411 2021-08-07 06:12:37 True
4377121662372253 2021-08-07 04:23:11 True
4371727619578324 2021-08-05 09:33:45 True
4365624383521981 2021-08-03 08:09:04 True
4365050470246039 2021-08-03 04:00:19 True
4329977513753335 2021-07-21 15:51:00 None
4314796888604731 2021-07-16 10:00:00 None
4314503055300781 2021-07-16 07:54:00 None
4311927432225010 2021-07-15 11:00:00 None
4306850432732710 2021-07-15 05:00:05 True
4294380853979668 2021-07-10 04:37:24 True
4291314330952987 2021-07-09 04:27:44 True
4290937374324016 2021-07-09 02:02:31 True
4288189624598791 2021-07-08 03:30:03 True
4284852408265846 2021-07-07 01:09:41 True
4279695485448205 2021-07-05 05:00:01 True
4274331009317986 2021-07-03 10:00:03 True
4273772932707127 2021-07-02 08:34:00 None
4270449279706159 2021-07-01 06:08:00 None
4268327196585034 2021-06-30 11:30:00 None
4268132489937838 2021-06-30 10:03:00 None
4253713764713044 2021-06-25 09:11:00 None
4244168652334222 2021-06-22 04:01:00 None
4239326416151779 2021-06-20 09:00:00 None
4230186287065792 2021-06-17 01:20:00 None
4226613690756385 2021-06-15 18:00:00 None
4226242920793462 2021-06-15 14:25:00 None
4225498680867886 2021-06-15 08:30:00 None
4222851827799238 2021-06-14 09:09:00 None
4217919734959114 2021-06-13 09:37:33 True
4217774244973663 2021-06-13 08:17:58 True
4214518608632560 2021-06-12 05:11:49 True
4214033132014441 2021-06-12 02:00:01 True
4194934713924283 2021-06-04 17:33:00 None
4193752747375813 2021-06-04 08:11:00 None
4191173794300375 2021-06-03 11:00:00 None
4188174317933656 2021-06-02 12:01:00 None

In the absence of a UNIX timestamp, the scraper tries to make a guess by parsing text similar to the examples given in https://github.com/kevinzg/facebook-scraper/blob/master/tests/test_parse_date.py. Note the lack of seconds in that case.

Tweak the code a bit to:

for post in get_posts("Nintendo", pages=2, cookies="cookies.txt", options={"allow_extra_requests": False, "posts_per_page": 50}):
    print(post["post_id"], post["time"], post.get("time_exact"))

and then you get:

4377850938965992 2021-08-07 10:09:16 True
4377380082346411 2021-08-07 06:12:37 True
4377121662372253 2021-08-07 04:23:11 True
4371727619578324 2021-08-05 09:33:45 True
4365624383521981 2021-08-03 08:09:04 True
4365050470246039 2021-08-03 04:00:19 True
4329977513753335 2021-07-22 10:51:17 True
4314796888604731 2021-07-17 05:00:02 True
4314503055300781 2021-07-17 02:54:30 True
4311927432225010 2021-07-16 06:00:03 True
4306850432732710 2021-07-15 05:00:05 True
4294380853979668 2021-07-10 04:37:24 True
4291314330952987 2021-07-09 04:27:44 True
4290937374324016 2021-07-09 02:02:31 True
4288189624598791 2021-07-08 03:30:03 True
4284852408265846 2021-07-07 01:09:41 True
4279695485448205 2021-07-05 05:00:01 True
4274331009317986 2021-07-03 10:00:03 True
4273772932707127 2021-07-03 03:34:57 True
4270449279706159 2021-07-02 01:08:47 True
4268327196585034 2021-07-01 06:30:03 True
4268132489937838 2021-07-01 05:03:44 True
4253713764713044 2021-06-26 04:11:55 True
4244168652334222 2021-06-22 23:01:20 True
4239326416151779 2021-06-21 04:00:20 True
4230186287065792 2021-06-17 20:20:18 True
4226613690756385 2021-06-16 13:00:00 True
4226242920793462 2021-06-16 09:25:25 True
4225498680867886 2021-06-16 03:30:00 True
4222851827799238 2021-06-15 04:09:22 True
4217919734959114 2021-06-13 09:37:33 True
4217774244973663 2021-06-13 08:17:58 True
4214518608632560 2021-06-12 05:11:49 True
4214033132014441 2021-06-12 02:00:01 True
4194934713924283 2021-06-05 12:33:32 True
4193752747375813 2021-06-05 03:11:22 True
4191173794300375 2021-06-04 06:00:35 True
4188174317933656 2021-06-03 07:01:46 True
4188053201279101 2021-06-03 06:00:31 True
4187481031336318 2021-06-03 02:00:01 True
4168993589851729 2021-05-28 01:05:00 True
4151037621647326 2021-05-22 05:29:47 True
4148574141893674 2021-05-21 10:31:31 True
4144965658921189 2021-05-20 04:30:08 True
4141878672563221 2021-05-19 05:00:02 True
4128969113854177 2021-05-15 10:00:01 True
4126036390814116 2021-05-14 08:00:00 True
4112996902118065 2021-05-10 05:00:01 True
4106669112750844 2021-05-08 10:00:38 True
4088623351222087 2021-05-02 05:04:28 True
4085738151510607 2021-05-01 05:20:30 True
4085430424874713 2021-05-01 03:37:20 True

So maybe it's determined by whether the page includes a certain type of post

TowardMyth · 2021-08-10T05:33:48Z

@neon-ninja thanks for helping to debug. I've spent the past few hours playing around with this too, and like you, I found that sometimes, FB will return a unix timestamp whereas other times, it won't.

Is there any way to force the scraper to: for every post, it must get a unix timestamp, otherwise, retry pulling this post, until it gets a unix timestamp.
Do you have any further insight around when unix timestamps will be returned by the scraper, and when they won't? (From my experimentation, it seems like this is random, but you may know more).

neon-ninja · 2021-08-10T05:42:45Z

In my test above, increasing the posts_per_page improved the chance of getting unix timestamps. Do you not get the same? Which page/profile/group doesn't serve unix timestamps for you?

TowardMyth · 2021-08-11T03:08:04Z

Here's my command:

for post in get_posts("nintendo", pages=2, options={"allow_extra_requests": False, "posts_per_page": 10}):
       print(post["post_id"], post["time"], post.get("time_exact"))

Results:

sys:1: UserWarning: A low page limit (<=2) might return no results, try increasing the limit
4388957874521965 2021-08-10 09:00:01 None
4377850938965992 2021-08-06 18:09:16 None
4377380082346411 2021-08-06 14:12:37 None
4377121662372253 2021-08-06 12:23:11 None
4371727619578324 2021-08-04 17:33:45 None
4365624383521981 2021-08-02 16:09:04 None
4365050470246039 2021-08-02 12:00:19 None
4329977513753335 2021-07-21 18:51:17 None
4314796888604731 2021-07-16 13:00:02 None
4314503055300781 2021-07-16 10:54:30 None
4311927432225010 2021-07-15 14:00:03 None
4306850432732710 2021-07-14 13:00:05 None

Interestingly, even when unix timestamp is not being returned, I get the seconds.

I've tried different combinations: having cookies, not having cookies; setting post_per_page to 10/50/other numbers, pages=2/50/100, to no avail: all the timestamps are not unix timestamps.

neon-ninja · 2021-08-11T03:15:35Z

time_exact isn't part of the library, it was a one-off modification I did locally to test with. If you're getting seconds, you're getting UNIX timestamps.

TowardMyth · 2021-08-11T03:16:02Z

Further to my above comment: I am getting an error running the code I pasted above. Not sure if related?

sys:1: UserWarning: A low page limit (<=2) might return no results, try increasing the limit
4388957874521965 2021-08-10 09:00:01 None
4377850938965992 2021-08-06 18:09:16 None
/home/xxx/.local/lib/python3.8/site-packages/facebook_scraper/facebook_scraper.py:440: UserWarning: Facebook served mbasic/noscript content unexpectedly on https://m.facebook.com/page_content_list_view/more/?page_id=119240841493711&start_cursor=%7B%22timeline_cursor%22:%22AQHRnfQjPoQy0di1sKI1zb49Fl2ip0D6sejyMvw4tMHgKUbDTIojIHkctHtInzTppBj46e3Sf0miBLIpc0p3oS368acKad9gdeTyj8r8jYioN0czrqwnrJ7zQXKR8T8LfMrW%22,%22timeline_section_cursor%22:null,%22has_next_page%22:true%7D&num_to_fetch=10&surface_type=posts_tab
  warnings.warn(
4377380082346411 2021-08-06 14:12:37 None
4377121662372253 2021-08-06 12:23:11 None
4371727619578324 2021-08-04 17:33:45 None
4365624383521981 2021-08-02 16:09:04 None
4365050470246039 2021-08-02 12:00:19 None
4329977513753335 2021-07-21 18:51:17 None
4314796888604731 2021-07-16 13:00:02 None
4314503055300781 2021-07-16 10:54:30 None
4311927432225010 2021-07-15 14:00:03 None
4306850432732710 2021-07-14 13:00:05 None

neon-ninja · 2021-08-11T03:16:55Z

That's just a warning, not an error. Probably safe to ignore.

TowardMyth · 2021-08-11T03:18:28Z

Thanks.

post["time"] returns a human-readable timestamp / a datetime.datetime object, which I think is not timezone aware. Is there a way to get the actual unix timestamp (i.e. 1628651893) so I can manipulate it easier?

neon-ninja · 2021-08-11T03:24:04Z

Sure, d4be429 should do it

TowardMyth · 2021-08-11T03:25:33Z

Many many many thanks! One last question if you don't mind (I'm still a beginner): is this change automatically merged into pip library (i.e. I can just do pip install facebook-scraper --upgrade to get it)?

neon-ninja · 2021-08-11T03:27:36Z

No. You'd need to install it from github. Like so: pip install git+https://github.com/kevinzg/facebook-scraper.git

TowardMyth · 2021-08-11T03:30:13Z

Got it to work with your pip install git command. Thank you!

TowardMyth · 2021-08-11T03:36:34Z

Will post["timestamp"] return anything if the scraper could not find a unix timestamp? As well, is there some way to easily check whether a unix timestamp was returned for a post?

neon-ninja · 2021-08-11T03:37:45Z

No, it would be None in that case. You can check if it is None.

TowardMyth · 2021-08-11T03:41:00Z

One more q: is there any difference in what is returned if you are authenticated vs non-authenticated? I know that if you are non-authenticated, you get the unix timestamps, but if authenticated, then less likely to get unix timestamps. If I only scrape without authentication going forward, is there any content/values/etc I would miss?

neon-ninja · 2021-08-11T03:47:26Z

You're more likely to run into a LoginRequired exception if you scrape unauthenticated

TowardMyth · 2021-08-11T03:59:18Z

If I use authenticated: is there any chance that any personal data related to the authenticated cookie/FB user will be printed onto some output (ex: maybe the userid of that FB user will be in the posts array somewhere?)

neon-ninja · 2021-08-11T04:00:20Z

Probably not

Prabesh01 closed this as completed Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change post date-time timezone #416

change post date-time timezone #416

Prabesh01 commented Jul 31, 2021

kevinzg commented Aug 1, 2021

neon-ninja commented Aug 1, 2021

Prabesh01 commented Aug 2, 2021

TowardMyth commented Aug 9, 2021

neon-ninja commented Aug 10, 2021 •

edited

Loading

TowardMyth commented Aug 10, 2021

neon-ninja commented Aug 10, 2021 •

edited

Loading

TowardMyth commented Aug 10, 2021 •

edited

Loading

neon-ninja commented Aug 10, 2021

TowardMyth commented Aug 11, 2021 •

edited

Loading

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021 •

edited

Loading

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021 •

edited

Loading

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021

TowardMyth commented Aug 11, 2021

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021

neon-ninja commented Aug 11, 2021

change post date-time timezone #416

change post date-time timezone #416

Comments

Prabesh01 commented Jul 31, 2021

kevinzg commented Aug 1, 2021

neon-ninja commented Aug 1, 2021

Prabesh01 commented Aug 2, 2021

TowardMyth commented Aug 9, 2021

neon-ninja commented Aug 10, 2021 • edited Loading

TowardMyth commented Aug 10, 2021

neon-ninja commented Aug 10, 2021 • edited Loading

TowardMyth commented Aug 10, 2021 • edited Loading

neon-ninja commented Aug 10, 2021

TowardMyth commented Aug 11, 2021 • edited Loading

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021 • edited Loading

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021 • edited Loading

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021

TowardMyth commented Aug 11, 2021

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021

neon-ninja commented Aug 11, 2021

TowardMyth commented Aug 11, 2021

neon-ninja commented Aug 11, 2021

neon-ninja commented Aug 10, 2021 •

edited

Loading

neon-ninja commented Aug 10, 2021 •

edited

Loading

TowardMyth commented Aug 10, 2021 •

edited

Loading

TowardMyth commented Aug 11, 2021 •

edited

Loading

TowardMyth commented Aug 11, 2021 •

edited

Loading

TowardMyth commented Aug 11, 2021 •

edited

Loading