The parler parser is used to parse parler HTML posts and user profiles. Parler post dumps can be found from here.
Refer to here
import glob
from parler.parser.postParser import PostParser
from parler.dataType.post import Post
files = glob.glob('posts/*')
data = []
for file in files:
post = PostParser(file).parse()
if (post is not None):
data.append(post.convert())
print(data)
from parler.parser.profilePageParser import ProfilePageParser
file = r".\profile\00KimPossible00\posts\index.html"
timestamp = 20201124075219
profilePage = ProfilePageParser(file, timestamp)
user, posts = profilePage.parse()
print(user.convert())
print()
for post in posts:
print(post.convert())
print()
You should get the same results as shown in sample_output.
-
Determine what type of post we are dealing with:
- New Post
- Echoed Post
- Echoed Post with Reply
- Echoed Post with Root Echo and No Reply
- Echoed Post with Root Echo and Reply
-
If
new post
, parse the only post asmain post
else parse thereply
post asmain post
. -
If not
new post
, parse theechoed post
. -
If
echoed post
orechoed post with root echo and no reply
:- Use the "Echoed by ... " line to fill out
main
post info with theuser
andcreated_at
- Grab
username
from the meta information stored in the header. - No profile badge can be found in the post this way.
- The
comment_count
,echo_count
,upvote_count
belongs to the echoed post.
- Use the "Echoed by ... " line to fill out
-
Else:
- The
comment_count
,echo_count
,upvote_count
belongs to themain
post.
- The
-
If
Echoed Post with Root Echo and No Reply
orEchoed Post with Root Echo and Reply
:- Parse the
first
post for theroot echo
.
- Parse the