-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathOpen Questions
48 lines (41 loc) · 2.78 KB
/
Open Questions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Are we using streaming or REST endpoints when collecting data from twitter; Or can we use tweeQL?
Depends on the Spike results (should be a small spike)
How do we store the tweets?
Probably with a no-sql. Can discuss about it as well.
How do we filter noise(tweets that might be advertisements etc) from the stream of tweets?
http://www.quora.com/How-do-you-filter-out-the-noise-on-Twitter-lists.
Can ask suggestions to the Professor after we have sufficient references.
What is the location reference we are relying on?
(Yago/GeoNames)
GeoNames comes with a REST API support for retrieving location suggestions. Since it supports REST, language support isn't a problem.
Yago on the other hand is an ontology knowledge base which has relationship lifo of almost anything available in web. It can be downloaded and used in Python.
Is the project supporting multilingual tweets? (I18n or any other translator api?)
Nope
Do we consider friends' location while predicting ?
Yes, friends’ home location.
What data(some clean data/ tweets from users who have enabled their geolocation tracking) are we gonna use for comparison purpose?
Some standard data set with geolocation info. Need to look for sources.
Are we considering fuzzy matching as well? -
there are out of the box fuzzy matching libraries available in Python. Trial & Error those and tweak those libraries if necessary.
Are we considering n-gram matching?
Yes we are thinking of using n gram matching
Are we restricting to city/state? (Do we involve prediction of POI?)
POI
Are we providing a web interface to make something configurable & how are we gonna display the results(in case of comparison)?
Yes, thinking about visualizing using D3 if time permits.
How are we leveraging hash tags?
With or without??? It depends.
What are the ranking techniques that we are proposing?
Need to use the reference paper published in ACNOM_2014.
We are not considering tweets when users are traveling.
We are not considering multilingual tweets.
For training purpose, we will stick to standard data sets without noise.
We are considering prediction of Point of Interest.
Final results using a web interface with maps.
Our algorithm should consider friends' location and user's previous tweets(for predicting location).
Our proposal might mainly depend finding social network of users and assigning weightage for friends. Also we will assign weightage for friendships and the algorithms for weightage can be discussed amongst us and validated with the professor.
We use a predefined dictionary of locations like yago. We enable fuzzy matching for matching tweets with the dictionary. Also we consider n-gram matching for comparison with the tweets. We provide the results using some form of visualization.
SPOT,
GLITTER,
Ramachandran Paper.
Cheng Paper (Latest).