Choosing parameters for large dataset of short texts #2

bwang482 · 2018-03-01T16:05:24Z

Thanks for your great work Joe!

Following the provided notebook, I have been trying to use hlda to infer topics on a large set (~100,000 docs) of short text docs with vocab size of 15000. The sampling is very slow, took about 11 hours for 10 iterations (n_samples = 10).

From my results as well as your demo It seems level-0 only has one topic which contains all docs. It makes sense since level-0 is at the top of the hierarchy. But I still want to confirm that if I want to have 4 levels of topics with each level containing different topic/cluster assignments, I should set num_levels = 5?

Finally, may I ask how to (or if there is any intuition I can use ) choose values for alpha and gamma? Especially for inferring large set of short text docs?

Thanks again.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choosing parameters for large dataset of short texts #2

Choosing parameters for large dataset of short texts #2

bwang482 commented Mar 1, 2018

Choosing parameters for large dataset of short texts #2

Choosing parameters for large dataset of short texts #2

Comments

bwang482 commented Mar 1, 2018