Skip to content

Question about quantizing term weights #982

Answered by MXueguang
kwang2049 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @kwang2049
take uniCOIL as example,
the term weights generated by the model are usually in range 0-5 as float.
Py/Anserini only accepts integer weights for corpus, thus we quantitize the float in range 0-5 to integers in range 0-255.
If indexing the term weights without quantization, the floating points will be rounded into integers directly.
Integer 0-5 loss many infomation while describing a document v.s. integer 0-255

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@kwang2049
Comment options

@MXueguang
Comment options

Answer selected by kwang2049
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #981 on February 03, 2022 00:06.