You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lets say you have an attribute within your data which is a blob of text. For you to search through that text is a full-scan at present in egraphdb. In order to avoid a full-scan which is impractical, how about creating a simple inverted index from that text and making it keyword-searchable.
Potential Steps:
Tokenize
Drop common words and retain only the useful ones. Lets just say create another table which shall have such words, which can then be loaded by egraphdb in memory for quick access.
Simple spelling correction would be useful too.
Store multiple rows {keyword, sourceid} for the same data within the index table for a particular attribute. Where you could say do "select count(keyword),sum(count),sourceid from xyz where keyword in ('a', 'b') group by sourceid limit 10000". This is just a suggestion and not a strong rule.
sample table:
createtable `egraph_lookup_rindex_base` (
`key_data` varbinary(255) NOT NULL,
`id` binary(8) NOT NULL,
`count`intNOT NULL COMMENT "number of occurrence of keyword in id",
CONSTRAINT pkey PRIMARY KEY (`id`, `key_data`),
KEY `key_data` (`key_data`),
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The text was updated successfully, but these errors were encountered:
Lets say you have an attribute within your data which is a blob of text. For you to search through that text is a full-scan at present in egraphdb. In order to avoid a full-scan which is impractical, how about creating a simple inverted index from that text and making it keyword-searchable.
Potential Steps:
sample table:
The text was updated successfully, but these errors were encountered: