-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDocument Classification Part 2 Text Processing (N-Gram Model & TF-IDF Model) by Chan Woo Kim Explained Relentlessly Deep Learning & Natural Language Processing Medium.html
36 lines (36 loc) · 76 KB
/
Document Classification Part 2 Text Processing (N-Gram Model & TF-IDF Model) by Chan Woo Kim Explained Relentlessly Deep Learning & Natural Language Processing Medium.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
<!doctype html><html lang="en"><head><script defer src="https://cdn.optimizely.com/js/16180790160.js"></script><title data-rh="true">Document Classification Part 2: Text Processing (N-Gram Model & TF-IDF Model) | by Chan Woo Kim | Explained Relentlessly: Deep Learning & Natural Language Processing | Medium</title><meta data-rh="true" charset="utf-8"/><meta data-rh="true" name="viewport" content="width=device-width,minimum-scale=1,initial-scale=1,maximum-scale=1"/><meta data-rh="true" name="theme-color" content="#000000"/><meta data-rh="true" name="twitter:app:name:iphone" content="Medium"/><meta data-rh="true" name="twitter:app:id:iphone" content="828256236"/><meta data-rh="true" property="al:ios:app_name" content="Medium"/><meta data-rh="true" property="al:ios:app_store_id" content="828256236"/><meta data-rh="true" property="al:android:package" content="com.medium.reader"/><meta data-rh="true" property="fb:app_id" content="542599432471018"/><meta data-rh="true" property="og:site_name" content="Medium"/><meta data-rh="true" property="og:type" content="article"/><meta data-rh="true" property="article:published_time" content="2018-04-19T09:58:13.362Z"/><meta data-rh="true" name="title" content="Document Classification Part 2: Text Processing (N-Gram Model & TF-IDF Model) | by Chan Woo Kim | Explained Relentlessly: Deep Learning & Natural Language Processing | Medium"/><meta data-rh="true" property="og:title" content="Document Classification Part 2: Text Processing (N-Gram Model & TF-IDF Model)"/><meta data-rh="true" property="twitter:title" content="Document Classification Part 2: Text Processing (N-Gram Model & TF-IDF Model)"/><meta data-rh="true" name="twitter:site" content="@Medium"/><meta data-rh="true" name="twitter:app:url:iphone" content="medium://p/eaa26d16c719"/><meta data-rh="true" property="al:android:url" content="medium://p/eaa26d16c719"/><meta data-rh="true" property="al:ios:url" content="medium://p/eaa26d16c719"/><meta data-rh="true" property="al:android:app_name" content="Medium"/><meta data-rh="true" name="description" content="In this article I will explain some core concepts in text processing in conducting machine learning on documents to classify them into categories. This is the part 2 of a series outlined below:"/><meta data-rh="true" property="og:description" content="In this article I will explain some core concepts in text processing in conducting machine learning on documents to classify them into…"/><meta data-rh="true" property="twitter:description" content="In this article I will explain some core concepts in text processing in conducting machine learning on documents to classify them into…"/><meta data-rh="true" property="og:url" content="https://medium.com/machine-learning-intuition/document-classification-part-2-text-processing-eaa26d16c719"/><meta data-rh="true" property="al:web:url" content="https://medium.com/machine-learning-intuition/document-classification-part-2-text-processing-eaa26d16c719"/><meta data-rh="true" property="og:image" content="https://miro.medium.com/max/1200/1*UhbeQoqOi9AaSqNnPS0TkQ.jpeg"/><meta data-rh="true" name="twitter:image:src" content="https://miro.medium.com/max/1200/1*UhbeQoqOi9AaSqNnPS0TkQ.jpeg"/><meta data-rh="true" name="twitter:card" content="summary_large_image"/><meta data-rh="true" property="article:author" content="https://chanwkim01.medium.com"/><meta data-rh="true" name="author" content="Chan Woo Kim"/><meta data-rh="true" name="robots" content="index,follow,max-image-preview:large"/><meta data-rh="true" name="referrer" content="unsafe-url"/><meta data-rh="true" name="twitter:label1" content="Reading time"/><meta data-rh="true" name="twitter:data1" content="7 min read"/><link data-rh="true" rel="search" type="application/opensearchdescription+xml" title="Medium" href="/osd.xml"/><link data-rh="true" rel="apple-touch-icon" sizes="152x152" href="https://miro.medium.com/fit/c/152/152/1*sHhtYhaCe2Uc3IU0IgKwIQ.png"/><link data-rh="true" rel="apple-touch-icon" sizes="120x120" href="https://miro.medium.com/fit/c/120/120/1*sHhtYhaCe2Uc3IU0IgKwIQ.png"/><link data-rh="true" rel="apple-touch-icon" sizes="76x76" href="https://miro.medium.com/fit/c/76/76/1*sHhtYhaCe2Uc3IU0IgKwIQ.png"/><link data-rh="true" rel="apple-touch-icon" sizes="60x60" href="https://miro.medium.com/fit/c/60/60/1*sHhtYhaCe2Uc3IU0IgKwIQ.png"/><link data-rh="true" rel="mask-icon" href="https://cdn-static-1.medium.com/_/fp/icons/Medium-Avatar-500x500.svg" color="#171717"/><link data-rh="true" rel="preconnect" href="https://glyph.medium.com" crossOrigin=""/><link data-rh="true" rel="preconnect" href="https://logx.optimizely.com"/><link data-rh="true" id="glyph_preload_link" rel="preload" as="style" type="text/css" href="https://glyph.medium.com/css/unbound.css"/><link data-rh="true" id="glyph_link" rel="stylesheet" type="text/css" href="https://glyph.medium.com/css/unbound.css"/><link data-rh="true" rel="author" href="https://chanwkim01.medium.com"/><link data-rh="true" rel="canonical" href="https://medium.com/machine-learning-intuition/document-classification-part-2-text-processing-eaa26d16c719"/><link data-rh="true" rel="alternate" href="android-app://com.medium.reader/https/medium.com/p/eaa26d16c719"/><link data-rh="true" rel="icon" href="https://miro.medium.com/1*m-R_BkNf1Qjr1YbyOIJY2w.png"/><script data-rh="true" type="application/ld+json">{"@context":"http:\u002F\u002Fschema.org","@type":"NewsArticle","image":["https:\u002F\u002Fmiro.medium.com\u002Fmax\u002F1200\u002F1*UhbeQoqOi9AaSqNnPS0TkQ.jpeg"],"url":"https:\u002F\u002Fmedium.com\u002Fmachine-learning-intuition\u002Fdocument-classification-part-2-text-processing-eaa26d16c719","dateCreated":"2018-02-23T15:32:16.956Z","datePublished":"2018-02-23T15:32:16.956Z","dateModified":"2021-06-15T07:53:32.708Z","headline":"Document Classification Part 2: Text Processing (N-Gram Model & TF-IDF Model)","name":"Document Classification Part 2: Text Processing (N-Gram Model & TF-IDF Model)","description":"In this article I will explain some core concepts in text processing in conducting machine learning on documents to classify them into categories. This is the part 2 of a series outlined below:","identifier":"eaa26d16c719","author":{"@type":"Person","name":"Chan Woo Kim","url":"https:\u002F\u002Fchanwkim01.medium.com"},"creator":["Chan Woo Kim"],"publisher":{"@type":"Organization","name":"Explained Relentlessly: Deep Learning & Natural Language Processing","url":"https:\u002F\u002Fmedium.com\u002Fmachine-learning-intuition","logo":{"@type":"ImageObject","width":60,"height":60,"url":"https:\u002F\u002Fmiro.medium.com\u002Fmax\u002F120\u002F1*1JYwCiLpV0AM-SpxKqhEFw.png"}},"mainEntityOfPage":"https:\u002F\u002Fmedium.com\u002Fmachine-learning-intuition\u002Fdocument-classification-part-2-text-processing-eaa26d16c719","isAccessibleForFree":"False","hasPart":{"@type":"WebPageElement","isAccessibleForFree":"False","cssSelector":".meteredContent"}}</script><link rel="preload" href="https://cdn.optimizely.com/js/16180790160.js" as="script"><style type="text/css" data-fela-rehydration="259" data-fela-type="STATIC">html{box-sizing:border-box}*, *:before, *:after{box-sizing:inherit}body{margin:0;padding:0;text-rendering:optimizeLegibility;-webkit-font-smoothing:antialiased;color:rgba(0,0,0,0.8);position:relative;min-height:100vh}h1, h2, h3, h4, h5, h6, dl, dd, ol, ul, menu, figure, blockquote, p, pre, form{margin:0}menu, ol, ul{padding:0;list-style:none;list-style-image:none}main{display:block}a{color:inherit;text-decoration:none}a, button, input{-webkit-tap-highlight-color:transparent}img, svg{vertical-align:middle}button{background:transparent;overflow:visible}button, input, optgroup, select, textarea{margin:0}:root{--reach-tabs:1;--reach-menu-button:1}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="KEYFRAME">@-webkit-keyframes k1{from{filter:hue-rotate(0deg)}to{filter:hue-rotate(360deg)}}@-moz-keyframes k1{from{filter:hue-rotate(0deg)}to{filter:hue-rotate(360deg)}}@keyframes k1{from{filter:hue-rotate(0deg)}to{filter:hue-rotate(360deg)}}@-webkit-keyframes k2{0%{opacity:0;transform:translateY(-60px)}100%{opacity:1;transform:translateY(0px)}}@-moz-keyframes k2{0%{opacity:0;transform:translateY(-60px)}100%{opacity:1;transform:translateY(0px)}}@keyframes k2{0%{opacity:0;transform:translateY(-60px)}100%{opacity:1;transform:translateY(0px)}}@-webkit-keyframes k3{0%{opacity:1;transform:translateY(0px)}100%{opacity:0;transform:translateY(-60px)}}@-moz-keyframes k3{0%{opacity:1;transform:translateY(0px)}100%{opacity:0;transform:translateY(-60px)}}@keyframes k3{0%{opacity:1;transform:translateY(0px)}100%{opacity:0;transform:translateY(-60px)}}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="RULE">.a{font-family:medium-content-sans-serif-font, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, Cantarell, "Open Sans", "Helvetica Neue", sans-serif}.b{font-weight:400}.c{background-color:rgba(255, 255, 255, 1)}.l{height:100vh}.m{width:100vw}.n{display:flex}.o{align-items:center}.p{justify-content:center}.q{height:25px}.r{fill:rgba(41, 41, 41, 1)}.s{display:block}.t{margin-bottom:36px}.v{width:100%}.z{flex:0 0 auto}.ab{justify-self:flex-end}.ac{z-index:500}.ae{visibility:hidden}.af{min-height:115px}.ai{flex-direction:column}.aj{background-color:#84f984}.ak{display:none}.am{white-space:nowrap}.an{border-bottom:none}.ao{position:relative}.au{max-width:1192px}.av{min-width:0}.aw{height:62px}.ax{flex-direction:row}.ay{flex:1 0 auto}.az{margin-right:16px}.ba{font-family:sohne, "Helvetica Neue", Helvetica, Arial, sans-serif}.bb{font-size:14px}.bc{line-height:20px}.bd{color:rgba(0, 75, 0, 1)}.be{padding:7px 16px 9px}.bf{background:0}.bg{fill:rgba(0, 75, 0, 1)}.bh{border-color:rgba(0, 99, 0, 1)}.bm:disabled{cursor:inherit}.bn:disabled{opacity:0.3}.bo:disabled:hover{color:rgba(0, 75, 0, 1)}.bp:disabled:hover{fill:rgba(0, 75, 0, 1)}.bq:disabled:hover{border-color:rgba(0, 99, 0, 1)}.br{border-radius:99em}.bs{border-width:1px}.bt{border-style:solid}.bu{box-sizing:border-box}.bv{display:inline-block}.bw{text-decoration:none}.bx{margin-left:0px}.by{color:rgba(0, 121, 22, 1)}.bz{font-size:inherit}.ca{border:inherit}.cb{font-family:inherit}.cc{letter-spacing:inherit}.cd{font-weight:inherit}.ce{padding:0}.cf{margin:0}.cg:disabled{cursor:default}.ch:disabled{color:rgba(163, 208, 162, 0.5)}.ci:disabled{fill:rgba(163, 208, 162, 0.5)}.cj{justify-content:space-between}.cp{align-items:flex-start}.cq{margin-bottom:0px}.cr{margin-top:-32px}.cs{flex-wrap:wrap}.cv{margin-top:32px}.cw{margin-right:24px}.cy{height:35px}.cz{width:35px}.da{margin-bottom:-3px}.db{margin-left:14px}.dc{margin-top:-3px}.dd{fill:rgba(0, 47, 0, 1)}.de{padding-top:1px}.df{height:70px}.dg{font-size:16px}.dh{line-height:24px}.di:before{margin-bottom:-10px}.dj:before{content:""}.dk:before{display:table}.dl:before{border-collapse:collapse}.dm:after{margin-top:-6px}.dn:after{content:""}.do:after{display:table}.dp:after{border-collapse:collapse}.dq{color:rgba(117, 117, 117, 1)}.dr{margin-right:12px}.ds{display:inline-flex}.dt{color:inherit}.du{fill:inherit}.dx:disabled{color:rgba(117, 117, 117, 1)}.dy:disabled{fill:rgba(117, 117, 117, 1)}.dz{margin-left:12px}.ec{left:0}.ed{opacity:0}.ee{position:fixed}.ef{right:0}.eg{top:0}.ei{height:60px}.el{height:100%}.eo{color:rgba(85, 124, 255, 1)}.ep{fill:rgba(85, 124, 255, 1)}.eq{border-color:rgba(85, 124, 255, 1)}.eu:disabled:hover{color:rgba(85, 124, 255, 1)}.ev:disabled:hover{fill:rgba(85, 124, 255, 1)}.ew:disabled:hover{border-color:rgba(85, 124, 255, 1)}.ex{margin-left:16px}.ey{padding-left:24px}.ez{padding-right:24px}.fa{margin-left:auto}.fb{margin-right:auto}.fc{max-width:728px}.fd{background:rgba(255, 255, 255, 1)}.fe{border:1px solid rgba(230, 230, 230, 1)}.ff{border-radius:4px}.fg{box-shadow:0 1px 4px rgba(230, 230, 230, 1)}.fh{max-height:100vh}.fi{overflow-y:auto}.fj{position:absolute}.fk{top:calc(100vh + 100px)}.fl{bottom:calc(100vh + 100px)}.fm{width:10px}.fn{pointer-events:none}.fo{word-break:break-word}.fp{word-wrap:break-word}.fq:after{display:block}.fr:after{clear:both}.fs{max-width:680px}.ft{line-height:1.23}.fu{letter-spacing:0}.fv{font-style:normal}.fw{font-weight:700}.gr{margin-bottom:-0.27em}.gs{color:rgba(41, 41, 41, 1)}.gw{border-radius:50%}.gx{height:28px}.gy{width:28px}.gz{margin-left:8px}.ha{fill:rgba(61, 61, 61, 1)}.hb{margin-top:-2px}.hc{padding-left:4px}.hd{margin:0 4px}.he{margin:0 7px}.hf{align-items:flex-end}.ho{margin:0 6px 0 7px}.hp{clear:both}.hq{margin-top:33px}.hs{cursor:zoom-in}.ht{z-index:auto}.hv{max-width:100%}.hw{height:auto}.hx{margin-top:10px}.hy{text-align:center}.ib{text-decoration:underline}.ic{line-height:1.58}.id{letter-spacing:-0.004em}.ie{font-family:charter, Georgia, Cambria, "Times New Roman", Times, serif}.iz{margin-bottom:-0.46em}.bi:hover{color:rgba(0, 47, 0, 1)}.bj:hover{fill:rgba(0, 47, 0, 1)}.bk:hover{border-color:rgba(0, 47, 0, 1)}.bl:hover{cursor:pointer}.dv:hover{color:rgba(25, 25, 25, 1)}.dw:hover{fill:rgba(25, 25, 25, 1)}.er:hover{color:rgba(76, 108, 224, 1)}.es:hover{fill:rgba(76, 108, 224, 1)}.et:hover{border-color:rgba(76, 108, 224, 1)}.hu:focus{transform:scale(1.01)}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="RULE" media="all and (min-width: 1080px)">.d{display:none}.w{display:flex}.at{margin:0 64px}.gn{font-size:46px}.go{margin-top:0.6em}.gp{line-height:56px}.gq{letter-spacing:-0.011em}.hm{margin-left:30px}.iv{font-size:21px}.iw{margin-top:2em}.ix{line-height:32px}.iy{letter-spacing:-0.003em}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="RULE" media="all and (max-width: 1079.98px)">.e{display:none}.hl{margin-left:30px}.hz{margin-left:auto}.ia{text-align:center}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="RULE" media="all and (max-width: 903.98px)">.f{display:none}.hk{margin-left:30px}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="RULE" media="all and (max-width: 727.98px)">.g{display:none}.u{margin-bottom:20px}.ag{box-shadow:inset 0 -1px 0 rgba(230, 230, 230, 1)}.ah{min-height:230px}.al{display:block}.ck{min-height:98px}.cl{display:flex}.cm{align-items:flex-start}.cn{flex-direction:column}.co{justify-content:flex-end}.ct{margin-bottom:28px}.cu{margin-top:0px}.cx{margin-top:28px}.ea{border-top:1px solid rgba(230, 230, 230, 1)}.eb{border-bottom:1px solid rgba(230, 230, 230, 1)}.em{align-items:center}.en{flex:1 0 auto}.gu{margin-top:32px}.gv{flex-direction:column-reverse}.hi{margin-bottom:30px}.hj{margin-left:0px}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="RULE" media="all and (max-width: 551.98px)">.h{display:none}.ap{margin:0 24px}.ej{display:block}.fx{font-size:32px}.fy{margin-top:0.64em}.fz{line-height:40px}.ga{letter-spacing:-0.016em}.gt{margin-top:32px}.hg{margin-bottom:30px}.hh{margin-left:0px}.if{font-size:18px}.ig{margin-top:1.56em}.ih{line-height:28px}.ii{letter-spacing:-0.003em}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="RULE" media="all and (min-width: 904px) and (max-width: 1079.98px)">.i{display:none}.x{display:flex}.as{margin:0 64px}.gj{font-size:46px}.gk{margin-top:0.6em}.gl{line-height:56px}.gm{letter-spacing:-0.011em}.ir{font-size:21px}.is{margin-top:2em}.it{line-height:32px}.iu{letter-spacing:-0.003em}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="RULE" media="all and (min-width: 728px) and (max-width: 903.98px)">.j{display:none}.y{display:flex}.ar{margin:0 48px}.gf{font-size:46px}.gg{margin-top:0.6em}.gh{line-height:56px}.gi{letter-spacing:-0.011em}.in{font-size:21px}.io{margin-top:2em}.ip{line-height:32px}.iq{letter-spacing:-0.003em}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="RULE" media="all and (min-width: 552px) and (max-width: 727.98px)">.k{display:none}.aq{margin:0 24px}.ek{display:block}.gb{font-size:32px}.gc{margin-top:0.64em}.gd{line-height:40px}.ge{letter-spacing:-0.016em}.ij{font-size:18px}.ik{margin-top:1.56em}.il{line-height:28px}.im{letter-spacing:-0.003em}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="RULE" media="print">.hn{display:none}</style><style type="text/css" data-fela-rehydration="259" data-fela-type="RULE" media="(prefers-reduced-motion: no-preference)">.eh{animation:k3 .2s ease-in-out both}.hr{transition:transform 300ms cubic-bezier(0.2, 0, 0.2, 1)}</style></head><body><div id="root"><div class="a b c"><div class="d e f g h i j k"></div><script>document.domain = document.domain;</script><div class="s"><div class="t s u"><div class="af s ag ah"><div class="n ai aj"><div class="ak al"><div class="an s ao ac"><div class="n p"><div class="ap aq ar as at au av v"><div class="aw n o"><div class="n o ax ay"><div class="az s"><span><button class="ba b bb bc bd be bf bg bh bi bj bk bl bm bn bo bp bq br bs bt bu bv bw">Get started</button></span></div><div class="am"><div class="bx ak al"><span class="ba b bb bc by"><a href="https://rsci.app.link/?%24canonical_url=https%3A%2F%2Fmedium.com%2Fp%2Feaa26d16c719&~feature=LoOpenInAppButton&~channel=ShowPostUnderCollection&~stage=mobileNavBar&source=post_page-----eaa26d16c719--------------------------------" class="bd bg bz ca cb cc cd ce cf bl bi bj cg ch ci" rel="noopener follow">Open in app</a></span></div></div></div><a href="https://medium.com/?source=post_page-----eaa26d16c719--------------------------------" rel="noopener follow" aria-label="Homepage"><svg viewBox="0 0 1043.63 592.71" class="q bg"><g data-name="Layer 2"><g data-name="Layer 1"><path d="M588.67 296.36c0 163.67-131.78 296.35-294.33 296.35S0 460 0 296.36 131.78 0 294.34 0s294.33 132.69 294.33 296.36M911.56 296.36c0 154.06-65.89 279-147.17 279s-147.17-124.94-147.17-279 65.88-279 147.16-279 147.17 124.9 147.17 279M1043.63 296.36c0 138-23.17 249.94-51.76 249.94s-51.75-111.91-51.75-249.94 23.17-249.94 51.75-249.94 51.76 111.9 51.76 249.94"></path></g></g></svg></a></div></div></div></div></div><div class="n p"><div class="ap aq ar as at au av v"><div class="af n o ax cj ck cl cm cn co"><div class="v n cp cj"><div class="n v"><div class="cq cr v n o ax cs ct cu cl cm cn"><div class="cv cw s cx"><a href="/machine-learning-intuition?source=post_page-----eaa26d16c719--------------------------------" rel="noopener follow" aria-label="Publication Homepage"><div class="cy cz s"><img alt="Explained Relentlessly: Deep Learning & Natural Language Processing" class="" src="https://miro.medium.com/max/70/1*1JYwCiLpV0AM-SpxKqhEFw.png" width="35" height="35"/></div></a></div></div></div><div class="w x y k h z ab o ac ae"><p class="ba b bb bc by"><span><a class="bd bg bz ca cb cc cd ce cf bl bi bj cg ch ci" rel="noopener follow" href="/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2Fmachine-learning-intuition%2Fdocument-classification-part-2-text-processing-eaa26d16c719&source=post_page-----eaa26d16c719---------------------nav_reg-----------">Sign in</a></span></p><div class="da db dc cw s"><span><button class="ba b bb bc bd be bf bg bh bi bj bk bl bm bn bo bp bq br bs bt bu bv bw">Get started</button></span></div><a href="https://medium.com/?source=post_page-----eaa26d16c719--------------------------------" rel="noopener follow" aria-label="Homepage"><svg viewBox="0 0 1043.63 592.71" class="q dd"><g data-name="Layer 2"><g data-name="Layer 1"><path d="M588.67 296.36c0 163.67-131.78 296.35-294.33 296.35S0 460 0 296.36 131.78 0 294.34 0s294.33 132.69 294.33 296.36M911.56 296.36c0 154.06-65.89 279-147.17 279s-147.17-124.94-147.17-279 65.88-279 147.16-279 147.17 124.9 147.17 279M1043.63 296.36c0 138-23.17 249.94-51.76 249.94s-51.75-111.91-51.75-249.94 23.17-249.94 51.75-249.94 51.76 111.9 51.76 249.94"></path></g></g></svg></a></div></div></div></div></div></div><div class="s"><div class="n p"><div class="ap aq ar as at au av v"><div><div class="de df n o"><div class="s"><span class="ba b dg dh di dj dk dl dm dn do dp dq"><div class="n o"><div class="dr ds ai"><span class="ba b bb bc dq"><a class="dt du bz ca cb cc cd ce cf bl dv dw cg dx dy" rel="noopener follow" href="/machine-learning-intuition/followers?source=post_page-----eaa26d16c719--------------------------------">253 Followers</a></span></div><div class="s h"></div><div class="dz n ai"><a href="/machine-learning-intuition/about?source=post_page-----eaa26d16c719--------------------------------" class="dt du bz ca cb cc cd ce cf bl dv dw cg dx dy" rel="noopener follow">About</a></div></div></span></div></div></div></div></div></div></div><div class="ea eb c ec ed ee ef eg ae ac eh"><div class="n p"><div class="ap aq ar as at au av v"><div class="ei v ej ek j i d eg ac"><div class="el n o"><div class="ak cl em en"><span><button class="ba b bb bc eo be bf ep eq er es et bl bm bn eu ev ew br bs bt bu bv bw">Get started</button></span><div class="ex ak al"><span class="ba b bb bc dq"><a href="https://rsci.app.link/?%24canonical_url=https%3A%2F%2Fmedium.com%2Fp%2Feaa26d16c719&~feature=LoOpenInAppButton&~channel=ShowPostUnderCollection&~stage=mobileNavBar&source=post_page-----eaa26d16c719--------------------------------" class="eo ep bz ca cb cc cd ce cf bl er es cg ch ci" rel="noopener follow">Open in app</a></span></div></div><a href="https://medium.com/?source=post_page-----eaa26d16c719--------------------------------" rel="noopener follow" aria-label="Homepage"><svg viewBox="0 0 1043.63 592.71" class="q r"><g data-name="Layer 2"><g data-name="Layer 1"><path d="M588.67 296.36c0 163.67-131.78 296.35-294.33 296.35S0 460 0 296.36 131.78 0 294.34 0s294.33 132.69 294.33 296.36M911.56 296.36c0 154.06-65.89 279-147.17 279s-147.17-124.94-147.17-279 65.88-279 147.16-279 147.17 124.9 147.17 279M1043.63 296.36c0 138-23.17 249.94-51.76 249.94s-51.75-111.91-51.75-249.94 23.17-249.94 51.75-249.94 51.76 111.9 51.76 249.94"></path></g></g></svg></a></div></div></div></div></div></div><article class="meteredContent"><section class="ey ez fa fb v fc bu s"></section><span class="s"></span><div><div class="fj ec fk fl fm fn"></div><section class="fo fp fq dn fr"><div class="n p"><div class="ap aq ar as at fs av v"><div class=""><h1 id="750f" class="ft fu fv ba fw fx fy fz ga gb gc gd ge gf gg gh gi gj gk gl gm gn go gp gq gr gs">Document Classification Part 2: <strong class="cd">Text Processing (N-Gram Model & TF-IDF Model)</strong></h1><div class="cv"><div class="n cj gt gu gv"><div class="o n"><div><a href="https://chanwkim01.medium.com/?source=post_page-----eaa26d16c719--------------------------------" rel="noopener follow"><img alt="Chan Woo Kim" class="s gw gx gy" src="https://miro.medium.com/fit/c/56/56/1*cTTNRpwyB9O1a0yrYXcRtQ.jpeg" width="28" height="28"/></a></div><div class="gz v n cs"><div class="n"><div style="flex:1"><span class="ba b bb bc gs"><a href="https://chanwkim01.medium.com/?source=post_page-----eaa26d16c719--------------------------------" class="" rel="noopener follow"><p class="ba b bb bc eo">Chan Woo Kim</p></a></span></div></div><span class="ba b bb bc dq"><a class="" rel="noopener follow" href="/machine-learning-intuition/document-classification-part-2-text-processing-eaa26d16c719?source=post_page-----eaa26d16c719--------------------------------"><p class="ba b bb bc dq"><span class="hd"></span>Feb 23, 2018<span class="he">·</span>7 min read<svg class="ha hb hc" width="15" height="15" viewBox="0 0 15 15" aria-label="Member only content"><path d="M7.44 2.32c.03-.1.09-.1.12 0l1.2 3.53a.29.29 0 0 0 .26.2h3.88c.11 0 .13.04.04.1L9.8 8.33a.27.27 0 0 0-.1.29l1.2 3.53c.03.1-.01.13-.1.07l-3.14-2.18a.3.3 0 0 0-.32 0L4.2 12.22c-.1.06-.14.03-.1-.07l1.2-3.53a.27.27 0 0 0-.1-.3L2.06 6.16c-.1-.06-.07-.12.03-.12h3.89a.29.29 0 0 0 .26-.19l1.2-3.52z"></path></svg></p></a></span></div></div><div class="n hf hg hh hi hj hk hl hm hn"><div class="n o"><div class="bv" aria-hidden="false" aria-describedby="postFooterSocialMenu" aria-labelledby="postFooterSocialMenu"><div><div class="bv" role="tooltip" aria-hidden="false"><button class="dt du bz ca cb cc cd ce cf bl dv dw cg dx dy" aria-controls="postFooterSocialMenu" aria-expanded="false" aria-label="Share Post"><svg width="25" height="25" class="r"><g fill-rule="evenodd"><path d="M15.6 5a.42.42 0 0 0 .17-.3.42.42 0 0 0-.12-.33l-2.8-2.79a.5.5 0 0 0-.7 0l-2.8 2.8a.4.4 0 0 0-.1.32c0 .12.07.23.16.3h.02a.45.45 0 0 0 .57-.04l2-2V10c0 .28.23.5.5.5s.5-.22.5-.5V2.93l2.02 2.02c.08.07.18.12.3.13.11.01.21-.02.3-.08v.01"></path><path d="M18 7h-1.5a.5.5 0 0 0 0 1h1.6c.5 0 .9.4.9.9v10.2c0 .5-.4.9-.9.9H6.9a.9.9 0 0 1-.9-.9V8.9c0-.5.4-.9.9-.9h1.6a.5.5 0 0 0 .35-.15A.5.5 0 0 0 9 7.5a.5.5 0 0 0-.15-.35A.5.5 0 0 0 8.5 7H7a2 2 0 0 0-2 2v10c0 1.1.9 2 2 2h11a2 2 0 0 0 2-2V9a2 2 0 0 0-2-2"></path></g></svg></button></div></div></div><div class="ho s"></div><div class="s ay"></div></div></div></div></div></div></div></div><div class="hp v"><figure class="hq hp v paragraph-image"><div role="button" tabindex="0" class="hr hs ao ht v hu"><img alt="" class="v hv hw" src="https://miro.medium.com/max/9792/1*UhbeQoqOi9AaSqNnPS0TkQ.jpeg" width="4896" height="3264" role="presentation"/></div><figcaption class="hx hy fc fa fb hz ia ba b bb bc dq">Photo by <a href="https://unsplash.com/photos/O453M2Liufs?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText" class="dt ib" target="_blank" rel="noopener ugc nofollow">Luca Bravo</a> on <a href="https://unsplash.com/search/photos/nature?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText" class="dt ib" target="_blank" rel="noopener ugc nofollow">Unsplash</a></figcaption></figure></div><div class="n p"><div class="ap aq ar as at fs av v"><p id="260f" class="ic id fv ie b if ig ih ii ij ik il im in io ip iq ir is it iu iv iw ix iy iz fo gs">In this article I will explain some core concepts in text processing in conducting machine learning on documents to classify them into categories. This is the part 2 of a series outlined below:</p><p id="3fcd" class="ic id fv ie b if ig ih ii ij ik il im in io ip iq ir is it iu iv iw ix iy iz fo gs"><strong class="ie fw">Part 1: Intuition & How Do We Work With Documents?</strong></p></div></div></section></div></article></div></div></div><script>window.__BUILD_ID__="main-20211022-004504-3e44848916"</script><script>window.__GRAPHQL_URI__ = "https://medium.com/_/graphql"</script><script>window.__PRELOADED_STATE__ = {"algolia":{"queries":{}},"auroraPage":{"isAuroraPageEnabled":true},"bookReader":{"assets":{},"reader":{"currentAsset":null,"currentGFI":null,"settingsPanelIsOpen":false,"settings":{"fontFamily":"CHARTER","fontScale":"M","publisherStyling":false,"textAlignment":"start","theme":"White","lineSpacing":0,"wordSpacing":0,"letterSpacing":0},"internalNavCounter":0,"currentSelection":null}},"cache":{"experimentGroupSet":true,"reason":"","group":"enabled","tags":["group-edgeCachePosts","post-eaa26d16c719","user-e29281884d6","collection-a4d68c1f6803"],"serverVariantState":"5780834a9de343a07fe9e3c3f240f8188daa49337cfd5f86596783c9a1fdd98d","middlewareEnabled":true,"cacheStatus":"DYNAMIC","vary":[]},"client":{"hydrated":false,"isUs":false,"isNativeMedium":false,"isSafariMobile":false,"isSafari":true,"routingEntity":{"type":"DEFAULT","explicit":false}},"debug":{"requestId":"940e0e00-d6dc-431f-8c7f-911e3a11e518","hybridDevServices":[],"showBookReaderDebugger":false,"originalSpanCarrier":{"ot-tracer-spanid":"33c0aa0864969dfc","ot-tracer-traceid":"469e4052aaa16930","ot-tracer-sampled":"true"}},"multiVote":{"clapsPerPost":{}},"navigation":{"branch":{"show":null,"hasRendered":null,"blockedByCTA":false},"hideGoogleOneTap":false,"hasRenderedGoogleOneTap":null,"hasRenderedAlternateUserBanner":null,"currentLocation":"https:\u002F\u002Fmedium.com\u002Fmachine-learning-intuition\u002Fdocument-classification-part-2-text-processing-eaa26d16c719","host":"medium.com","hostname":"medium.com","referrer":"","hasSetReferrer":false,"susiModal":{"step":null,"operation":"register"},"postRead":false,"queryString":"","currentHash":""},"tracing":{},"config":{"nodeEnv":"production","version":"main-20211022-004504-3e44848916","isTaggedVersion":false,"isMediumDotApp":false,"isMediumDotAppVariant":false,"target":"production","productName":"Medium","publicUrl":"https:\u002F\u002Fcdn-client.medium.com\u002Flite","authDomain":"medium.com","authGoogleClientId":"216296035834-k1k6qe060s2tp2a2jam4ljdcms00sttg.apps.googleusercontent.com","favicon":"production","glyphUrl":"https:\u002F\u002Fglyph.medium.com","branchKey":"key_live_ofxXr2qTrrU9NqURK8ZwEhknBxiI6KBm","lightStep":{"name":"lite-web","host":"lightstep.medium.systems","token":"ce5be895bef60919541332990ac9fef2","appVersion":"main-20211022-004504-3e44848916","disableClientReporting":false},"algolia":{"appId":"MQ57UUUQZ2","apiKeySearch":"394474ced050e3911ae2249ecc774921","indexPrefix":"medium_","host":"-dsn.algolia.net"},"recaptchaKey":"6Lfc37IUAAAAAKGGtC6rLS13R1Hrw_BqADfS1LRk","recaptcha3Key":"6Lf8R9wUAAAAABMI_85Wb8melS7Zj6ziuf99Yot5","datadog":{"applicationId":"6702d87d-a7e0-42fe-bbcb-95b469547ea0","clientToken":"pub853ea8d17ad6821d9f8f11861d23dfed","rumToken":"pubf9cc52896502b9413b68ba36fc0c7162","context":{"deployment":{"target":"production","tag":"main-20211022-004504-3e44848916","commit":"3e44848916d5fa7cf80e3016f228d434af8e3464"}},"datacenter":"us"},"googleAnalyticsCode":"UA-24232453-2","googlePay":{"apiVersion":"2","apiVersionMinor":"0","merchantId":"BCR2DN6TV7EMTGBM","merchantName":"Medium"},"signInWallCustomDomainCollectionIds":["3a8144eabfe3","336d898217ee","61061eb0c96b","138adf9c44c","819cc2aaeee0"],"mediumOwnedAndOperatedCollectionIds":["8a9336e5bb4","b7e45b22fec3","193b68bd4fba","8d6b8a439e32","54c98c43354d","3f6ecf56618","d944778ce714","92d2092dc598","ae2a65f35510","1285ba81cada","544c7006046e","fc8964313712","40187e704f1c","88d9857e584e","7b6769f2748b","bcc38c8f6edf","cef6983b292","cb8577c9149e","444d13b52878","713d7dbc99b0","ef8e90590e66","191186aaafa0","55760f21cdc5","9dc80918cc93","bdc4052bbdba","8ccfed20cbb2"],"tierOneDomains":["medium.com","thebolditalic.com","arcdigital.media","towardsdatascience.com","uxdesign.cc","codeburst.io","psiloveyou.xyz","writingcooperative.com","entrepreneurshandbook.co","prototypr.io","betterhumans.coach.me","theascent.pub"],"topicsToFollow":["d61cf867d93f","8a146bc21b28","1eca0103fff3","4d562ee63426","aef1078a3ef5","e15e46793f8d","6158eb913466","55f1c20aba7a","3d18b94f6858","4861fee224fd","63c6f1f93ee","1d98b3a9a871","decb52b64abf","ae5d4995e225","830cded25262"],"topicToTagMappings":{"accessibility":"accessibility","addiction":"addiction","android-development":"android-development","art":"art","artificial-intelligence":"artificial-intelligence","astrology":"astrology","basic-income":"basic-income","beauty":"beauty","biotech":"biotech","blockchain":"blockchain","books":"books","business":"business","cannabis":"cannabis","cities":"cities","climate-change":"climate-change","comics":"comics","coronavirus":"coronavirus","creativity":"creativity","cryptocurrency":"cryptocurrency","culture":"culture","cybersecurity":"cybersecurity","data-science":"data-science","design":"design","digital-life":"digital-life","disability":"disability","economy":"economy","education":"education","equality":"equality","family":"family","feminism":"feminism","fiction":"fiction","film":"film","fitness":"fitness","food":"food","freelancing":"freelancing","future":"future","gadgets":"gadgets","gaming":"gaming","gun-control":"gun-control","health":"health","history":"history","humor":"humor","immigration":"immigration","ios-development":"ios-development","javascript":"javascript","justice":"justice","language":"language","leadership":"leadership","lgbtqia":"lgbtqia","lifestyle":"lifestyle","machine-learning":"machine-learning","makers":"makers","marketing":"marketing","math":"math","media":"media","mental-health":"mental-health","mindfulness":"mindfulness","money":"money","music":"music","neuroscience":"neuroscience","nonfiction":"nonfiction","outdoors":"outdoors","parenting":"parenting","pets":"pets","philosophy":"philosophy","photography":"photography","podcasts":"podcast","poetry":"poetry","politics":"politics","privacy":"privacy","product-management":"product-management","productivity":"productivity","programming":"programming","psychedelics":"psychedelics","psychology":"psychology","race":"race","relationships":"relationships","religion":"religion","remote-work":"remote-work","san-francisco":"san-francisco","science":"science","self":"self","self-driving-cars":"self-driving-cars","sexuality":"sexuality","social-media":"social-media","society":"society","software-engineering":"software-engineering","space":"space","spirituality":"spirituality","sports":"sports","startups":"startup","style":"style","technology":"technology","transportation":"transportation","travel":"travel","true-crime":"true-crime","tv":"tv","ux":"ux","venture-capital":"venture-capital","visual-design":"visual-design","work":"work","world":"world","writing":"writing"},"defaultImages":{"avatar":{"imageId":"1*dmbNkD5D-u45r44go_cf0g.png","height":150,"width":150},"orgLogo":{"imageId":"1*OMF3fSqH8t4xBJ9-6oZDZw.png","height":106,"width":545},"postLogo":{"imageId":"1*kFrc4tBFM_tCis-2Ic87WA.png","height":810,"width":1440},"postPreviewImage":{"imageId":"1*hn4v1tCaJy7cWMyb0bpNpQ.png","height":386,"width":579}},"collectionStructuredData":{"8d6b8a439e32":{"name":"Elemental","data":{"@type":"NewsMediaOrganization","ethicsPolicy":"https:\u002F\u002Fhelp.medium.com\u002Fhc\u002Fen-us\u002Farticles\u002F360043290473","logo":{"@type":"ImageObject","url":"https:\u002F\u002Fcdn-images-1.medium.com\u002Fmax\u002F980\u002F1*[email protected]","width":980,"height":159}}},"3f6ecf56618":{"name":"Forge","data":{"@type":"NewsMediaOrganization","ethicsPolicy":"https:\u002F\u002Fhelp.medium.com\u002Fhc\u002Fen-us\u002Farticles\u002F360043290473","logo":{"@type":"ImageObject","url":"https:\u002F\u002Fcdn-images-1.medium.com\u002Fmax\u002F596\u002F1*[email protected]","width":596,"height":183}}},"ae2a65f35510":{"name":"GEN","data":{"@type":"NewsMediaOrganization","ethicsPolicy":"https:\u002F\u002Fhelp.medium.com\u002Fhc\u002Fen-us\u002Farticles\u002F360043290473","logo":{"@type":"ImageObject","url":"https:\u002F\u002Fmiro.medium.com\u002Fmax\u002F264\u002F1*RdVZMdvfV3YiZTw6mX7yWA.png","width":264,"height":140}}},"88d9857e584e":{"name":"LEVEL","data":{"@type":"NewsMediaOrganization","ethicsPolicy":"https:\u002F\u002Fhelp.medium.com\u002Fhc\u002Fen-us\u002Farticles\u002F360043290473","logo":{"@type":"ImageObject","url":"https:\u002F\u002Fmiro.medium.com\u002Fmax\u002F540\u002F1*JqYMhNX6KNNb2UlqGqO2WQ.png","width":540,"height":108}}},"7b6769f2748b":{"name":"Marker","data":{"@type":"NewsMediaOrganization","ethicsPolicy":"https:\u002F\u002Fhelp.medium.com\u002Fhc\u002Fen-us\u002Farticles\u002F360043290473","logo":{"@type":"ImageObject","url":"https:\u002F\u002Fcdn-images-1.medium.com\u002Fmax\u002F383\u002F1*[email protected]","width":383,"height":92}}},"444d13b52878":{"name":"OneZero","data":{"@type":"NewsMediaOrganization","ethicsPolicy":"https:\u002F\u002Fhelp.medium.com\u002Fhc\u002Fen-us\u002Farticles\u002F360043290473","logo":{"@type":"ImageObject","url":"https:\u002F\u002Fmiro.medium.com\u002Fmax\u002F540\u002F1*cw32fIqCbRWzwJaoQw6BUg.png","width":540,"height":123}}},"8ccfed20cbb2":{"name":"Zora","data":{"@type":"NewsMediaOrganization","ethicsPolicy":"https:\u002F\u002Fhelp.medium.com\u002Fhc\u002Fen-us\u002Farticles\u002F360043290473","logo":{"@type":"ImageObject","url":"https:\u002F\u002Fmiro.medium.com\u002Fmax\u002F540\u002F1*tZUQqRcCCZDXjjiZ4bDvgQ.png","width":540,"height":106}}}},"embeddedPostIds":{"coronavirus":"cd3010f9d81f"},"sharedCdcMessaging":{"COVID_APPLICABLE_TAG_SLUGS":[],"COVID_APPLICABLE_TOPIC_NAMES":[],"COVID_APPLICABLE_TOPIC_NAMES_FOR_TOPIC_PAGE":[],"COVID_MESSAGES":{"tierA":{"text":"For more information on the novel coronavirus and Covid-19, visit cdc.gov.","markups":[{"start":66,"end":73,"href":"https:\u002F\u002Fwww.cdc.gov\u002Fcoronavirus\u002F2019-nCoV"}]},"tierB":{"text":"Anyone can publish on Medium per our Policies, but we don’t fact-check every story. For more info about the coronavirus, see cdc.gov.","markups":[{"start":37,"end":45,"href":"https:\u002F\u002Fhelp.medium.com\u002Fhc\u002Fen-us\u002Fcategories\u002F201931128-Policies-Safety"},{"start":125,"end":132,"href":"https:\u002F\u002Fwww.cdc.gov\u002Fcoronavirus\u002F2019-nCoV"}]},"paywall":{"text":"This article has been made free for everyone, thanks to Medium Members. For more information on the novel coronavirus and Covid-19, visit cdc.gov.","markups":[{"start":56,"end":70,"href":"https:\u002F\u002Fmedium.com\u002Fmembership"},{"start":138,"end":145,"href":"https:\u002F\u002Fwww.cdc.gov\u002Fcoronavirus\u002F2019-nCoV"}]},"unbound":{"text":"This article is free for everyone, thanks to Medium Members. For more information on the novel coronavirus and Covid-19, visit cdc.gov.","markups":[{"start":45,"end":59,"href":"https:\u002F\u002Fmedium.com\u002Fmembership"},{"start":127,"end":134,"href":"https:\u002F\u002Fwww.cdc.gov\u002Fcoronavirus\u002F2019-nCoV"}]}},"COVID_BANNER_POST_ID_OVERRIDE_WHITELIST":["3b31a67bff4a"]},"sharedVoteMessaging":{"TAGS":["politics","election-2020","government","us-politics","election","2020-presidential-race","trump","donald-trump","democrats","republicans","congress","republican-party","democratic-party","biden","joe-biden","maga"],"TOPICS":["politics","election"],"MESSAGE":{"text":"Find out more about the U.S. election results here.","markups":[{"start":46,"end":50,"href":"https:\u002F\u002Fcookpolitical.com\u002F2020-national-popular-vote-tracker"}]},"EXCLUDE_POSTS":["397ef29e3ca5"]},"embedPostRules":[],"recircOptions":{"v1":{"limit":3},"v2":{"limit":8}},"braintreeClientKey":"production_zjkj96jm_m56f8fqpf7ngnrd4","paypalClientId":"AXj1G4fotC2GE8KzWX9mSxCH1wmPE3nJglf4Z2ig_amnhvlMVX87otaq58niAg9iuLktVNF_1WCMnN7v","stripePublishableKey":"pk_live_7FReX44VnNIInZwrIIx6ghjl"},"session":{"xsrf":""}}</script><script>window.__APOLLO_STATE__ = {"ROOT_QUERY":{"__typename":"Query","meterPost({\"postId\":\"eaa26d16c719\",\"postMeteringOptions\":{\"referrer\":\"https:\u002F\u002Fwww.google.com\u002F\",\"sk\":null,\"source\":null}})":{"__ref":"MeteringInfo:{}"},"postResult({\"id\":\"eaa26d16c719\"})":{"__ref":"Post:eaa26d16c719"}},"MeteringInfo:{}":{"__typename":"MeteringInfo","postIds":["eaa26d16c719"],"maxUnlockCount":3,"unlocksRemaining":2},"User:e29281884d6":{"id":"e29281884d6","__typename":"User","name":"Chan Woo Kim","username":"chanwkim01","newsletterV3":{"__ref":"NewsletterV3:132a7e74e235"},"customStyleSheet":{"__ref":"CustomStyleSheet:c80fbc93cf41"},"isSuspended":false,"bio":"Deep Learning Researcher at Williams College","imageId":"1*cTTNRpwyB9O1a0yrYXcRtQ.jpeg","hasCompletedProfile":false,"isAuroraVisible":true,"mediumMemberAt":0,"socialStats":{"__typename":"SocialStats","followerCount":209,"followingCount":78,"collectionFollowingCount":6},"customDomainState":{"__typename":"CustomDomainState","live":{"__typename":"CustomDomain","domain":"chanwkim01.medium.com","status":"ACTIVE","isSubdomain":true}},"hasSubdomain":true,"viewerEdge":{"__ref":"UserViewerEdge:userId:e29281884d6-viewerId:lo_35617133e23a"},"bookAuthor":null,"isPartnerProgramEnrolled":true,"viewerIsUser":false,"homepagePostsConnection({\"paging\":{\"limit\":1}})":{"__typename":"PostConnection","posts":[{"__ref":"Post:a1aaccdc5b26"}]},"postSubscribeMembershipUpsellShownAt":0,"allowNotes":true,"replyToEmailBannerShownCount":0,"twitterScreenName":"","followedCollections":6,"referredMembershipCustomHeadline":"","referredMembershipCustomBody":"","atsQualifiedAt":1624797968937},"ImageMetadata:":{"id":"","__typename":"ImageMetadata"},"CustomStyleSheet:7fd29fa259f3":{"id":"7fd29fa259f3","__typename":"CustomStyleSheet","global":{"__typename":"GlobalStyles","colorPalette":{"__typename":"StyleSheetColorPalette","primary":{"__typename":"ColorValue","colorPalette":{"__typename":"ColorPalette","highlightSpectrum":{"__typename":"ColorSpectrum","backgroundColor":"#FFFFFFFF","colorPoints":[{"__typename":"ColorPoint","color":"#FFE8F3FF","point":0},{"__typename":"ColorPoint","color":"#FFE4F1FF","point":0.1},{"__typename":"ColorPoint","color":"#FFDFEFFF","point":0.2},{"__typename":"ColorPoint","color":"#FFDBEDFF","point":0.3},{"__typename":"ColorPoint","color":"#FFD6EBFF","point":0.4},{"__typename":"ColorPoint","color":"#FFD1E9FF","point":0.5},{"__typename":"ColorPoint","color":"#FFCDE7FF","point":0.6},{"__typename":"ColorPoint","color":"#FFC8E5FF","point":0.7},{"__typename":"ColorPoint","color":"#FFC3E3FF","point":0.8},{"__typename":"ColorPoint","color":"#FFBEE1FF","point":0.9},{"__typename":"ColorPoint","color":"#FFB9DEFF","point":1}]},"defaultBackgroundSpectrum":{"__typename":"ColorSpectrum","backgroundColor":"#FFFFFFFF","colorPoints":[{"__typename":"ColorPoint","color":"#FF557CFF","point":0},{"__typename":"ColorPoint","color":"#FF5174F8","point":0.1},{"__typename":"ColorPoint","color":"#FF4C6CE0","point":0.2},{"__typename":"ColorPoint","color":"#FF4764C8","point":0.3},{"__typename":"ColorPoint","color":"#FF415BB1","point":0.4},{"__typename":"ColorPoint","color":"#FF3B529B","point":0.5},{"__typename":"ColorPoint","color":"#FF354884","point":0.6},{"__typename":"ColorPoint","color":"#FF2E3D6E","point":0.7},{"__typename":"ColorPoint","color":"#FF263259","point":0.8},{"__typename":"ColorPoint","color":"#FF1D2743","point":0.9},{"__typename":"ColorPoint","color":"#FF131A2D","point":1}]},"tintBackgroundSpectrum":{"__typename":"ColorSpectrum","backgroundColor":"#FF334fe9","colorPoints":[{"__typename":"ColorPoint","color":"#FF334FE9","point":0},{"__typename":"ColorPoint","color":"#FF4768ED","point":0.1},{"__typename":"ColorPoint","color":"#FF5C7DF2","point":0.2},{"__typename":"ColorPoint","color":"#FF7091F7","point":0.3},{"__typename":"ColorPoint","color":"#FF85A2FC","point":0.4},{"__typename":"ColorPoint","color":"#FF99B3FF","point":0.5},{"__typename":"ColorPoint","color":"#FFACC3FF","point":0.6},{"__typename":"ColorPoint","color":"#FFBFD3FF","point":0.7},{"__typename":"ColorPoint","color":"#FFD2E2FF","point":0.8},{"__typename":"ColorPoint","color":"#FFE5F1FF","point":0.9},{"__typename":"ColorPoint","color":"#FFF7FFFF","point":1}]}}},"background":null},"fonts":{"__typename":"StyleSheetFonts","font1":{"__typename":"StyleSheetFont","name":"SANS_SERIF_1"},"font2":{"__typename":"StyleSheetFont","name":"SANS_SERIF_1"},"font3":{"__typename":"StyleSheetFont","name":"SERIF_2"}}},"header":{"__typename":"HeaderStyles","backgroundColor":{"__typename":"ColorValue","colorPalette":{"__typename":"ColorPalette","tintBackgroundSpectrum":{"__typename":"ColorSpectrum","backgroundColor":"#FF84f984","colorPoints":[{"__typename":"ColorPoint","color":"#FF84F984","point":0},{"__typename":"ColorPoint","color":"#FF74E876","point":0.1},{"__typename":"ColorPoint","color":"#FF64D767","point":0.2},{"__typename":"ColorPoint","color":"#FF53C659","point":0.3},{"__typename":"ColorPoint","color":"#FF41B449","point":0.4},{"__typename":"ColorPoint","color":"#FF2CA13A","point":0.5},{"__typename":"ColorPoint","color":"#FF0D8D29","point":0.6},{"__typename":"ColorPoint","color":"#FF007916","point":0.7},{"__typename":"ColorPoint","color":"#FF006300","point":0.8},{"__typename":"ColorPoint","color":"#FF004B00","point":0.9},{"__typename":"ColorPoint","color":"#FF002F00","point":1}]}},"rgb":"84f984","alpha":"ff"},"postBackgroundColor":null,"backgroundImage":null,"headerScale":"HEADER_SCALE_MEDIUM","horizontalAlignment":"START","backgroundImageDisplayMode":"IMAGE_DISPLAY_MODE_FILL","backgroundImageVerticalAlignment":"START","backgroundColorDisplayMode":"COLOR_DISPLAY_MODE_VERTICAL_GRADIENT","secondaryBackgroundColor":{"__typename":"ColorValue","rgb":"3f72ff","alpha":"ff"},"nameColor":null,"nameTreatment":"NAME_TREATMENT_LOGO","postNameTreatment":"NAME_TREATMENT_LOGO","logoImage":{"__ref":"ImageMetadata:1*1JYwCiLpV0AM-SpxKqhEFw.png"},"logoScale":"HEADER_SCALE_SMALL","taglineColor":null,"taglineTreatment":"TAGLINE_TREATMENT_SIDEBAR"},"navigation":{"__typename":"HeaderNavigation","navItems":[]},"postBody":null,"blogroll":{"__typename":"BlogrollConfiguration","visibility":"BLOGROLL_VISIBILITY_SIDEBAR"}},"CollectionViewerEdge:collectionId:a4d68c1f6803-viewerId:lo_35617133e23a":{"id":"collectionId:a4d68c1f6803-viewerId:lo_35617133e23a","__typename":"CollectionViewerEdge","isEditor":false},"ImageMetadata:1*1JYwCiLpV0AM-SpxKqhEFw.png":{"id":"1*1JYwCiLpV0AM-SpxKqhEFw.png","__typename":"ImageMetadata","originalWidth":1922,"originalHeight":1920},"ImageMetadata:1*VUcaW5MHOxVDJ-Sjz5ljbQ.png":{"id":"1*VUcaW5MHOxVDJ-Sjz5ljbQ.png","__typename":"ImageMetadata"},"Collection:a4d68c1f6803":{"id":"a4d68c1f6803","__typename":"Collection","domain":null,"googleAnalyticsId":null,"slug":"machine-learning-intuition","colorBehavior":"ACCENT_COLOR","isAuroraVisible":true,"favicon":{"__ref":"ImageMetadata:"},"name":"Explained Relentlessly: Deep Learning & Natural Language Processing","colorPalette":{"__typename":"ColorPalette","highlightSpectrum":{"__typename":"ColorSpectrum","backgroundColor":"#FFFFFFFF","colorPoints":[{"__typename":"ColorPoint","color":"#FFFFFFFF","point":0},{"__typename":"ColorPoint","color":"#FFE8F3E8","point":0.1},{"__typename":"ColorPoint","color":"#FFE8F3E8","point":0.2},{"__typename":"ColorPoint","color":"#FFD1E7D1","point":0.6},{"__typename":"ColorPoint","color":"#FFA3D0A2","point":1}]},"defaultBackgroundSpectrum":{"__typename":"ColorSpectrum","backgroundColor":"#FFFFFFFF","colorPoints":[{"__typename":"ColorPoint","color":"#FF1A8917","point":0},{"__typename":"ColorPoint","color":"#FF11800E","point":0.1},{"__typename":"ColorPoint","color":"#FF0F730C","point":0.2},{"__typename":"ColorPoint","color":"#FF095407","point":1}]},"tintBackgroundSpectrum":null},"customStyleSheet":{"__ref":"CustomStyleSheet:7fd29fa259f3"},"tagline":"Deep Learning & Natural Language Processing concepts explained intuitively","isAuroraEligible":true,"viewerEdge":{"__ref":"CollectionViewerEdge:collectionId:a4d68c1f6803-viewerId:lo_35617133e23a"},"logo":{"__ref":"ImageMetadata:1*1JYwCiLpV0AM-SpxKqhEFw.png"},"navItems":[],"creator":{"__ref":"User:e29281884d6"},"subscriberCount":253,"avatar":{"__ref":"ImageMetadata:1*VUcaW5MHOxVDJ-Sjz5ljbQ.png"},"newsletterV3":null,"canToggleEmail":false,"description":"A Deep Learning & Natural Language Processing blog. Explained with a relentless effort for intuition and clarity.","ampEnabled":false,"twitterUsername":null,"facebookPageId":null,"customDomainState":null,"ptsQualifiedAt":1624874684381},"CustomStyleSheet:c80fbc93cf41":{"id":"c80fbc93cf41","__typename":"CustomStyleSheet","blogroll":{"__typename":"BlogrollConfiguration","visibility":"BLOGROLL_VISIBILITY_HIDDEN"},"global":{"__typename":"GlobalStyles","colorPalette":{"__typename":"StyleSheetColorPalette","primary":{"__typename":"ColorValue","colorPalette":{"__typename":"ColorPalette","tintBackgroundSpectrum":{"__typename":"ColorSpectrum","backgroundColor":"#FF5666cf","colorPoints":[{"__typename":"ColorPoint","color":"#FF5666CF","point":0},{"__typename":"ColorPoint","color":"#FF6778D7","point":0.1},{"__typename":"ColorPoint","color":"#FF798ADF","point":0.2},{"__typename":"ColorPoint","color":"#FF8A9AE6","point":0.3},{"__typename":"ColorPoint","color":"#FF9BAAED","point":0.4},{"__typename":"ColorPoint","color":"#FFABB9F4","point":0.5},{"__typename":"ColorPoint","color":"#FFBCC7FB","point":0.6},{"__typename":"ColorPoint","color":"#FFCCD5FF","point":0.7},{"__typename":"ColorPoint","color":"#FFDCE3FF","point":0.8},{"__typename":"ColorPoint","color":"#FFECF1FF","point":0.9},{"__typename":"ColorPoint","color":"#FFFBFEFF","point":1}]},"highlightSpectrum":{"__typename":"ColorSpectrum","backgroundColor":"#FFFFFFFF","colorPoints":[{"__typename":"ColorPoint","color":"#FFEDF2FF","point":0},{"__typename":"ColorPoint","color":"#FFEAF0FF","point":0.1},{"__typename":"ColorPoint","color":"#FFE6EEFF","point":0.2},{"__typename":"ColorPoint","color":"#FFE3ECFF","point":0.3},{"__typename":"ColorPoint","color":"#FFE0EAFF","point":0.4},{"__typename":"ColorPoint","color":"#FFDCE8FF","point":0.5},{"__typename":"ColorPoint","color":"#FFD9E6FF","point":0.6},{"__typename":"ColorPoint","color":"#FFD5E4FF","point":0.7},{"__typename":"ColorPoint","color":"#FFD2E1FF","point":0.8},{"__typename":"ColorPoint","color":"#FFCEDFFF","point":0.9},{"__typename":"ColorPoint","color":"#FFCBDDFF","point":1}]},"defaultBackgroundSpectrum":{"__typename":"ColorSpectrum","backgroundColor":"#FFFFFFFF","colorPoints":[{"__typename":"ColorPoint","color":"#FF6C7EE8","point":0},{"__typename":"ColorPoint","color":"#FF6576D4","point":0.1},{"__typename":"ColorPoint","color":"#FF5E6DC1","point":0.2},{"__typename":"ColorPoint","color":"#FF5765AD","point":0.3},{"__typename":"ColorPoint","color":"#FF4F5B9A","point":0.4},{"__typename":"ColorPoint","color":"#FF475287","point":0.5},{"__typename":"ColorPoint","color":"#FF3F4874","point":0.6},{"__typename":"ColorPoint","color":"#FF363D61","point":0.7},{"__typename":"ColorPoint","color":"#FF2C324F","point":0.8},{"__typename":"ColorPoint","color":"#FF22263C","point":0.9},{"__typename":"ColorPoint","color":"#FF161928","point":1}]}}},"background":null},"fonts":{"__typename":"StyleSheetFonts","font1":{"__typename":"StyleSheetFont","name":"SANS_SERIF_1"},"font2":{"__typename":"StyleSheetFont","name":"SANS_SERIF_1"},"font3":{"__typename":"StyleSheetFont","name":"SERIF_2"}}},"header":{"__typename":"HeaderStyles","backgroundColor":{"__typename":"ColorValue","colorPalette":{"__typename":"ColorPalette","tintBackgroundSpectrum":{"__typename":"ColorSpectrum","backgroundColor":"#FF3b8b72","colorPoints":[{"__typename":"ColorPoint","color":"#FF3B8B72","point":0},{"__typename":"ColorPoint","color":"#FF539981","point":0.1},{"__typename":"ColorPoint","color":"#FF68A690","point":0.2},{"__typename":"ColorPoint","color":"#FF7BB39E","point":0.3},{"__typename":"ColorPoint","color":"#FF8EBFAC","point":0.4},{"__typename":"ColorPoint","color":"#FFA0CBBA","point":0.5},{"__typename":"ColorPoint","color":"#FFB2D7C7","point":0.6},{"__typename":"ColorPoint","color":"#FFC3E2D5","point":0.7},{"__typename":"ColorPoint","color":"#FFD4EDE2","point":0.8},{"__typename":"ColorPoint","color":"#FFE5F8EF","point":0.9},{"__typename":"ColorPoint","color":"#FFF6FFFC","point":1}]}},"rgb":"3b8b72","alpha":"ff"},"postBackgroundColor":null,"backgroundImage":null,"headerScale":"HEADER_SCALE_MEDIUM","horizontalAlignment":"START","backgroundImageDisplayMode":"IMAGE_DISPLAY_MODE_FILL","backgroundImageVerticalAlignment":"START","backgroundColorDisplayMode":"COLOR_DISPLAY_MODE_VERTICAL_GRADIENT","secondaryBackgroundColor":{"__typename":"ColorValue","rgb":"666cdf","alpha":"ff"},"nameColor":null,"nameTreatment":"NAME_TREATMENT_TEXT","postNameTreatment":"NAME_TREATMENT_LOGO","logoImage":null,"logoScale":"HEADER_SCALE_MEDIUM","taglineColor":null,"taglineTreatment":"TAGLINE_TREATMENT_SIDEBAR"},"navigation":null},"UserViewerEdge:userId:e29281884d6-viewerId:lo_35617133e23a":{"id":"userId:e29281884d6-viewerId:lo_35617133e23a","__typename":"UserViewerEdge","createdAt":0,"lastPostCreatedAt":0,"isFollowing":false,"isUser":false},"NewsletterV3:132a7e74e235":{"id":"132a7e74e235","__typename":"NewsletterV3","type":"NEWSLETTER_TYPE_AUTHOR","slug":"e29281884d6","name":"e29281884d6","collection":null,"user":{"__ref":"User:e29281884d6"},"description":"","promoHeadline":"","promoBody":"","replyToEmail":"","showPromo":false,"subscribersCount":2},"Post:a1aaccdc5b26":{"id":"a1aaccdc5b26","__typename":"Post"},"Paragraph:457a32e5316b_0":{"id":"457a32e5316b_0","__typename":"Paragraph","name":"750f","text":"Document Classification Part 2: Text Processing (N-Gram Model & TF-IDF Model)","type":"H3","href":null,"layout":null,"metadata":null,"hasDropCap":null,"iframe":null,"mixtapeMetadata":null,"markups":[{"__typename":"Markup","start":32,"end":77,"type":"STRONG","href":null,"anchorType":null,"userId":null,"linkMetadata":null}],"dropCapImage":null},"Paragraph:457a32e5316b_1":{"id":"457a32e5316b_1","__typename":"Paragraph","name":"f1ec","text":"Photo by Luca Bravo on Unsplash","type":"IMG","href":null,"layout":"FULL_WIDTH","metadata":{"__ref":"ImageMetadata:1*UhbeQoqOi9AaSqNnPS0TkQ.jpeg"},"hasDropCap":null,"iframe":null,"mixtapeMetadata":null,"markups":[{"__typename":"Markup","start":9,"end":19,"type":"A","href":"https:\u002F\u002Funsplash.com\u002Fphotos\u002FO453M2Liufs?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText","anchorType":"LINK","userId":null,"linkMetadata":null},{"__typename":"Markup","start":23,"end":31,"type":"A","href":"https:\u002F\u002Funsplash.com\u002Fsearch\u002Fphotos\u002Fnature?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText","anchorType":"LINK","userId":null,"linkMetadata":null}],"dropCapImage":null},"Paragraph:457a32e5316b_2":{"id":"457a32e5316b_2","__typename":"Paragraph","name":"260f","text":"In this article I will explain some core concepts in text processing in conducting machine learning on documents to classify them into categories. This is the part 2 of a series outlined below:","type":"P","href":null,"layout":null,"metadata":null,"hasDropCap":null,"iframe":null,"mixtapeMetadata":null,"markups":[],"dropCapImage":null},"Paragraph:457a32e5316b_3":{"id":"457a32e5316b_3","__typename":"Paragraph","name":"3fcd","text":"Part 1: Intuition & How Do We Work With Documents?","type":"P","href":null,"layout":null,"metadata":null,"hasDropCap":null,"iframe":null,"mixtapeMetadata":null,"markups":[{"__typename":"Markup","start":0,"end":50,"type":"STRONG","href":null,"anchorType":null,"userId":null,"linkMetadata":null}],"dropCapImage":null},"ImageMetadata:1*UhbeQoqOi9AaSqNnPS0TkQ.jpeg":{"id":"1*UhbeQoqOi9AaSqNnPS0TkQ.jpeg","__typename":"ImageMetadata","originalHeight":3264,"originalWidth":4896,"focusPercentX":null,"focusPercentY":null,"alt":null},"Tag:naturallanguageprocessing":{"id":"naturallanguageprocessing","__typename":"Tag","displayTitle":"Naturallanguageprocessing","normalizedTagSlug":"naturallanguageprocessing"},"Tag:nlp":{"id":"nlp","__typename":"Tag","displayTitle":"NLP","normalizedTagSlug":"nlp"},"Tag:python":{"id":"python","__typename":"Tag","displayTitle":"Python","normalizedTagSlug":"python"},"Tag:machine-learning":{"id":"machine-learning","__typename":"Tag","displayTitle":"Machine Learning","normalizedTagSlug":"machine-learning"},"Tag:classification":{"id":"classification","__typename":"Tag","displayTitle":"Classification","normalizedTagSlug":"classification"},"ImageMetadata:1*vgH_tK_6C8vca9w9kjYwcQ.png":{"id":"1*vgH_tK_6C8vca9w9kjYwcQ.png","__typename":"ImageMetadata","focusPercentX":null,"focusPercentY":null},"UserViewerEdge:userId:a7461b133f1d-viewerId:lo_35617133e23a":{"id":"userId:a7461b133f1d-viewerId:lo_35617133e23a","__typename":"UserViewerEdge","isFollowing":false,"isUser":false},"User:a7461b133f1d":{"id":"a7461b133f1d","__typename":"User","name":"rhome","username":"rhome","bio":"Engineer, gravitating in science fields. Passionate about math, physics, computing and artificial intelligence.","imageId":"2*1bHtzIVIGmWIrooXLwHrtg.jpeg","mediumMemberAt":1561341267000,"isPartnerProgramEnrolled":false,"viewerEdge":{"__ref":"UserViewerEdge:userId:a7461b133f1d-viewerId:lo_35617133e23a"},"viewerIsUser":false,"newsletterV3":null,"customDomainState":null,"hasSubdomain":false,"postSubscribeMembershipUpsellShownAt":0},"Post:26d5a993692b":{"id":"26d5a993692b","__typename":"Post","title":"Automatic Differentiation","mediumUrl":"https:\u002F\u002Fmedium.com\u002F@rhome\u002Fautomatic-differentiation-26d5a993692b","previewImage":{"__ref":"ImageMetadata:1*vgH_tK_6C8vca9w9kjYwcQ.png"},"isPublished":true,"firstPublishedAt":1570969581354,"readingTime":6.997169811320754,"statusForCollection":null,"isLocked":false,"visibility":"PUBLIC","collection":null,"creator":{"__ref":"User:a7461b133f1d"},"previewContent":{"__typename":"PreviewContent","isFullContent":false}},"ImageMetadata:1*[email protected]":{"id":"1*[email protected]","__typename":"ImageMetadata","focusPercentX":null,"focusPercentY":null},"CollectionViewerEdge:collectionId:7219b4dc6c4c-viewerId:lo_35617133e23a":{"id":"collectionId:7219b4dc6c4c-viewerId:lo_35617133e23a","__typename":"CollectionViewerEdge","isEditor":false},"Collection:7219b4dc6c4c":{"id":"7219b4dc6c4c","__typename":"Collection","name":"Analytics Vidhya","description":"Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https:\u002F\u002Fwww.analyticsvidhya.com","tagline":"Analytics Vidhya is a community of Analytics and Data…","domain":null,"slug":"analytics-vidhya","isAuroraEligible":true,"isAuroraVisible":false,"viewerEdge":{"__ref":"CollectionViewerEdge:collectionId:7219b4dc6c4c-viewerId:lo_35617133e23a"},"canToggleEmail":false},"UserViewerEdge:userId:f4a3547e3a22-viewerId:lo_35617133e23a":{"id":"userId:f4a3547e3a22-viewerId:lo_35617133e23a","__typename":"UserViewerEdge","isFollowing":false,"isUser":false},"NewsletterV3:a5edf2bac5c4":{"id":"a5edf2bac5c4","__typename":"NewsletterV3","type":"NEWSLETTER_TYPE_AUTHOR","slug":"f4a3547e3a22","name":"f4a3547e3a22","collection":null,"user":{"__ref":"User:f4a3547e3a22"}},"User:f4a3547e3a22":{"id":"f4a3547e3a22","__typename":"User","name":"Manika Nagpal","username":"manikanagpal","newsletterV3":{"__ref":"NewsletterV3:a5edf2bac5c4"},"bio":"Writer at ProjectPro | UTSIP-2017| ISRO SRFP-2016| DU Innovation Project 2015-16| Physics - University of Delhi| \"Knowing is not knowing. \"","imageId":"1*mes2V_Bse1EADyeyrYC_Dw.jpeg","mediumMemberAt":0,"isPartnerProgramEnrolled":false,"viewerEdge":{"__ref":"UserViewerEdge:userId:f4a3547e3a22-viewerId:lo_35617133e23a"},"viewerIsUser":false,"customDomainState":{"__typename":"CustomDomainState","live":{"__typename":"CustomDomain","domain":"manikanagpal.medium.com"}},"hasSubdomain":true,"postSubscribeMembershipUpsellShownAt":0},"Post:d640aa2c915d":{"id":"d640aa2c915d","__typename":"Post","title":"(Part 4)The goblet of Convolutional- Neural Network(CNN)","mediumUrl":"https:\u002F\u002Fmedium.com\u002Fanalytics-vidhya\u002Fpart-4-the-goblet-of-convolutional-neural-network-cnn-d640aa2c915d","previewImage":{"__ref":"ImageMetadata:1*[email protected]"},"isPublished":true,"firstPublishedAt":1595774513846,"readingTime":8.10188679245283,"statusForCollection":"APPROVED","isLocked":false,"visibility":"PUBLIC","collection":{"__ref":"Collection:7219b4dc6c4c"},"creator":{"__ref":"User:f4a3547e3a22"},"previewContent":{"__typename":"PreviewContent","isFullContent":false}},"ImageMetadata:1*eITEi8J4GI1dwMqCrgQGkw.png":{"id":"1*eITEi8J4GI1dwMqCrgQGkw.png","__typename":"ImageMetadata","focusPercentX":null,"focusPercentY":null},"CollectionViewerEdge:collectionId:6ea408ec434d-viewerId:lo_35617133e23a":{"id":"collectionId:6ea408ec434d-viewerId:lo_35617133e23a","__typename":"CollectionViewerEdge","isEditor":false},"Collection:6ea408ec434d":{"id":"6ea408ec434d","__typename":"Collection","name":"learn data science","description":"Unpacking Data Science One Step At A Time","tagline":"Unpacking Data Science One Step At A Time","domain":"blog.exploratory.io","slug":"learn-dplyr","isAuroraEligible":false,"isAuroraVisible":false,"viewerEdge":{"__ref":"CollectionViewerEdge:collectionId:6ea408ec434d-viewerId:lo_35617133e23a"},"canToggleEmail":false},"UserViewerEdge:userId:1bfa80768afa-viewerId:lo_35617133e23a":{"id":"userId:1bfa80768afa-viewerId:lo_35617133e23a","__typename":"UserViewerEdge","isFollowing":false,"isUser":false},"User:1bfa80768afa":{"id":"1bfa80768afa","__typename":"User","name":"Kan Nishida","username":"kanaugust","bio":"CEO \u002F Founder at Exploratory(https:\u002F\u002Fexploratory.io\u002F). Having fun analyzing interesting data and learning something new everyday.","imageId":"1*3EPEaiFps9Ds3GpqV8pNmA.png","mediumMemberAt":0,"isPartnerProgramEnrolled":false,"viewerEdge":{"__ref":"UserViewerEdge:userId:1bfa80768afa-viewerId:lo_35617133e23a"},"viewerIsUser":false,"newsletterV3":null,"customDomainState":null,"hasSubdomain":false,"postSubscribeMembershipUpsellShownAt":0},"Post:fdcf321e2d7d":{"id":"fdcf321e2d7d","__typename":"Post","title":"Quick Introduction to Logistic Regression in Exploratory","mediumUrl":"https:\u002F\u002Fblog.exploratory.io\u002Fquick-introduction-to-logistic-regression-in-exploratory-fdcf321e2d7d","previewImage":{"__ref":"ImageMetadata:1*eITEi8J4GI1dwMqCrgQGkw.png"},"isPublished":true,"firstPublishedAt":1485284230954,"readingTime":5.702830188679245,"statusForCollection":"APPROVED","isLocked":false,"visibility":"PUBLIC","collection":{"__ref":"Collection:6ea408ec434d"},"creator":{"__ref":"User:1bfa80768afa"},"previewContent":{"__typename":"PreviewContent","isFullContent":false}},"ImageMetadata:1*p0RW19vl_23Cm2DajRWeHw.png":{"id":"1*p0RW19vl_23Cm2DajRWeHw.png","__typename":"ImageMetadata","focusPercentX":null,"focusPercentY":null},"CollectionViewerEdge:collectionId:7f60cf5620c9-viewerId:lo_35617133e23a":{"id":"collectionId:7f60cf5620c9-viewerId:lo_35617133e23a","__typename":"CollectionViewerEdge","isEditor":false},"Collection:7f60cf5620c9":{"id":"7f60cf5620c9","__typename":"Collection","name":"Towards Data Science","description":"Your home for data science. A Medium publication sharing concepts, ideas and codes.","tagline":"A Medium publication sharing concepts, ideas and codes.","domain":"towardsdatascience.com","slug":"towards-data-science","isAuroraEligible":true,"isAuroraVisible":true,"viewerEdge":{"__ref":"CollectionViewerEdge:collectionId:7f60cf5620c9-viewerId:lo_35617133e23a"},"canToggleEmail":true},"UserViewerEdge:userId:4a56bf1f84c-viewerId:lo_35617133e23a":{"id":"userId:4a56bf1f84c-viewerId:lo_35617133e23a","__typename":"UserViewerEdge","isFollowing":false,"isUser":false},"User:4a56bf1f84c":{"id":"4a56bf1f84c","__typename":"User","name":"Sanand Patel","username":"spx905","bio":"I live and work in Toronto, Canada. Please view my detailed profile at www.linkedin.com\u002Fin\u002Fsanandpatel","imageId":"1*dmbNkD5D-u45r44go_cf0g.png","mediumMemberAt":0,"isPartnerProgramEnrolled":false,"viewerEdge":{"__ref":"UserViewerEdge:userId:4a56bf1f84c-viewerId:lo_35617133e23a"},"viewerIsUser":false,"newsletterV3":null,"customDomainState":null,"hasSubdomain":false,"postSubscribeMembershipUpsellShownAt":0},"Post:c6e308382926":{"id":"c6e308382926","__typename":"Post","title":"My Experiments In Replacing Deep Learning Backpropagation (SGD) With A Genetic Algorithm","mediumUrl":"https:\u002F\u002Ftowardsdatascience.com\u002Fmy-experiments-in-replacing-deep-learning-backpropagation-sgd-with-a-genetic-algorithm-c6e308382926","previewImage":{"__ref":"ImageMetadata:1*p0RW19vl_23Cm2DajRWeHw.png"},"isPublished":true,"firstPublishedAt":1570577988932,"readingTime":5.712578616352201,"statusForCollection":"APPROVED","isLocked":true,"visibility":"LOCKED","collection":{"__ref":"Collection:7f60cf5620c9"},"creator":{"__ref":"User:4a56bf1f84c"},"previewContent":{"__typename":"PreviewContent","isFullContent":false}},"ImageMetadata:1*b6yWdZ9dav_JiJz7llHK3w.png":{"id":"1*b6yWdZ9dav_JiJz7llHK3w.png","__typename":"ImageMetadata","focusPercentX":null,"focusPercentY":null},"UserViewerEdge:userId:3277019e1267-viewerId:lo_35617133e23a":{"id":"userId:3277019e1267-viewerId:lo_35617133e23a","__typename":"UserViewerEdge","isFollowing":false,"isUser":false},"User:3277019e1267":{"id":"3277019e1267","__typename":"User","name":"Alfred Sasko","username":"alfred-sasko","bio":"Principal Data Scientist","imageId":"2*WpBs_VX2ipR7IyykbhFdJg.jpeg","mediumMemberAt":1577644313995,"isPartnerProgramEnrolled":true,"viewerEdge":{"__ref":"UserViewerEdge:userId:3277019e1267-viewerId:lo_35617133e23a"},"viewerIsUser":false,"newsletterV3":null,"customDomainState":{"__typename":"CustomDomainState","live":{"__typename":"CustomDomain","domain":"alfred-sasko.medium.com"}},"hasSubdomain":true,"postSubscribeMembershipUpsellShownAt":0},"Post:2731ce8c0163":{"id":"2731ce8c0163","__typename":"Post","title":"Multilingual Document Classification","mediumUrl":"https:\u002F\u002Ftowardsdatascience.com\u002Fmultilingual-document-classification-2731ce8c0163","previewImage":{"__ref":"ImageMetadata:1*b6yWdZ9dav_JiJz7llHK3w.png"},"isPublished":true,"firstPublishedAt":1591198063621,"readingTime":5.470754716981133,"statusForCollection":"APPROVED","isLocked":true,"visibility":"LOCKED","collection":{"__ref":"Collection:7f60cf5620c9"},"creator":{"__ref":"User:3277019e1267"},"previewContent":{"__typename":"PreviewContent","isFullContent":false}},"ImageMetadata:1*GI1vtYqwtrwXn4pub5Oe-g.png":{"id":"1*GI1vtYqwtrwXn4pub5Oe-g.png","__typename":"ImageMetadata","focusPercentX":null,"focusPercentY":null},"UserViewerEdge:userId:ac113ce7aeda-viewerId:lo_35617133e23a":{"id":"userId:ac113ce7aeda-viewerId:lo_35617133e23a","__typename":"UserViewerEdge","isFollowing":false,"isUser":false},"NewsletterV3:c74a76a005a2":{"id":"c74a76a005a2","__typename":"NewsletterV3","type":"NEWSLETTER_TYPE_AUTHOR","slug":"ac113ce7aeda","name":"ac113ce7aeda","collection":null,"user":{"__ref":"User:ac113ce7aeda"}},"User:ac113ce7aeda":{"id":"ac113ce7aeda","__typename":"User","name":"abhinaya rajaram","username":"abhinaya-sridhar-rajaram","newsletterV3":{"__ref":"NewsletterV3:c74a76a005a2"},"bio":"Data in Law","imageId":"1*[email protected]","mediumMemberAt":1607460904000,"isPartnerProgramEnrolled":false,"viewerEdge":{"__ref":"UserViewerEdge:userId:ac113ce7aeda-viewerId:lo_35617133e23a"},"viewerIsUser":false,"customDomainState":{"__typename":"CustomDomainState","live":{"__typename":"CustomDomain","domain":"abhinaya-sridhar-rajaram.medium.com"}},"hasSubdomain":true,"postSubscribeMembershipUpsellShownAt":0},"Post:33d39165ae6b":{"id":"33d39165ae6b","__typename":"Post","title":"Sampling Methods to deal with Class Imbalance","mediumUrl":"https:\u002F\u002Fabhinaya-sridhar-rajaram.medium.com\u002Fimbalance-in-class-33d39165ae6b","previewImage":{"__ref":"ImageMetadata:1*GI1vtYqwtrwXn4pub5Oe-g.png"},"isPublished":true,"firstPublishedAt":1624486287304,"readingTime":3.7047169811320755,"statusForCollection":null,"isLocked":false,"visibility":"PUBLIC","collection":null,"creator":{"__ref":"User:ac113ce7aeda"},"previewContent":{"__typename":"PreviewContent","isFullContent":false}},"ImageMetadata:1*uPho9sPUsiJfQCU-94oHzA.png":{"id":"1*uPho9sPUsiJfQCU-94oHzA.png","__typename":"ImageMetadata","focusPercentX":null,"focusPercentY":null},"CollectionViewerEdge:collectionId:e96e88f703e2-viewerId:lo_35617133e23a":{"id":"collectionId:e96e88f703e2-viewerId:lo_35617133e23a","__typename":"CollectionViewerEdge","isEditor":false},"Collection:e96e88f703e2":{"id":"e96e88f703e2","__typename":"Collection","name":"Augmented Startups","description":"Augmented Startups — YouTuber 91000+ Subscribers","tagline":"Robotics | Computer Vision | AI | AR","domain":null,"slug":"augmented-startups","isAuroraEligible":true,"isAuroraVisible":false,"viewerEdge":{"__ref":"CollectionViewerEdge:collectionId:e96e88f703e2-viewerId:lo_35617133e23a"},"canToggleEmail":false},"UserViewerEdge:userId:77cb35c0e61c-viewerId:lo_35617133e23a":{"id":"userId:77cb35c0e61c-viewerId:lo_35617133e23a","__typename":"UserViewerEdge","isFollowing":false,"isUser":false},"NewsletterV3:bbde2c5ad3f9":{"id":"bbde2c5ad3f9","__typename":"NewsletterV3","type":"NEWSLETTER_TYPE_AUTHOR","slug":"77cb35c0e61c","name":"77cb35c0e61c","collection":null,"user":{"__ref":"User:77cb35c0e61c"}},"User:77cb35c0e61c":{"id":"77cb35c0e61c","__typename":"User","name":"Aditya Singh","username":"adityametier","newsletterV3":{"__ref":"NewsletterV3:bbde2c5ad3f9"},"bio":"","imageId":"1*WTswBVdDG6PtaWmuD62acw.jpeg","mediumMemberAt":0,"isPartnerProgramEnrolled":false,"viewerEdge":{"__ref":"UserViewerEdge:userId:77cb35c0e61c-viewerId:lo_35617133e23a"},"viewerIsUser":false,"customDomainState":null,"hasSubdomain":false,"postSubscribeMembershipUpsellShownAt":0},"Post:47df3e914cf8":{"id":"47df3e914cf8","__typename":"Post","title":"YOLOP: Single Shot Panoptic Driving Perception","mediumUrl":"https:\u002F\u002Fmedium.com\u002Faugmented-startups\u002Fyolop-single-shot-panoptic-driving-perception-47df3e914cf8","previewImage":{"__ref":"ImageMetadata:1*uPho9sPUsiJfQCU-94oHzA.png"},"isPublished":true,"firstPublishedAt":1631600019356,"readingTime":5.3254716981132075,"statusForCollection":"APPROVED","isLocked":false,"visibility":"PUBLIC","collection":{"__ref":"Collection:e96e88f703e2"},"creator":{"__ref":"User:77cb35c0e61c"},"previewContent":{"__typename":"PreviewContent","isFullContent":false}},"ImageMetadata:1*zAHne2Liz8RpCfTgqbCwYw.gif":{"id":"1*zAHne2Liz8RpCfTgqbCwYw.gif","__typename":"ImageMetadata","focusPercentX":null,"focusPercentY":null},"UserViewerEdge:userId:9e828c14da26-viewerId:lo_35617133e23a":{"id":"userId:9e828c14da26-viewerId:lo_35617133e23a","__typename":"UserViewerEdge","isFollowing":false,"isUser":false},"User:9e828c14da26":{"id":"9e828c14da26","__typename":"User","name":"Venkatesh Chandra","username":"venkatesh.mcgill","bio":"Data Scientist","imageId":"2*mmjlMMm9OcFEVublK3Hmjw.jpeg","mediumMemberAt":0,"isPartnerProgramEnrolled":false,"viewerEdge":{"__ref":"UserViewerEdge:userId:9e828c14da26-viewerId:lo_35617133e23a"},"viewerIsUser":false,"newsletterV3":null,"customDomainState":null,"hasSubdomain":false,"postSubscribeMembershipUpsellShownAt":0},"Post:a25279fdc26c":{"id":"a25279fdc26c","__typename":"Post","title":"Masking an area in a video in OpenCV in Python— Harry Potter Invisible Cloak example","mediumUrl":"https:\u002F\u002Fmedium.com\u002Fanalytics-vidhya\u002Fmasking-an-area-in-a-video-in-opencv-in-python-harry-potter-invisible-cloak-example-a25279fdc26c","previewImage":{"__ref":"ImageMetadata:1*zAHne2Liz8RpCfTgqbCwYw.gif"},"isPublished":true,"firstPublishedAt":1581783635631,"readingTime":3.4301886792452834,"statusForCollection":"APPROVED","isLocked":false,"visibility":"PUBLIC","collection":{"__ref":"Collection:7219b4dc6c4c"},"creator":{"__ref":"User:9e828c14da26"},"previewContent":{"__typename":"PreviewContent","isFullContent":false}},"PostViewerEdge:postId:eaa26d16c719-viewerId:lo_35617133e23a":{"id":"postId:eaa26d16c719-viewerId:lo_35617133e23a","__typename":"PostViewerEdge","catalogsConnection":null},"Post:eaa26d16c719":{"id":"eaa26d16c719","__typename":"Post","creator":{"__ref":"User:e29281884d6"},"canonicalUrl":"","collection":{"__ref":"Collection:a4d68c1f6803"},"content({\"postMeteringOptions\":{\"referrer\":\"https:\u002F\u002Fwww.google.com\u002F\",\"sk\":null,\"source\":null}})":{"__typename":"PostContent","isLockedPreviewOnly":true,"validatedShareKey":"","bodyModel":{"__typename":"RichText","paragraphs":[{"__ref":"Paragraph:457a32e5316b_0"},{"__ref":"Paragraph:457a32e5316b_1"},{"__ref":"Paragraph:457a32e5316b_2"},{"__ref":"Paragraph:457a32e5316b_3"}],"sections":[{"__typename":"Section","name":null,"startIndex":0,"textLayout":null,"imageLayout":null,"backgroundImage":null,"videoLayout":null,"backgroundVideo":null}]}},"customStyleSheet":{"__ref":"CustomStyleSheet:7fd29fa259f3"},"firstPublishedAt":1519399936956,"isLocked":true,"isPublished":true,"isShortform":false,"layerCake":0,"primaryTopic":null,"title":"Document Classification Part 2: Text Processing (N-Gram Model & TF-IDF Model)","isMarkedPaywallOnly":false,"mediumUrl":"https:\u002F\u002Fmedium.com\u002Fmachine-learning-intuition\u002Fdocument-classification-part-2-text-processing-eaa26d16c719","isLimitedState":false,"inResponseToPostResult":null,"inResponseToCatalogResult":null,"visibility":"LOCKED","license":"ALL_RIGHTS_RESERVED","allowResponses":true,"newsletterId":"","sequence":null,"tags":[{"__ref":"Tag:naturallanguageprocessing"},{"__ref":"Tag:nlp"},{"__ref":"Tag:python"},{"__ref":"Tag:machine-learning"},{"__ref":"Tag:classification"}],"topics":[{"__typename":"Topic","topicId":"1eca0103fff3","name":"Machine Learning"}],"isNewsletter":false,"isPublishToEmail":false,"socialTitle":"","socialDek":"","noIndex":null,"curationStatus":null,"metaDescription":"","latestPublishedAt":1524131893362,"readingTime":6.969182389937107,"previewContent":{"__typename":"PreviewContent","subtitle":"In this article I will explain some core concepts in text processing in conducting machine learning on documents to classify them into…"},"previewImage":{"__ref":"ImageMetadata:1*UhbeQoqOi9AaSqNnPS0TkQ.jpeg"},"clapCount":851,"postResponses":{"__typename":"PostResponses","count":4},"isSuspended":false,"pendingCollection":null,"statusForCollection":"APPROVED","lockedSource":"LOCKED_POST_SOURCE_UGC","pinnedAt":0,"pinnedByCreatorAt":0,"curationEligibleAt":0,"responseDistribution":"NOT_DISTRIBUTED","internalLinks({\"paging\":{\"limit\":8}})":{"__typename":"InternalLinksConnection","items":[{"__ref":"Post:26d5a993692b"},{"__ref":"Post:d640aa2c915d"},{"__ref":"Post:fdcf321e2d7d"},{"__ref":"Post:c6e308382926"},{"__ref":"Post:2731ce8c0163"},{"__ref":"Post:33d39165ae6b"},{"__ref":"Post:47df3e914cf8"},{"__ref":"Post:a25279fdc26c"}]},"viewerEdge":{"__ref":"PostViewerEdge:postId:eaa26d16c719-viewerId:lo_35617133e23a"},"collaborators":[],"translationSourcePost":null,"inResponseToMediaResource":null,"audioVersionUrl":"","seoTitle":"","updatedAt":1623743612708,"shortformType":"SHORTFORM_TYPE_LINK","structuredData":"","seoDescription":"","isIndexable":true,"latestPublishedVersion":"457a32e5316b","voterCount":146,"recommenders":[],"content({})":{"__typename":"PostContent","isLockedPreviewOnly":true,"validatedShareKey":"","bodyModel":{"__typename":"RichText","paragraphs":[{"__ref":"Paragraph:457a32e5316b_0"},{"__ref":"Paragraph:457a32e5316b_1"},{"__ref":"Paragraph:457a32e5316b_2"},{"__ref":"Paragraph:457a32e5316b_3"}],"sections":[{"__typename":"Section","name":null,"startIndex":0,"textLayout":null,"imageLayout":null,"backgroundImage":null,"videoLayout":null,"backgroundVideo":null}]}}}}</script><script>window.__MIDDLEWARE_STATE__={"session":{"xsrf":"bac3496f3b78"},"cache":{"cacheStatus":"HIT"}}</script><script src="https://cdn-client.medium.com/lite/static/js/manifest.a4404b0b.js"></script><script src="https://cdn-client.medium.com/lite/static/js/36804.29338eda.js"></script><script src="https://cdn-client.medium.com/lite/static/js/main.23f53594.js"></script><script src="https://cdn-client.medium.com/lite/static/js/45573.4354ed57.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/instrumentation.46e170b7.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/reporting.0a3746f4.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/81144.478f446d.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/11034.d256484f.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/90192.ba099145.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/79088.e4863540.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/81645.c8a01874.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/70832.444ac173.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/63303.da52dbf3.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/80685.98eaf21e.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/50006.f237604f.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/26022.606a1a5e.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/5850.2cc3e6a0.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/92397.6c801126.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/11615.6d046961.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/82529.1e6efc63.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/5055.da1a97c1.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/79851.0c6f9f31.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/79356.4527cdd4.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/22026.dbbd9f6f.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/36851.092d71cd.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/33673.de5f47de.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/95972.996c4300.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/11366.069ea1f1.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/60519.f409c2f9.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/62182.91cdfb4e.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/35285.dc03faaf.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/76155.a4aa3f03.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/9972.269c800c.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/43642.37bf25d2.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/46463.3c01e067.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/10733.13d16b41.chunk.js"></script>
<script src="https://cdn-client.medium.com/lite/static/js/Post.015510a3.chunk.js"></script><script>window.main();</script><script defer src="https://static.cloudflareinsights.com/beacon.min.js" data-cf-beacon='{"rayId":"6a2b8458f4d20dbc","token":"0b5f665943484354a59c39c6833f7078","version":"2021.10.0","si":100}'></script>
<script defer src="https://static.cloudflareinsights.com/beacon.min.js" data-cf-beacon='{"rayId":"6a2b8458dd9b0dbc","token":"0b5f665943484354a59c39c6833f7078","version":"2021.10.0","si":100}'></script>
</body></html>