Skip to content

Commit

Permalink
re-train the model
Browse files Browse the repository at this point in the history
  • Loading branch information
dogancanbakir committed Sep 3, 2020
1 parent 873c452 commit 9358f83
Show file tree
Hide file tree
Showing 5 changed files with 12 additions and 5 deletions.
7 changes: 7 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@ Changes

.. contents::

0.4.0 (2020-09-02)
------------------

- pretty print spider errors
- extend site list
- re-crawled and re-trained

0.3.1 (2020-07-02)
------------------

Expand Down
6 changes: 3 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ You can also create a classifier explicitly::
Development
-----------

Classifier is trained on 37486 pages from 6660 domains, with 404 page ratio of about 1/3.
With 10-fold cross-validation, PR AUC (average precision) is 0.990 ± 0.006,
and ROC AUC is 0.994 ± 0.004.
Classifier is trained on 198801 pages from 35995 domains, with 404 page ratio of about 1/3.
With 10-fold cross-validation, PR AUC (average precision) is 0.991 ± 0.002,
and ROC AUC is 0.995 ± 0.002.


Getting data for training
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

setup(
name='soft-404',
version='0.3.1',
version='0.4.0',
author='Konstantin Lopuhin',
author_email='[email protected]',
description='A classifier for detecting soft 404 pages',
Expand Down
Binary file modified soft404/clf.joblib
Binary file not shown.
2 changes: 1 addition & 1 deletion tests/test_predict.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
def test_predict_classifier():
clf = Soft404Classifier()
assert clf.predict('<h1>page not found: 404 error</h1>') > 0.9
assert clf.predict('<h1>hi here!</h1> just a page') < 0.6
assert clf.predict('<h1>hi here!</h1> just a page') < 0.5


def test_predict_function():
Expand Down

0 comments on commit 9358f83

Please sign in to comment.