simplify directory structure for framework/ #50

jspaaks · 2020-07-13T10:51:11Z

Below is what I think what's needed for preparing the model to be used in the web interface.

Couple observations:

I think we can ditch docker-compose (for /framework, not for /).
I've incorpoarted English demo dataset #47 (multi language sample data sets)
/preprocessing/Dockerfile is only there to provide hasslefree dependency management, such that people don't have to install conda, Jupyter notebook and what have you.

Update Updated directory structure below, #50 (comment)

.
├── corpora
│   ├── en-us
│   │   └── police-aggression
│   │       ├── file1.txt
│   │       └── file2.txt
│   ├── fr
│   │   ├── baguettes
│   │   │   ├── file1.txt
│   │   │   └── file2.txt
│   │   └── croissants
│   │       └── file1.txt
│   └── nl
│       └── getuigenverhalen
│           ├── file1.txt
│           ├── file2.txt
│           └── file3.txt
├── docker-compose.yml
├── output
│   ├── README.md
│   ├── tag-1
│   │   ├── model
│   │   ├── report.log
│   │   └── vocabulary
│   ├── tag-2
│   │   ├── model
│   │   ├── report.log
│   │   └── vocabulary
│   └── tag-n
│       ├── model
│       ├── report.log
│       └── vocabulary
├── model-generation
│   ├── Dockerfile
│   └── notebooks
│       ├── corpus_reinferral.ipynb
│       ├── filepaths.ipynb
│       ├── generate_doc2vec_model.ipynb
│       ├── preprocess_corpus.ipynb
│       └── tokenizer.ipynb
└── ui
    ├── indexer
    ├── internal
    ├── testdata
    └── ui

The text was updated successfully, but these errors were encountered:

sverhoeven · 2020-07-13T11:46:12Z

A model (aka doc2vec json) is based on a single corpus. How would you store a model for each corpus in this directory structure?
Is a vocabulary a (pre)processed corpus?
Handling different languages is not part of the milestone. I think we should only introduce a language directory if we are going to use it. Can the language dir be removed?

sverhoeven · 2020-07-13T11:47:33Z

output is confusing directory name as it is input for the ui. Can output be renamed/removed?

sverhoeven · 2020-07-13T11:57:33Z

The directory structure ties in with how we want a user to use the repo, which is drafted in #41

jspaaks · 2020-07-13T12:01:41Z

A model (aka doc2vec json) is based on a single corpus. How would you store a model for each corpus in this directory structure?

I updated the tree to have tag directory, where a tag serves to identify a given analysis

Is a vocabulary a (pre)processed corpus?

Vocabulary is the combination of what is now _ids and _corpus, i.e. whatever is produced by gensim

Handling different languages is not part of the milestone. I think we should only introduce a language directory if we are going to use it. Can the language dir be removed?

Just to be clear, your preference is to have corpora/getuigenverhalen instead of corpora/nl/getuigenverhalen even if nl doesn't have siblings?

sverhoeven · 2020-07-13T12:09:00Z

A model (aka doc2vec json) is based on a single corpus. How would you store a model for each corpus in this directory structure?

I updated the tree to have tag directory, where a tag serves to identify a given analysis

How would a output/tag-1 output be traceable to one of the corpus? By using corpus name in tag name?

Is a vocabulary a (pre)processed corpus?

Vocabulary is the combination of what is now _ids and _corpus, i.e. whatever is produced by gensim

Handling different languages is not part of the milestone. I think we should only introduce a language directory if we are going to use it. Can the language dir be removed?

Just to be clear, your preference is to have corpora/getuigenverhalen instead of corpora/nl/getuigenverhalen even if nl doesn't have siblings?

Yes, corpora/getuigenverhalen is my preference

jspaaks · 2020-07-13T15:36:53Z

(Updated directory structure after some discussion)

.
├── docker-compose.yml
├── experiments
│   ├── demo
│   │   ├── config.env
│   │   ├── corpus
│   │   │   ├── file1.txt
│   │   │   ├── file2.txt
│   │   │   └── file3.txt
│   │   ├── model
│   │   ├── report.log
│   │   └── vocabulary.json
│   ├── getuigenverhalen
│   │   ├── config.env
│   │   ├── corpus
│   │   │   ├── file1.txt
│   │   │   ├── file2.txt
│   │   │   └── file3.txt
│   │   ├── model
│   │   ├── report.log
│   │   └── vocabulary.json
│   └── README.md
├── model-generation
│   ├── Dockerfile
│   └── notebooks
│       ├── corpus_reinferral.ipynb
│       ├── filepaths.ipynb
│       ├── generate_doc2vec_model.ipynb
│       ├── preprocess_corpus.ipynb
│       └── tokenizer.ipynb
└── ui
    ├── indexer
    ├── internal
    ├── testdata
    └── ui

jspaaks · 2020-07-13T16:54:47Z

working on it in #58

jspaaks · 2020-07-15T13:39:26Z

I think this can be closed.

fdiblen · 2020-07-15T14:03:52Z

implemented in #61

jspaaks mentioned this issue Jul 13, 2020

EviDENce_doc2vec_docker_framework has hardcoded usernames #25

Closed

jspaaks added the standup label Jul 13, 2020

This was referenced Jul 13, 2020

framework's run_evidence_framework.sh generates a bunch of files, #33

Closed

merge docker-compose files #45

Closed

fdiblen removed the standup label Jul 13, 2020

fdiblen added this to the Milestone 1 milestone Jul 13, 2020

jspaaks self-assigned this Jul 13, 2020

This was referenced Jul 13, 2020

refactoring directory layout #58

Merged

running jupyter notebooks in docker #61

Merged

jspaaks added the standup label Jul 15, 2020

fdiblen closed this as completed Jul 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simplify directory structure for framework/ #50

simplify directory structure for framework/ #50

jspaaks commented Jul 13, 2020 •

edited

Loading

sverhoeven commented Jul 13, 2020

sverhoeven commented Jul 13, 2020

sverhoeven commented Jul 13, 2020

jspaaks commented Jul 13, 2020

sverhoeven commented Jul 13, 2020

jspaaks commented Jul 13, 2020

jspaaks commented Jul 13, 2020

jspaaks commented Jul 15, 2020

fdiblen commented Jul 15, 2020

simplify directory structure for framework/ #50

simplify directory structure for framework/ #50

Comments

jspaaks commented Jul 13, 2020 • edited Loading

sverhoeven commented Jul 13, 2020

sverhoeven commented Jul 13, 2020

sverhoeven commented Jul 13, 2020

jspaaks commented Jul 13, 2020

sverhoeven commented Jul 13, 2020

jspaaks commented Jul 13, 2020

jspaaks commented Jul 13, 2020

jspaaks commented Jul 15, 2020

fdiblen commented Jul 15, 2020

jspaaks commented Jul 13, 2020 •

edited

Loading