Develop 8. Next Steps 1: comparing corpora #12

drjwbaker · 2020-05-27T16:39:53Z

drjwbaker · 2020-09-25T10:15:25Z

Based on #34 (comment) move to 'Dev Tasks'.

drjwbaker · 2020-09-25T10:25:57Z

Based on #34 (comment) scope here to:

look at unusually frequent vocab
look at two sets of word lists (e.g. MDG + IAMS) for sense of difference in structure of language, emphasis, et cetera.

drjwbaker · 2020-11-12T16:13:02Z

rough outline

intro

This episode introduces potential next steps for comparing corpora. In the context of catalogue data, this is important because: provides an alternative point of analysis for recognising the features of the catalogue data under analysis; can be used to compare sub-sets of catalogue data, e.g. use an exemplar subset to understand what linguistic features of the comparative sub-set need adjusting/repairing; allows comparison of catalogue data to everyday speech, in order to tease out - in an evidential way - the special language that should be used in guides to cataloguing at your institution (because, you'll - probably - have a style you want based on some exemplar cataloguing)

main body

Three parts:

Comparing wordlists. Generate word lists for BMCSat and BL-IAMS datasets in AntConc. Renaming files sensible things for comparison. Task: making of a list of words that appear in both top 30, and a list of unique words for each, then using knowledge from previous episodes to say what is different about BMSat.
Keyness. What it is. What it can be used for. Use AntConc to create keyness file. Explain negative keyness. Task: a) something basic on reading the results ("what are the five most unusually frequent verbs) + b) one on interpretation of negative keyness results: what does it tells us BL-IAMS cataloguing is not about, and given what we know about the collection is that a surprise? (that is, press at idea that these results cannot be a function of frequency effects in the objects being catalogued alone). Finally, link to AS/JB paper section on negative keyness for more info on practical uses.

include 'it' and 'or' from part 1

Comparing concordances in AntConc. Put both BMCSat and BL-IAMS in. Note need to think about relative sizes of the corpus (File View tab) and spread across corpora (Concordance Plot view, can open up need to order records logically - e.g. change over time). Read concordance lists. Task: compare use of a n-gram (something 'special language' like "towards the rear" that both have)

add keyness and NER files for #12 and #13

drjwbaker · 2020-11-13T15:28:07Z

Potentially, use comparing with Photo db subjects as a way of thinking about comparing between parts of the catalogue entry (so, 'description' is not in isolation)

drjwbaker · 2020-12-02T16:25:59Z

@rossi-uk Made a big update today! Are you able to work on the four remaining points at the top of the ticket? #12 (comment)

rossi-uk · 2020-12-02T17:18:34Z

@drjwbaker Thanks, yes, will do before our meeting.

rossi-uk · 2020-12-04T12:38:40Z

for numbers in line 31 what settings should be used for the wordlist - I had unticked treat all as lowercase in Tool Preferences and got 73295 vs 63100; once ticked it give the numbers in the lesson

rossi-uk · 2020-12-04T14:30:17Z

Keyness section - should we mention that the selection3 file needs to be open and a word list created. After the previous section I was working with the wordlist that I had generated to compare the corpora and that confused me. Also clarify the settings for the word list - Tool Preference - untick treat all data as lower case.

rossi-uk · 2020-12-04T14:35:23Z

Re task 2 are we suggesting that people run this with both corpora and then export Iams keyness txt file and BMC keyness txt file and open side by side in notepad and compare there?

rossi-uk · 2020-12-04T14:46:05Z

Comparing concordances section - again I get a different number of results for both corpora - what settings are we using in Tool preferences for the wordlists? I got 3103 results for behind across both corpora

drjwbaker · 2020-12-04T15:13:50Z

From meeting 4/12:

suggest a tool for opening text files
clarify what the screen should look like in keyness section (so no confusion about what is being compared)
put back in the default global/tool settings for this episode
put in sub-headings
add in brief explainer on adding own dataset
update looking *ly as potential task for Part 3 Task
roughly 45 minutes to complete

implement first 5 changes from #12 (comment)

drjwbaker added the development development of episodes before writing label May 27, 2020

drjwbaker self-assigned this May 27, 2020

drjwbaker changed the title ~~Develop 9. Next Steps 2: comparing corpora~~ Develop 9. Next Steps 1: comparing corpora Sep 25, 2020

drjwbaker changed the title ~~Develop 9. Next Steps 1: comparing corpora~~ Develop 8. Next Steps 1: comparing corpora Sep 25, 2020

drjwbaker pushed a commit that referenced this issue Nov 12, 2020

Add files via upload

ffdfacd

add keyness and NER files for #12 and #13

drjwbaker pushed a commit that referenced this issue Dec 8, 2020

Update 08-comparing.md

6665def

implement first 5 changes from #12 (comment)

drjwbaker added final check needed and removed development development of episodes before writing labels Jan 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop 8. Next Steps 1: comparing corpora #12

Develop 8. Next Steps 1: comparing corpora #12

drjwbaker commented May 27, 2020 •

edited

Loading

drjwbaker commented Sep 25, 2020

drjwbaker commented Sep 25, 2020

drjwbaker commented Nov 12, 2020 •

edited

Loading

drjwbaker commented Nov 13, 2020

drjwbaker commented Dec 2, 2020

rossi-uk commented Dec 2, 2020

rossi-uk commented Dec 4, 2020 •

edited

Loading

rossi-uk commented Dec 4, 2020

rossi-uk commented Dec 4, 2020

rossi-uk commented Dec 4, 2020

drjwbaker commented Dec 4, 2020 •

edited

Loading

Develop 8. Next Steps 1: comparing corpora #12

Develop 8. Next Steps 1: comparing corpora #12

Comments

drjwbaker commented May 27, 2020 • edited Loading

drjwbaker commented Sep 25, 2020

drjwbaker commented Sep 25, 2020

drjwbaker commented Nov 12, 2020 • edited Loading

rough outline

intro

main body

drjwbaker commented Nov 13, 2020

drjwbaker commented Dec 2, 2020

rossi-uk commented Dec 2, 2020

rossi-uk commented Dec 4, 2020 • edited Loading

rossi-uk commented Dec 4, 2020

rossi-uk commented Dec 4, 2020

rossi-uk commented Dec 4, 2020

drjwbaker commented Dec 4, 2020 • edited Loading

drjwbaker commented May 27, 2020 •

edited

Loading

drjwbaker commented Nov 12, 2020 •

edited

Loading

rossi-uk commented Dec 4, 2020 •

edited

Loading

drjwbaker commented Dec 4, 2020 •

edited

Loading