Reshape introduction and welcome episodes #26

n400peanuts · 2024-09-10T12:43:06Z

This PR solves issue #24 . The introduction and welcome part have been simplified and redundant info has been cut. This has resulted into an effective merge of the intro and welcome into a single episode. Moreover, this episode now introduces immediately an exercise to kick-off the course in a good vibe.

Note that the dataset part is still not completed, as transformers and LLM episodes are still in development and their datasets are not finalised yet.

github-actions · 2024-09-10T12:43:21Z

Thank you!

Thank you for your pull request 😃

🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}.

If you have files that automatically render output (e.g. R Markdown), then you should check for the following:

🎯 correct output
🖼️ correct figures
❓ new warnings
‼️ new errors

Rendered Changes

🔍 Inspect the changes: https://github.com/esciencecenter-digital-skills/Natural-language-processing/compare/md-outputs..md-outputs-PR-26

The following changes were observed in the rendered markdown documents:

 00-welcome.md                              | 63 +++++++++++++------
 01-introduction.md (gone)                  | 97 ------------------------------
 02-preprocessing.md => 01-preprocessing.md |  0
 03-embeddings.md => 02-embeddings.md       |  0
 config.yaml                                |  5 +-
 md5sum.txt                                 |  9 ++-
 6 files changed, 49 insertions(+), 125 deletions(-)

What does this mean?

If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible.

This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation.

⏱️ Updated at 2024-09-10 12:50:39 +0000

svenvanderburg

Nice changes! I left some small comments here and there.

You could consider moving the welcome to index.md, similar to: https://github.com/carpentries-incubator/deep-learning-intro/blob/2edad370a4e1639f57804324e398b8762cb9a11a/index.md?plain=1#L2 which is rendered like this.

You could also consider adding a prerequisites box like in that lesson, but maybe that already exists somewhere in your lesson or is planned in an issue.

svenvanderburg · 2024-09-30T15:20:20Z

episodes/00-welcome.md

-This course is designed to equip researchers in the humanities and social sciences with the foundational
-skills needed to carry over text-based research projects. 
+### What is NLP?
+Natural language processing (NLP) is an area of research and application that focuses on making natural (i.e., human) language accessible to computers so that they can be used to perform useful tasks. Research in NLP is highly interdisciplinary, drawing on concepts from computer science, linguistics, logic, mathematics, psychology, etc. In the past decade, NLP has evolved significantly with advances in technology to the point that it has become embedded in our daily lives: automatic language translation or chatGPT are only some examples. 


Suggested change

Natural language processing (NLP) is an area of research and application that focuses on making natural (i.e., human) language accessible to computers so that they can be used to perform useful tasks. Research in NLP is highly interdisciplinary, drawing on concepts from computer science, linguistics, logic, mathematics, psychology, etc. In the past decade, NLP has evolved significantly with advances in technology to the point that it has become embedded in our daily lives: automatic language translation or chatGPT are only some examples.

Natural language processing (NLP) is an area of research and application that focuses on making natural (i.e., human) language accessible to computers so that they can be used to perform useful tasks. Research in NLP is highly interdisciplinary, drawing on concepts from computer science, linguistics, logic, mathematics, psychology, etc. In the past decade, NLP has evolved significantly with advances in technology, especially in the field of deep learning, to the point that it has become embedded in our daily lives: automatic language translation or chatGPT are only some examples.

I feel like you should mention deep learning here since so much of NLP research and applications are actually deep learning applied to NLP.

svenvanderburg · 2024-09-30T15:22:34Z

episodes/00-welcome.md


 After following this lesson, learners will be able to:

 - Explain and differentiate what are the core topics in NLP
 - Identify what kinds of tasks NLP techniques excel at, and what are their limitations
 - Structure a typical NLP pipeline
- Extract vector representations of individual words, visualise and manipulate it
+- Extract vector representations of individual words, visualise it and manipulate it


Suggested change

- Extract vector representations of individual words, visualise it and manipulate it

- Extract vector representations of individual words, visualise and manipulate them

svenvanderburg · 2024-09-30T15:23:20Z

episodes/00-welcome.md

 - Applying a machine learning algorithm to textual data to extract and categorise names of entities (e.gs., places, people)
- Apply popular tools and libraries used to solve other tasks in NLP (such as topic modelling, and text generation)
+- Using natural language to produce a desired response from a large language model (LLM), i.e. prompt engineering
+- Other?


Suggested change

- Other?

svenvanderburg · 2024-09-30T15:27:46Z

episodes/00-welcome.md

 ::::::

 # Welcome
-This is a hands-on introduction to Natural Language Processing (or NLP). NLP refers to a set of techniques involving the application of statistical methods, 
-with or without insights from linguistics, to understand natural (i.e, human) language for the sake of solving real-world tasks.
+This course covers core concepts of Natural Language Processing (or NLP) and it is designed to equip researchers in the humanities and social sciences with the foundational skills and knowledge needed to carry over text-based research projects. 


Suggested change

This course covers core concepts of Natural Language Processing (or NLP) and it is designed to equip researchers in the humanities and social sciences with the foundational skills and knowledge needed to carry over text-based research projects.

This lesson covers core concepts of Natural Language Processing (or NLP) . It will equip you with the foundational skills and knowledge needed to carry over text-based research projects. The lesson is designed specifically with researchers in the humanities and social sciences in mind, but is also applicable to other fields of research.

I think even though you focus on SSH, the lesson will still be useful for others.

Also, in carpentries lesson material it is common to refer to lesson instead of course

svenvanderburg · 2024-09-30T15:31:39Z

episodes/00-welcome.md


-## Dataset
-In this lesson, we'll use N books from the [Project Gutenberg](https://www.gutenberg.org/). We will use their Plain Text UTF-8 version.
+For the episode 02: Transformers (BERT)


Suggested change

For the episode 02: Transformers (BERT)

Maybe a minor technical principality: I would suggest to track 'what still need to be added' in issues, so that the lesson is you always a viable product.

svenvanderburg · 2024-09-30T15:32:00Z

episodes/00-welcome.md

+- Using natural language to produce a desired response from a large language model (LLM), i.e. prompt engineering
+- Other?
+
+## Datasets


I think this should be part of the setup instructions. Or do you have a good reason for not putting it in there?

svenvanderburg · 2024-09-30T15:37:38Z

episodes/00-welcome.md

-sciences. We will focus on solving a particular problem over the lesson, that is how to identify key entities in text (such as people,
-places, companies, dates and more) and labeling each one of them with the right category name. Towards the end of the lesson,
-we will cover also other types of applications (such as topic modelling, and text generation).
+This lesson provides a high-level introduction to NLP with particular emphasis on applications in the humanities and the social sciences. We will focus on solving particular problems over the lesson. Some problems deal with capturing the meaning of a word (e.g., `head`) and how this changes over time and context (e.g., `top part of the body` vs `leaders of others`), others with identifying key entities in text (such as people, places, companies, dates and more) in literary texts and labeling each one of them with the right category name. These problems are examples of useful applications in your own research, however, they also offer a window in the latest NLP advancements that are now embedded in our daily life.


I think the way this part is now, it is a bit like: this gives you an idea of the kind of problems we will tackle, but you can forget everything you just read because we will get back to it anyway. Is that what you intended?

If not you could consider making short subsections and actually naming the problem (word2vec, named entity recognition). And mention the episodes in which you will actually cover this.

Eva Viviani added 6 commits September 10, 2024 14:10

reinstated exercise on technologies

3a480d7

merged welcome and introduction into a single episode

8bff624

reshape numbering of episodes

a5ca598

added dataset section

59ecc7f

resolve reference

77470b4

update learning objectives

f446df0

n400peanuts requested a review from svenvanderburg September 10, 2024 12:43

n400peanuts linked an issue Sep 10, 2024 that may be closed by this pull request

reshape introduction and welcome episodes #24

Open

n400peanuts added the enhancement New feature or request label Sep 10, 2024

config file updated to reflect new episode structure

9a7cf9d

github-actions bot pushed a commit that referenced this pull request Sep 10, 2024

differences for PR #26

c8ca207

svenvanderburg approved these changes Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reshape introduction and welcome episodes #26

Reshape introduction and welcome episodes #26

n400peanuts commented Sep 10, 2024

github-actions bot commented Sep 10, 2024 •

edited

Loading

svenvanderburg left a comment

svenvanderburg Sep 30, 2024

svenvanderburg Sep 30, 2024

svenvanderburg Sep 30, 2024

svenvanderburg Sep 30, 2024

svenvanderburg Sep 30, 2024

svenvanderburg Sep 30, 2024

svenvanderburg Sep 30, 2024

svenvanderburg Sep 30, 2024

svenvanderburg Sep 30, 2024 •

edited

Loading

	- Extract vector representations of individual words, visualise it and manipulate it
	- Extract vector representations of individual words, visualise and manipulate them

	This course covers core concepts of Natural Language Processing (or NLP) and it is designed to equip researchers in the humanities and social sciences with the foundational skills and knowledge needed to carry over text-based research projects.
	This lesson covers core concepts of Natural Language Processing (or NLP) . It will equip you with the foundational skills and knowledge needed to carry over text-based research projects. The lesson is designed specifically with researchers in the humanities and social sciences in mind, but is also applicable to other fields of research.

Reshape introduction and welcome episodes #26

Are you sure you want to change the base?

Reshape introduction and welcome episodes #26

Conversation

n400peanuts commented Sep 10, 2024

github-actions bot commented Sep 10, 2024 • edited Loading

Thank you!

Rendered Changes

svenvanderburg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

svenvanderburg Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Sep 10, 2024 •

edited

Loading

svenvanderburg Sep 30, 2024 •

edited

Loading