Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reshape introduction and welcome episodes #26

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

n400peanuts
Copy link
Collaborator

This PR solves issue #24 . The introduction and welcome part have been simplified and redundant info has been cut. This has resulted into an effective merge of the intro and welcome into a single episode. Moreover, this episode now introduces immediately an exercise to kick-off the course in a good vibe.

Note that the dataset part is still not completed, as transformers and LLM episodes are still in development and their datasets are not finalised yet.

@n400peanuts n400peanuts linked an issue Sep 10, 2024 that may be closed by this pull request
@n400peanuts n400peanuts added the enhancement New feature or request label Sep 10, 2024
Copy link

github-actions bot commented Sep 10, 2024

Thank you!

Thank you for your pull request 😃

🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}.

If you have files that automatically render output (e.g. R Markdown), then you should check for the following:

  • 🎯 correct output
  • 🖼️ correct figures
  • ❓ new warnings
  • ‼️ new errors

Rendered Changes

🔍 Inspect the changes: https://github.com/esciencecenter-digital-skills/Natural-language-processing/compare/md-outputs..md-outputs-PR-26

The following changes were observed in the rendered markdown documents:

 00-welcome.md                              | 63 +++++++++++++------
 01-introduction.md (gone)                  | 97 ------------------------------
 02-preprocessing.md => 01-preprocessing.md |  0
 03-embeddings.md => 02-embeddings.md       |  0
 config.yaml                                |  5 +-
 md5sum.txt                                 |  9 ++-
 6 files changed, 49 insertions(+), 125 deletions(-)
What does this mean?

If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible.

This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation.

⏱️ Updated at 2024-09-10 12:50:39 +0000

github-actions bot pushed a commit that referenced this pull request Sep 10, 2024
Copy link
Contributor

@svenvanderburg svenvanderburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes! I left some small comments here and there.

You could consider moving the welcome to index.md, similar to: https://github.com/carpentries-incubator/deep-learning-intro/blob/2edad370a4e1639f57804324e398b8762cb9a11a/index.md?plain=1#L2 which is rendered like this.

You could also consider adding a prerequisites box like in that lesson, but maybe that already exists somewhere in your lesson or is planned in an issue.

This course is designed to equip researchers in the humanities and social sciences with the foundational
skills needed to carry over text-based research projects.
### What is NLP?
Natural language processing (NLP) is an area of research and application that focuses on making natural (i.e., human) language accessible to computers so that they can be used to perform useful tasks. Research in NLP is highly interdisciplinary, drawing on concepts from computer science, linguistics, logic, mathematics, psychology, etc. In the past decade, NLP has evolved significantly with advances in technology to the point that it has become embedded in our daily lives: automatic language translation or chatGPT are only some examples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Natural language processing (NLP) is an area of research and application that focuses on making natural (i.e., human) language accessible to computers so that they can be used to perform useful tasks. Research in NLP is highly interdisciplinary, drawing on concepts from computer science, linguistics, logic, mathematics, psychology, etc. In the past decade, NLP has evolved significantly with advances in technology to the point that it has become embedded in our daily lives: automatic language translation or chatGPT are only some examples.
Natural language processing (NLP) is an area of research and application that focuses on making natural (i.e., human) language accessible to computers so that they can be used to perform useful tasks. Research in NLP is highly interdisciplinary, drawing on concepts from computer science, linguistics, logic, mathematics, psychology, etc. In the past decade, NLP has evolved significantly with advances in technology, especially in the field of deep learning, to the point that it has become embedded in our daily lives: automatic language translation or chatGPT are only some examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like you should mention deep learning here since so much of NLP research and applications are actually deep learning applied to NLP.


After following this lesson, learners will be able to:

- Explain and differentiate what are the core topics in NLP
- Identify what kinds of tasks NLP techniques excel at, and what are their limitations
- Structure a typical NLP pipeline
- Extract vector representations of individual words, visualise and manipulate it
- Extract vector representations of individual words, visualise it and manipulate it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Extract vector representations of individual words, visualise it and manipulate it
- Extract vector representations of individual words, visualise and manipulate them

- Applying a machine learning algorithm to textual data to extract and categorise names of entities (e.gs., places, people)
- Apply popular tools and libraries used to solve other tasks in NLP (such as topic modelling, and text generation)
- Using natural language to produce a desired response from a large language model (LLM), i.e. prompt engineering
- Other?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Other?

::::::

# Welcome
This is a hands-on introduction to Natural Language Processing (or NLP). NLP refers to a set of techniques involving the application of statistical methods,
with or without insights from linguistics, to understand natural (i.e, human) language for the sake of solving real-world tasks.
This course covers core concepts of Natural Language Processing (or NLP) and it is designed to equip researchers in the humanities and social sciences with the foundational skills and knowledge needed to carry over text-based research projects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This course covers core concepts of Natural Language Processing (or NLP) and it is designed to equip researchers in the humanities and social sciences with the foundational skills and knowledge needed to carry over text-based research projects.
This lesson covers core concepts of Natural Language Processing (or NLP) . It will equip you with the foundational skills and knowledge needed to carry over text-based research projects. The lesson is designed specifically with researchers in the humanities and social sciences in mind, but is also applicable to other fields of research.

I think even though you focus on SSH, the lesson will still be useful for others.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in carpentries lesson material it is common to refer to lesson instead of course


## Dataset
In this lesson, we'll use N books from the [Project Gutenberg](https://www.gutenberg.org/). We will use their Plain Text UTF-8 version.
For the episode 02: Transformers (BERT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For the episode 02: Transformers (BERT)

Maybe a minor technical principality: I would suggest to track 'what still need to be added' in issues, so that the lesson is you always a viable product.

- Using natural language to produce a desired response from a large language model (LLM), i.e. prompt engineering
- Other?

## Datasets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be part of the setup instructions. Or do you have a good reason for not putting it in there?

sciences. We will focus on solving a particular problem over the lesson, that is how to identify key entities in text (such as people,
places, companies, dates and more) and labeling each one of them with the right category name. Towards the end of the lesson,
we will cover also other types of applications (such as topic modelling, and text generation).
This lesson provides a high-level introduction to NLP with particular emphasis on applications in the humanities and the social sciences. We will focus on solving particular problems over the lesson. Some problems deal with capturing the meaning of a word (e.g., `head`) and how this changes over time and context (e.g., `top part of the body` vs `leaders of others`), others with identifying key entities in text (such as people, places, companies, dates and more) in literary texts and labeling each one of them with the right category name. These problems are examples of useful applications in your own research, however, they also offer a window in the latest NLP advancements that are now embedded in our daily life.
Copy link
Contributor

@svenvanderburg svenvanderburg Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way this part is now, it is a bit like: this gives you an idea of the kind of problems we will tackle, but you can forget everything you just read because we will get back to it anyway. Is that what you intended?

If not you could consider making short subsections and actually naming the problem (word2vec, named entity recognition). And mention the episodes in which you will actually cover this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reshape introduction and welcome episodes
2 participants