-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reshape introduction and welcome episodes #26
base: main
Are you sure you want to change the base?
Conversation
Thank you!Thank you for your pull request 😃 🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}. If you have files that automatically render output (e.g. R Markdown), then you should check for the following:
Rendered Changes🔍 Inspect the changes: https://github.com/esciencecenter-digital-skills/Natural-language-processing/compare/md-outputs..md-outputs-PR-26 The following changes were observed in the rendered markdown documents:
What does this mean?If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible. This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation. ⏱️ Updated at 2024-09-10 12:50:39 +0000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice changes! I left some small comments here and there.
You could consider moving the welcome to index.md
, similar to: https://github.com/carpentries-incubator/deep-learning-intro/blob/2edad370a4e1639f57804324e398b8762cb9a11a/index.md?plain=1#L2 which is rendered like this.
You could also consider adding a prerequisites box like in that lesson, but maybe that already exists somewhere in your lesson or is planned in an issue.
This course is designed to equip researchers in the humanities and social sciences with the foundational | ||
skills needed to carry over text-based research projects. | ||
### What is NLP? | ||
Natural language processing (NLP) is an area of research and application that focuses on making natural (i.e., human) language accessible to computers so that they can be used to perform useful tasks. Research in NLP is highly interdisciplinary, drawing on concepts from computer science, linguistics, logic, mathematics, psychology, etc. In the past decade, NLP has evolved significantly with advances in technology to the point that it has become embedded in our daily lives: automatic language translation or chatGPT are only some examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Natural language processing (NLP) is an area of research and application that focuses on making natural (i.e., human) language accessible to computers so that they can be used to perform useful tasks. Research in NLP is highly interdisciplinary, drawing on concepts from computer science, linguistics, logic, mathematics, psychology, etc. In the past decade, NLP has evolved significantly with advances in technology to the point that it has become embedded in our daily lives: automatic language translation or chatGPT are only some examples. | |
Natural language processing (NLP) is an area of research and application that focuses on making natural (i.e., human) language accessible to computers so that they can be used to perform useful tasks. Research in NLP is highly interdisciplinary, drawing on concepts from computer science, linguistics, logic, mathematics, psychology, etc. In the past decade, NLP has evolved significantly with advances in technology, especially in the field of deep learning, to the point that it has become embedded in our daily lives: automatic language translation or chatGPT are only some examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like you should mention deep learning here since so much of NLP research and applications are actually deep learning applied to NLP.
|
||
After following this lesson, learners will be able to: | ||
|
||
- Explain and differentiate what are the core topics in NLP | ||
- Identify what kinds of tasks NLP techniques excel at, and what are their limitations | ||
- Structure a typical NLP pipeline | ||
- Extract vector representations of individual words, visualise and manipulate it | ||
- Extract vector representations of individual words, visualise it and manipulate it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Extract vector representations of individual words, visualise it and manipulate it | |
- Extract vector representations of individual words, visualise and manipulate them |
- Applying a machine learning algorithm to textual data to extract and categorise names of entities (e.gs., places, people) | ||
- Apply popular tools and libraries used to solve other tasks in NLP (such as topic modelling, and text generation) | ||
- Using natural language to produce a desired response from a large language model (LLM), i.e. prompt engineering | ||
- Other? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Other? |
:::::: | ||
|
||
# Welcome | ||
This is a hands-on introduction to Natural Language Processing (or NLP). NLP refers to a set of techniques involving the application of statistical methods, | ||
with or without insights from linguistics, to understand natural (i.e, human) language for the sake of solving real-world tasks. | ||
This course covers core concepts of Natural Language Processing (or NLP) and it is designed to equip researchers in the humanities and social sciences with the foundational skills and knowledge needed to carry over text-based research projects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This course covers core concepts of Natural Language Processing (or NLP) and it is designed to equip researchers in the humanities and social sciences with the foundational skills and knowledge needed to carry over text-based research projects. | |
This lesson covers core concepts of Natural Language Processing (or NLP) . It will equip you with the foundational skills and knowledge needed to carry over text-based research projects. The lesson is designed specifically with researchers in the humanities and social sciences in mind, but is also applicable to other fields of research. |
I think even though you focus on SSH, the lesson will still be useful for others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in carpentries lesson material it is common to refer to lesson
instead of course
|
||
## Dataset | ||
In this lesson, we'll use N books from the [Project Gutenberg](https://www.gutenberg.org/). We will use their Plain Text UTF-8 version. | ||
For the episode 02: Transformers (BERT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the episode 02: Transformers (BERT) |
Maybe a minor technical principality: I would suggest to track 'what still need to be added' in issues, so that the lesson is you always a viable product.
- Using natural language to produce a desired response from a large language model (LLM), i.e. prompt engineering | ||
- Other? | ||
|
||
## Datasets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be part of the setup instructions. Or do you have a good reason for not putting it in there?
sciences. We will focus on solving a particular problem over the lesson, that is how to identify key entities in text (such as people, | ||
places, companies, dates and more) and labeling each one of them with the right category name. Towards the end of the lesson, | ||
we will cover also other types of applications (such as topic modelling, and text generation). | ||
This lesson provides a high-level introduction to NLP with particular emphasis on applications in the humanities and the social sciences. We will focus on solving particular problems over the lesson. Some problems deal with capturing the meaning of a word (e.g., `head`) and how this changes over time and context (e.g., `top part of the body` vs `leaders of others`), others with identifying key entities in text (such as people, places, companies, dates and more) in literary texts and labeling each one of them with the right category name. These problems are examples of useful applications in your own research, however, they also offer a window in the latest NLP advancements that are now embedded in our daily life. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way this part is now, it is a bit like: this gives you an idea of the kind of problems we will tackle, but you can forget everything you just read because we will get back to it anyway. Is that what you intended?
If not you could consider making short subsections and actually naming the problem (word2vec, named entity recognition). And mention the episodes in which you will actually cover this.
This PR solves issue #24 . The introduction and welcome part have been simplified and redundant info has been cut. This has resulted into an effective merge of the intro and welcome into a single episode. Moreover, this episode now introduces immediately an exercise to kick-off the course in a good vibe.
Note that the dataset part is still not completed, as transformers and LLM episodes are still in development and their datasets are not finalised yet.