Skip to content

Commit

Permalink
Update gc4-corpus.md
Browse files Browse the repository at this point in the history
  • Loading branch information
PhilipMay authored Apr 7, 2024
1 parent 92d4586 commit ba78c23
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion source/projects/gc4-corpus.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ The German colossal, cleaned Common Crawl corpus.

This is a German text corpus which is based on [Common Crawl](https://commoncrawl.org/). It has been cleaned up and preprocessed and can be used for various tasks in the NLP field. For example, for the self-supervised training of language models.

GC4 has been created by [**Philipp Reißel**](https://www.linkedin.com/in/philipp-reissel/) from [ambeRoad](https://amberoad.de/) with support from [Philip May](https://may.la/) from [Deutsche Telekom](https://www.telekom.de/).
GC4 has been created by [**Philipp Reißel**](https://www.linkedin.com/in/philipp-reissel/) from [ambeRoad](https://amberoad.de/) with support from [Philip May](https://philipmay.org) from [Deutsche Telekom](https://www.telekom.de/).
Many thanks to [iisys](https://www.iisys.de/) (the Institute of Information Systems Hof University) for hosting this dataset.

**For [download](#download) scroll way down.**
Expand Down

0 comments on commit ba78c23

Please sign in to comment.