remove "vastly"

[ci skip] This build is based on 32a9909. This commit was created by the following CI build and job: https://github.com/biocypher/biochatter-paper/commit/32a990970f754b4a632acac66c661aed6c8b01c6/checks https://github.com/biocypher/biochatter-paper/actions/runs/7914598592
biocypher · Feb 15, 2024 · 3579d76 · 3579d76
1 parent 4538fc5
commit 3579d76
Show file tree

Hide file tree

Showing 24 changed files with 4,260 additions and 31 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 # Output directory containing the formatted manuscript
 
 The [`gh-pages`](https://github.com/biocypher/biochatter-paper/tree/gh-pages) branch hosts the contents of this directory at <https://biocypher.github.io/biochatter-paper/>.
-The permalink for this webpage version is <https://biocypher.github.io/biochatter-paper/v/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2/>.
+The permalink for this webpage version is <https://biocypher.github.io/biochatter-paper/v/32a990970f754b4a632acac66c661aed6c8b01c6/>.
 To redirect to the permalink for the latest manuscript version at anytime, use the link <https://biocypher.github.io/biochatter-paper/v/freeze/>.
 
 ## Files
@@ -35,4 +35,4 @@ Verifying timestamps with the `ots verify` command requires running a local bitc
 ## Source
 
 The manuscripts in this directory were built from
-[`ccfa0d0db46d487019ff6a8a6ced67f18306f5e2`](https://github.com/biocypher/biochatter-paper/commit/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2).
+[`32a990970f754b4a632acac66c661aed6c8b01c6`](https://github.com/biocypher/biochatter-paper/commit/32a990970f754b4a632acac66c661aed6c8b01c6).
diff --git a/index.html b/index.html
@@ -124,8 +124,8 @@
   <meta name="dc.date" content="2024-02-15" />
   <meta name="citation_publication_date" content="2024-02-15" />
   <meta property="article:published_time" content="2024-02-15" />
-  <meta name="dc.modified" content="2024-02-15T10:33:39+00:00" />
-  <meta property="article:modified_time" content="2024-02-15T10:33:39+00:00" />
+  <meta name="dc.modified" content="2024-02-15T10:38:49+00:00" />
+  <meta property="article:modified_time" content="2024-02-15T10:38:49+00:00" />
   <meta name="dc.language" content="en-UK" />
   <meta name="citation_language" content="en-UK" />
   <meta name="dc.relation.ispartof" content="Manubot" />
@@ -169,9 +169,9 @@
   <meta name="citation_fulltext_html_url" content="https://biocypher.github.io/biochatter-paper/" />
   <meta name="citation_pdf_url" content="https://biocypher.github.io/biochatter-paper/manuscript.pdf" />
   <link rel="alternate" type="application/pdf" href="https://biocypher.github.io/biochatter-paper/manuscript.pdf" />
-  <link rel="alternate" type="text/html" href="https://biocypher.github.io/biochatter-paper/v/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2/" />
-  <meta name="manubot_html_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2/" />
-  <meta name="manubot_pdf_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2/manuscript.pdf" />
+  <link rel="alternate" type="text/html" href="https://biocypher.github.io/biochatter-paper/v/32a990970f754b4a632acac66c661aed6c8b01c6/" />
+  <meta name="manubot_html_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/32a990970f754b4a632acac66c661aed6c8b01c6/" />
+  <meta name="manubot_pdf_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/32a990970f754b4a632acac66c661aed6c8b01c6/manuscript.pdf" />
   <meta property="og:type" content="article" />
   <meta property="twitter:card" content="summary_large_image" />
   <link rel="icon" type="image/png" sizes="192x192" href="https://manubot.org/favicon-192x192.png" />
@@ -188,9 +188,9 @@ <h1 class="title">A Platform for the Biomedical Application of Large Language Mo
 </header>
 <p><small><em>
 This manuscript
-(<a href="https://biocypher.github.io/biochatter-paper/v/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2/">permalink</a>)
+(<a href="https://biocypher.github.io/biochatter-paper/v/32a990970f754b4a632acac66c661aed6c8b01c6/">permalink</a>)
 was automatically generated
-from <a href="https://github.com/biocypher/biochatter-paper/tree/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2">biocypher/biochatter-paper@ccfa0d0</a>
+from <a href="https://github.com/biocypher/biochatter-paper/tree/32a990970f754b4a632acac66c661aed6c8b01c6">biocypher/biochatter-paper@32a9909</a>
 on February 15, 2024.
 </em></small></p>
 <h2 id="authors">Authors</h2>
@@ -306,7 +306,7 @@ <h2 id="introduction">Introduction</h2>
 In addition, biological events are context-dependent, for instance with respect to a cell type or specific disease.</p>
 <p>Large Language Models (LLMs) of the current generation, in contrast, can access enormous amounts of knowledge, encoded (incomprehensibly) in their billions of parameters <span class="citation" data-cites="JIjeWPOb IzWFZmuQ 17lpGtuH5 fLS7kvml">[<a href="#ref-JIjeWPOb" role="doc-biblioref">4</a>,<a href="#ref-IzWFZmuQ" role="doc-biblioref">5</a>,<a href="#ref-17lpGtuH5" role="doc-biblioref">6</a>,<a href="#ref-fLS7kvml" role="doc-biblioref">7</a>]</span>.
 Trained correctly, they can recall and combine virtually limitless knowledge from their training set.
-ChatGPT has taken the world by storm, and many biomedical researchers already use LLMs in their daily work, for general as well as research tasks <span class="citation" data-cites="viLUfCLq ae7XiPvs wo7jyZHW">[<a href="#ref-viLUfCLq" role="doc-biblioref">8</a>,<a href="#ref-ae7XiPvs" role="doc-biblioref">9</a>,<a href="#ref-wo7jyZHW" role="doc-biblioref">10</a>]</span>.
+LLMs have taken the world by storm, and many biomedical researchers already use them in their daily work, for general as well as research tasks <span class="citation" data-cites="viLUfCLq ae7XiPvs wo7jyZHW">[<a href="#ref-viLUfCLq" role="doc-biblioref">8</a>,<a href="#ref-ae7XiPvs" role="doc-biblioref">9</a>,<a href="#ref-wo7jyZHW" role="doc-biblioref">10</a>]</span>.
 However, the current way of interacting with LLMs is predominantly manual, virtually non-reproducible, and their behaviour can be erratic.
 For instance, they are known to confabulate: they make up facts as they go along, and, to make matters worse, are convinced — and convincing — regarding the truth of their confabulations <span class="citation" data-cites="elx4isXx wo7jyZHW">[<a href="#ref-wo7jyZHW" role="doc-biblioref">10</a>,<a href="#ref-elx4isXx" role="doc-biblioref">11</a>]</span>.
 While current efforts towards Artificial General Intelligence manage to ameliorate some of the shortcomings by ensembling multiple models <span class="citation" data-cites="UEmjXz02">[<a href="#ref-UEmjXz02" role="doc-biblioref">12</a>]</span> with long-term memory stores <span class="citation" data-cites="gy4YOpGJ">[<a href="#ref-gy4YOpGJ" role="doc-biblioref">13</a>]</span>, the current generation of AI does not inspire adequate trust to be applied to biomedical problems without supervision <span class="citation" data-cites="elx4isXx">[<a href="#ref-elx4isXx" role="doc-biblioref">11</a>]</span>.
@@ -318,9 +318,9 @@ <h2 id="introduction">Introduction</h2>
 <!-- Figure 1 -->
 <div id="fig:overview" class="fignos">
 <figure>
-<img src="images/biochatter_overview.png" alt="Figure 1: The BioChatter composable platform architecture (simplified). LLMs can facilitate many tasks in daily biomedical research practice, for instance, interpretation of experimental results or the use of a web resource (top left). BioChatter’s main response circuit (blue) composes a number of specifically engineered prompts and passes them (and a conversation history) to the primary LLM, which generates a response for the user based on all inputs. This response is simultaneously used to prompt the secondary circuit (orange), which fulfils auxiliary tasks to complement the primary response. In particular, using search, the secondary circuit queries a database as a prior knowledge repository and compares annotations to the primary response, or uses the knowledge to perform Retrieval-Augmented Generation (RAG). A knowledge graph such as BioCypher [15] can similarly serve as knowledge resource or long-term memory extension of the model. Further, an independent LLM receives the primary response for fact-checking, which can be supplemented with context-specific information by a RAG process. The platform is composable in most aspects, allowing arbitrary extensions to other specialised models for additional tasks orchestrated by the primary LLM." />
+<img src="images/biochatter_overview.png" alt="Figure 1: The BioChatter composable platform architecture (simplified). LLMs can facilitate many tasks in daily biomedical research practice, for instance interpretation of experimental results or the use of a web resource (top left). BioChatter’s main response circuit (blue) composes a number of specifically engineered prompts and passes them (and a conversation history) to the primary LLM, which generates a response for the user based on all inputs. This response is simultaneously used to prompt the secondary circuit (orange), which fulfils auxiliary tasks to complement the primary response. In particular, using search, the secondary circuit queries a database as a prior knowledge repository and compares annotations to the primary response, or uses the knowledge to perform Retrieval-Augmented Generation (RAG). A knowledge graph such as BioCypher [15] can similarly serve as knowledge resource or long-term memory extension of the model. Further, an independent LLM receives the primary response for fact-checking, which can be supplemented with context-specific information by a RAG process. The platform is composable in most aspects, allowing arbitrary extensions to other specialised models for additional tasks orchestrated by the primary LLM." />
 <figcaption aria-hidden="true"><span>Figure 1:</span> <strong>The BioChatter composable platform architecture (simplified).</strong>
-LLMs can facilitate many tasks in daily biomedical research practice, for instance, interpretation of experimental results or the use of a web resource (top left).
+LLMs can facilitate many tasks in daily biomedical research practice, for instance interpretation of experimental results or the use of a web resource (top left).
 BioChatter’s main response circuit (blue) composes a number of specifically engineered prompts and passes them (and a conversation history) to the primary LLM, which generates a response for the user based on all inputs.
 This response is simultaneously used to prompt the secondary circuit (orange), which fulfils auxiliary tasks to complement the primary response.
 In particular, using search, the secondary circuit queries a database as a prior knowledge repository and compares annotations to the primary response, or uses the knowledge to perform Retrieval-Augmented Generation (RAG).
@@ -365,7 +365,7 @@ <h3 id="question-answering-and-llm-connectivity">Question Answering and LLM Conn
 To address this issue, we provide access to the different OpenAI models through their API, which is subject to different, more stringent data protection than the web interface <span class="citation" data-cites="C5Z1X3MG">[<a href="#ref-C5Z1X3MG" role="doc-biblioref">21</a>]</span>, most importantly by disallowing reuse of user inputs for subsequent model training.
 Further, we aim to preferentially support open-source LLMs to facilitate more transparency in their application and increase data privacy by being able to run a model locally on dedicated hardware and end-user devices <span class="citation" data-cites="17E1dWalv">[<a href="#ref-17E1dWalv" role="doc-biblioref">22</a>]</span>.
 By building on LangChain <span class="citation" data-cites="vKMc6EpN">[<a href="#ref-vKMc6EpN" role="doc-biblioref">16</a>]</span>, we support dozens of LLM providers, such as the Xorbits Inference and Hugging Face APIs <span class="citation" data-cites="mGEvmJGA">[<a href="#ref-mGEvmJGA" role="doc-biblioref">19</a>]</span>, which can be used to query any of the more than 100 000 open-source models on Hugging Face Hub <span class="citation" data-cites="NicesiwN">[<a href="#ref-NicesiwN" role="doc-biblioref">23</a>]</span>, for instance those on its LLM leaderboard <span class="citation" data-cites="LE2GwIqT">[<a href="#ref-LE2GwIqT" role="doc-biblioref">24</a>]</span>.
-Although OpenAI’s models currently vastly outperform any alternatives in terms of both LLM performance and API convenience, we expect many open-source developments in this area in the future <span class="citation" data-cites="uYvzQA7w">[<a href="#ref-uYvzQA7w" role="doc-biblioref">25</a>]</span>.
+Although OpenAI’s models currently outperform any alternatives in terms of both LLM performance and API convenience, we expect many open-source developments in this area in the future <span class="citation" data-cites="uYvzQA7w">[<a href="#ref-uYvzQA7w" role="doc-biblioref">25</a>]</span>.
 Therefore, we support plug-and-play exchange of models to enhance biomedical AI readiness, and we implement a bespoke benchmarking framework for the biomedical application of LLMs.</p>
 <h3 id="prompt-engineering">Prompt Engineering</h3>
 <p>An essential property of LLMs is their sensitivity to the prompt, i.e., the initial input that guides the model towards a specific task or behaviour.
@@ -650,7 +650,7 @@ <h2 class="page_break_before" id="references">References</h2>
 <div class="csl-left-margin">18. </div><div class="csl-right-inline"><strong>Mixtral of Experts</strong> <div class="csl-block">Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, … William El Sayed</div> <em>arXiv</em> (2024) <a href="https://doi.org/gtc2g3">https://doi.org/gtc2g3</a> <div class="csl-block">DOI: <a href="https://doi.org/10.48550/arxiv.2401.04088">10.48550/arxiv.2401.04088</a></div></div>
 </div>
 <div id="ref-mGEvmJGA" class="csl-entry" role="doc-biblioentry">
-<div class="csl-left-margin">19. </div><div class="csl-right-inline"><strong>xorbitsai/inference</strong> <div class="csl-block">Xorbits</div> (2024-02-15) <a href="https://github.com/xorbitsai/inference">https://github.com/xorbitsai/inference</a></div>
+<div class="csl-left-margin">19. </div><div class="csl-right-inline"><strong>GitHub - xorbitsai/inference: Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.</strong> <div class="csl-block">GitHub</div> <a href="https://github.com/xorbitsai/inference">https://github.com/xorbitsai/inference</a></div>
 </div>
 <div id="ref-PDhRVYjU" class="csl-entry" role="doc-biblioentry">
 <div class="csl-left-margin">20. </div><div class="csl-right-inline"><a href="https://www.reuters.com/technology/european-data-protection-board-discussing-ai-policy-thursday-meeting-2023-04-13/">https://www.reuters.com/technology/european-data-protection-board-discussing-ai-policy-thursday-meeting-2023-04-13/</a></div>
@@ -686,7 +686,7 @@ <h2 class="page_break_before" id="references">References</h2>
 <div class="csl-left-margin">30. </div><div class="csl-right-inline"><strong>A Survey on Large Language Model based Autonomous Agents</strong> <div class="csl-block">Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, … Ji-Rong Wen</div> <em>arXiv</em> (2023) <a href="https://doi.org/gsv93m">https://doi.org/gsv93m</a> <div class="csl-block">DOI: <a href="https://doi.org/10.48550/arxiv.2308.11432">10.48550/arxiv.2308.11432</a></div></div>
 </div>
 <div id="ref-14upAJPXR" class="csl-entry" role="doc-biblioentry">
-<div class="csl-left-margin">31. </div><div class="csl-right-inline"><strong>pytest-dev/pytest</strong> <div class="csl-block">pytest-dev</div> (2024-02-15) <a href="https://github.com/pytest-dev/pytest">https://github.com/pytest-dev/pytest</a></div>
+<div class="csl-left-margin">31. </div><div class="csl-right-inline"><strong>GitHub - pytest-dev/pytest: The pytest framework makes it easy to write small tests, yet scales to support complex functional testing</strong> <div class="csl-block">GitHub</div> <a href="https://github.com/pytest-dev/pytest">https://github.com/pytest-dev/pytest</a></div>
 </div>
 <div id="ref-KONKs6Pw" class="csl-entry" role="doc-biblioentry">
 <div class="csl-left-margin">32. </div><div class="csl-right-inline"><strong>Large language models encode clinical knowledge</strong> <div class="csl-block">Karan Singhal, Shekoofeh Azizi, Tao Tu, SSara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, … Vivek Natarajan</div> <em>Nature</em> (2023-07-12) <a href="https://doi.org/gsgp8c">https://doi.org/gsgp8c</a> <div class="csl-block">DOI: <a href="https://doi.org/10.1038/s41586-023-06291-2">10.1038/s41586-023-06291-2</a> · PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37438534">37438534</a> · PMCID: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10396962">PMC10396962</a></div></div>

diff --git a/manuscript.pdf b/manuscript.pdf
diff --git a/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/biochatter_architecture.png b/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/biochatter_architecture.png
diff --git a/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/biochatter_benchmark.png b/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/biochatter_benchmark.png
diff --git a/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/biochatter_overview.png b/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/biochatter_overview.png
diff --git a/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/github.svg b/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/github.svg
diff --git a/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/kg-demo.png b/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/kg-demo.png
diff --git a/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/kg-settings.png b/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/kg-settings.png
diff --git a/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/mastodon.svg b/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/mastodon.svg
diff --git a/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/orcid.svg b/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/orcid.svg
diff --git a/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/pole-schema.png b/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/pole-schema.png
diff --git a/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/rag-demo.png b/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/rag-demo.png
diff --git a/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/rag-settings.png b/v/32a990970f754b4a632acac66c661aed6c8b01c6/images/rag-settings.png