Skip to content

Commit

Permalink
remove "vastly"
Browse files Browse the repository at this point in the history
  • Loading branch information
slobentanzer committed Feb 15, 2024
1 parent 4538fc5 commit 3579d76
Show file tree
Hide file tree
Showing 24 changed files with 4,260 additions and 31 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Output directory containing the formatted manuscript

The [`gh-pages`](https://github.com/biocypher/biochatter-paper/tree/gh-pages) branch hosts the contents of this directory at <https://biocypher.github.io/biochatter-paper/>.
The permalink for this webpage version is <https://biocypher.github.io/biochatter-paper/v/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2/>.
The permalink for this webpage version is <https://biocypher.github.io/biochatter-paper/v/32a990970f754b4a632acac66c661aed6c8b01c6/>.
To redirect to the permalink for the latest manuscript version at anytime, use the link <https://biocypher.github.io/biochatter-paper/v/freeze/>.

## Files
Expand Down Expand Up @@ -35,4 +35,4 @@ Verifying timestamps with the `ots verify` command requires running a local bitc
## Source

The manuscripts in this directory were built from
[`ccfa0d0db46d487019ff6a8a6ced67f18306f5e2`](https://github.com/biocypher/biochatter-paper/commit/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2).
[`32a990970f754b4a632acac66c661aed6c8b01c6`](https://github.com/biocypher/biochatter-paper/commit/32a990970f754b4a632acac66c661aed6c8b01c6).
26 changes: 13 additions & 13 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,8 @@
<meta name="dc.date" content="2024-02-15" />
<meta name="citation_publication_date" content="2024-02-15" />
<meta property="article:published_time" content="2024-02-15" />
<meta name="dc.modified" content="2024-02-15T10:33:39+00:00" />
<meta property="article:modified_time" content="2024-02-15T10:33:39+00:00" />
<meta name="dc.modified" content="2024-02-15T10:38:49+00:00" />
<meta property="article:modified_time" content="2024-02-15T10:38:49+00:00" />
<meta name="dc.language" content="en-UK" />
<meta name="citation_language" content="en-UK" />
<meta name="dc.relation.ispartof" content="Manubot" />
Expand Down Expand Up @@ -169,9 +169,9 @@
<meta name="citation_fulltext_html_url" content="https://biocypher.github.io/biochatter-paper/" />
<meta name="citation_pdf_url" content="https://biocypher.github.io/biochatter-paper/manuscript.pdf" />
<link rel="alternate" type="application/pdf" href="https://biocypher.github.io/biochatter-paper/manuscript.pdf" />
<link rel="alternate" type="text/html" href="https://biocypher.github.io/biochatter-paper/v/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2/" />
<meta name="manubot_html_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2/" />
<meta name="manubot_pdf_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2/manuscript.pdf" />
<link rel="alternate" type="text/html" href="https://biocypher.github.io/biochatter-paper/v/32a990970f754b4a632acac66c661aed6c8b01c6/" />
<meta name="manubot_html_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/32a990970f754b4a632acac66c661aed6c8b01c6/" />
<meta name="manubot_pdf_url_versioned" content="https://biocypher.github.io/biochatter-paper/v/32a990970f754b4a632acac66c661aed6c8b01c6/manuscript.pdf" />
<meta property="og:type" content="article" />
<meta property="twitter:card" content="summary_large_image" />
<link rel="icon" type="image/png" sizes="192x192" href="https://manubot.org/favicon-192x192.png" />
Expand All @@ -188,9 +188,9 @@ <h1 class="title">A Platform for the Biomedical Application of Large Language Mo
</header>
<p><small><em>
This manuscript
(<a href="https://biocypher.github.io/biochatter-paper/v/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2/">permalink</a>)
(<a href="https://biocypher.github.io/biochatter-paper/v/32a990970f754b4a632acac66c661aed6c8b01c6/">permalink</a>)
was automatically generated
from <a href="https://github.com/biocypher/biochatter-paper/tree/ccfa0d0db46d487019ff6a8a6ced67f18306f5e2">biocypher/biochatter-paper@ccfa0d0</a>
from <a href="https://github.com/biocypher/biochatter-paper/tree/32a990970f754b4a632acac66c661aed6c8b01c6">biocypher/biochatter-paper@32a9909</a>
on February 15, 2024.
</em></small></p>
<h2 id="authors">Authors</h2>
Expand Down Expand Up @@ -306,7 +306,7 @@ <h2 id="introduction">Introduction</h2>
In addition, biological events are context-dependent, for instance with respect to a cell type or specific disease.</p>
<p>Large Language Models (LLMs) of the current generation, in contrast, can access enormous amounts of knowledge, encoded (incomprehensibly) in their billions of parameters <span class="citation" data-cites="JIjeWPOb IzWFZmuQ 17lpGtuH5 fLS7kvml">[<a href="#ref-JIjeWPOb" role="doc-biblioref">4</a>,<a href="#ref-IzWFZmuQ" role="doc-biblioref">5</a>,<a href="#ref-17lpGtuH5" role="doc-biblioref">6</a>,<a href="#ref-fLS7kvml" role="doc-biblioref">7</a>]</span>.
Trained correctly, they can recall and combine virtually limitless knowledge from their training set.
ChatGPT has taken the world by storm, and many biomedical researchers already use LLMs in their daily work, for general as well as research tasks <span class="citation" data-cites="viLUfCLq ae7XiPvs wo7jyZHW">[<a href="#ref-viLUfCLq" role="doc-biblioref">8</a>,<a href="#ref-ae7XiPvs" role="doc-biblioref">9</a>,<a href="#ref-wo7jyZHW" role="doc-biblioref">10</a>]</span>.
LLMs have taken the world by storm, and many biomedical researchers already use them in their daily work, for general as well as research tasks <span class="citation" data-cites="viLUfCLq ae7XiPvs wo7jyZHW">[<a href="#ref-viLUfCLq" role="doc-biblioref">8</a>,<a href="#ref-ae7XiPvs" role="doc-biblioref">9</a>,<a href="#ref-wo7jyZHW" role="doc-biblioref">10</a>]</span>.
However, the current way of interacting with LLMs is predominantly manual, virtually non-reproducible, and their behaviour can be erratic.
For instance, they are known to confabulate: they make up facts as they go along, and, to make matters worse, are convinced — and convincing — regarding the truth of their confabulations <span class="citation" data-cites="elx4isXx wo7jyZHW">[<a href="#ref-wo7jyZHW" role="doc-biblioref">10</a>,<a href="#ref-elx4isXx" role="doc-biblioref">11</a>]</span>.
While current efforts towards Artificial General Intelligence manage to ameliorate some of the shortcomings by ensembling multiple models <span class="citation" data-cites="UEmjXz02">[<a href="#ref-UEmjXz02" role="doc-biblioref">12</a>]</span> with long-term memory stores <span class="citation" data-cites="gy4YOpGJ">[<a href="#ref-gy4YOpGJ" role="doc-biblioref">13</a>]</span>, the current generation of AI does not inspire adequate trust to be applied to biomedical problems without supervision <span class="citation" data-cites="elx4isXx">[<a href="#ref-elx4isXx" role="doc-biblioref">11</a>]</span>.
Expand All @@ -318,9 +318,9 @@ <h2 id="introduction">Introduction</h2>
<!-- Figure 1 -->
<div id="fig:overview" class="fignos">
<figure>
<img src="images/biochatter_overview.png" alt="Figure 1: The BioChatter composable platform architecture (simplified). LLMs can facilitate many tasks in daily biomedical research practice, for instance, interpretation of experimental results or the use of a web resource (top left). BioChatter’s main response circuit (blue) composes a number of specifically engineered prompts and passes them (and a conversation history) to the primary LLM, which generates a response for the user based on all inputs. This response is simultaneously used to prompt the secondary circuit (orange), which fulfils auxiliary tasks to complement the primary response. In particular, using search, the secondary circuit queries a database as a prior knowledge repository and compares annotations to the primary response, or uses the knowledge to perform Retrieval-Augmented Generation (RAG). A knowledge graph such as BioCypher [15] can similarly serve as knowledge resource or long-term memory extension of the model. Further, an independent LLM receives the primary response for fact-checking, which can be supplemented with context-specific information by a RAG process. The platform is composable in most aspects, allowing arbitrary extensions to other specialised models for additional tasks orchestrated by the primary LLM." />
<img src="images/biochatter_overview.png" alt="Figure 1: The BioChatter composable platform architecture (simplified). LLMs can facilitate many tasks in daily biomedical research practice, for instance interpretation of experimental results or the use of a web resource (top left). BioChatter’s main response circuit (blue) composes a number of specifically engineered prompts and passes them (and a conversation history) to the primary LLM, which generates a response for the user based on all inputs. This response is simultaneously used to prompt the secondary circuit (orange), which fulfils auxiliary tasks to complement the primary response. In particular, using search, the secondary circuit queries a database as a prior knowledge repository and compares annotations to the primary response, or uses the knowledge to perform Retrieval-Augmented Generation (RAG). A knowledge graph such as BioCypher [15] can similarly serve as knowledge resource or long-term memory extension of the model. Further, an independent LLM receives the primary response for fact-checking, which can be supplemented with context-specific information by a RAG process. The platform is composable in most aspects, allowing arbitrary extensions to other specialised models for additional tasks orchestrated by the primary LLM." />
<figcaption aria-hidden="true"><span>Figure 1:</span> <strong>The BioChatter composable platform architecture (simplified).</strong>
LLMs can facilitate many tasks in daily biomedical research practice, for instance, interpretation of experimental results or the use of a web resource (top left).
LLMs can facilitate many tasks in daily biomedical research practice, for instance interpretation of experimental results or the use of a web resource (top left).
BioChatter’s main response circuit (blue) composes a number of specifically engineered prompts and passes them (and a conversation history) to the primary LLM, which generates a response for the user based on all inputs.
This response is simultaneously used to prompt the secondary circuit (orange), which fulfils auxiliary tasks to complement the primary response.
In particular, using search, the secondary circuit queries a database as a prior knowledge repository and compares annotations to the primary response, or uses the knowledge to perform Retrieval-Augmented Generation (RAG).
Expand Down Expand Up @@ -365,7 +365,7 @@ <h3 id="question-answering-and-llm-connectivity">Question Answering and LLM Conn
To address this issue, we provide access to the different OpenAI models through their API, which is subject to different, more stringent data protection than the web interface <span class="citation" data-cites="C5Z1X3MG">[<a href="#ref-C5Z1X3MG" role="doc-biblioref">21</a>]</span>, most importantly by disallowing reuse of user inputs for subsequent model training.
Further, we aim to preferentially support open-source LLMs to facilitate more transparency in their application and increase data privacy by being able to run a model locally on dedicated hardware and end-user devices <span class="citation" data-cites="17E1dWalv">[<a href="#ref-17E1dWalv" role="doc-biblioref">22</a>]</span>.
By building on LangChain <span class="citation" data-cites="vKMc6EpN">[<a href="#ref-vKMc6EpN" role="doc-biblioref">16</a>]</span>, we support dozens of LLM providers, such as the Xorbits Inference and Hugging Face APIs <span class="citation" data-cites="mGEvmJGA">[<a href="#ref-mGEvmJGA" role="doc-biblioref">19</a>]</span>, which can be used to query any of the more than 100 000 open-source models on Hugging Face Hub <span class="citation" data-cites="NicesiwN">[<a href="#ref-NicesiwN" role="doc-biblioref">23</a>]</span>, for instance those on its LLM leaderboard <span class="citation" data-cites="LE2GwIqT">[<a href="#ref-LE2GwIqT" role="doc-biblioref">24</a>]</span>.
Although OpenAI’s models currently vastly outperform any alternatives in terms of both LLM performance and API convenience, we expect many open-source developments in this area in the future <span class="citation" data-cites="uYvzQA7w">[<a href="#ref-uYvzQA7w" role="doc-biblioref">25</a>]</span>.
Although OpenAI’s models currently outperform any alternatives in terms of both LLM performance and API convenience, we expect many open-source developments in this area in the future <span class="citation" data-cites="uYvzQA7w">[<a href="#ref-uYvzQA7w" role="doc-biblioref">25</a>]</span>.
Therefore, we support plug-and-play exchange of models to enhance biomedical AI readiness, and we implement a bespoke benchmarking framework for the biomedical application of LLMs.</p>
<h3 id="prompt-engineering">Prompt Engineering</h3>
<p>An essential property of LLMs is their sensitivity to the prompt, i.e., the initial input that guides the model towards a specific task or behaviour.
Expand Down Expand Up @@ -650,7 +650,7 @@ <h2 class="page_break_before" id="references">References</h2>
<div class="csl-left-margin">18. </div><div class="csl-right-inline"><strong>Mixtral of Experts</strong> <div class="csl-block">Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, … William El Sayed</div> <em>arXiv</em> (2024) <a href="https://doi.org/gtc2g3">https://doi.org/gtc2g3</a> <div class="csl-block">DOI: <a href="https://doi.org/10.48550/arxiv.2401.04088">10.48550/arxiv.2401.04088</a></div></div>
</div>
<div id="ref-mGEvmJGA" class="csl-entry" role="doc-biblioentry">
<div class="csl-left-margin">19. </div><div class="csl-right-inline"><strong>xorbitsai/inference</strong> <div class="csl-block">Xorbits</div> (2024-02-15) <a href="https://github.com/xorbitsai/inference">https://github.com/xorbitsai/inference</a></div>
<div class="csl-left-margin">19. </div><div class="csl-right-inline"><strong>GitHub - xorbitsai/inference: Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.</strong> <div class="csl-block">GitHub</div> <a href="https://github.com/xorbitsai/inference">https://github.com/xorbitsai/inference</a></div>
</div>
<div id="ref-PDhRVYjU" class="csl-entry" role="doc-biblioentry">
<div class="csl-left-margin">20. </div><div class="csl-right-inline"><a href="https://www.reuters.com/technology/european-data-protection-board-discussing-ai-policy-thursday-meeting-2023-04-13/">https://www.reuters.com/technology/european-data-protection-board-discussing-ai-policy-thursday-meeting-2023-04-13/</a></div>
Expand Down Expand Up @@ -686,7 +686,7 @@ <h2 class="page_break_before" id="references">References</h2>
<div class="csl-left-margin">30. </div><div class="csl-right-inline"><strong>A Survey on Large Language Model based Autonomous Agents</strong> <div class="csl-block">Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, … Ji-Rong Wen</div> <em>arXiv</em> (2023) <a href="https://doi.org/gsv93m">https://doi.org/gsv93m</a> <div class="csl-block">DOI: <a href="https://doi.org/10.48550/arxiv.2308.11432">10.48550/arxiv.2308.11432</a></div></div>
</div>
<div id="ref-14upAJPXR" class="csl-entry" role="doc-biblioentry">
<div class="csl-left-margin">31. </div><div class="csl-right-inline"><strong>pytest-dev/pytest</strong> <div class="csl-block">pytest-dev</div> (2024-02-15) <a href="https://github.com/pytest-dev/pytest">https://github.com/pytest-dev/pytest</a></div>
<div class="csl-left-margin">31. </div><div class="csl-right-inline"><strong>GitHub - pytest-dev/pytest: The pytest framework makes it easy to write small tests, yet scales to support complex functional testing</strong> <div class="csl-block">GitHub</div> <a href="https://github.com/pytest-dev/pytest">https://github.com/pytest-dev/pytest</a></div>
</div>
<div id="ref-KONKs6Pw" class="csl-entry" role="doc-biblioentry">
<div class="csl-left-margin">32. </div><div class="csl-right-inline"><strong>Large language models encode clinical knowledge</strong> <div class="csl-block">Karan Singhal, Shekoofeh Azizi, Tao Tu, SSara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, … Vivek Natarajan</div> <em>Nature</em> (2023-07-12) <a href="https://doi.org/gsgp8c">https://doi.org/gsgp8c</a> <div class="csl-block">DOI: <a href="https://doi.org/10.1038/s41586-023-06291-2">10.1038/s41586-023-06291-2</a> · PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37438534">37438534</a> · PMCID: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10396962">PMC10396962</a></div></div>
Expand Down
Binary file modified manuscript.pdf
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions v/32a990970f754b4a632acac66c661aed6c8b01c6/images/github.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions v/32a990970f754b4a632acac66c661aed6c8b01c6/images/orcid.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 3579d76

Please sign in to comment.