Skip to content

Commit

Permalink
clean up comments and documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
jacobfulano committed Jan 5, 2024
1 parent bcd10d2 commit e9e1213
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 9 deletions.
4 changes: 4 additions & 0 deletions examples/benchmarks/bert/src/flash_attn_triton.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.
*Update: 01-04-2024*
This version of Triton Flash Attention is being deprecated in favor of Flash Attention 2,
which now supports ALiBi natively https://github.com/Dao-AILab/flash-attention
*Experimental* implementation of FlashAttention in Triton.
We use the FlashAttention implementation from Phil Tillet a starting point.
https://github.com/openai/triton/blob/master/python/tutorials/06-fused-attention.py
Expand Down
20 changes: 11 additions & 9 deletions examples/benchmarks/bert/src/mosaic_bert.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright 2022 MosaicML Examples authors
# SPDX-License-Identifier: Apache-2.0

"""Implements a Mosaic BERT wrapper around a :class:`.ComposerTransformer`."""
"""Implements a MosaicBERT wrapper around a :class:`.ComposerTransformer`."""

from __future__ import annotations

Expand Down Expand Up @@ -31,12 +31,12 @@ def create_mosaic_bert_mlm(pretrained_model_name: str = 'bert-base-uncased',
tokenizer_name: Optional[str] = None,
gradient_checkpointing: Optional[bool] = False,
pretrained_checkpoint: Optional[str] = None):
"""Mosaic BERT masked language model based on |:hugging_face:| Transformers.
"""MosaicBERT masked language model based on |:hugging_face:| Transformers.
For more information, see
`Transformers. <https://huggingface.co/transformers/>`_.
This function creates a Mosaic BERT, which includes several throughput
This function creates a MosaicBERT, which includes several throughput
optimizations not available in |:hugging_face:| BERT as well as
architecture changes based on ALiBi and Gated Linear Units.
Expand Down Expand Up @@ -82,7 +82,7 @@ def create_mosaic_bert_mlm(pretrained_model_name: str = 'bert-base-uncased',
"vocab_size": 30522
}
To create a Mosaic BERT model for Masked Language Model pretraining:
To create a MosaicBERT model for Masked Language Model pretraining:
.. testcode::
Expand Down Expand Up @@ -145,11 +145,11 @@ def create_mosaic_bert_classification(
tokenizer_name: Optional[str] = None,
gradient_checkpointing: Optional[bool] = False,
pretrained_checkpoint: Optional[str] = None):
"""Mosaic BERT classification model based on |:hugging_face:| Transformers.
"""MosaicBERT classification model based on |:hugging_face:| Transformers.
For more information, see `Transformers. <https://huggingface.co/transformers/>`_.
This function creates a Mosaic BERT, which includes several throughput
This function creates a MosaicBERT, which includes several throughput
optimizations not available in |:hugging_face:| BERT as well as
architecture changes based on ALiBi and Gated Linear Units.
Expand Down Expand Up @@ -207,7 +207,7 @@ def create_mosaic_bert_classification(
"vocab_size": 30522
}
To create a Mosaic BERT model for classification:
To create a MosaicBERT model for classification:
.. testcode::
from mosaic_bert import create_mosaic_bert_classification
Expand All @@ -229,8 +229,10 @@ def create_mosaic_bert_classification(
if not model_config:
model_config = {}

# By default, turn off attention dropout in Mosaic BERT
# (otherwise, Flash Attention will be off by default)
# By default, turn off attention dropout in MosaicBERT
# Flash Attention 2 supports dropout in the attention module
# while our previous Triton Flash Attention layer only works with
# attention_probs_dropout_prob = 0.
if 'attention_probs_dropout_prob' not in model_config:
model_config['attention_probs_dropout_prob'] = 0.0

Expand Down

0 comments on commit e9e1213

Please sign in to comment.