Bumblebee.Text.SmolLm3 (Bumblebee v0.7.0)
View SourceSmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports dual mode reasoning, 6 languages and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale.
Key features
- Instruct model optimized for hybrid reasoning
- Fully open model: open weights + full training details including public data mixture and training configs
- Long context: Trained on 64k context and supports up to 128k tokens using YARN extrapolation (not implemented in
bumblebee) - Multilingual: 6 natively supported (English, French, Spanish, German, Italian, and Portuguese)
For best results, follow the chat template.
To disable reasoning, append <think>\n\n</think> to the prompt.
For more details see: https://huggingface.co/HuggingFaceTB/SmolLM3-3B
Architectures
:base- plain SmolLM3 without any head on top:for_causal_language_modeling- SmolLM3 with a language modeling head. The head returns logits for each token in the original sequence:for_sequence_classification- SmolLM3 with a sequence classification head. The head returns logits corresponding to possible classes:for_token_classification- SmolLM3 with a token classification head. The head returns logits for each token in the original sequence:for_question_answering- SmolLM3 with a span classification head. The head returns logits for the span start and end positions
Inputs
"input_ids"-{batch_size, sequence_length}Indices of input sequence tokens in the vocabulary.
"attention_mask"-{batch_size, sequence_length}Mask indicating which tokens to attend to. This is used to ignore padding tokens, which are added when processing a batch of sequences with different length.
"position_ids"-{batch_size, sequence_length}Indices of positions of each input sequence tokens in the position embeddings.
"attention_head_mask"-{encoder_num_blocks, encoder_num_attention_heads}Mask to nullify selected heads of the self-attention blocks in the encoder.
"input_embeddings"-{batch_size, sequence_length, hidden_size}Embedded representation of
"input_ids", which can be specified for more control over how"input_ids"are embedded than the model's internal embedding lookup. If"input_embeddings"are present, then"input_ids"will be ignored."cache"A container with cached layer results used to speed up sequential decoding (autoregression). With cache, certain hidden states are taken from the cache, rather than recomputed on every decoding pass. The cache should be treated as opaque and initialized with
Bumblebee.Text.Generation.init_cache/4.
Global layer options
:output_hidden_states- whentrue, the model output includes all hidden states:output_attentions- whentrue, the model output includes all attention weights
Configuration
:vocab_size- the vocabulary size of the token embedding. This corresponds to the number of distinct tokens that can be represented in model input and output . Defaults to128256:max_positions- the vocabulary size of the position embedding. This corresponds to the maximum sequence length that this model can process. Typically this is set to a large value just in case, such as 512, 1024 or 2048. . Defaults to65536:hidden_size- the dimensionality of hidden layers. Defaults to4096:intermediate_size- the dimensionality of intermediate layers. Defaults to11008:attention_head_size- the size of the key, value, and query projection per attention head. Defaults todiv(hidden_size, num_attention_heads):num_blocks- the number of Transformer blocks in the model. Defaults to32:num_attention_heads- the number of attention heads for each attention layer in the model. Defaults to32:num_key_value_heads- the number of key value heads for each attention layer in the model. Defaults to4:activation- the activation function. Defaults to:silu:rotary_embedding_base- base for computing rotary embedding frequency. Defaults to5000000:rotary_embedding_scaling_strategy- scaling configuration for rotary embedding. Currently the supported values are:%{type: :linear, factor: number()}%{type: :dynamic, factor: number()}%{type: :llama3, factor: number(), low_frequency_factor: number(), high_frequency_factor: number(), original_max_positions: pos_integer()}
For more details see https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases
:rotary_embedding_enabled- a list of booleans specifying whether rotary embeddings are enabled for the block that corresponds to the index. Defaults tonilwhich enables rotary embeddings for all blocks.:layer_norm_epsilon- the epsilon used by RMS normalization layers. Defaults to1.0e-12:initializer_scale- the standard deviation of the normal initializer used for initializing kernel parameters. Defaults to0.02:tie_word_embeddings- whether to tie input and output embedding weights. Defaults totrue:num_labels- the number of labels to use in the last layer for the classification task. Defaults to2:id_to_label- a map from class index to label. Defaults to%{}