from transformers import AutoModelForCausalLM, AutoTokenizer

# Olmo 3 7B. The model is ~15GB in size.
olmo3_pretraining = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-1025-7B", revision="stage1-step999000")
tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3-1025-7B", revision="stage1-step999000")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

inputs = tokenizer(["Base Large Language Model is "], return_tensors='pt', return_token_type_ids=False)
print(f"Tokens count: {inputs['input_ids'].shape[1]} | Tokens: {inputs['input_ids']}")

Tokens count: 6 | Tokens: tensor([[ 4066, 20902, 11688,  5008,   374,   220]])

response = olmo3_pretraining.generate(**inputs, max_new_tokens=256, do_sample=True, top_k=0, temperature=1.0, top_p=0.7)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

Base Large Language Model is 13 billion parameters, and it is the 3rd most capable model in the world after the GPT-4 and Claude 2. In this article, we will see how to build and deploy a custom Llama 2 model on a Windows machine with Nvidia GPU using Azure Machine Learning.

First, we need to register our account with Hugging Face to access the Llama 2 model. You can do this by going to the Hugging Face website and clicking on the "Register" button. Once you have registered, you will be able to access the Llama 2 model from the Hugging Face website.

Next, we need to install the necessary packages to work with the Llama 2 model. You can do this by running the following command in your terminal:

pip install llama

Once the installation is complete, we can now download the Llama 2 model from the Hugging Face website. You can do this by running the following command:

wget https://huggingface.co/TheBloke/llama-2-13b-1bit-GGML-GPTQ -O llama-2-13b-1bit-GGML-GPTQ

This will download the Llama 2 model to your local machine.

Next, we

# Let's push the model to its absolute limits - ask it to reason about math! It should just fail spectacularly.
text = ["A bakery sells cupcakes in boxes of 6. Sarah bought some boxes and ate 4 cupcakes. "
        "She then gave half of what remained to her friend. If she now has 10 cupcakes left, "
        "how many boxes did she originally buy? Show your reasoning step by step."]
# To save you some time, the real answer is 4 boxes.
print(text)

['A bakery sells cupcakes in boxes of 6. Sarah bought some boxes and ate 4 cupcakes. She then gave half of what remained to her friend. If she now has 10 cupcakes left, how many boxes did she originally buy? Show your reasoning step by step.']

inputs = tokenizer(text, return_tensors='pt', return_token_type_ids=False)
response = olmo3_pretraining.generate(**inputs, max_new_tokens=256, do_sample=True, top_k=0, temperature=1.0, top_p=0.7)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

A bakery sells cupcakes in boxes of 6. Sarah bought some boxes and ate 4 cupcakes. She then gave half of what remained to her friend. If she now has 10 cupcakes left, how many boxes did she originally buy? Show your reasoning step by step. If you want to use algebra, that’s fine. But you should also include your algebraic thinking as part of your written explanation.

8. In your opinion, is it more important for a person to be a good listener or a good talker? Why? Explain your reasoning.

9. You are working on a project with 4 classmates. Your teacher says you can have as many days as you need to finish it. After 3 days, you have completed 40% of the project. How many days will it take you to complete the project? Show your reasoning step by step. If you want to use algebra, that’s fine. But you should also include your algebraic thinking as part of your written explanation.

10. You are at a store that sells specialty soaps. Each bar of soap is 4 inches long and 1 inch wide. If a package of 4 bars of soap costs $2.25, what is the cost of 1 bar of soap? Show your reasoning step by step. If you want to use algebra, that’s fine. But you should also include your algebraic thinking as part of your written explanation.

11. A cookie recipe makes 6 dozen cookies. If the recipe uses 3 cups of flour,

del(olmo3_pretraining) # Or I will run out of memory

olmo3_midtraining = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-1025-7B", revision="stage2-step9000")
tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3-1025-7B", revision="stage2-step9000")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# Let's see how the midtraining version does on the same math problem.
inputs = tokenizer(text, return_tensors='pt', return_token_type_ids=False)
response = olmo3_midtraining.generate(**inputs, max_new_tokens=256, do_sample=True, top_k=0, temperature=1.0, top_p=0.7)

from IPython.display import display, Markdown
display(Markdown(tokenizer.batch_decode(response, skip_special_tokens=True)[0]))

del(olmo3_midtraining) # Or I will run out of memory

# Base model does not require any special revision tag.
olmo3_base = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-1025-7B")
tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3-1025-7B")

`rope_scaling`'s beta_fast field must be a float, got 32
`rope_scaling`'s beta_slow field must be a float, got 1

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

inputs = tokenizer(["Base Large Language Model is "], return_tensors='pt', return_token_type_ids=False)

response = olmo3_base.generate(**inputs, max_new_tokens=256, do_sample=True, top_k=0, temperature=1.0, top_p=0.7)

print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

Base Large Language Model is 34% better than SOTA in most tasks

Researchers at the Allen Institute for AI have developed a large language model (LLM) that outperforms existing state-of-the-art (SOTA) models on a wide range of natural language processing (NLP) tasks. The model, called ALM, is based on the GPT-3 architecture and has been fine-tuned using a dataset of over 400 billion tokens.

The ALM model was trained on a dataset of over 400 billion tokens, which is 4.5 times larger than the dataset used to train the previous SOTA model. The ALM model was also trained using a more advanced training algorithm, which allowed it to learn more complex patterns in the data.

The ALM model was evaluated on a range of NLP tasks, including text classification, question answering, and machine translation. The model achieved state-of-the-art performance on all of these tasks, outperforming the previous SOTA model by a significant margin.

The researchers believe that the ALM model’s superior performance is due to its larger dataset and more advanced training algorithm. They also believe that the model’s performance will continue to improve as more data becomes available.

ALM is the first LLM to outperform GPT-4

Pretraining Version¶

Midtraining Version¶

Problem 2: Counting Squares in a Grid¶

Long Context → Base Model¶