Blog

GitHub Copilot - Going from Premium Request Units to Token Based Billing

The PRU based billing model for GitHub Copilot is transforming into Token Based Billing as of June 1, 2026. Learn what this change means for your company.

Rob Bos

April 30, 2026

11 minutes

For almost a year we have grown to known Premium Request Unit (PRU) used for the billing of GitHub Copilot usage: whenever you ask a model a question, you use a PRU with a multiplier attached. This goes for prompting in the chat interface, every prompt in the CLI, and requesting Copilot Cloud Agent (CCA) to plan or implement some changes for you. On top of that CCA is also billed for the duration of the runtime (GitHub Action minutes) as that is hosted in GitHub Actions. Find more information in the documentation →

This model has been giving customers great value for money as all model vendors are billing by the tokens, which means for a PRU on the GitHub side you could be using Millions of tokens from the model vendor. Until now we only needed to take a look at PRU’s which was hiding users that where making a lot of use out of that PRU (and incurring a lot of tokens in the background).

Now PRU based billing is being converted to Token Based Billing (TBB) as of June 1^st 2026, as GitHub has seen a big increase in demand for AI features on the platform. This means that customers will start paying their actual cost per use, instead of the calculated setup of PRU’s.

What are Premium Requests again?

Things to know about Premium Request Units:

Multipliers for PRU’s are applied depending on the model: using a powerful model with in depth reasoning, like Claude Opus 4.5 and 4.6? That has a multiplier of 3x, so every chat turn means you incur 3 PRU’s instead of 1.
Using a smaller model like GPT-5.4 mini? That has less compute needs to host it, so that leads to a 0,33x multiplier.
Since the start of PRU’s, we always got the base models like GPT-4o and GPT-4.1 included in the service for free, or 0x PRU’s for that.
With “auto mode” you let GitHub Copilot select which model to use, based on the complexity of the prompt and compute constraints they see on the hosting side. Using auto mode means that you get a 10% discount on your PRU cost.

Depending on which plan you have for Copilot, you have included PRU’s in the license price:

You can configure overages on the included PRU by creating budgets. A budget is a group of users that together get access to the dollar amounts you configure in overage costs. You can for example set up a budget of a $100 for 10 users. Each time they use a PRU that is over their monthly allotment, each PRU will cost $ 0,04. So for $100 you get 400 PRU’s at the disposal of that group of 10 users. The harder part to control here is that a single user can use up the overage for the group, leaving the other 9 engineers without any PRU options (once they go over their monthly allotted PRU’s based on their plan).

To learn even more about PRU and what they are, read our blogpost here: GitHub Copilot Premium Requests →

Getting a grip on your PRU overage

We have always advised our customers to keep an eye on the PRU usage of their engineers, and use it as a conversation starter when engineers use a lot PRU’s. There is always a mix of users that make the most of their included PRU’s, and users that use them all up by the 10^th day of the month. We advocate to look at how users are working with PRU’s so that we can reach out to the different team members and understand how they utilize it and get the most out of the tools they have. You don’t want to block users from using PRU’s, as it is a valuable tool helping them in their day to day work, but you need to have a grip on the costs and understand where the value is being added. In the worst case scenario you find an engineer that heard they need to use Claude Opus for everything and they stopped thinking about which model to use for which task. The fast mode version of Opus 4.6 is currently at a 30x [JW1] [RB2] multiplier, so that can drain the PRU and overage budget quite fast! Engineers need to understand which model to use for which task. A common rule of thumb is to plan your changes in Plan mode with a more expensive model (e.g. Opus 4.7, 7.5x [JW3] ) and then implement those changes in a new chat turn with a less expensive model (e.g. Claude Haiku at 0,33x or GPT-5 mini at 0x). The distinction here is that you use a more powerful model for making the plan, and the less powerful model for then converting that plan in to changes in the codebase. Most of the models are very capable of implementing those changes, because all of the hard “thinking” work has already been covered by the more expensive model.

To get a grip on the cost, we help customers implement a base budget with warning signals when the users have consumed 75% of that budget. Then update the budget to the next tier. This gives you a regular point in time to look at the usage and have a sense of how the cost is progressing over time. You can also use the REST API for PRU (https://docs.github.com/en/rest/billing/usage?apiVersion=2026-03-10#get-billing-premium-request-usage-report-for-an-organization) to automate loading the information into your reporting or observability platform. You can also download the PRU usage from GitHub and feed it into our analyzer at https://xebia.github.io/github-copilot-premium-reqs-usage. This will indicate how many people will go over their PRU allotment at the end of the month, so that you can plan those costs accordingly.

Using these tools, we did find people that used up all their PRU’s at the 10^th of the month (and even earlier). That gave us the chance to contact those engineers and learn about their tool stack and how they use GitHub Copilot on top of it.

Incoming changes

With this blogpost, GitHub has announced that starting June 1^st, 2026, they will stop billing users and enterprises for Premium Request Units, and instead start using tokens to charge for the compute costs is incurred by users.

All model vendors have been working on that premise for quite a while, and GitHub is following suit, as the infrastructure costs to host AI are growing with massive peeks.
A token is the unit of input and output data for a Large Language Model (LLM). The LLMs work with parts of words to predict the next part of the word in a sentence. That has been the underlying method for LLMs since the beginning. That means that every token sent to a model as input, consumes compute power (GPU/CPU/RAM/disk/networking) in a datacenter hosted by GitHub or another model provider. That compute power comes with a cost and that cost is now passed all the way to the end user.

That means that the influence of your chosen editor, the model, and the reasoning effort now becomes really important, as those choices now have financial impact!

Actions to take

To reach an understanding of your users behavior with AI it is important to visualize that information. To prevent unwanted costs, start with configuring a budget for the AI credits: user level budgets are coming, and we recommend to configure that first with their monthly included budget. Then setup a way for users to either request overage options, or automatically detect that they are nearing their budget and then increase the budget. This lets you find and prevent users spending all their budget on a high token cost model for simple tasks, so that you can guide and educate those engineers.

How to find out about your token usage?

The changes mean that we need to come to grip with our token usage. GitHub has always stayed away from showing that information to the end user by abstracting it into PRU’s, and now we need to make the conversion and understand our token usage.
GitHub is building tooling for their customers that will show their token usage over the last few months. If you want to get started now, then take a look at the AI Engineering Fluency extension that one of our team members has built: AI Engineering Fluency →

This tool looks at your local files where Copilot interactions are stored. It supports multiple editors and CLI’s:

Visual Studio Code (+ Codium derivatives)
Visual Studio
GitHub Copilot CLI
OpenCode / Crush / and other CLIs that work with your Copilot subscription

The tool has options to configure uploading the data (opt-in per user for privacy reasons) to cloud storage so that you can analyze this on a team or department level and make some sensible predictions on your token consumption.

With this tooling we’ve learned that an average chat session is somewhere in the realm of 1 to 3 Million tokens, depending on the work the engineer is doing and how long the conversation keeps going.

To calculate the cost of these sessions, we need to understand the factors that define the cost calculation. This is determined on a couple of data points:

Cost are calculated per 1 Million tokens
Input tokens are less costly than the output tokens, as the input tokens are used as part of the initial prompt, and the model does not have to do any predicting on them. The prediction is on the output side
Some models and vendors support caching of tokens, meaning that during a chat conversation, the old chat turn could be cached, therefore incurring less cost.

So let’s take the lower number of 1 Million tokens for an example calculation of the actual costs with token-based billing. Each model vendor has pricing information available, for example the documentation for Anthropic’s models can be found here: https://platform.claude.com/docs/en/about-claude/pricing.

A summarized version can be found in the table to help understand how the calculation will be applied:

Note: there is also caching involved with cost per writes and costs per reads, which are often costing less than the input/output tokens. The cached information is not available in the Copilot information as of this writing, so this is left out of the equation.

With an average chat conversation taking up 1M tokens you can immediately see that this has an impact on the cost that will be incurred with your model choice, as well as your choice of editor, as the editor decides which context it wants to send to the model (or not!). So if your editor wants to ship your entire repository over to the model to analyze, then your input tokens will go through the roof. Same thing for reasoning effort: most models these days have a toggle that lets you switch the effort they will take reasoning on their task. That ranges from Low to Medium to High, and some models even have an Extra High option. This has immediate impact on your token usage and thus cost.

Of course, we need to consider input / output and cached tokens, so let’s say that for the 1Million average chat session, we follow the 80/20 rule: 80% is input tokens and that we go with a regular model like Claude Sonnet 4.6:

And of course we can do the same calculation for using Claude Opus 4.7 instead:

The extension takes these calculations into account and shows the overall costs over the actual usage of the engineer over the last 30 days:

Tooling from GitHub

To help with this GitHub has built new tooling that will be releases soon. It can help understand how your Enterprise users are using AI credits and compare them to PRU billing now. Until then, you can use the VS Code extension above.

How can Xebia help?

We help our customers to get the most value out of GitHub Copilot by upskilling engineers, guiding leadership on governance topics, and help with these kinds of budgeting and prognoses on a larger scale.

Find more information here: GitHub Partnership → or reach out to one of our sellers:

The Netherlands: Matthias Walgers: matthias.walgers@xebia.com
Belgium: Freddy Aben: freddy.aben@xebia.com
DACH region: Marko Sawall: marko.sawall@xebia.com
USA: Brooke Vorhees: brooke.vorhees@xebia.com

Webinar - May 12

GitHub Copilot is Changing – Get Ready

Join this 30-minute power session where we will walk you through everything that's changing for GitHub Copilot and what it means in practice.

Join Webinar

Tags:

Written by

Rob Bos

Rob has a strong focus on ALM and DevOps, automating manual tasks and helping teams deliver value to the end-user faster, using DevOps techniques. This is applied on anything Rob comes across, whether it’s an application, infrastructure, serverless or training environments. Additionally, Rob focuses on the management of production environments, including dashboarding, usage statistics for product owners and stakeholders, but also as part of the feedback loop to the developers. A lot of focus goes to GitHub and GitHub Actions, improving the security of applications and DevOps pipelines. Rob is a Trainer (Azure + GitHub), a Microsoft MVP and a LinkedIn Learning Instructor.

Our Ideas

Explore More Blogs

View All

‌

Super Agent Mania: Why the Agent Count in Your Enterprise Is About to Explode

A field report from Matt Gosselin about Rocky Lhotka's AI to ROI Leadership Series keynote in Atlanta, GA

Matt Gosselin

‌

How to Get the Most Out of Your Agents: Multi-Agent Systems

Rogier van der Geer

‌

How RockBot Learns New Skills: AI Agents Skills System

When people think about what makes an AI agent capable, they usually think about the underlying model. Bigger model, smarter agent. But in practice, a...

Rockford Lhotka

‌

Bridging the Business-Technology Gap with a Semantic Data Layer

Marcel Ploska

Contact

Let’s discuss how we can support your journey.

‌

Response

Related Topics

Context Files

Related Topics

GitHub Copilot - Going from Premium Request Units to Token Based Billing

Rob Bos

What are Premium Requests again?

Getting a grip on your PRU overage

Incoming changes

Actions to take

How to find out about your token usage?

Tooling from GitHub

How can Xebia help?

GitHub Copilot is Changing – Get Ready

Written by

Rob Bos

Explore More Blogs

Super Agent Mania: Why the Agent Count in Your Enterprise Is About to Explode

How to Get the Most Out of Your Agents: Multi-Agent Systems

How RockBot Learns New Skills: AI Agents Skills System

Bridging the Business-Technology Gap with a Semantic Data Layer

Let’s discuss how we can support your journey.