Function Calling

Introduction

Certain models support function calling (also called tool calling), which gives them the ability to respond to queries with function names and arguments that you can then invoke in your own application code.

To use it, pass an array of function descriptions to the tools key. If the LLM decides one or more of the available functions should be used to answer a query, it will respond with an array of the function names and their arguments to call in the tool_calls key of its response.

You can then use the data from tool_calls to invoke the named functions and get the results, which you can then provide directly to the user or pass them back into subsequent LLM queries for further processing.

Supported models

The following models currently support function calling:

moonshotai/Kimi-K2-Instruct
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
meta-llama/Llama-4-Scout-17B-16E-Instruct
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
meta-llama/Llama-3.3-70B-Instruct-Turbo
meta-llama/Llama-3.2-3B-Instruct-Turbo
Qwen/Qwen2.5-7B-Instruct-Turbo
Qwen/Qwen2.5-72B-Instruct-Turbo
Qwen/Qwen3-235B-A22B-fp8-tput
deepseek-ai/DeepSeek-V3
mistralai/Mistral-Small-24B-Instruct-2501
arcee-ai/virtuoso-large
arcee-ai/virtuoso-medium-v2
arcee-ai/caller

Basic example

Let's say our application has access to a get_current_weather function which takes in two named arguments,location and unit:

Python Copy

## Hypothetical function that exists in our app
get_current_weather(
  location="San Francisco, CA",
  unit="fahrenheit"
)

TypeScript Copy

// Hypothetical function that exists in our app
getCurrentWeather({
  location: "San Francisco, CA",
  unit: "fahrenheit"
})

We can make this function available to our LLM by passing its description to the tools key alongside the user's query. Let's suppose the user asks, "What is the current temperature of New York, San Francisco and Chicago?"

Python Copy

import json
from bytecompute import bytecompute

client = bytecompute()

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct-Turbo",
    messages=[
      {"role": "system", "content": "You are a helpful assistant that can access external functions. The responses from these function calls will be appended to this dialogue. Please provide responses based on the information from these function calls."},
      {"role": "user", "content": "What is the current temperature of New York, San Francisco and Chicago?"},
    ],
    tools=[
      {
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              },
              "unit": {
                "type": "string",
                "enum": [
                  "celsius",
                  "fahrenheit"
                ]
              }
            }
          }
        }
      }
    ]
)

print(json.dumps(response.choices[0].message.model_dump()['tool_calls'], indent=2))

TypeScript Copy

import bytecompute from 'bytecompute-ai';

const bytecompute = new bytecompute();

const response = await bytecompute.chat.completions.create({
  model: "Qwen/Qwen2.5-7B-Instruct-Turbo",
  messages: [
    {
      role: "system",
      content:
      "You are a helpful assistant that can access external functions. The responses from these function calls will be appended to this dialogue. Please provide responses based on the information from these function calls.",
    },
    {
      role: "user",
      content:
      "What is the current temperature of New York, San Francisco and Chicago?",
    },
  ],
  tools: [
    {
      type: "function",
      function: {
        name: "getCurrentWeather",
        description: "Get the current weather in a given location",
        parameters: {
          type: "object",
          properties: {
            location: {
              type: "string",
              description: "The city and state, e.g. San Francisco, CA",
            },
            unit: {
              type: "string",
              enum: ["celsius", "fahrenheit"],
            },
          },
        },
      },
    },
  ],
});

console.log(
  JSON.stringify(answerResponse.choices[0].message?.tool_calls, null, 2),
);

In response, the tool_calls key of the LLM's response will look like this:

JSON Copy

[
  {
    "index": 0,
    "id": "call_aisak3q1px3m2lzb41ay6rwf",
    "type": "function",
    "function": {
      "arguments": "{\"location\":\"New York, NY\",\"unit\":\"fahrenheit\"}",
      "name": "get_current_weather"
    }
  },
  {
    "index": 1,
    "id": "call_agrjihqjcb0r499vrclwrgdj",
    "type": "function",
    "function": {
      "arguments": "{\"location\":\"San Francisco, CA\",\"unit\":\"fahrenheit\"}",
      "name": "get_current_weather"
    }
  },
  {
    "index": 2,
    "id": "call_17s148ekr4hk8m5liicpwzkk",
    "type": "function",
    "function": {
      "arguments": "{\"location\":\"Chicago, IL\",\"unit\":\"fahrenheit\"}",
      "name": "get_current_weather"
    }
  }
]

As we can see, the LLM has given us three function calls that we can programmatically execute to answer the user's question.

Selecting a specific tool

By default, an LLM that's been provided with tools will automatically attempt to use the most appropriate one when generating responses.

If you'd like to manually select a specific tool to use for a completion, pass in the tool's name to the tool_choice parameter:

Python Copy

import json
from bytecompute import bytecompute

client = bytecompute()

tools = [
  {
    "type": "function",
    "function": {
      "name": "get_current_weather",
      # ...
    }
  },
  {
    "type": "function",
    "function": {
      "name": "get_current_stock_price",
      # ...
    }
  }
]

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct-Turbo",
    messages=[
      {"role": "user", "content": "What's the current price of Apple's stock?"},
    ],
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_current_stock_price"}}
)

print(json.dumps(response.choices[0].message.model_dump()['tool_calls'], indent=2))

TypeScript Copy

import bytecompute from "bytecompute-ai";

const bytecompute = new bytecompute();

const tools = [
  {
    type: "function",
    function: {
      name: "getCurrentWeather",
      // ...
    },
  },
  {
    type: "function",
    function: {
      name: "getCurrentStockPrice",
      // ...
    },
  },
];

const response = await bytecompute.chat.completions.create({
  model: "Qwen/Qwen2.5-7B-Instruct-Turbo",
  messages: [
    {
      role: "user",
      content: "What's the current price of Apple's stock?",
    },
  ],
  tools,
  tool_choice: "getCurrentStockPrice",
});

console.log(
  JSON.stringify(response.choices[0].message?.tool_calls, null, 2),
);

This ensures the model will use the provided function when generating its response:

JSON Copy

[
  {
    "index": 0,
    "id": "call_jxo8ybor16ju34abq552jymn",
    "type": "function",
    "function": {
      "arguments": "{\"ticker\":\"APPL\"}",
      "name": "get_current_stock_price"
    }
  }
]

Multi-turn example

Here's an example of passing the result of a tool call from one completion into a second follow-up completion. Multi-step function calls require tools to be passed when generating the enriched response. If you only require a single step, you can remove this:

Python Copy

import json
from bytecompute import bytecompute

client = bytecompute()

## Example function to make available to model
def get_current_weather(location, unit="fahrenheit"):
    """Get the weather for some location"""
    if "chicago" in location.lower():
        return json.dumps({"location": "Chicago", "temperature": "13", "unit": unit})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "55", "unit": unit})
    elif "new york" in location.lower():
        return json.dumps({"location": "New York", "temperature": "11", "unit": unit})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

tools = [
  {
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": [
              "celsius",
              "fahrenheit"
            ]
          }
        }
      }
    }
  }
]

messages = [
    {"role": "system", "content": "You are a helpful assistant that can access external functions. The responses from these function calls will be appended to this dialogue. Please provide responses based on the information from these function calls."},
    {"role": "user", "content": "What is the current temperature of New York, San Francisco and Chicago?"}
]
    
## Completion #1: Get the appropriate tool calls
response = client.chat.completions.create(
    model="Qwen/Qwen2.5-7B-Instruct-Turbo",
    messages=messages,
    tools=tools,
)

tool_calls = response.choices[0].message.tool_calls
if tool_calls:
    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_args = json.loads(tool_call.function.arguments)

        if function_name == "get_current_weather":
            function_response = get_current_weather(
                location=function_args.get("location"),
                unit=function_args.get("unit"),
            )
            messages.append(
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            )

    # Completion #2: Provide the results to get the final answer
    function_enriched_response = client.chat.completions.create(
        model="Qwen/Qwen2.5-7B-Instruct-Turbo",
        messages=messages,
    )
    print(json.dumps(function_enriched_response.choices[0].message.model_dump(), indent=2))

TypeScript Copy

import bytecompute from "bytecompute-ai";
import { CompletionCreateParams } from "bytecompute-ai/resources/chat/completions.mjs";

const bytecompute = new bytecompute();

// Example function to make available to model
function getCurrentWeather({
  location,
  unit = "fahrenheit",
}: {
  location: string;
  unit: "fahrenheit" | "celsius";
}) {
  let result: { location: string; temperature: number | null; unit: string };
  if (location.toLowerCase().includes("chicago")) {
    result = {
      location: "Chicago",
      temperature: 13,
      unit,
    };
  } else if (location.toLowerCase().includes("san francisco")) {
    result = {
      location: "San Francisco",
      temperature: 55,
      unit,
    };
  } else if (location.toLowerCase().includes("new york")) {
    result = {
      location: "New York",
      temperature: 11,
      unit,
    };
  } else {
    result = {
      location,
      temperature: null,
      unit,
    };
  }

  return JSON.stringify(result);
}

const tools = [
  {
    type: "function",
    function: {
      name: "getCurrentWeather",
      description: "Get the current weather in a given location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "The city and state, e.g. San Francisco, CA",
          },
          unit: {
            type: "string",
            enum: ["celsius", "fahrenheit"],
          },
        },
      },
    },
  },
];

const messages: CompletionCreateParams.Message[] = [
  {
    role: "system",
    content:
      "You are a helpful assistant that can access external functions. The responses from these function calls will be appended to this dialogue. Please provide responses based on the information from these function calls.",
  },
  {
    role: "user",
    content:
      "What is the current temperature of New York, San Francisco and Chicago?",
  },
];

const response = await bytecompute.chat.completions.create({
  model: "Qwen/Qwen2.5-7B-Instruct-Turbo",
  messages,
  tools,
});

if (response.choices[0].message?.tool_calls) {
  for (const toolCall of response.choices[0].message.tool_calls) {
    if (toolCall.function.name === "getCurrentWeather") {
      const args = JSON.parse(toolCall.function.arguments);
      const functionResponse = getCurrentWeather(args);

      messages.push({
        role: "tool",
        content: functionResponse,
      });
    }
  }

  const functionEnrichedResponse = await bytecompute.chat.completions.create({
    model: "Qwen/Qwen2.5-7B-Instruct-Turbo",
    messages,
    tools,
  });

  console.log(
    JSON.stringify(functionEnrichedResponse.choices[0].message, null, 2),
  );
}

And here's the final output from the second call:

JSON Copy

{
  "content": "The current temperature in New York is 11 degrees Fahrenheit, in San Francisco it is 55 degrees Fahrenheit, and in Chicago it is 13 degrees Fahrenheit.",
  "role": "assistant"
}

We've successfully used our LLM to generate three tool call descriptions, iterated over those descriptions to execute each one, and passed the results into a follow-up message to get the LLM to produce a final answer!

Documentation

Function Calling

Introduction

Supported models

Basic example

Selecting a specific tool

Multi-turn example

On this page