How AI Agents Use APIs to Interact with the World

If the LLM is the brain, APIs are the hands. To make an agent autonomous, you must grant it access to external software. This guide details how "Function Calling" bridges the gap between AI reasoning and real-world execution.

1. What is Function Calling?

Early LLMs could only generate text. If you asked ChatGPT to "check my bank balance," it would apologize and say it couldn't access the internet. Function calling (or Tool Calling) solves this.

When you initialize a Fikra model, you pass it a list of tools it is allowed to use. During its reasoning loop, if the model decides it needs data it doesn't have, it halts text generation. Instead, it outputs a strict JSON object that tells your server: "Run this API endpoint with these parameters."

2. Defining the API Boundary (JSON Schema)

Models do not magically know how to use your company's proprietary software. You must teach them the API contract using JSON schema.

Here is an example of defining a tool that allows the agent to search a local SQL database for customer records:

{
  "name": "search_customer_db",
  "description": "Searches the CRM for a customer's profile using their email address.",
  "parameters": {
    "type": "object",
    "properties": {
      "email": {
        "type": "string",
        "description": "The exact email address of the client."
      }
    },
    "required": ["email"]
  }
}

The Fikra AI reads this description. If a user says "Look up [email protected]," the AI knows to trigger the `search_customer_db` tool and automatically extracts the email string to pass as a parameter.

3. Real-World Example: M-Pesa API Integration

In African fintech, connecting an AI agent to payment gateways is incredibly powerful. Using the Safaricom Daraja API, we can give an agent the ability to check transaction statuses.

When the agent outputs the tool call JSON, your backend Python server catches it and executes the actual HTTP request to Safaricom:

# The agent requested to check a transaction.
# Your backend intercepts and runs this code:

import requests
from requests.auth import HTTPBasicAuth

def execute_mpesa_check(transaction_id):
    api_url = "https://sandbox.safaricom.co.ke/mpesa/transactionstatus/v1/query"
    
    # Send the request to the real-world API
    response = requests.post(api_url, json={
        "TransactionID": transaction_id,
        "CommandID": "TransactionStatusQuery",
        # ... other required Daraja parameters
    })
    
    # Return the raw JSON observation back to the Agent
    return response.json()

4. API Security and Guardrails

Giving an AI agent direct access to `POST`, `PUT`, or `DELETE` endpoints is dangerous without guardrails. Best practices in agent architecture dictate:

  • Human-in-the-Loop (HITL): For destructive actions (like issuing a refund or deleting a database row), the framework should pause the loop and require human API confirmation before executing.
  • Read-Only by Default: Start by only giving agents `GET` endpoints to retrieve context safely.
  • Strict Authentication: The agent itself does not hold API keys. The agent outputs the *intent* to use a tool, and your secure backend server handles the actual authenticated API request.

5. Frequently Asked Questions

How do AI agents interact with external enterprise software?

AI agents interact with software using RESTful APIs or GraphQL. The agent generates a structured JSON payload containing the required parameters, and a backend framework (like Fikra Claw) executes the HTTP request to the external software.

What is API tool-calling in Large Language Models?

Tool-calling (or function calling) is a feature where an LLM is trained to detect when it needs external information. Instead of generating conversational text, it halts generation and outputs a JSON object specifying which function it wants to run and with what data.

How do I give my AI agent safe access to the internet?

You provide the agent with a 'search' tool (like the Google Search API or Brave Search API). Crucially, the agent does not browse freely; it formulates a specific search query, the backend executes the API call, and returns only the text results back into the agent's memory window.