Liquid <>

Liquid <> .txt Collaboration

‍Faster and more accurate function calling on the edge with .txt’s structured outputs and Liquid Foundation Models

LLMs

Introduction

In practical edge applications like smart home systems, industrial IoT sensors, or mobile assistants, we need models to generate function calls as accurately and quickly as possible. These devices face three critical challenges:

Limited processing power - The iPhone 15 Pro achieves 2.15 TFLOPS (FP32), which is 31 times less than a single H200 GPU
Latency requirements - Users expect near-instant responses (less than 300 ms)
Reliability demands - Function calls must work correctly the first time to meet the latency requirements.

Traditional approaches to function calling with LLMs often fail to meet these constraints. Models are too large, generation is too slow, and outputs can be inconsistent. This creates a significant barrier to deploying AI assistants that need to interact reliably with hardware and software systems at the edge.

The Solution: LFM2-350M + dotgrammar

LFM2-350M: AI That Fits on the Edge

LFM2-350M is an LLM developed by Liquid AI specifically for high-performance edge deployment with a custom lightweight architecture:

Minimal footprint: It uses less than 1 GB of RAM without quantization, with minimal increase with long inputs
Responsive performance: Delivers sub-100ms inference times on common edge hardware
High quality: Achieves performance in terms of knowledge and reasoning on par with significantly larger models.

dotgrammar: Grammar-Based Generation

dotgrammar is a library developed by .txt for high-performance structured outputs using context-free grammars (CFGs). Complementing LFM2-350M's efficiency, dotgrammar ensures reliable function calling through:

CFG constraints: It guarantees syntactically valid outputs every time
Token-efficient function formats: Reducing generation time and improving responsiveness with Pythonic function calls.
Zero runtime overhead: It enforces constraints without increasing inference latency

Let's explore how these technologies work together.

What is function calling?

Function calling is a technique where language models output structured function calls rather than natural language text. It enables AI assistants to interact with external tools, APIs, and hardware in a reliable, programmatic way.

For instance, if you want a model to control a media player, you need it to generate precise commands that your application can parse and execute:

User: "Play Spotify at high volume"
AI: play_media(device="speaker", source="spotify", volume=1)

This approach effectively makes LLMs speak your application's native language, bridging the gap between natural language understanding and actionable commands.

The traditional JSON approach

Most LLM providers implement function calling using JSON objects. While functional, this approach creates significant inefficiencies for edge deployment:

{  
  "name": "play_media",
  "parameters": {
    "device": "speaker",
    "source": "spotify",
    "volume": 1
}

This JSON representation requires 37 tokens to generate, where each token increases latency. Moreover, LLMs are not reliable when it comes to generating properly formatted nested structures, such as JSON.

The pythonic alternative

By contrast, a more compact Pythonic representation:

play_media(device="speaker", source="spotify", volume=1)

Requires just 14 tokens. It’s a 2.6x reduction that directly translates to faster generation times without sacrificing expressivity. Because LLMs are trained on a lot of code, it’s also a more reliable and natural approach to implement function calling.

The need for structured generation

Despite this training, LLMs can still make mistakes. These errors become especially problematic in edge applications where:

Parameters have strict constraints (e.g., volume must be between 0-1)
Parameter combinations must follow business logic (e.g., "speaker" and "netflix" is an invalid combination)
Latency requirements leave no room for error handling and retries

For truly reliable function calls on the edge, we need structured generation that enforces output format in a deterministic way. This is where context-free grammars (CFGs) come in.

A context-free grammar (CFG) is a set of rules that defines exactly what combinations of characters or words are allowed, creating a "railroad track" that generation must follow.

Think of a CFG like a system for composing music with clear rules. Technically, a CFG consists of:

Terminal symbols: The actual characters that appear in the final output
Non-terminal symbols: Variables that get replaced according to rules
Production rules: Instructions for replacing non-terminals with terminals or other non-terminals

For example, this grammar ensures correct play_media function calls:

Call ::= "play_media" "(" Arguments ")"
Arguments ::= "device=\"" Device "\"" ", " "source=\"" Source "\"" ", " "volume=" Volume
Device ::= "speaker" | "tv"
Source ::= "spotify" | "netflix" | "youtube"
Volume ::= "0" | "0.1" | "0.2" | ... | "1.0"

In this example:

Terminal symbols include characters like "play_media", "(", and ")"
Non-terminal symbols include Call, Arguments, Device, etc.
The symbol "::=" means "can be replaced with"
The symbol "|" means "OR" (indicating a choice)

Grammar-based structured generation provides the following guarantees:

Security: Prevents injection attacks by strictly limiting what can be generated
Reliability: Guarantees parseable output
Validation: Enforces business rules at the generation stage

Demo: Smart Home Control

Let's see how LFM2-350M and dotgrammar work together in a practical edge application: a smart home assistant running locally on a hub device.

In this use case, the model will take in user queries and perform actions to set the state of various systems at home, like dimming lights, closing blinds, and playing media. Some functions like `theater_mode()` don’t require any arguments, but others like `set_display()` need more fine-grained control.

We will implement this using .txt’s dotgrammar library, which provides latency-free grammar-based structured generation.

Example 1: Simple function with unconstrained arguments

Let’s say a user just finished watching a movie, and they want to create a note about it. The smart home might have access to a `save_note(str)` function, which takes an arbitrary string as input.

In this case, we still want to constrain the output to ensure the function call is properly formatted, but we do not want to constrain the argument. This can be achieved in dotgrammar like this:

note_grammar = """
	?start: "save_note(" UNESCAPED_STRING ")"
	%import common.UNESCAPED_STRING
"""

Example 2: One function with constrained arguments

In many cases, the arguments will be constrained. For instance, a `set_display()` function can be called for one specific device in a list, like ‘tv’ and ‘projector’. In Python, we indicate this via string literals. In a CFG, we enforce this with pipes (“|”).

Here’s what such a grammar could look like for a function with type signature `set_display(screen: Literal[‘tv’, ‘projector’])`:

display_grammar = """
	?start: "set_display(device = " device ")"
	?device: "'tv'" | "'projector'"
"""

Note the double quotations wrapping ‘tv’ and ‘projector’: the outer double quotations indicate that everything inside should be treated as a constant in the CFG. The inner single quotations indicate that it should be a string in the Python function call.

A similar effect can be achieved for Boolean arguments. Let’s add an argument “night_mode” to control how bright to set the display. For this new version of the function with signature `set_display(screen: Literal[‘tv’, ‘projector’], night_mode: bool)`, we’ll generalize our notation a bit:

display_grammar = """
	?start: "set_display(" arguments )"
	?arguments: "device = " device ", night_mode = " night_mode
	?device: "'tv'" | "'projector'"
	?night_mode: "True" | "False"
"""