Python API
Note: this API is in preview and is subject to change.
- Install and import
- Model class
- GeneratorParams class
- Tokenizer class
- TokenizerStream class
- GeneratorParams class
- Generator class
Install and import
The Python API is delivered by the onnxruntime-genai Python package.
pip install onnxruntime-genai
import onnxruntime_genai
Model class
Load the model
Loads the ONNX model(s) and configuration from a folder on disk.
onnxruntime_genai.Model(model_folder: str) -> onnxruntime_genai.Model
Parameters
model_folder
: Location of model and configuration on disk
Returns
onnxruntime_genai.Model
Generate method
onnxruntime_genai.Model.generate(params: GeneratorParams) -> numpy.ndarray[int, int]
Parameters
params
: (Required) Created by theGeneratorParams
method.
Returns
numpy.ndarray[int, int]
: a two dimensional numpy array with dimensions equal to the size of the batch passed in and the maximum length of the sequence of tokens.
GeneratorParams class
Create GeneratorParams object
onnxruntime_genai.GeneratorParams(model: onnxruntime_genai.Model) -> onnxruntime_genai.GeneratorParams
Parameters
model
: (required) The model that was loaded by onnxruntime_genai.Model()
Returns
onnxruntime_genai.GeneratorParams
: The GeneratorParams object
Tokenizer class
Create tokenizer object
onnxruntime_genai.Model.Tokenizer(model: onnxruntime_genai.Model) -> onnxruntime_genai.Tokenizer
Parameters
model
: (Required) The model that was loaded by theModel()
Returns
Tokenizer
: The tokenizer object
Encode
onnxruntime_genai.Tokenizer.encode(text: str) -> numpy.ndarray[numpy.int32]
Parameters
text
: (Required)
Returns
numpy.ndarray[numpy.int32]
: an array of tokens representing the prompt
Decode
onnxruntime_genai.Tokenizer.decode(tokens: numpy.ndarry[int]) -> str
Parameters
numpy.ndarray[numpy.int32]
: (Required) a sequence of generated tokens
Returns
str
: the decoded generated tokens
Encode batch
onnxruntime_genai.Tokenizer.encode_batch(texts: list[str]) -> numpy.ndarray[int, int]
Parameters
texts
: A list of inputs
Returns
numpy.ndarray[int, int]
: The batch of tokenized strings
Decode batch
onnxruntime_genai.Tokenize.decode_batch(tokens: [[numpy.int32]]) -> list[str]
Parameters
- tokens
Returns
texts
: a batch of decoded text
Create tokenizer decoding stream
onnxruntime_genai.Tokenizer.create_stream() -> TokenizerStream
Parameters
None
Returns
onnxruntime_genai.TokenizerStream
The tokenizer stream object
TokenizerStream class
This class accumulates the next displayable string (according to the tokenizer’s vocabulary).
Decode method
onnxruntime_genai.TokenizerStream.decode(token: int32) -> str
Parameters
token
: (Required) A token to decode
Returns
str
: If a displayable string has accumulated, this method returns it. If not, this method returns the empty string.
GeneratorParams class
Create a Generator Params object
onnxruntime_genai.GeneratorParams(model: Model) -> GeneratorParams
Input_ids member
onnxruntime_genai.GeneratorParams.input_ids = numpy.ndarray[numpy.int32, numpy.int32]
Set search options method
onnxruntime_genai.GeneratorParams.set_search_options(options: dict[str, Any])
Generator class
Create a Generator
onnxruntime_genai.Generator(model: Model, params: GeneratorParams) -> Generator
Parameters
model
: (Required) The model to use for generationparams
: (Required) The set of parameters that control the generation
Returns
onnxruntime_genai.Generator
The Generator object
Is generation done
onnxruntime_genai.Generator.is_done() -> bool
Returns
Returns true when all sequences are at max length, or have reached the end of sequence.
Compute logits
Runs the model through one iteration.
onnxruntime_genai.Generator.compute_logits()
Generate next token
Using the current set of logits and the specified generator parameters, calculates the next batch of tokens, using Top P sampling.
onnxruntime_genai.Generator.generate_next_token()
Get next tokens
onnxruntime_genai.Generator.get_next_tokens() -> numpy.ndarray[numpy.int32]
Returns
numpy.ndarray[numpy.int32]
: The most recently generated tokens
Get sequence
onnxruntime_genai.Generator.get_sequence(index: int) -> numpy.ndarray[numpy.int32]
index
: (Required) The index of the sequence in the batch to return