Skip to contents

The function batch-wise computes the total number of tokens in a text file. The function returns a numeric value indicating the total number of tokens in the file. The function can be used on very large text files.

Usage

num_tokens_file(filename, batch_size = 1000, encoding = "cl100k_base")

Arguments

filename

character string indicating the name of the text file to read in

batch_size

integer indicating the number of lines to read in per batch (default is 1000)

encoding

character string indicating the encoding to use (default is "cl100k_base")

Value

a numeric value indicating the total number of tokens in the text file