Recipes

Threading multiple images

Because most of the work is not completed in Python and is done via IO and the tesseract binary, we can utilize threading.

from multiprocessing.pool import ThreadPool
from piltesseract import get_text_from_image


thread_pool = ThreadPool(10)


def get_lines_from_images(image_list):
    """Gets text from a list of images.

    Uses threads to speed up the process

    Args:
        image_list (list[Image]): The list of images to use
            ocr on.

    Returns
        list[str]: The text from the images.

    """
    return thread_pool.map(get_text_from_image, image_list)