Khattat: Enhancing Readability and Concept Representation of Semantic Typography

1Egypt-Japan University of Science and Technology (E-JUST), 2Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), 3Swiss Federal Institute of Technology Lausanne (EPFL)
*Denotes equal contribution
AI4VA@ ECCV 2024
Khattat Results

Examples of semantic typography generated by our method in Arabic and English. Coloured examples are post-processed using Stable diffusion's depth-to-image method.

Abstract

Designing expressive typography that visually conveys a word's meaning while maintaining readability is a complex task, such art is known as semantic typography. It requires careful selection of an idea, choosing an appropriate font, and balancing creativity with legibility. We introduce an end-to-end system that transforms this process into an automated pipeline. To achieve this, we first use a Large Language Model (LLM) as a prompt engine to generate suitable imagery ideas for the given word, which is particularly useful for abstract concepts like ``freedom.'' Next, we use the FontCLIP pre-trained model to automatically select an appropriate font based on its semantic understanding of font attributes. The system then identifies the optimal region of the word for morphing and iteratively transforms it, leveraging the prior knowledge of a pre-trained diffusion model. A key feature is our OCR-based loss function, which enhances readability which allows for the simultaneous stylization of multiple characters. We compare our method with other baselines, demonstrating great readability enhancement and versatility across multiple languages and writing scripts.

Method

The methodology of the proposed Khattat system . The system first utilizes a prompt engine to get concrete concept and font prompts.The system then selects an appropriate font with FontCLIP and identifies the region fit for the concept prompt. Over 500 iterations, the system deforms the letter outlines to align with the concept, while applying regularizing loss terms to maintain readability and minimize distortions.

Khattat-method

Morphing process illustration

morphing-illustration

Results

We demonstrate great enhancement in readability over similar methods without compromising aesthetics .

OCR Accuracy Readability
Avg. Rank
Visual Appeal
Avg. Rank
Arabic (ar)
Ours (ar) 0.64 1.34 1.71
Word-as-Image (ar) 0.35 1.87 1.68
CLIPDraw (ar) 0.20 2.79 2.61
English (en)
Ours (en) 0.78 1.35 1.75
Word-as-Image (en) 0.62 1.78 1.71
CLIPDraw (en) 0.26 2.87 2.54

In the following Comparison, we showcase the strength of our method with robust readability regularization, allowing for effective Multi-letter stylization.

Comparison

BibTeX


          @inproceedings{10.1007/978-3-031-92808-6_18,
            author = {Hussein, Ahmed and Elsetohy, Alaa and Hadhoud, Sama and Bakr, Tameem and Rohaim, Yasser and AlKhamissi, Badr},
            title = {Khattat: Enhancing Readability and Concept Representation of Semantic Typography},
            year = {2025},
            isbn = {978-3-031-92807-9},
            publisher = {Springer-Verlag},
            address = {Berlin, Heidelberg},
            url = {https://doi.org/10.1007/978-3-031-92808-6_18},
            doi = {10.1007/978-3-031-92808-6_18},
            abstract = {Designing expressive typography that visually conveys a word’s meaning while maintaining readability is a complex task, known as semantic typography. It involves selecting an idea, choosing an appropriate font, and balancing creativity with legibility. We introduce an end-to-end system that automates this process. First, a Large Language Model (LLM) generates imagery ideas for the word, useful for abstract concepts like “freedom.” Then, the FontCLIP pre-trained model automatically selects a suitable font based on its semantic understanding of font attributes. The system identifies optimal regions of the word for morphing and iteratively transforms them using a pre-trained diffusion model. A key feature is our OCR-based loss function, which enhances readability and enables simultaneous stylization of multiple characters. We compare our method with other baselines, demonstrating great readability enhancement and versatility across multiple languages and writing scripts.},
            booktitle = {Computer Vision – ECCV 2024 Workshops: Milan, Italy, September 29–October 4, 2024, Proceedings, Part V},
            pages = {278–295},
            numpages = {18},
            keywords = {Semantic Typography, Multi-letter, Multilingual, OCR Loss, Large Language Models, Font Selection},
            location = {Milan, Italy}
            }