Khattat:Enhancing Readability and Concept Representation of Semantic Typography

Khattat: Enhancing Readability and Concept Representation of Semantic Typography

¹Egypt-Japan University of Science and Technology (E-JUST), ²Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), ³Swiss Federal Institute of Technology Lausanne (EPFL)
^*Denotes equal contribution

AI4VA@ ECCV 2024

Abstract

Designing expressive typography that visually conveys a word's meaning while maintaining readability is a complex task, such art is known as semantic typography. It requires careful selection of an idea, choosing an appropriate font, and balancing creativity with legibility. We introduce an end-to-end system that transforms this process into an automated pipeline. To achieve this, we first use a Large Language Model (LLM) as a prompt engine to generate suitable imagery ideas for the given word, which is particularly useful for abstract concepts like ``freedom.'' Next, we use the FontCLIP pre-trained model to automatically select an appropriate font based on its semantic understanding of font attributes. The system then identifies the optimal region of the word for morphing and iteratively transforms it, leveraging the prior knowledge of a pre-trained diffusion model. A key feature is our OCR-based loss function, which enhances readability which allows for the simultaneous stylization of multiple characters. We compare our method with other baselines, demonstrating great readability enhancement and versatility across multiple languages and writing scripts.

Method

The methodology of the proposed Khattat system . The system first utilizes a prompt engine to get concrete concept and font prompts.The system then selects an appropriate font with FontCLIP and identifies the region fit for the concept prompt. Over 500 iterations, the system deforms the letter outlines to align with the concept, while applying regularizing loss terms to maintain readability and minimize distortions.

	OCR Accuracy	Readability Avg. Rank	Visual Appeal Avg. Rank
Arabic (ar)
Ours (ar)	0.64	1.34	1.71
Word-as-Image (ar)	0.35	1.87	1.68
CLIPDraw (ar)	0.20	2.79	2.61
English (en)
Ours (en)	0.78	1.35	1.75
Word-as-Image (en)	0.62	1.78	1.71
CLIPDraw (en)	0.26	2.87	2.54

OCR Accuracy

Readability
Avg. Rank

Visual Appeal
Avg. Rank

Arabic (ar)

Ours (ar)

0.64

1.34

1.71

Word-as-Image (ar)

0.35

1.87

1.68

CLIPDraw (ar)

0.20

2.79

2.61

English (en)

Ours (en)

0.78

1.35

1.75

Word-as-Image (en)

0.62

1.78

1.71

CLIPDraw (en)

0.26

2.87

2.54

BibTeX

@inproceedings{10.1007/978-3-031-92808-6_18, author = {Hussein, Ahmed and Elsetohy, Alaa and Hadhoud, Sama and Bakr, Tameem and Rohaim, Yasser and AlKhamissi, Badr}, title = {Khattat: Enhancing Readability and Concept Representation of Semantic Typography}, year = {2025}, isbn = {978-3-031-92807-9}, publisher = {Springer-Verlag}, address = {Berlin, Heidelberg}, url = {https://doi.org/10.1007/978-3-031-92808-6_18}, doi = {10.1007/978-3-031-92808-6_18}, abstract = {Designing expressive typography that visually conveys a word’s meaning while maintaining readability is a complex task, known as semantic typography. It involves selecting an idea, choosing an appropriate font, and balancing creativity with legibility. We introduce an end-to-end system that automates this process. First, a Large Language Model (LLM) generates imagery ideas for the word, useful for abstract concepts like “freedom.” Then, the FontCLIP pre-trained model automatically selects a suitable font based on its semantic understanding of font attributes. The system identifies optimal regions of the word for morphing and iteratively transforms them using a pre-trained diffusion model. A key feature is our OCR-based loss function, which enhances readability and enables simultaneous stylization of multiple characters. We compare our method with other baselines, demonstrating great readability enhancement and versatility across multiple languages and writing scripts.}, booktitle = {Computer Vision – ECCV 2024 Workshops: Milan, Italy, September 29–October 4, 2024, Proceedings, Part V}, pages = {278–295}, numpages = {18}, keywords = {Semantic Typography, Multi-letter, Multilingual, OCR Loss, Large Language Models, Font Selection}, location = {Milan, Italy} }

Khattat: Enhancing Readability and Concept Representation of Semantic Typography

Examples of semantic typography generated by our method in Arabic and English. Coloured examples are post-processed using Stable diffusion's depth-to-image method.

Abstract

Method

Morphing process illustration

Results

BibTeX