kanji_lookup / config.py
etrotta's picture
Change the vector database used and embed the embeddings within the program
63a1db6
import os
lancedb_location = os.getenv('DATABASE_FILE', ".kanjidb")
description = """This is a Kanji image search demo. Draw or upload an image of an individual Kanji character."""
article = """
### Getting better results
Try different brush sizes.
Try to draw it centered in the middle of the canvas, both horizontally and vertically.
You may want to try using an external tool to draw then import a file.
The results is sorted by estimated distance from the input, but will rarely give the exact Kanji you are searching for as the first result
### About this project
It uses the "kha-white/manga-ocr-base" Vision Transformer Encoder model to create embeddings, then uses a vector database (lancedb) to find similar characters.
You can find the code used to create the embeddings as well as more information in https://github.com/etrotta/kanji_lookup
The database has been populated with over 10000 characters from [The KANJIDIC project](https://www.edrdg.org/wiki/index.php/KANJIDIC_Project), each rendered in multiple fonts downloaded from Google Fonts
"""