nyuuzyou

nyuuzyou

AI & ML interests

None yet

Organizations

Posts 26

view post
Post
1247
🎓 Introducing Kompy.info Uzbek Educational Dataset - nyuuzyou/kompy

Dataset highlights:
- 584,648 pages of educational content extracted from kompy.info, a comprehensive educational resource website
- Content exclusively in Uzbek language, focusing on technical and scientific topics
- Each entry contains: URL, page title, and extracted main text content
- Data extracted using trafilatura HTML extraction tool
- Covers a wide range of academic and educational materials
- Released to the public domain under Creative Commons Zero (CC0) license

The dataset presents a valuable resource for natural language processing tasks in the Uzbek language, particularly in educational and technical domains. It can be used for text classification, topic modeling, and content analysis of educational materials. The large-scale collection of Uzbek-language academic content makes it especially useful for developing educational technology applications and studying pedagogical approaches in Uzbek-language instruction. The dataset's monolingual nature provides a focused corpus for understanding technical and scientific terminology in Uzbek educational contexts.
view post
Post
2699
🎓 Introducing PPT4Web Educational Materials Dataset - nyuuzyou/ppt4web

Dataset highlights:
- 182,405 presentations from ppt4web.ru, a platform for storing and viewing presentations covering a wide range of educational materials
- Primarily in Russian, with content in English, Kazakh, Ukrainian, and Belarusian
- Each entry includes: URL, title, download URL, and filepath
- Contains original PPTX files (converted from PPT for consistency) in addition to metadata
- Data covers a broad spectrum of educational topics and subjects
- Dedicated to the public domain under Creative Commons Zero (CC0) license

The dataset can be used for analyzing educational presentation content across various subjects in multiple languages, text classification tasks, and information retrieval systems. It's particularly valuable for examining trends in education, teaching methodologies, and presentation materials used across different academic disciplines. The inclusion of original files allows for in-depth analysis of presentation formats and structures commonly used in educational settings, providing insights into the diverse range of subjects and teaching approaches.