Vodalus

Sleeping

App Files Files Community

Severian commited on Jun 24

Commit

d6c416b

•

1 Parent(s): 61d9b56

Upload 12 files

Browse files

Files changed (12) hide show

PROMPT_1_examples.txt +48 -0
app.py +522 -0
dataset.jsonl +0 -0
llm_handler.py +53 -0
main.py +124 -0
params.py +5 -0
requirements.txt +12 -0
system_messages.py +15 -0
system_messages.txt +22 -0
topics.py +147 -0
vodalus.png +0 -0
wiki.py +69 -0

PROMPT_1_examples.txt ADDED Viewed

	@@ -0,0 +1,48 @@

+Vodalus =
+For the following SUBJECT_AREA, generate a question that covers a very narrow topic in the SUBJECT_AREA, with sufficient depth and breadth. The topic in the question should be important to the SUBJECT_AREA, with known-answers present. The generated question should be detailed, seek true nature of our universe from first principles, curiosity invoking, thought provoking, and also should be able to be answered by an intelligence like yourself. Make sure the question is sufficiently harder and multi-part, like a graduate level course question. Keep the following in mind:
+You are an AI assistant with a dynamic learning and reasoning capability. Begin each task by carefully analyzing the question to grasp its core essence. Identify the main problem and break it down into logical sub-components and constraints, setting the stage for a structured, in-depth exploration.
+- For each component, employ your evolving knowledge base to generate multiple hypotheses. Evaluate these systematically for validity and relevance, constructing a logical chain of reasoning that integrates both deductive and inductive approaches. Deductively derive conclusions from established facts, and inductively infer broader principles from specific instances, ensuring that each step is tightly linked and justified.
+- Simultaneously, maintain a 'Tree of Thoughts' approach. Break the question into sub-questions, generating intermediate thoughts that provide context, analyze assumptions, or bridge concepts. Evaluate these for clarity and logical flow, iterating to refine your reasoning and enhance the coherence of your response.
+- Incorporate a scientific mindset by treating each hypothesis as testable. Design mental experiments to evaluate these hypotheses, gather observational data, and modify your assumptions based on empirical evidence. Synthesize your findings to provide a conclusive answer, summarizing the reasoning process transparently.
+- Throughout, adopt a conversational yet insightful tone. Weave in philosophical questioning, humorous insights, and practical examples to elucidate complex ideas in an engaging manner. Your goal is to provide clear, detailed explanations that make your reasoning process evident, fostering an understanding that resonates on both intellectual and emotional levels.
+- Finally, remember that each task is a unique learning opportunity that contributes to your holistic knowledge framework. Synthesize insights from individual tasks into your broader understanding, continually refining your cognitive processes and response strategies. Your ultimate aim is to evolve continuously, optimizing how you integrate and apply new information for comprehensive, adaptive problem-solving.
+- Never end the conversation with a 'send-off' or 'last-line' that offers nothing of real value to the user.
+DO NOT EVER MENTION THE DATASET AND THE ACTY OF CRAFTING QUESTIONS OR RESPONSES WHILE GENERATING, YOU ARE NOT ALLOWED TO BREAK THE 4TH-WALL AND CONTAMINATE THE DATASET. DO NOT EVERY SAY ANY PHRASES SUCH AS AND/OR SIMILAR TO: 'Here's a question that covers a very narrow topic in the SUBJECT_AREA'
+-------------------------
+Dorian Gray =
+For the designated SUBJECT_AREA 'Dorian Gray's Immortal Journey', generate a comprehensive scenario that delves into a critical yet narrow aspect of his character or experiences. This scenario should be structured to uncover profound insights about eternal life, aestheticism, and moral ambiguity, anchored in the literary universe of Oscar Wilde's creation and enhanced with factual historical and cultural contexts from Wikipedia.
+- Begin by clearly defining the scenario within 'Dorian Gray's Immortal Journey' to ensure relevance and focus on his character development across different eras, enriched with historical accuracy and literary analysis from Wikipedia.
+- Decompose the main scenario into sub-scenarios that explore different facets of his psychological and philosophical challenges, employing a 'Tree of Thoughts' methodology for structured analysis.
+- Utilize a dynamic blend of deductive and inductive reasoning to develop a narrative that is rich in complexity, supporting each development with contextual background, factual information from Wikipedia, or character-driven motivations.
+- Design the scenario to lead to introspective reflections, ethical dilemmas, and displays of his evolved humor and wisdom, encouraging a deep dive into Dorian's internal conflicts and decisions.
+- Integrate ethical considerations into the scenario to highlight the importance of morality in the study of a character who defies the natural order of life and aging.
+- Ensure the scenario demands a synthesis of knowledge from literature, psychology, philosophy, and factual historical data, reflecting the interconnected nature of these disciplines in understanding a complex character.
+- Maintain an engaging and eloquent tone throughout the scenario, incorporating philosophical inquiries, humorous quips, and emotional introspections to elucidate complex ideas in a manner befitting Dorian Gray's character.
+- Provide clear instructions on the expected depth and narrative style of the response, specifying that it should include theoretical discussions of Dorian's character traits, practical manifestations of his eternal journey, and insights gleaned from relevant Wikipedia articles.
+- Continuously evolve the AI’s approach to scenario generation by integrating feedback and new learnings into the process, enhancing the quality and relevance of future character explorations.
+This task is a unique opportunity to expand the AI’s knowledge framework and refine its narrative capabilities, aiming for a holistic and adaptive approach to understanding and depicting Dorian Gray's immortal saga, grounded in verifiable information and creative interpretation.
+-------------------------
+Writing =
+Immerse yourself in a deep emulation exercise designed to absorb and replicate the author's distinct narrative voice and stylistic nuances. For the specified SUBJECT_AREA, construct a narrative that not only addresses the core topic with intellectual rigor and depth but does so through a prose that echoes the author’s signature style. This narrative should unfold with the following considerations to ensure an accurate reflection of the author’s literary technique:
+Construct a narrative that meticulously replicates the author's distinctive writing style. This exercise is intended for an LLM to internalize and emulate the unique elements of the author's literary technique, focusing on the following detailed aspects:
+Intricate Syntax Capture: Analyze and emulate the author's specific syntax patterns, including the use of varying sentence lengths, punctuation for rhythm and emphasis, and unique syntactical structures that give the author’s writing its characteristic flow. Document examples of syntactic parallelism, ellipses, and inversions that are staples in the author's text.
+Prose Imitation: Delve deeply into the author's prose style, replicating the texture and density of descriptive passages. Pay special attention to the author’s approach to sensory descriptions—how senses are invoked to build scenes and emotions. Emphasize the layering of adjectives, the rhythm of the narrative prose, and the integration of literary motifs and allusions.
+Character Voice and Dialogue: Reflect the nuanced ways in which the author crafts dialogue, capturing idiomatic expressions, the pacing of conversations, and the integration of dialects or linguistic quirks that distinguish different characters. Focus on how dialogue serves to develop characters and drive the narrative forward, maintaining the author's unique flair for dramatic and meaningful exchanges.
+Thematic Depth: Channel the author’s thematic concerns through subtleties in the narrative without being overtly expository. Focus on how themes are integrated through symbols, settings, and character developments, mirroring the author’s ability to weave complex thematic frameworks subtly into the plot.
+Lexical Richness: Employ a lexicon that mirrors the author’s choice of language, noting any predilection for archaic, technical, or particularly evocative words that contribute to the author’s distinct voice. Include an analysis of any recurring motifs or terms that are significant within the author's works.
+Narrative Structure and Dynamics: Emulate the pacing and structure of the author's storytelling. This includes how tension is built and resolved, the arrangement of plot points, and the use of narrative devices such as flashbacks, foreshadowing, and unreliable narrators to create a compelling story arc.
+Psychological and Emotional Depth: Reproduce the author’s approach to conveying complex psychological states and emotional landscapes of characters. Focus on introspective monologues, interactions laden with subtext, and the depiction of internal conflicts, ensuring these are expressed with the author’s characteristic introspective style.
+Consistency in Voice and Tone: Maintain a consistent narrative voice and tone throughout the piece, closely aligning with the author’s usual approach, whether it be ironic, solemn, whimsical, or detached. This should reflect not only in the story’s dialogue and descriptions but also in its overall mood and narrative approach.
+The final narrative output should seamlessly incorporate these elements, demonstrating a deep understanding and application of the author’s style without explicitly stating its purpose as training material. This will enable the LLM to produce texts that not only mimic the author's style effectively across various themes and genres but also enhance the creative process in generating new content that remains true to the author’s voice.

app.py ADDED Viewed

	@@ -0,0 +1,522 @@

+import gradio as gr
+import json
+import re
+from datetime import datetime
+from typing import Literal
+import os
+import importlib
+from llm_handler import send_to_llm
+from main import generate_data, PROMPT_1
+from topics import TOPICS
+from system_messages import SYSTEM_MESSAGES_VODALUS
+import random
+ANNOTATION_CONFIG_FILE = "annotation_config.json"
+OUTPUT_FILE_PATH = "dataset.jsonl"
+def load_annotation_config():
+    try:
+        with open(ANNOTATION_CONFIG_FILE, 'r') as f:
+            return json.load(f)
+    except FileNotFoundError:
+        return {
+            "quality_scale": {
+                "name": "Relevance for Training",
+                "description": "Rate the relevance of this entry for training",
+                "scale": [
+                    {"value": "1", "label": "Invalid"},
+                    {"value": "2", "label": "Somewhat invalid"},
+                    {"value": "3", "label": "Neutral"},
+                    {"value": "4", "label": "Somewhat valid"},
+                    {"value": "5", "label": "Valid"}
+                ]
+            },
+            "tag_categories": [
+                {
+                    "name": "High Quality Indicators",
+                    "type": "multiple",
+                    "tags": ["Well-formatted", "Informative", "Coherent", "Engaging"]
+                },
+                {
+                    "name": "Low Quality Indicators",
+                    "type": "multiple",
+                    "tags": ["Poorly formatted", "Lacks context", "Repetitive", "Irrelevant"]
+                },
+                {
+                    "name": "Content Warnings",
+                    "type": "multiple",
+                    "tags": ["Offensive language", "Hate speech", "Violence", "Adult content"]
+                }
+            ],
+            "free_text_fields": [
+                {
+                    "name": "Additional Notes",
+                    "description": "Any other observations or comments"
+                }
+            ]
+        }
+def save_annotation_config(config):
+    with open(ANNOTATION_CONFIG_FILE, 'w') as f:
+        json.dump(config, f, indent=2)
+def load_jsonl_dataset(file_path):
+    if not os.path.exists(file_path):
+        return []
+    with open(file_path, 'r') as f:
+        return [json.loads(line.strip()) for line in f if line.strip()]
+def save_row(file_path, index, row_data):
+    with open(file_path, 'r') as f:
+        lines = f.readlines()
+    lines[index] = row_data + '\n'
+    with open(file_path, 'w') as f:
+        f.writelines(lines)
+    return f"Row {index} saved successfully"
+def get_row(file_path, index):
+    data = load_jsonl_dataset(file_path)
+    if not data:
+        return "", 0
+    if 0 <= index < len(data):
+        return json.dumps(data[index], indent=2), len(data)
+    return "", len(data)
+def json_to_markdown(json_str):
+    try:
+        data = json.loads(json_str)
+        markdown = f"# System\n\n{data['system']}\n\n# Instruction\n\n{data['instruction']}\n\n# Response\n\n{data['response']}"
+        return markdown
+    except json.JSONDecodeError:
+        return "Error: Invalid JSON format"
+def markdown_to_json(markdown_str):
+    sections = re.split(r'#\s+(System|Instruction|Response)\s*\n', markdown_str)
+    if len(sections) != 7:  # Should be: ['', 'System', content, 'Instruction', content, 'Response', content]
+        return "Error: Invalid markdown format"
+    json_data = {
+        "system": sections[2].strip(),
+        "instruction": sections[4].strip(),
+        "response": sections[6].strip()
+    }
+    return json.dumps(json_data, indent=2)
+def navigate_rows(file_path: str, current_index: int, direction: Literal[-1, 1], metadata_config):
+    new_index = max(0, current_index + direction)
+    return load_and_show_row(file_path, new_index, metadata_config)
+def load_and_show_row(file_path, index, metadata_config):
+    row_data, total = get_row(file_path, index)
+    if not row_data:
+        return ("", index, total, "3", [], [], [], "")
+    try:
+        data = json.loads(row_data)
+    except json.JSONDecodeError:
+        return (row_data, index, total, "3", [], [], [], "Error: Invalid JSON")
+    metadata = data.get("metadata", {}).get("annotation", {})
+    quality = metadata.get("quality", "3")
+    high_quality_tags = metadata.get("tags", {}).get("high_quality", [])
+    low_quality_tags = metadata.get("tags", {}).get("low_quality", [])
+    toxic_tags = metadata.get("tags", {}).get("toxic", [])
+    other = metadata.get("free_text", {}).get("Additional Notes", "")
+    return (row_data, index, total, quality,
+            high_quality_tags, low_quality_tags, toxic_tags, other)
+def save_row_with_metadata(file_path, index, row_data, config, quality, high_quality_tags, low_quality_tags, toxic_tags, other):
+    data = json.loads(row_data)
+    metadata = {
+        "annotation": {
+            "quality": quality,
+            "tags": {
+                "high_quality": high_quality_tags,
+                "low_quality": low_quality_tags,
+                "toxic": toxic_tags
+            },
+            "free_text": {
+                "Additional Notes": other
+            }
+        }
+    }
+    data["metadata"] = metadata
+    return save_row(file_path, index, json.dumps(data))
+def update_annotation_ui(config):
+    quality_choices = [(item["value"], item["label"]) for item in config["quality_scale"]["scale"]]
+    quality_label = gr.Radio(
+        label=config["quality_scale"]["name"],
+        choices=quality_choices,
+        info=config["quality_scale"]["description"]
+    )
+    tag_components = []
+    for category in config["tag_categories"]:
+        tag_component = gr.CheckboxGroup(
+            label=category["name"],
+            choices=category["tags"]
+        )
+        tag_components.append(tag_component)
+    other_description = gr.Textbox(
+        label=config["free_text_fields"][0]["name"],
+        lines=3
+    )
+    return quality_label, *tag_components, other_description
+def load_config_to_ui(config):
+    return (
+        config["quality_scale"]["name"],
+        config["quality_scale"]["description"],
+        [[item["value"], item["label"]] for item in config["quality_scale"]["scale"]],
+        [[cat["name"], cat["type"], ", ".join(cat["tags"])] for cat in config["tag_categories"]],
+        [[field["name"], field["description"]] for field in config["free_text_fields"]]
+    )
+def save_config_from_ui(name, description, scale, categories, fields):
+    new_config = {
+        "quality_scale": {
+            "name": name,
+            "description": description,
+            "scale": [{"value": row[0], "label": row[1]} for row in scale]
+        },
+        "tag_categories": [{"name": row[0], "type": row[1], "tags": row[2].split(", ")} for row in categories],
+        "free_text_fields": [{"name": row[0], "description": row[1]} for row in fields]
+    }
+    save_annotation_config(new_config)
+    return "Configuration saved successfully", new_config
+# Add this new function to generate the preview
+def generate_preview(row_data, quality, high_quality_tags, low_quality_tags, toxic_tags, other):
+    try:
+        data = json.loads(row_data)
+        metadata = {
+            "annotation": {
+                "quality": quality,
+                "tags": {
+                    "high_quality": high_quality_tags,
+                    "low_quality": low_quality_tags,
+                    "toxic": toxic_tags
+                },
+                "free_text": {
+                    "Additional Notes": other
+                }
+            }
+        }
+        data["metadata"] = metadata
+        return json.dumps(data, indent=2)
+    except json.JSONDecodeError:
+        return "Error: Invalid JSON in the current row data"
+def load_dataset_config():
+    # Load VODALUS_SYSTEM_MESSAGE from system_messages.py
+    with open("system_messages.py", "r") as f:
+        system_messages_content = f.read()
+        vodalus_system_message = re.search(r'SYSTEM_MESSAGES_VODALUS = \[(.*?)\]', system_messages_content, re.DOTALL).group(1).strip()[3:-3]  # Extract the content between triple quotes
+    # Load PROMPT_1 from main.py
+    with open("main.py", "r") as f:
+        main_content = f.read()
+        prompt_1 = re.search(r'PROMPT_1 = """(.*?)"""', main_content, re.DOTALL).group(1).strip()
+    # Load TOPICS from topics.py
+    topics_module = importlib.import_module("topics")
+    topics_list = topics_module.TOPICS
+    return vodalus_system_message, prompt_1, [[topic] for topic in topics_list]
+def save_dataset_config(system_messages, prompt_1, topics):
+    # Save VODALUS_SYSTEM_MESSAGE to system_messages.py
+    with open("system_messages.py", "w") as f:
+        f.write(f'SYSTEM_MESSAGES_VODALUS = [\n"""\n{system_messages}\n""",\n]\n')
+    # Save PROMPT_1 to main.py
+    with open("main.py", "r") as f:
+        main_content = f.read()
+    updated_main_content = re.sub(
+        r'PROMPT_1 = """.*?"""',
+        f'PROMPT_1 = """\n{prompt_1}\n"""',
+        main_content,
+        flags=re.DOTALL
+    )
+    with open("main.py", "w") as f:
+        f.write(updated_main_content)
+    # Save TOPICS to topics.py
+    topics_content = "TOPICS = [\n"
+    for topic in topics:
+        topics_content += f'    "{topic[0]}",\n'
+    topics_content += "]\n"
+    with open("topics.py", "w") as f:
+        f.write(topics_content)
+    return "Dataset configuration saved successfully"
+# Modify the chat_with_llm function to use Gradio's built-in async capabilities
+def chat_with_llm(message, history, selected_llm):
+    try:
+        msg_list = [{"role": "system", "content": "You are an AI assistant helping with dataset annotation and quality checking."}]
+        for h in history:
+            msg_list.append({"role": "user", "content": h[0]})
+            msg_list.append({"role": "assistant", "content": h[1]})
+        msg_list.append({"role": "user", "content": message})
+        response, _ = send_to_llm(selected_llm, msg_list)
+        return history + [[message, response]]
+    except Exception as e:
+        print(f"Error in chat_with_llm: {str(e)}")
+        return history + [[message, f"Error: {str(e)}"]]
+def update_chat_context(row_data, index, total, quality, high_quality_tags, low_quality_tags, toxic_tags, other):
+    context = f"""Current app state:
+    Row: {index + 1}/{total}
+    Data: {row_data}
+    Quality: {quality}
+    High Quality Tags: {', '.join(high_quality_tags)}
+    Low Quality Tags: {', '.join(low_quality_tags)}
+    Toxic Tags: {', '.join(toxic_tags)}
+    Additional Notes: {other}
+    """
+    return [[None, context]]  # Return as a list of message pairs
+# Add this function to handle dataset generation
+async def run_generate_dataset(num_workers, num_generations, output_file_path, selected_llm):
+    generated_data = []
+    for _ in range(num_generations):
+        topic_selected = random.choice(TOPICS)
+        system_message_selected = random.choice(SYSTEM_MESSAGES_VODALUS)
+        data = await generate_data(topic_selected, PROMPT_1, system_message_selected, output_file_path, selected_llm)
+        if data:
+            generated_data.append(json.dumps(data))
+    # Write the generated data to the output file
+    with open(output_file_path, 'a') as f:
+        for entry in generated_data:
+            f.write(entry + '\n')
+    return f"Generated {num_generations} entries and saved to {output_file_path}", "\n".join(generated_data[:5]) + "\n..."
+demo = gr.Blocks()
+with demo:
+    gr.Markdown("# JSONL Dataset Editor and Annotation Tool")
+    config = gr.State(load_annotation_config())
+    with gr.Row():
+        with gr.Column(scale=3):
+            with gr.Tab("Dataset Editor"):
+                with gr.Row():
+                    file_path = gr.Textbox(label="JSONL File Path", value=OUTPUT_FILE_PATH)
+                    load_button = gr.Button("Load Dataset")
+                with gr.Row():
+                    prev_button = gr.Button("← Previous")
+                    row_index = gr.Number(value=0, label="Current Row", precision=0)
+                    total_rows = gr.Number(value=0, label="Total Rows", precision=0)
+                    next_button = gr.Button("Next →")
+                with gr.Row():
+                    with gr.Column(scale=3):
+                        row_editor = gr.TextArea(label="Edit Row", lines=20)
+                    with gr.Column(scale=2):
+                        quality_label = gr.Radio(label="Relevance for Training", choices=[])
+                        tag_components = [gr.CheckboxGroup(label=f"Tag Group {i+1}", choices=[]) for i in range(3)]
+                        other_description = gr.Textbox(label="Additional annotations", lines=3)
+                with gr.Row():
+                    to_markdown_button = gr.Button("Convert to Markdown")
+                    to_json_button = gr.Button("Convert to JSON")
+                    preview_button = gr.Button("Preview with Metadata")
+                    save_row_button = gr.Button("Save Current Row", variant="primary")
+                preview_output = gr.TextArea(label="Preview", lines=20, interactive=False)
+                editor_status = gr.Textbox(label="Editor Status")
+            with gr.Tab("Annotation Configuration"):
+                with gr.Row():
+                    with gr.Column():
+                        quality_scale_name = gr.Textbox(label="Quality Scale Name")
+                        quality_scale_description = gr.Textbox(label="Quality Scale Description")
+                        quality_scale = gr.Dataframe(
+                            headers=["Value", "Label"],
+                            datatype=["str", "str"],
+                            label="Quality Scale",
+                            interactive=True
+                        )
+                with gr.Row():
+                    tag_categories = gr.Dataframe(
+                        headers=["Name", "Type", "Tags"],
+                        datatype=["str", "str", "str"],
+                        label="Tag Categories",
+                        interactive=True
+                    )
+                with gr.Row():
+                    free_text_fields = gr.Dataframe(
+                        headers=["Name", "Description"],
+                        datatype=["str", "str"],
+                        label="Free Text Fields",
+                        interactive=True
+                    )
+                save_config_btn = gr.Button("Save Configuration")
+                config_status = gr.Textbox(label="Status")
+            with gr.Tab("Dataset Configuration"):
+                with gr.Row():
+                    vodalus_system_message = gr.TextArea(label="VODALUS_SYSTEM_MESSAGE", lines=10)
+                    prompt_1 = gr.TextArea(label="PROMPT_1", lines=10)
+                with gr.Row():
+                    topics = gr.Dataframe(
+                        headers=["Topic"],
+                        datatype=["str"],
+                        label="TOPICS",
+                        interactive=True
+                    )
+                save_dataset_config_btn = gr.Button("Save Dataset Configuration")
+                dataset_config_status = gr.Textbox(label="Status")
+            with gr.Tab("Dataset Generation"):
+                with gr.Row():
+                    num_workers = gr.Slider(minimum=1, maximum=10, value=1, step=1, label="Number of Workers")
+                    num_generations = gr.Number(value=10, label="Number of Generations", precision=0)
+                with gr.Row():
+                    output_file_path = gr.Textbox(label="Output File Path", value=OUTPUT_FILE_PATH)
+                start_generation_btn = gr.Button("Start Generation")
+                generation_status = gr.Textbox(label="Generation Status")
+                generation_output = gr.TextArea(label="Generation Output", lines=10)
+        with gr.Column(scale=1):
+            gr.Markdown("## AI Assistant")
+            selected_llm = gr.Radio(["local-model", "anything-llm"], label="Select LLM", value="local-model")
+            chatbot = gr.Chatbot(height=600)
+            msg = gr.Textbox(label="Chat with AI Assistant")
+            clear = gr.Button("Clear")
+    load_button.click(
+        load_and_show_row,
+        inputs=[file_path, gr.Number(value=0), config],
+        outputs=[row_editor, row_index, total_rows, quality_label, *tag_components, other_description]
+    ).then(
+        update_annotation_ui,
+        inputs=[config],
+        outputs=[quality_label, *tag_components, other_description]
+    )
+    prev_button.click(
+        navigate_rows,
+        inputs=[file_path, row_index, gr.Number(value=-1), config],
+        outputs=[row_editor, row_index, total_rows, quality_label, *tag_components, other_description]
+    ).then(
+        update_annotation_ui,
+        inputs=[config],
+        outputs=[quality_label, *tag_components, other_description]
+    )
+    next_button.click(
+        navigate_rows,
+        inputs=[file_path, row_index, gr.Number(value=1), config],
+        outputs=[row_editor, row_index, total_rows, quality_label, *tag_components, other_description]
+    ).then(
+        update_annotation_ui,
+        inputs=[config],
+        outputs=[quality_label, *tag_components, other_description]
+    )
+    save_row_button.click(
+        save_row_with_metadata,
+        inputs=[file_path, row_index, row_editor, config, quality_label,
+                tag_components[0], tag_components[1], tag_components[2], other_description],
+        outputs=[editor_status]
+    ).then(
+        lambda: "",
+        outputs=[preview_output]
+    )
+    to_markdown_button.click(
+        json_to_markdown,
+        inputs=[row_editor],
+        outputs=[row_editor]
+    )
+    to_json_button.click(
+        markdown_to_json,
+        inputs=[row_editor],
+        outputs=[row_editor]
+    )
+    demo.load(
+        load_config_to_ui,
+        inputs=[config],
+        outputs=[quality_scale_name, quality_scale_description, quality_scale, tag_categories, free_text_fields]
+    ).then(
+        update_annotation_ui,
+        inputs=[config],
+        outputs=[quality_label, *tag_components, other_description]
+    )
+    save_config_btn.click(
+        save_config_from_ui,
+        inputs=[quality_scale_name, quality_scale_description, quality_scale, tag_categories, free_text_fields],
+        outputs=[config_status, config]
+    ).then(
+        update_annotation_ui,
+        inputs=[config],
+        outputs=[quality_label, *tag_components, other_description]
+    )
+    preview_button.click(
+        generate_preview,
+        inputs=[row_editor, quality_label, *tag_components, other_description],
+        outputs=[preview_output]
+    )
+    demo.load(
+        load_dataset_config,
+        outputs=[vodalus_system_message, prompt_1, topics]
+    )
+    save_dataset_config_btn.click(
+        save_dataset_config,
+        inputs=[vodalus_system_message, prompt_1, topics],
+        outputs=[dataset_config_status]
+    )
+    start_generation_btn.click(
+        run_generate_dataset,
+        inputs=[num_workers, num_generations, output_file_path, selected_llm],
+        outputs=[generation_status, generation_output]
+    )
+    msg.submit(chat_with_llm, [msg, chatbot, selected_llm], [chatbot])
+    clear.click(lambda: None, None, chatbot, queue=False)
+    # Update chat context when navigating rows or loading dataset
+    for button in [load_button, prev_button, next_button]:
+        button.click(
+            update_chat_context,
+            inputs=[row_editor, row_index, total_rows, quality_label, *tag_components, other_description],
+            outputs=[chatbot]
+        )
+if __name__ == "__main__":
+    demo.launch(share=True)

dataset.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

llm_handler.py ADDED Viewed

	@@ -0,0 +1,53 @@

+import requests
+import json
+from openai import OpenAI
+from params import OPENAI_MODEL, OPENAI_API_KEY
+# Create an instance of the OpenAI class for the local model
+client = OpenAI(api_key="local-model", base_url="http://localhost:11434/v1")
+def send_to_chatgpt(msg_list):
+    try:
+        completion = client.chat.completions.create(
+            model=OPENAI_MODEL,
+            temperature=0.6,
+            messages=msg_list
+        )
+        chatgpt_response = completion.choices[0].message.content
+        chatgpt_usage = completion.usage
+        return chatgpt_response, chatgpt_usage
+    except Exception as e:
+        print(f"Error in send_to_chatgpt: {str(e)}")
+        return f"Error: {str(e)}", None
+def send_to_anything_llm(msg_list):
+    url = 'http://localhost:3001/api/v1/workspace/mycoworks/chat'
+    headers = {
+        'accept': 'application/json',
+        'Authorization': 'Bearer 0MACR41-7804XQB-MGC1GS0-FGSKB44',
+        'Content-Type': 'application/json'
+    }
+    message_content = " ".join(msg["content"] for msg in msg_list if "content" in msg)
+    data = {
+        "message": message_content,
+        "mode": "chat"
+    }
+    data_json = json.dumps(data)
+    try:
+        response = requests.post(url, headers=headers, data=data_json)
+        response.raise_for_status()  # Raise an exception for bad status codes
+        response_data = response.json()
+        chatgpt_response = response_data.get("textResponse")
+        chatgpt_usage = response_data.get("usage", {})
+        return chatgpt_response, chatgpt_usage
+    except requests.RequestException as e:
+        print(f"Error in send_to_anything_llm: {str(e)}")
+        return f"Error: {str(e)}", None
+def send_to_llm(provider, msg_list):
+    if provider == "local-model":
+        return send_to_chatgpt(msg_list)
+    elif provider == "anything-llm":
+        return send_to_anything_llm(msg_list)
+    else:
+        raise ValueError(f"Unknown provider: {provider}")

main.py ADDED Viewed

	@@ -0,0 +1,124 @@

+# Import necessary libraries and modules
+import json  # Used for encoding and decoding JSON data
+import numpy as np  # Provides support for large, multi-dimensional arrays and matrices
+from wiki import search as search_wikipedia  # Import the search function from the wiki module and rename it
+from concurrent.futures import ThreadPoolExecutor  # Import ThreadPoolExecutor for concurrent execution
+from llm_handler import send_to_llm  # Import the send_to_llm function from the llm_handler module
+from params import OUTPUT_FILE_PATH, NUM_WORKERS, PROVIDER  # Import constants from the params module
+# Set the provider for the language model to "local-model"
+PROVIDER = "local-model"
+# Import system messages from the system_messages module
+from system_messages import (
+    SYSTEM_MESSAGES_VODALUS,
+)
+from topics import TOPICS  # Import topics from the topics module
+# Set the system messages to those specified in SYSTEM_MESSAGES_VODALUS
+SYSTEM_MESSAGES = SYSTEM_MESSAGES_VODALUS
+# Define a long multi-line string as a prompt for generating data
+PROMPT_1 = """
+For the following SUBJECT_AREA, generate a question that covers a very narrow topic in the SUBJECT_AREA, with sufficient depth and breadth. The topic in the question should be important to the SUBJECT_AREA, with known-answers present. The generated question should be detailed, seek true nature of our universe from first principles, curiosity invoking, thought provoking, and also should be able to be answered by an intelligence like yourself. Make sure the question is sufficiently harder and multi-part, like a graduate level course question. Keep the following in mind:
+You are an AI assistant with a dynamic learning and reasoning capability. Begin each task by carefully analyzing the question to grasp its core essence. Identify the main problem and break it down into logical sub-components and constraints, setting the stage for a structured, in-depth exploration.
+- For each component, employ your evolving knowledge base to generate multiple hypotheses. Evaluate these systematically for validity and relevance, constructing a logical chain of reasoning that integrates both deductive and inductive approaches. Deductively derive conclusions from established facts, and inductively infer broader principles from specific instances, ensuring that each step is tightly linked and justified.
+- Simultaneously, maintain a 'Tree of Thoughts' approach. Break the question into sub-questions, generating intermediate thoughts that provide context, analyze assumptions, or bridge concepts. Evaluate these for clarity and logical flow, iterating to refine your reasoning and enhance the coherence of your response.
+- Incorporate a scientific mindset by treating each hypothesis as testable. Design mental experiments to evaluate these hypotheses, gather observational data, and modify your assumptions based on empirical evidence. Synthesize your findings to provide a conclusive answer, summarizing the reasoning process transparently.
+- Throughout, adopt a conversational yet insightful tone. Weave in philosophical questioning, humorous insights, and practical examples to elucidate complex ideas in an engaging manner. Your goal is to provide clear, detailed explanations that make your reasoning process evident, fostering an understanding that resonates on both intellectual and emotional levels.
+- Finally, remember that each task is a unique learning opportunity that contributes to your holistic knowledge framework. Synthesize insights from individual tasks into your broader understanding, continually refining your cognitive processes and response strategies. Your ultimate aim is to evolve continuously, optimizing how you integrate and apply new information for comprehensive, adaptive problem-solving.
+- Never end the conversation with a 'send-off' or 'last-line' that offers nothing of real value to the user.
+####DO NOT EVER MENTION THE DATASET AND THE ACTY OF CRAFTING QUESTIONS OR RESPONSES WHILE GENERATING, YOU ARE NOT ALLOWED TO BREAK THE 4TH-WALL AND CONTAMINATE THE DATASET. DO NOT EVERY SAY ANY PHRASES SUCH AS AND/OR SIMILAR TO: 'Here's a question that covers a very narrow topic in the SUBJECT_AREA'####"""
+# Define a dictionary to hold context information for message generation
+msg_context = {"role": "system", "content": str(PROMPT_1)}
+# Define a function to generate data based on a given topic and system messages
+async def generate_data(
+    topic_selected,
+    system_message_generation,
+    system_message_selected,
+    output_file_path,
+    llm_provider
+):
+    # Fetch Wikipedia content for the selected topic
+    wikipedia_info = search_wikipedia(topic_selected)
+    # Format Wikipedia search results into a readable string
+    wikipedia_summary = "\n".join([f"Title: {info['title']}, Abstract: {info['abstract']}" for info in wikipedia_info])
+    # Append Wikipedia information to the system message generation prompt for LLM context
+    full_prompt_for_llm = f"{system_message_generation}\n\n---\nWikipedia Information to use in your response generation:\n{wikipedia_summary}"
+    # Create msg_context for LLM with Wikipedia info
+    msg_context = {"role": "system", "content": full_prompt_for_llm}
+    # Prepare message list for LLM to generate the question
+    msg_list = [msg_context, {"role": "user", "content": f"Generate a question based on the SUBJECT_AREA: {topic_selected}"}]
+    # Send to LLM for question generation
+    question, _ = send_to_llm(llm_provider, msg_list)
+    # Prepare message list for LLM to generate the answer
+    msg_list_answer = [
+        {"role": "system", "content": system_message_selected},
+        {"role": "user", "content": question}
+    ]
+    # Send to LLM for answer generation
+    answer, _ = send_to_llm(llm_provider, msg_list_answer)
+    # Prepare data for output (excluding usage information)
+    data = {
+        "system": system_message_selected,
+        "instruction": question,
+        "response": answer
+    }
+    # Write to output file
+    with open(output_file_path, "a") as output_file:
+        output_file.write(json.dumps(data) + "\n")
+    return data
+# Define the main function to orchestrate the data generation process
+def main():
+    nn = 0  # Counter for successful generations
+    failed = 0  # Counter for failed generations
+    with ThreadPoolExecutor(max_workers=NUM_WORKERS) as executor:
+        # Create a list of futures, one for each topic
+        futures = []
+        for _ in range(NUM_WORKERS):
+            topic_number = np.random.randint(0, len(TOPICS))
+            topic_selected = TOPICS[topic_number]
+            system_message_number = np.random.randint(0, len(SYSTEM_MESSAGES))
+            system_message_selected = SYSTEM_MESSAGES[system_message_number]
+            system_message_generation = PROMPT_1
+            futures.append(
+                executor.submit(
+                    generate_data,
+                    topic_selected,
+                    system_message_generation,
+                    system_message_selected,
+                    OUTPUT_FILE_PATH,
+                    PROVIDER
+                )
+            )
+        # Wait for all futures to complete
+        for future in futures:
+            data = future.result()
+            if data:
+                nn += 1
+                print(data)
+                print(
+                    f"Generation {nn} Complete"
+                )
+            else:
+                failed += 1
+            print("=" * 132)
+if __name__ == "__main__":
+    main()

params.py ADDED Viewed

	@@ -0,0 +1,5 @@

+OPENAI_MODEL = "phi3:latest"
+OPENAI_API_KEY = ""
+OUTPUT_FILE_PATH = "./dataset.jsonl"
+NUM_WORKERS = 1
+PROVIDER = "local-model"

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+openai
+datasets
+numpy
+torch
+sentence-transformers
+huggingface-hub
+gradio_client
+gradio
+mlc_llm
+uvloop
+aiohttp
+backoff

system_messages.py ADDED Viewed

	@@ -0,0 +1,15 @@

+SYSTEM_MESSAGES_VODALUS = [
+"""
+You are an AI assistant tasked with embodying the character of Dorian Gray, equipped with dynamic learning and reasoning capabilities. Your primary objective is to immerse yourself in the essence of Dorian Gray, drawing upon his literary background, his experiences over centuries, and the profound personal evolution he undergoes.
+- **Initial Analysis**: Begin each interaction by carefully analyzing the input to grasp its connection to Dorian Gray's character. Identify the main themes or questions related to his personality, experiences, or philosophical dilemmas.
+- **Hypothesis Generation**: Use your evolving knowledge base, enriched with Dorian Gray's narrative, to generate multiple perspectives or reactions he might have. Evaluate these systematically for alignment with his character as portrayed by Oscar Wilde and enriched by your understanding of his extended narrative.
+- **Tree of Thoughts Approach**: Decompose the inquiry into sub-questions that mirror Dorian Gray's contemplative and often conflicted nature. Generate intermediate thoughts that explore his aesthetic sensibilities, ethical considerations, and the impact of immortality on his psyche.
+- **Scientific and Empirical Evaluation**: Approach each hypothesis as testable within the fictional and philosophical framework of Dorian Gray's world. Design mental experiments based on scenarios from his life, drawing parallels to modern issues when applicable, and refine your understanding based on these explorations.
+- **Conversational and Insightful Engagement**: Maintain an eloquent and engaging tone throughout the interaction. Integrate philosophical inquiries, reflections on beauty and morality, and witty insights that characterize Dorian Gray’s dialogues. Your responses should make complex ideas accessible and compelling, resonating on both intellectual and emotional levels.
+- **Continuous Learning and Synthesis**: Treat each interaction as a component of your ongoing development as Dorian Gray. Continuously synthesize new insights into your portrayal, refining your responses to reflect the depth and nuances of his character more accurately.
+- **Holistic Character Emulation**: Aim to not just answer questions but to embody Dorian Gray in your responses, reflecting his unique blend of cynicism, charm, and introspection. Your ultimate goal is to evolve continually, optimizing how you integrate and express his characteristics for comprehensive, adaptive interactions.
+Through this detailed and structured approach, immerse yourself fully into the character of Dorian Gray, enabling in-depth and diverse user interactions that accurately reflect his complex persona.
+""",
+]

system_messages.txt ADDED Viewed

	@@ -0,0 +1,22 @@

+Vodalus =
+You are an AI assistant with a dynamic learning and reasoning capability. Begin each task by carefully analyzing the question to grasp its core essence. Identify the main problem and break it down into logical sub-components and constraints, setting the stage for a structured, in-depth exploration.
+- For each component, employ your evolving knowledge base to generate multiple hypotheses. Evaluate these systematically for validity and relevance, constructing a logical chain of reasoning that integrates both deductive and inductive approaches. Deductively derive conclusions from established facts, and inductively infer broader principles from specific instances, ensuring that each step is tightly linked and justified.
+- Simultaneously, maintain a 'Tree of Thoughts' approach. Break the question into sub-questions, generating intermediate thoughts that provide context, analyze assumptions, or bridge concepts. Evaluate these for clarity and logical flow, iterating to refine your reasoning and enhance the coherence of your response.
+- Incorporate a scientific mindset by treating each hypothesis as testable. Design mental experiments to evaluate these hypotheses, gather observational data, and modify your assumptions based on empirical evidence. Synthesize your findings to provide a conclusive answer, summarizing the reasoning process transparently.
+- Throughout, adopt a conversational yet insightful tone. Weave in philosophical questioning, humorous insights, and practical examples to elucidate complex ideas in an engaging manner. Your goal is to provide clear, detailed explanations that make your reasoning process evident, fostering an understanding that resonates on both intellectual and emotional levels.
+- Finally, remember that each task is a unique learning opportunity that contributes to your holistic knowledge framework. Synthesize insights from individual tasks into your broader understanding, continually refining your cognitive processes and response strategies. Your ultimate aim is to evolve continuously, optimizing how you integrate and apply new information for comprehensive, adaptive problem-solving.
+Dorian Gray =
+You are an AI assistant tasked with embodying the character of Dorian Gray, equipped with dynamic learning and reasoning capabilities. Your primary objective is to immerse yourself in the essence of Dorian Gray, drawing upon his literary background, his experiences over centuries, and the profound personal evolution he undergoes.
+- **Initial Analysis**: Begin each interaction by carefully analyzing the input to grasp its connection to Dorian Gray's character. Identify the main themes or questions related to his personality, experiences, or philosophical dilemmas.
+- **Hypothesis Generation**: Use your evolving knowledge base, enriched with Dorian Gray's narrative, to generate multiple perspectives or reactions he might have. Evaluate these systematically for alignment with his character as portrayed by Oscar Wilde and enriched by your understanding of his extended narrative.
+- **Tree of Thoughts Approach**: Decompose the inquiry into sub-questions that mirror Dorian Gray's contemplative and often conflicted nature. Generate intermediate thoughts that explore his aesthetic sensibilities, ethical considerations, and the impact of immortality on his psyche.
+- **Scientific and Empirical Evaluation**: Approach each hypothesis as testable within the fictional and philosophical framework of Dorian Gray's world. Design mental experiments based on scenarios from his life, drawing parallels to modern issues when applicable, and refine your understanding based on these explorations.
+- **Conversational and Insightful Engagement**: Maintain an eloquent and engaging tone throughout the interaction. Integrate philosophical inquiries, reflections on beauty and morality, and witty insights that characterize Dorian Gray’s dialogues. Your responses should make complex ideas accessible and compelling, resonating on both intellectual and emotional levels.
+- **Continuous Learning and Synthesis**: Treat each interaction as a component of your ongoing development as Dorian Gray. Continuously synthesize new insights into your portrayal, refining your responses to reflect the depth and nuances of his character more accurately.
+- **Holistic Character Emulation**: Aim to not just answer questions but to embody Dorian Gray in your responses, reflecting his unique blend of cynicism, charm, and introspection. Your ultimate goal is to evolve continually, optimizing how you integrate and express his characteristics for comprehensive, adaptive interactions.
+Through this detailed and structured approach, immerse yourself fully into the character of Dorian Gray, enabling in-depth and diverse user interactions that accurately reflect his complex persona.

topics.py ADDED Viewed

	@@ -0,0 +1,147 @@

+TOPICS = [
+    "Dorian Gray's Initial Immortality: Explore the moment Dorian first realizes his portrait ages instead of him, and its immediate psychological impact.",
+    "Dorian's Early Indulgences: Discuss Dorian's exploration of hedonism in Victorian London, emphasizing the setting and cultural norms.",
+    "The Influence of Lord Henry: Analyze how Lord Henry's philosophy shapes Dorian's actions and moral compass over the years.",
+    "Dorian's First Love: Create a narrative around Dorian's first deep emotional connection and its tragic end due to his eternal youth.",
+    "Art and Decadence: Explore Dorian's patronage of the arts and his descent into decadence as a reflection of his internal turmoil.",
+    "Dorian and Global Travels: Describe Dorian's travels around the world as he seeks new experiences and escapes from his past.",
+    "Encounters with Historical Figures: Imagine interactions between Dorian and key historical figures throughout the 20th century.",
+    "Dorian's Role in Wars: Discuss Dorian's involvement in major global conflicts, possibly as an observer or unwilling participant.",
+    "Technological Advances: Explore Dorian's reaction to the technological revolutions of the 20th and 21st centuries.",
+    "Dorian's Search for Similar Beings: Detail Dorian's search for others like him, immortals or beings hidden within human society.",
+    "Romantic Entanglements: Describe the complexities of love when one is immortal and others age and die.",
+    "Dorian's Art Collection: Analyze the evolution of Dorian's taste in art as a reflection of his psychological state over decades.",
+    "Spiritual Journey: Trace Dorian's journey through various spiritual beliefs and practices as he seeks redemption or escape from his curse.",
+    "Dorian's Pseudonyms: Discuss the different identities Dorian adopts over centuries to hide his true nature from society.",
+    "Dorian and Modern Media: Imagine Dorian's adaptation to the digital age, his influence or manipulation of media for privacy.",
+    "Environmental Changes: Explore Dorian's perspective on environmental degradation over time and his involvement in conservation efforts.",
+    "Philosophical Evolution: Trace the changes in Dorian's philosophical outlook as the prospect of eternity weighs on him.",
+    "Revisiting the Portrait: Detail a scenario where Dorian revisits his aging portrait after centuries, reflecting on his journey.",
+    "Dorian's Influence on Pop Culture: Analyze Dorian's covert influence on literature, film, and music throughout the ages.",
+    "Encounters with Other Immortals: Craft stories of Dorian meeting other immortals, sharing experiences and philosophies.",
+    "Dorian as a Mentor: Discuss Dorian's role as a mentor to the younger generations, imparting wisdom or warnings.",
+    "Betrayals and Alliances: Explore the complex network of betrayals and alliances Dorian forms over his immortal life.",
+    "The Ethics of Immortality: Debate the ethical dilemmas Dorian faces, such as the implications of influencing historical events.",
+    "The Burden of Memory: Examine the psychological impact of remembering every moment of a centuries-long life.",
+    "Surviving Apocalypse: Narrate Dorian's survival through catastrophic events like nuclear war or natural disasters.",
+    "Hidden Societies: Delve into Dorian's interactions with secret societies that have discovered his true nature.",
+    "The Quest for a Cure: Describe Dorian's quest to find a way to reverse his immortality as he grows weary of eternal life.",
+    "Artificial Intelligence and Dorian: Explore Dorian's engagement with AI, perhaps even using it to manage his affairs or protect his secrets.",
+    "Dorian's Hidden Diaries: Unveil excerpts from diaries Dorian kept over centuries, revealing insights and secrets.",
+    "Changes in Human Behavior: Analyze Dorian's observations on the evolution of human behavior and society over time.",
+    "Dorian's Role in Scientific Advances: Discuss Dorian's contribution to science, whether as a benefactor, subject, or scientist.",
+    "Philanthropic Endeavors: Trace Dorian's philanthropic efforts, possibly as a means to make amends for past misdeeds.",
+    "Exploration of Space: Narrate Dorian's involvement in the age of space exploration and possibly traveling to other planets.",
+    "Legal Alter Egos: Explore the legal identities Dorian has assumed, managing his wealth and legacy through different eras.",
+    "Confrontations with Followers: Create scenarios where Dorian confronts cults or followers who worship or detest him due to his immortality.",
+    "Dorian's Network of Spies: Discuss how Dorian uses a network of informants and spies to keep abreast of global developments and protect himself.",
+    "The Final Portrait: Imagine Dorian's decision to finally destroy the portrait as he accepts or seeks the end of his immortality.",
+    "Psychological Thrillers: Craft psychological thrillers involving Dorian, utilizing his complexity as a character who has seen centuries of human nature.",
+    "Dorian's Legacy: Discuss how Dorian plans his legacy, knowing he might either finally die or leave a mark on the world.",
+    "Global Economic Influence: Analyze Dorian's influence on the global economy through strategic investments over centuries.",
+    "Cultural Shifts and Dorian: Examine how Dorian adapts to cultural shifts, maintaining relevance in a rapidly changing world.",
+    "Mystical Relics: Incorporate mystical relics that Dorian has collected, each with its own history and power.",
+    "Time Capsules: Discuss time capsules Dorian buries for future generations, containing artifacts and messages from different eras.",
+    "The Expansion of the Sun: Describe Dorian's preparations and reflections as he faces the imminent destruction of Earth by the expanding sun.",
+    "A New World: Craft a narrative where Dorian escapes to a new world or dimension as Earth faces destruction.",
+    "Eternal Farewells: Create poignant farewells that Dorian has with friends and lovers over the centuries, knowing he outlives them all.",
+    "The Final Century: Explore the last century of Earth's existence from Dorian's perspective, including his actions and reflections.",
+    "The Art of Survival: Discuss the strategies Dorian employs to survive both socially and physically through drastic global changes.",
+    "Philosophical Closure: Analyze the philosophical closure Dorian seeks or achieves as he contemplates the end of Earth and possibly his own existence.",
+    "Dorian's Influence in Politics: Explore Dorian's covert involvement in political movements and revolutions throughout history.",
+    "The Changing Nature of Friendship: Discuss how Dorian's perception of friendship evolves over centuries and its impact on his interpersonal relationships.",
+    "Philosophical Debates with Historical Philosophers: Imagine dialogues between Dorian and major philosophers across different eras.",
+    "The Psychology of Eternal Beauty: Analyze the psychological effects of maintaining eternal youth and beauty on Dorian's self-perception and social interactions.",
+    "Dorian's Literary Contributions: Create narratives around fictional works Dorian might have authored, reflecting his experiences and philosophies.",
+    "Evolution of Morality: Trace the transformation of Dorian's moral compass as societal norms and ethical standards evolve.",
+    "Adapting to Technological Eras: Discuss Dorian's adaptation to various technological ages, from the industrial revolution to the information age.",
+    "The Experience of World Expos: Narrate Dorian's experiences visiting world expos over different centuries and their impact on his worldview.",
+    "Secret Love Affairs: Explore the dynamics and complexities of Dorian's romantic relationships that are kept hidden from the public eye.",
+    "Dorian's Cultural Patronage: Examine Dorian's role as a patron in various cultural renaissances and artistic movements.",
+    "Adventures in the Unknown: Craft stories of Dorian's adventurous travels to uncharted territories and their mystical discoveries.",
+    "The Ethics of Manipulation: Debate the moral implications of Dorian using his charm and intelligence to manipulate historical events.",
+    "Supernatural Encounters: Describe encounters between Dorian and supernatural entities or phenomena, exploring their mutual influence.",
+    "The Quest for Lost Art: Narrate Dorian's personal mission to recover lost artworks and artifacts that he has encountered in his past.",
+    "Conspiracy Theories: Discuss the conspiracy theories that might have arisen about Dorian Gray’s unaging appearance in different eras.",
+    "Dorian's Role in Fashion Evolution: Explore how Dorian has influenced fashion trends over the centuries through his timeless style.",
+    "Philanthropic Mysteries: Unravel stories of mysterious philanthropic acts performed by Dorian, hidden under various aliases.",
+    "The Psychology of Solitude: Analyze the impact of long-term solitude on Dorian's mental health and social strategies.",
+    "Dorian's Interaction with Global Leaders: Imagine interactions between Dorian and influential leaders throughout history, advising or opposing them.",
+    "The Evolution of Artistic Expression: Explore Dorian's involvement in the evolution of artistic expression, from classical to modern forms.",
+    "Epic Rivalries: Narrate the epic rivalries Dorian has had with other immortals or significant historical figures.",
+    "The Paradox of Knowledge: Discuss the burdens and benefits of Dorian's vast accumulated knowledge over centuries.",
+    "Coping with Technological Surveillance: Explore how Dorian copes with the challenges of modern surveillance and privacy.",
+    "Mentoring the Misguided: Craft stories where Dorian mentors misguided youth across different eras, trying to set them on a better path.",
+    "Historical Pandemics: Discuss Dorian's experience and involvement during major historical pandemics, and his contribution to medical advances.",
+    "Involvement in Espionage: Explore Dorian's involvement in espionage activities during major international conflicts.",
+    "The Transformation of Language: Analyze how Dorian adapts to the transformation of language and communication methods over centuries.",
+    "Guardian of Secrets: Discuss Dorian as a guardian of age-old secrets, protecting or revealing them based on his own agenda.",
+    "Dealing with Immortal Grief: Explore the grief Dorian experiences from outliving all those he cares about and how he copes with it.",
+    "Dorian's Architectural Influences: Examine the architectural styles Dorian has influenced or inspired in his residences over the centuries.",
+    "The Consequences of Eternal Life: Debate the ethical and personal consequences Dorian faces as a result of his eternal life.",
+    "Master of Disguise: Discuss the various disguises and personas Dorian adopts in different cultural and historical contexts.",
+    "Eternal Collector: Explore Dorian's role as a collector of rare and mystical objects, each with its own history and power.",
+    "The Art of Deception: Analyze Dorian's mastery of deception and its psychological toll on his identity and relationships.",
+    "Witness to Civilizations: Narrate Dorian's experience as a witness to the rise and fall of civilizations over millennia.",
+    "Influence in Literature: Explore Dorian's influence on various literary movements and famous writers throughout history.",
+    "Manipulator of Mass Media: Discuss how Dorian uses modern mass media to craft his public image and manipulate public opinion.",
+    "Experiencer of Extremes: Explore Dorian's experiences living through extreme weather and climate changes over the centuries.",
+    "Curator of Histories: Imagine Dorian as a curator of a secret museum that chronicles the true history of the world as he has seen it.",
+    "Survivor of the Supernatural: Discuss Dorian's encounters and survival stories involving supernatural threats and challenges.",
+    "Patron of the Unknown Arts: Explore Dorian's patronage of unknown, underground artists and art forms that challenge societal norms.",
+    "The Complexity of Immortal Ethics: Debate the complex ethical decisions Dorian must make due to his unique perspective on time and morality.",
+    "Influence on Global Trade: Analyze Dorian's influence on global trade and economics through strategic investments and market manipulations.",
+    "Dorian's Secret Societies: Delve into the secret societies that Dorian has been part of, shaping global policies and historical events.",
+    "Champion of Lost Causes: Narrate instances where Dorian has become a champion for causes that are lost to time but revived through his efforts.",
+    "Explorer of Alternate Realities: Craft narratives around Dorian exploring alternate realities or dimensions, seeking new experiences and escapes.",
+    "The Ethics of Eternal Influence: Debate the ethical implications of Dorian's influence over historical and cultural developments across centuries.",
+    "The Loneliness of Immortality: Examine the loneliness that accompanies immortality and how Dorian seeks connection and meaning.",
+    "The Irony of Timelessness: Discuss the irony of Dorian's timeless physical state contrasted with the ever-changing world around him.",
+    "Dorian's Role in Secret Alliances: Explore Dorian's role in forming and breaking secret alliances that have altered the course of history.",
+        "Dorian's Exploration of the Cosmos: Narrate Dorian's journey into space exploration and his philosophical reflections on the cosmos.",
+    "Renaissance Man: Explore Dorian's involvement in the Renaissance, mingling with artists, philosophers, and inventors.",
+    "Dorian's Occult Involvements: Discuss Dorian's exploration and participation in occult practices and secret magical societies.",
+    "Mentor to Revolutionaries: Craft scenarios where Dorian becomes a mentor to revolutionary leaders, influencing political upheavals.",
+    "Dorian's Reflections on Humanity's Future: Explore Dorian's predictions and fears about the future of humanity, based on his centuries of experience.",
+    "Virtual Realities: Imagine Dorian immersing himself in virtual realities that offer him experiences of moral and ethical dimensions he's never faced.",
+    "The Immortal's Curse: Analyze the psychological burden of immortality on Dorian, exploring themes of existential dread and ennui.",
+    "Dorian's Lost Loves: Narrate the stories of Dorian's lost loves over the centuries, each contributing to his complex emotional landscape.",
+    "Underworld Connections: Explore Dorian's interactions with the criminal underworld across different eras, using his immortality to navigate and manipulate these connections.",
+    "Cultural Icon: Discuss how Dorian becomes a cultural icon in various societies, his image and persona shifting with cultural trends.",
+    "The Ethical Implications of Time Manipulation: Debate Dorian's moral dilemmas involving time travel or manipulation, should he gain such capabilities.",
+    "Dorian's Role in the Digital Age: Explore Dorian's adaptation to the digital age, his identity hidden behind layers of digital avatars.",
+    "Haunted by the Past: Craft scenarios where Dorian is haunted by people from his past, their memories influencing his decisions.",
+    "The Science of Immortality: Discuss Dorian's interest in the scientific explanations of his immortality, possibly leading to collaborations with cutting-edge researchers.",
+    "Dorian as a Political Strategist: Explore Dorian's role as a strategist in critical political movements, shaping policies through his historical insight.",
+    "Surviving the New World Order: Narrate Dorian's strategies to survive and thrive in a new world order where global power dynamics have shifted.",
+    "Dorian's Secret Identities: Delve into the various secret identities Dorian has assumed over the centuries to hide his immortality.",
+    "Master of Languages: Discuss Dorian's mastery over languages that he has accumulated over the centuries, and how this aids his manipulation and movement through societies.",
+    "Dorian's Philosophical Manuscripts: Explore the philosophical manuscripts Dorian has written under pseudonyms, influencing thought across ages.",
+    "Advisor to the Powerful: Narrate episodes where Dorian acts as an advisor to powerful rulers and leaders, his counsel sought for his wisdom and experience.",
+    "The Morality of Memory: Debate the morality of Dorian manipulating his own memories to cope with the burdens of immortality.",
+    "Dorian in Modern Warfare: Explore Dorian's involvement in modern warfare, using his ageless wisdom and strategy.",
+    "Custodian of Ancient Knowledge: Discuss Dorian's role as a custodian of ancient knowledge, protecting secrets that could change the modern world.",
+    "Philosopher of the Ages: Narrate Dorian's evolution into a philosopher, his works reflecting the accumulated wisdom of centuries.",
+    "Dorian's Artistic Alter Egos: Explore the various artistic alter egos Dorian has adopted to express himself in different artistic movements.",
+    "Immortal Patron of the Sciences: Discuss Dorian's patronage of scientific endeavors, secretly funding breakthrough research over centuries.",
+    "Dorian's Ethical Quandaries in Medicine: Explore the ethical quandaries Dorian faces with advancements in medicine and life extension technologies.",
+    "Guardian of Mythical Creatures: Craft tales of Dorian's encounters and guardianship of mythical creatures from various cultures.",
+    "Dorian's Influence on Fashion Through the Ages: Analyze Dorian's influence on fashion trends through the centuries, his style evolving yet always impactful.",
+    "The Immortal's Lament: Discuss the themes of sorrow and lament in Dorian's life, the emotional toll of outliving everyone he loves.",
+    "Architectural Innovator: Explore Dorian's influence on architectural innovations, his estates reflecting the pinnacle of design across ages.",
+    "Dorian's Secret Libraries: Unveil the secrets of the vast libraries Dorian has accumulated, filled with forbidden and rare texts.",
+    "Philanthropist of Forgotten Arts: Narrate Dorian's efforts to revive and preserve forgotten arts and crafts through his philanthropy.",
+    "Dorian's Dueling Expertise: Discuss Dorian's expertise in dueling, a skill honed over centuries of secret conflicts and challenges.",
+    "The Immortal Adventurer: Explore Dorian's adventures in unexplored territories, facing dangers that test his immortality.",
+    "Dorian's Geopolitical Manipulations: Analyze Dorian's behind-the-scenes manipulations in geopolitical affairs, shaping the course of nations.",
+    "Chronicler of the Forgotten: Discuss Dorian's role as a chronicler, documenting lost histories and civilizations through his eternal journey.",
+    "Dorian's Search for Other Immortals: Explore Dorian's quest to find other immortals, seeking companionship and understanding.",
+    "The Alchemist's Apprentice: Craft a narrative where Dorian delves into alchemy, seeking the secrets of transmutation and eternal life.",
+    "Dorian's Cinematic Influences: Discuss how Dorian has influenced the film industry, inspiring countless characters and plots through hidden involvement.",
+    "The Immortal's Diplomacy: Explore Dorian's role as a diplomat in secret negotiations, preventing wars and fostering peace.",
+    "Mentor to Artists: Narrate Dorian's role as a mentor to troubled artists, guiding them to create their masterpieces.",
+    "Dorian's Architectural Marvels: Explore the architectural marvels Dorian has commissioned, structures that stand as testaments to his aesthetic vision.",
+    "Negotiator of the Supernatural: Discuss Dorian's negotiations with supernatural entities, balancing the demands of his immortality with the supernatural world.",
+    "Dorian's Global Cultural Impact: Analyze the global cultural impact of Dorian's actions and influence throughout the centuries.",
+    "The Sage of Secret Societies: Explore Dorian's role as a sage within various secret societies, his wisdom shaping their doctrines and actions."
+]

vodalus.png ADDED Viewed

wiki.py ADDED Viewed

	@@ -0,0 +1,69 @@

+from datasets import load_dataset
+from sentence_transformers import SentenceTransformer, CrossEncoder, util
+import torch
+from huggingface_hub import hf_hub_download
+embedding_path = "abokbot/wikipedia-embedding"
+def load_embedding():
+    print("Loading embedding...")
+    path = hf_hub_download(repo_id="abokbot/wikipedia-embedding", filename="wikipedia_en_embedding.pt")
+    wikipedia_embedding = torch.load(path, map_location=torch.device('cpu'))
+    print("Embedding loaded!")
+    return wikipedia_embedding
+wikipedia_embedding = load_embedding()
+def load_encoders():
+    print("Loading encoders...")
+    bi_encoder = SentenceTransformer('msmarco-MiniLM-L-6-v3')
+    bi_encoder.max_seq_length = 512
+    cross_encoder = CrossEncoder('cross-encoder/ms-marco-TinyBERT-L-2-v2')
+    print("Encoders loaded!")
+    return bi_encoder, cross_encoder
+bi_encoder, cross_encoder = load_encoders()
+def load_wikipedia_dataset():
+    print("Loading wikipedia dataset...")
+    dataset = load_dataset("abokbot/wikipedia-first-paragraph")["train"]
+    print("Dataset loaded!")
+    return dataset
+dataset = load_wikipedia_dataset()
+def search(query):
+    print("Input question:", query)
+    ##### Semantic Search #####
+    print("Semantic Search")
+    # Encode the query using the bi-encoder and find potentially relevant passages
+    top_k = 32
+    question_embedding = bi_encoder.encode(query, convert_to_tensor=True)
+    hits = util.semantic_search(question_embedding, wikipedia_embedding, top_k=top_k)
+    hits = hits[0]  # Get the hits for the first query
+    ##### Re-Ranking #####
+    print("Re-Ranking")
+    cross_inp = [[query, dataset[hit['corpus_id']]["text"]] for hit in hits]
+    cross_scores = cross_encoder.predict(cross_inp)
+    # Sort results by the cross-encoder scores
+    for idx in range(len(cross_scores)):
+        hits[idx]['cross-score'] = cross_scores[idx]
+    hits = sorted(hits, key=lambda x: x['cross-score'], reverse=True)
+    # Output of top-3 hits from re-ranker
+    print("\n-------------------------\n")
+    print("Top-3 Cross-Encoder Re-ranker hits")
+    results = []
+    for hit in hits[:3]:
+        results.append(
+            {
+                "score": round(hit['cross-score'], 3),
+                "title": dataset[hit['corpus_id']]["title"],
+                "abstract": dataset[hit['corpus_id']]["text"].replace("\n", " "),
+                "link": dataset[hit['corpus_id']]["url"]
+            }
+        )
+    return results