Spaces:

MasonCrinr
/

MasonSpace

Configuration error

App Files Files Community

MasonCrinr commited on Dec 17, 2023

Commit

ce8fc87

•

1 Parent(s): 6f6ee27

Upload 5 files

Browse files

Files changed (5) hide show

.dummy_env +2 -0
LICENSE +21 -0
README.md +48 -12
app.py +54 -0
requirements.txt +0 -0

.dummy_env ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ HUGGINGFACEHUB_API_TOKEN = HUGGINGFACEHUB_API_TOKEN
2	+ OPENAI_API_KEY = OPENAI_API_KEY

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 Sartaj Bhuvaji
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,12 +1,48 @@
----
-title: MasonSpace
-emoji: 🏢
-colorFrom: indigo
-colorTo: yellow
-sdk: gradio
-sdk_version: 4.10.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Gradio App: Image to Story Generator
+This Gradio app allows you to upload an image, and it will generate a short story based on the image's content using image captioning. The generated story is then converted to audio using text-to-speech technology. You can both see the generated story and listen to it.
+# Demo
+- Launching the application
+![01](https://github.com/SartajBhuvaji/Image-to-Story-Generator/assets/31826483/984ad132-14eb-4ddf-8e5a-33fe2a7c7b28)
+- Select an image and Upload
+![02](https://github.com/SartajBhuvaji/Image-to-Story-Generator/assets/31826483/20ef38ee-562f-4cfa-9d64-3f01e85f231b)
+- Image
+![beach (1)](https://github.com/SartajBhuvaji/Image-to-Story-Generator/assets/31826483/69a5b52b-c6dd-41cb-889b-486977ebf37c)
+- Download the audio story
+https://github.com/SartajBhuvaji/Image-to-Story-Generator/assets/31826483/1fe00f34-9716-4047-9b57-a7794524816a
+## Features
+- Upload an image.
+- Generate a story based on the content of the image.
+- Listen to the generated story as an audio file.
+## Usage
+1. Clone this repository to your local machine.
+```bash
+git clone https://github.com/SartajBhuvaji/Image-to-Story-Generator.git
+pip install -r requirements.txt
+python app.py
+```
+`Create a .env file and paste your HUGGINGFACE, OPEN AI API Keys (Check the dummy_env file)`
+`Open your web browser and navigate to http://localhost:7860 to access the app.`
+`Upload an image to the app and click "Generate Story." You will see the generated story and be able to listen to it as audio.`
+# Tech
+- HuggingFace
+- Image to Caption model
+- Chat GPT 3.5 LLM
+- Text-to-speech

app.py ADDED Viewed

	@@ -0,0 +1,54 @@

+import gradio as gr
+import dotenv
+from transformers import pipeline
+from langchain import LLMChain
+from langchain.llms import OpenAI
+from langchain.prompts import PromptTemplate
+import requests
+import os
+dotenv.load_dotenv()
+HUGGINGFACEHUB_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")
+# image to text
+def imgToText(url):
+    img_to_text = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")
+    text = img_to_text(url)[0]['generated_text']
+    return text
+# LLM
+def generate_story(scenario):
+    template = """
+               You are a story teller.
+               You can generate a short story based on a simple narrative, the story should be no more than 40 words:
+               CONTEXT: {scenario}
+               STORY:
+            """
+    prompt = PromptTemplate(template=template, input_variables=["scenario"])
+    story_llm = LLMChain(llm=OpenAI(model_name="gpt-3.5-turbo"), prompt=prompt, verbose=True)
+    story = story_llm.predict(scenario=scenario)
+    return story
+def textToSpeech(story):
+    API_URL = "https://api-inference.huggingface.co/models/espnet/kan-bayashi_ljspeech_vits"
+    headers = {"Authorization": "Bearer " + HUGGINGFACEHUB_API_TOKEN}
+    payload = {"inputs": story}
+    response = requests.post(API_URL, headers=headers, json=payload)
+    with open("story.flac", "wb") as f:
+        f.write(response.content)
+def generate_story_and_play_audio(image):
+    scenario = imgToText(image.name)
+    story = generate_story(scenario)
+    textToSpeech(story)
+    return "story.flac"
+iface = gr.Interface(
+    fn=generate_story_and_play_audio,
+    inputs=gr.inputs.File(label="Upload an image"),
+    outputs=gr.outputs.Audio(label="Generated Story", type="filepath")
+)
+iface.launch()

requirements.txt ADDED Viewed

Binary file (290 Bytes). View file