Merge pull request #11 from Visionatrix/optional-gemini-in-flows

added ability to use Gemini as alternative to Ollama
Visionatrix · Jul 21, 2024 · d5103a6 · d5103a6
2 parents 30402c9 + 0cea16d
commit d5103a6
Show file tree

Hide file tree

Showing 8 changed files with 151 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -13,4 +13,5 @@ For any problems with Visionatrix or suggestions for improvement, go to the [mai
   - [Working modes](https://visionatrix.github.io/VixFlowsDocs/WorkingModes.html)
   - [Vix Workflows](https://visionatrix.github.io/VixFlowsDocs/VixWorkflows.html)
   - [Technical information](https://visionatrix.github.io/VixFlowsDocs/TechnicalInformation.html)
+  - [Hardware FAQ](https://visionatrix.github.io/VixFlowsDocs/HardwareFAQ.html)
   - [OpenAPI](https://visionatrix.github.io/VixFlowsDocs/swagger.html)
diff --git a/docs/Flows/MadScientist.rst b/docs/Flows/MadScientist.rst
@@ -3,7 +3,7 @@
 Mad Scientist
 =============
 
-.. note:: Requires ``Ollama`` server to be present with ``llava:7b-v1.6-vicuna-q8_0`` model.
+.. note:: This feature requires vision capabilities. You must either have the **Ollama** server running with the ``llava:7b-v1.6-vicuna-q8_0 model``, or provide a ``Gemini API key`` in the settings.
 
 There are only two required arguments:
 

diff --git a/docs/Flows/PhotoStickers2.rst b/docs/Flows/PhotoStickers2.rst
@@ -3,7 +3,7 @@
 Photo Stickers 2
 ================
 
-.. note:: Requires ``Ollama`` server to be present with ``llava:7b-v1.6-vicuna-q8_0`` model.
+.. note:: This feature requires vision capabilities. You must either have the **Ollama** server running with the ``llava:7b-v1.6-vicuna-q8_0 model``, or provide a ``Gemini API key`` in the settings.
 
 Turns a photo into 4 stickers using different prompts.
 

diff --git a/docs/HardwareFAQ.rst b/docs/HardwareFAQ.rst
@@ -0,0 +1,36 @@
+Hardware FAQ
+============
+
+First, you can take a look at the information in the `ComfyUI repository <https://github.com/comfyanonymous/ComfyUI/wiki/Which-GPU-should-I-buy-for-ComfyUI>`_.
+
+.. note:: If you are using Windows and want to avoid hassles, currently, there are no alternatives to Nvidia. PyTorch is expected to release a native version for AMD for Windows soon, but until then, Nvidia is the only option.
+
+List of GPUs by usefulness:
+
+1. Nvidia 4090 ``24 GB``
+2. AMD 7900 XTX ``24 GB``
+3. Nvidia 3090 ``24 GB``
+4. Nvidia 4080 Super ``16 GB``
+5. Nvidia 4070 Ti Super ``16 GB``
+6. Nvidia 4060 Ti ``16 GB``
+7. Nvidia 3060 ``12 GB``
+
+.. note:: You can also look at any performance tests of hardware for ComfyUI as a reference.
+
+---
+
+Q: Why are there no AMD cards other than *AMD 7900 XTX* on the list?
+
+A: *ROCM (Radeon Open Compute) "officially" supports only the AMD 7900 XTX among consumer cards.*
+
+---
+
+Q: How much RAM is needed in the system?
+
+A: *For normal operation, 32 GB is sufficient, but if you want to handle large resolutions with Supir Scaler Workflow, then 64 GB is recommended.*
+
+---
+
+Q: How to use 2 GPUs?
+
+A: *The simplest way is to run 2 workers, each assigned to its own GPU, so they can process tasks in parallel.*
diff --git a/docs/TechnicalInformation.rst b/docs/TechnicalInformation.rst
@@ -25,6 +25,7 @@ Visionatrix by default install and update these nodes:
  * `PuLID_ComfyUI <https://github.com/Visionatrix/PuLID_ComfyUI>`_
  * `ComfyUI_FizzNodes <https://github.com/Visionatrix/ComfyUI_FizzNodes>`_
  * `style_aligned_comfy <https://github.com/Visionatrix/style_aligned_comfy>`_
+ * `ComfyUI_Gemini_Flash <https://github.com/Visionatrix/ComfyUI_Gemini_Flash>`_
  * `ComfyUI-Visionatrix <https://github.com/Visionatrix/ComfyUI-Visionatrix>`_
 
 We are gradually expanding the list.

diff --git a/docs/index.rst b/docs/index.rst
@@ -14,6 +14,7 @@ Here will leave all docs that is not suitable for Readme file.
     VixWorkflows.rst
     ComfyUI2VixMigration.rst
     TechnicalInformation.rst
+    HardwareFAQ.rst
 
 
 Different utilities

diff --git a/flows/mad_scientist.json b/flows/mad_scientist.json
@@ -218,7 +218,10 @@
   },
   "24": {
     "inputs": {
-      "query": "Is this picture a painting, 3DCG, or photo? What is its artistic genre?",
+      "query": [
+        "44",
+        0
+      ],
       "debug": "enable",
       "url": "http://127.0.0.1:11434",
       "model": "llava:7b-v1.6-vicuna-q8_0",
@@ -249,7 +252,7 @@
   "26": {
     "inputs": {
       "text": [
-        "24",
+        "43",
         0
       ]
     },
@@ -374,5 +377,55 @@
     "_meta": {
       "title": "VixUI-WorkflowMetadata"
     }
+  },
+  "40": {
+    "inputs": {
+      "prompt": [
+        "44",
+        0
+      ],
+      "vision": true,
+      "api_key": "",
+      "proxy": "",
+      "image": [
+        "25",
+        0
+      ]
+    },
+    "class_type": "Gemini_Flash",
+    "_meta": {
+      "title": "Gemini flash"
+    }
+  },
+  "43": {
+    "inputs": {
+      "state": false,
+      "display_name": "Use Gemini for vision instead of Ollama",
+      "optional": true,
+      "advanced": true,
+      "order": 99,
+      "custom_id": "vision_provider",
+      "input_off_state": [
+        "24",
+        0
+      ],
+      "input_on_state": [
+        "40",
+        0
+      ]
+    },
+    "class_type": "VixUiCheckboxLogic",
+    "_meta": {
+      "title": "VixUI-CheckboxLogic"
+    }
+  },
+  "44": {
+    "inputs": {
+      "text": "Write a short prompt in English (less than 46 words) that defines the style (painting, 3DCG, or photography) and describes the artistic style of the image. Focus on the main subject, ignoring the background description. Only output the prompt itself."
+    },
+    "class_type": "Text Multiline (Code Compatible)",
+    "_meta": {
+      "title": "Text Multiline (Code Compatible)"
+    }
   }
 }
diff --git a/flows/photo_stickers2.json b/flows/photo_stickers2.json
@@ -319,7 +319,7 @@
   "517": {
     "inputs": {
       "text": [
-        "519",
+        "584",
         0
       ]
     },
@@ -343,7 +343,10 @@
   },
   "519": {
     "inputs": {
-      "query": "Ignore the artistic style of the picture.\n\nDescribe the person in detail, including any interesting features or characteristics, such as gender, age, facial expression, race, color, hairstyle, hair color, hat, eye color, beard. \n\nIf it is wearing glasses, describe the style of glasses.\n\nDo not describe anything else, such as background.\n\nPlease create an image generation prompt in English less than 46 words to fit the description above.",
+      "query": [
+        "585",
+        0
+      ],
       "debug": "enable",
       "url": "http://127.0.0.1:11434",
       "model": "llava:7b-v1.6-vicuna-q8_0",
@@ -610,5 +613,55 @@
     "_meta": {
       "title": "Save Image"
     }
+  },
+  "583": {
+    "inputs": {
+      "prompt": [
+        "585",
+        0
+      ],
+      "vision": true,
+      "api_key": "",
+      "proxy": "",
+      "image": [
+        "518",
+        0
+      ]
+    },
+    "class_type": "Gemini_Flash",
+    "_meta": {
+      "title": "Gemini flash"
+    }
+  },
+  "584": {
+    "inputs": {
+      "state": false,
+      "display_name": "Use Gemini for vision instead of Ollama",
+      "optional": true,
+      "advanced": true,
+      "order": 99,
+      "custom_id": "vision_provider",
+      "input_off_state": [
+        "519",
+        0
+      ],
+      "input_on_state": [
+        "583",
+        0
+      ]
+    },
+    "class_type": "VixUiCheckboxLogic",
+    "_meta": {
+      "title": "VixUI-CheckboxLogic"
+    }
+  },
+  "585": {
+    "inputs": {
+      "text": "Ignore the artistic style of the picture.\n\nDescribe the person in detail, including any interesting features or characteristics, such as gender, age, facial expression, race, color, hairstyle, hair color, hat, eye color, beard. \n\nIf it is wearing glasses, describe the style of glasses.\n\nDo not describe anything else, such as background.\n\nPlease create an image generation prompt in English less than 46 words to fit the description above."
+    },
+    "class_type": "Text Multiline (Code Compatible)",
+    "_meta": {
+      "title": "Text Multiline (Code Compatible)"
+    }
   }
 }