Fix image folder bug

FreeLanguageTools · Mar 22, 2024 · dadb23e · dadb23e
1 parent 83727d2
commit dadb23e
Show file tree

Hide file tree

Showing 7 changed files with 88 additions and 156 deletions.
diff --git a/docs/installation.md b/docs/installation.md
@@ -31,7 +31,7 @@ If you want to test the latest features, you can go to [CI artifacts page](https
 Go to the [Github releases page](https://github.com/FreeLanguageTools/vocabsieve/releases) for standalone versions. You may have to dismiss some warnings from the browser or Windows to install it, as it is unsigned.
 
 <details markdown=1>
-<summary> Click to open instructions for advanced users </summary>
+<summary> Click to open instructions to download test releases </summary>
 
 If you want to test the latest features, you can go to [CI artifacts page](https://nightly.link/FreeLanguageTools/vocabsieve/workflows/build-binaries/master) page to obtain the latest builds, but they are not guaranteed to run. If you notice anything wrong from those builds, open an issue on GitHub. Note: ensure you are using the latest nightly build before reporting anything.
 </details>
@@ -41,9 +41,6 @@ Only 64 bit Windows 10+ is supported
 
 ### MacOS
 
-{: .warning }
-MacOS support is often broken due to the me not being able to test it. If you discovered an issue and can help test or fix it, please reach out by opening a Github issue or using the chatroom.
-
 Go to the [Github releases page](https://github.com/FreeLanguageTools/vocabsieve/releases) for standalone versions. You may have to dismiss some warnings from the browser or Windows to install it, as it is unsigned.
 
 {: .important }
@@ -53,7 +50,7 @@ Open a new terminal window and type the following command
 `xattr -d com.apple.quarantine /path/to/app.app` (replacing "/path/to/app.app" with path to VocabSieve app). This unquarantines the app and allows it to run on your Mac without being certified by Apple.
 
 <details markdown=1>
-<summary> Click to open instructions for advanced users </summary>
+<summary> Click to open instructions to download test releases </summary>
 
 If you want to test the latest features, you can go to [CI artifacts page](https://nightly.link/FreeLanguageTools/vocabsieve/workflows/build-binaries/master) page to obtain the latest builds, but they are not guaranteed to run. If you notice anything wrong from those builds, open an issue on GitHub. Note: ensure you are using the latest nightly build before reporting anything.
 </details>

diff --git a/docs/resources.md b/docs/resources.md
@@ -20,43 +20,24 @@ VocabSieve supports a range of different local resources, which you can use with
 - JSON frequency lists (as a simple list of words in json format) 
 - Sound libraries (a directory of audio files)
 
-## StarDict
+## Dictionaries
+Listed in order of preference by author.
 
+### Kaikki Wiktionary dumps
 
-### Hu Zheng (StarDict author) personal website, over 100 dictionaries
-
-The website has been dead on a while, but some of the files are archived on Wayback Machine:
-
-<https://web.archive.org/web/20230717122310/https://download.huzheng.org/>
-
-### GTongue Dictionaries
-
-<https://sites.google.com/site/gtonguedict/home/stardict-dictionaries> 
-
-
-
-## Migaku
-
-Migaku Official MEGA Folder, 11 languages
-
-<https://mega.nz/folder/eyYwyIgY#3q4XQ3BhdvkFg9KsPe5avw/folder/bz4ywa5A>
-
-## Simple JSONs
-
-### Apple Dictionaries, 41 dictionaries, some bilingual
-
-<https://cloud.freemdict.com/index.php/s/HsC7ybBWsbZ7B4N>
+High-quality parsed data of Wiktionary in various languages. Prefer these over the online Wiktionary API as they contain more information. The English and French Wiktionaries (note this is the language the entries are written in) contain a large number of entries in other languages. All the other versions also contain a lot of entries in their own language, which can be useful as monolingual dictionaries.
 
-Navigate to "json" folder and download items for your language. Note that the bilingual dictionaries listed include entries in **both** directions. For example, an English-Spanish dictionary contains both English words defined in Spanish as well as Spanish words defined in English. You do not need to extract the files in order to import them.
+<https://kaikki.org/>
 
-## Kaikki Wiktionary dumps
+### Hu Zheng (StarDict author) personal website, over 100 dictionaries
 
-High-quality parsed data of Wiktionary in various languages. Prefer these over the online Wiktionary API as they contain more information.
+StarDict dictionaries converted by StarDict's author from various formats. They are usually of decent quality and is plaintext, which is suitable for display in VocabSieve and Anki. StarDicts need to be extracted first before importing. Select the .ifo file in the extracted folder.
 
-<https://kaikki.org/>
+The website has been dead on a while, but many of the files are archived on Wayback Machine:
 
+<https://web.archive.org/web/20230717122310/https://download.huzheng.org/>
 
-## Lingvo DSL
+### Lingvo DSL
 
 Rutracker GoldenDict Dictionaries (Russian, English, Ukrainian)
 
@@ -74,6 +55,18 @@ A bunch of dictionaries for GoldenDict, organized by language. Avoid MDX format
 
 <https://cloud.freemdict.com/index.php/s/pgKcDcbSDTCzXCs>
 
+### Apple Dictionaries, 41 dictionaries, some bilingual
+
+<https://cloud.freemdict.com/index.php/s/HsC7ybBWsbZ7B4N>
+
+Navigate to "json" folder and download items for your language. Note that the bilingual dictionaries listed include entries in **both** directions. For example, an English-Spanish dictionary contains both English words defined in Spanish as well as Spanish words defined in English. You do not need to extract the files in order to import them.
+
+### Migaku dictionaries
+
+Migaku Official MEGA Folder, 11 languages
+
+<https://mega.nz/folder/eyYwyIgY#3q4XQ3BhdvkFg9KsPe5avw/folder/bz4ywa5A>
+
 ## Frequency lists
 
 Lemmatized English frequency list
@@ -85,6 +78,17 @@ Lemmatized Russian frequency list
 <https://github.com/FreeLanguageTools/resources/raw/master/freq/freq_ru.json.gz>
 
 ## Cognate data
-CogNet processed data, includes all languages, may take a while to import.
+CogNet processed data processed for VocabSieve, includes all languages, may take a while to import.
+
+<https://github.com/FreeLanguageTools/resources/raw/master/cognates.json.gz>
+
+## Audio folders
+These need to be extracted into a folder first before importing. The containining folder should be selected for import. Do not delete the files as they are not copied.
+
+Lingua Libre sound libraries
+
+<https://lingualibre.org/datasets/>
+
+Forvo dump in various languages (may not be as complete as the online version)
 
-<https://github.com/FreeLanguageTools/resources/raw/master/cognates.json.gz>
+<https://cloud.freemdict.com/index.php/s/pgKcDcbSDTCzXCs?path=%2F0%20Forvo%20audio>
diff --git a/docs/workflows.md b/docs/workflows.md
@@ -17,7 +17,7 @@ VocabSieve supports a variety of different workflows. They can be broadly classi
 When you see any sentence from anywhere, you can simply copy it to the clipboard. It will appear on the Sentence field right away. Then, double click on any word. A definition should appear if found. You can look up words from the Definition field too. Then, when you are satisfied with the data, click on Add Note button to send it to Anki. You can add tags just like in Anki.
 
 {: .note}
-On MacOS and Linux with Wayland, clipboard change may not be detected due to OS restrictions. A workaround is implemented to use the focus event to retrieve clipboard content, but this may not work every time. Use the "Read clipboard" button if necessary.
+On MacOS and Linux with Wayland, clipboard change may not be detected due to OS restrictions. A workaround is implemented to use polling to retrieve clipboard content, but this may not work every time. Use the "Read clipboard" button if necessary.
 
 ### Browser
 When you turn on the extension, you will notice that sentences are underlined in green. Whenever you click on any word, VocabSieve will receive both the whole sentence and the word under your cursor. The word will be looked up immediately too. Chances are, with lemmatization on, this is exactly the word you want. In that case, just press Ctrl/Cmd + S to save the card, and you can keep reading!

diff --git a/vocabsieve/global_names.py b/vocabsieve/global_names.py
@@ -54,6 +54,8 @@ def _get_settings_app_title():
 settings = QSettings(app_organization, app_name)
 datapath = QStandardPaths.writableLocation(QStandardPaths.DataLocation)
 forvopath = os.path.join(datapath, "forvo")
+_imagepath = os.path.join(datapath, "images")
+os.makedirs(_imagepath, exist_ok=True)
 
 _today_log_name = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
 os.makedirs(os.path.join(datapath, "log"), exist_ok=True)

diff --git a/vocabsieve/main.py b/vocabsieve/main.py
@@ -799,8 +799,8 @@ def lookup(self, target: str, no_lemma=False, trigger=LookupTrigger.double_click
         if settings.value("sg2_enabled", False, type=bool):
             self.definition2.lookup(target, no_lemma, rules)
 
-        self.audio_selector.lookup(target)
         self.freq_widget.lookup(target, True, settings.value("freq_display", "Stars"))
+        self.audio_selector.lookup(target)
 
         self.previous_word = target
         self.previous_trigger = trigger

diff --git a/vocabsieve/record.py b/vocabsieve/record.py
@@ -13,9 +13,6 @@
 from .tools import findNotes, notesInfo
 from .global_names import logger, settings
 
-dictionaries = bidict({"Wiktionary (English)": "wikt-en",
-                       "Google Translate": "gtrans"})
-
 
 class Record():
     """Class to store user data"""
@@ -28,54 +25,19 @@ def __init__(self, parent_settings: QSettings, datapath):
             check_same_thread=False)
         self.c = self.conn.cursor()
         self.c.execute("PRAGMA foreign_keys = ON")
-        self.createTables()
-        self.fixOld()
-        if not parent_settings.value("internal/db_no_definitions"):
-            self.dropDefinitions()
-            parent_settings.setValue("internal/db_no_definitions", True)
-        if not parent_settings.value("internal/db_new_source"):
-            self.fixSource()
-            parent_settings.setValue("internal/db_new_source", True)
-        self.conn.commit()
-        if not parent_settings.value("internal/seen_has_no_word"):
-            self.fixSeen()
-            parent_settings.setValue("internal/seen_has_no_word", True)
-        if not parent_settings.value("internal/timestamps_are_seconds", True):
-            self.fixBadTimestamps()
-            parent_settings.setValue("internal/timestamps_are_seconds", True)
+        self._createTables()
         if not parent_settings.value("internal/lookup_unique_index"):
-            self.makeLookupUnique()
+            self._makeLookupUnique()
             parent_settings.setValue("internal/lookup_unique_index", True)
+        self.conn.commit()
 
         self.last_known_data: Optional[tuple[dict[str, WordRecord], KnownMetadata]] = None
         self.last_known_data_date: float = 0.0  # 1970-01-01
 
-    def fixSeen(self):
-        try:
-            self.c.execute("""
-                ALTER TABLE seen DROP COLUMN word
-                """)
-            self.conn.commit()
-            self.c.execute("VACUUM")
-        except Exception as e:
-            print(e)
-
-    def fixSource(self):
-        self.c.execute("""
-            UPDATE lookups SET source='vocabsieve'
-            """)
-        self.conn.commit()
-
-    def fixBadTimestamps(self):
-        "In the past some lookups were imported with millisecond timestamps"
-        self.c.execute("""
-            UPDATE lookups SET timestamp=timestamp/1000 WHERE timestamp > 1000000000000
-            """)
-
-    def createTables(self):
+    def _createTables(self):
         self.c.execute("""
         CREATE TABLE IF NOT EXISTS lookups (
-            timestamp FLOAT,
+            timestamp REAL,
             word TEXT,
             lemma TEXT,
             language TEXT,
@@ -87,7 +49,7 @@ def createTables(self):
         """)
         self.c.execute("""
         CREATE TABLE IF NOT EXISTS notes (
-            timestamp FLOAT,
+            timestamp REAL,
             data TEXT,
             success INTEGER,
             sentence TEXT,
@@ -100,12 +62,11 @@ def createTables(self):
         )
         """)
         self.c.execute("""
-        CREATE TABLE IF NOT EXISTS seen (
+        CREATE TABLE IF NOT EXISTS seen_new (
             language TEXT,
             lemma TEXT,
-            jd INTEGER,
-            source INTEGER,
-            FOREIGN KEY(source) REFERENCES contents(id) ON DELETE CASCADE
+            count INTEGER DEFAULT 1,
+            UNIQUE(language, lemma)
         )
         """)
         self.c.execute("""
@@ -128,49 +89,14 @@ def createTables(self):
         self.c.execute("""
                        CREATE UNIQUE INDEX IF NOT EXISTS modifier_index ON modifiers (language, lemma)
         """)
+        # Non-unique index for seen_new
+        self.c.execute("""CREATE INDEX IF NOT EXISTS seen_index_lang ON seen_new (language)""")
+        # Clean up old seen table
+        self.c.execute("""DROP TABLE IF EXISTS seen""")
+        self.c.execute("""VACUUM""")
         self.conn.commit()
 
-    def fixOld(self):
-        """
-        1. In the past language name rather than code was recorded
-        2. In the past some dictonaries had special names.
-        3. Add proper columns in the notes table rather than just a json dump
-        """
-        self.c.execute("""
-        SELECT DISTINCT language FROM lookups
-        """)
-        for languagename, in self.c.fetchall():  # comma unpacks a single value tuple
-            if not langcodes.get(languagename) and langcodes.inverse.get(languagename):
-                print(f"Replacing {languagename} with {langcodes.inverse[languagename]}")
-                self.c.execute("""
-                UPDATE lookups SET language=? WHERE language=?
-                """, (langcodes.inverse[languagename], languagename))
-                self.conn.commit()
-        self.c.execute("""
-        SELECT DISTINCT source FROM lookups
-        """)
-        for source, in self.c.fetchall():  # comma unpacks a single value tuple
-            if source in dictionaries.inverse:  # pylint: disable=unsupported-membership-test
-                print(f"Replacing {source} with {dictionaries.inverse[source]}")
-                self.c.execute("""
-                UPDATE lookups SET source=? WHERE source=?
-                """, (dictionaries.inverse[source], source))
-                self.conn.commit()
-        try:
-            self.c.executescript("""
-                ALTER TABLE notes ADD COLUMN sentence TEXT;
-                ALTER TABLE notes ADD COLUMN word TEXT;
-                ALTER TABLE notes ADD COLUMN definition TEXT;
-                ALTER TABLE notes ADD COLUMN definition2 TEXT;
-                ALTER TABLE notes ADD COLUMN pronunciation TEXT;
-                ALTER TABLE notes ADD COLUMN image TEXT;
-                ALTER TABLE notes ADD COLUMN tags TEXT;
-            """)
-        except sqlite3.OperationalError:
-            pass
-        self.c.execute("VACUUM")
-
-    def makeLookupUnique(self):
+    def _makeLookupUnique(self):
         """
         In the past, lookups were not unique, which made it very slow
         to avoid inserting duplicates"""
@@ -188,20 +114,16 @@ def makeLookupUnique(self):
         """)
         self.conn.commit()
 
-    def dropDefinitions(self):
-        print('dropping definition')
-        try:
-            self.c.execute("""ALTER TABLE lookups DROP COLUMN definition""")
-            self.c.execute("VACUUM")
-        except Exception as e:
-            logger.error(e)
-
-    def seenContent(self, cid, name, content, language, jd):
+    def _seenContent(self, name, content, language):
         start = time.time()
         for word in content.replace("\\n", "\n").replace("\\N", "\n").split():
             lemma = lem_word(word, language)
-            self.c.execute('INSERT INTO seen(source, language, lemma, jd) VALUES(?,?,?,?)', (cid, language, lemma, jd))
-        print("Lemmatized", name, "in", time.time() - start, "seconds")
+            self.c.execute("""
+                    INSERT INTO seen_new(language, lemma) VALUES(?,?)
+                    ON CONFLICT(language, lemma) DO UPDATE SET count = count + 1
+            """, (language, lemma))
+        self.conn.commit()
+        logger.info("Lemmatized", name, "in", time.time() - start, "seconds")
 
     def importContent(self, name: str, content: str, language: str, jd: int):
         start = time.time()
@@ -216,12 +138,12 @@ def importContent(self, name: str, content: str, language: str, jd: int):
 
             self.c.execute("SELECT last_insert_rowid()")
             source = self.c.fetchone()[0]
-            print("ID for content", name, "is", source)
-            self.seenContent(source, name, content, language, jd)
+            logger.debug("ID for content", name, "is", source)
+            self._seenContent(name, content, language)
             self.conn.commit()
-            print("Recorded", name, "in", time.time() - start, "seconds")
+            logger.debug("Recorded", name, "in", time.time() - start, "seconds")
             return True
-        print(name, "already exists")
+        logger.info(name, "already exists")
         return False
 
     def getContents(self, language):
@@ -248,25 +170,24 @@ def setModifier(self, language, lemma, value):
         self.conn.commit()
 
     def rebuildSeen(self):
-        self.c.execute("DELETE FROM seen")
+        self.c.execute("DELETE FROM seen_new")
         self.c.execute('SELECT id, name, content, language, jd FROM contents')
-        for cid, name, content, language, jd in self.c.fetchall():
-            print("Lemmatizing", name)
-            self.seenContent(cid, name, content, language, jd)
+        for _, name, content, language, _ in self.c.fetchall():
+            self._seenContent(name, content, language)
         self.conn.commit()
         self.c.execute("VACUUM")
 
     def getSeen(self, language):
         return self.c.execute('''
-            SELECT lemma, COUNT (lemma)
-            FROM seen
+            SELECT lemma, count
+            FROM seen_new
             WHERE language=?
-            GROUP BY lemma''', (language,))
+            ''', (language,))
 
     def countSeen(self, language):
         self.c.execute('''
-            SELECT COUNT(lemma), COUNT (DISTINCT lemma)
-            FROM seen
+            SELECT SUM (count), COUNT (DISTINCT lemma)
+            FROM seen_new
             WHERE language=?''', (language,))
         return self.c.fetchone()
 
@@ -403,9 +324,9 @@ def countNotesDay(self, day):
 
     def purge(self):
         self.c.execute("""
-        DROP TABLE IF EXISTS lookups,notes,contents,seen
+        DROP TABLE IF EXISTS lookups,notes,contents,seen_new,seen
         """)
-        self.createTables()
+        self._createTables()
         self.c.execute("VACUUM")
 
     def getKnownData(self) -> tuple[dict[str, WordRecord], KnownMetadata]: