Skip to content

Commit

Permalink
Fix image folder bug
Browse files Browse the repository at this point in the history
  • Loading branch information
1over137 committed Mar 22, 2024
1 parent 83727d2 commit dadb23e
Show file tree
Hide file tree
Showing 7 changed files with 88 additions and 156 deletions.
7 changes: 2 additions & 5 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ If you want to test the latest features, you can go to [CI artifacts page](https
Go to the [Github releases page](https://github.com/FreeLanguageTools/vocabsieve/releases) for standalone versions. You may have to dismiss some warnings from the browser or Windows to install it, as it is unsigned.

<details markdown=1>
<summary> Click to open instructions for advanced users </summary>
<summary> Click to open instructions to download test releases </summary>

If you want to test the latest features, you can go to [CI artifacts page](https://nightly.link/FreeLanguageTools/vocabsieve/workflows/build-binaries/master) page to obtain the latest builds, but they are not guaranteed to run. If you notice anything wrong from those builds, open an issue on GitHub. Note: ensure you are using the latest nightly build before reporting anything.
</details>
Expand All @@ -41,9 +41,6 @@ Only 64 bit Windows 10+ is supported

### MacOS

{: .warning }
MacOS support is often broken due to the me not being able to test it. If you discovered an issue and can help test or fix it, please reach out by opening a Github issue or using the chatroom.

Go to the [Github releases page](https://github.com/FreeLanguageTools/vocabsieve/releases) for standalone versions. You may have to dismiss some warnings from the browser or Windows to install it, as it is unsigned.

{: .important }
Expand All @@ -53,7 +50,7 @@ Open a new terminal window and type the following command
`xattr -d com.apple.quarantine /path/to/app.app` (replacing "/path/to/app.app" with path to VocabSieve app). This unquarantines the app and allows it to run on your Mac without being certified by Apple.

<details markdown=1>
<summary> Click to open instructions for advanced users </summary>
<summary> Click to open instructions to download test releases </summary>

If you want to test the latest features, you can go to [CI artifacts page](https://nightly.link/FreeLanguageTools/vocabsieve/workflows/build-binaries/master) page to obtain the latest builds, but they are not guaranteed to run. If you notice anything wrong from those builds, open an issue on GitHub. Note: ensure you are using the latest nightly build before reporting anything.
</details>
Expand Down
66 changes: 35 additions & 31 deletions docs/resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,43 +20,24 @@ VocabSieve supports a range of different local resources, which you can use with
- JSON frequency lists (as a simple list of words in json format)
- Sound libraries (a directory of audio files)

## StarDict
## Dictionaries
Listed in order of preference by author.

### Kaikki Wiktionary dumps

### Hu Zheng (StarDict author) personal website, over 100 dictionaries

The website has been dead on a while, but some of the files are archived on Wayback Machine:

<https://web.archive.org/web/20230717122310/https://download.huzheng.org/>

### GTongue Dictionaries

<https://sites.google.com/site/gtonguedict/home/stardict-dictionaries>



## Migaku

Migaku Official MEGA Folder, 11 languages

<https://mega.nz/folder/eyYwyIgY#3q4XQ3BhdvkFg9KsPe5avw/folder/bz4ywa5A>

## Simple JSONs

### Apple Dictionaries, 41 dictionaries, some bilingual

<https://cloud.freemdict.com/index.php/s/HsC7ybBWsbZ7B4N>
High-quality parsed data of Wiktionary in various languages. Prefer these over the online Wiktionary API as they contain more information. The English and French Wiktionaries (note this is the language the entries are written in) contain a large number of entries in other languages. All the other versions also contain a lot of entries in their own language, which can be useful as monolingual dictionaries.

Navigate to "json" folder and download items for your language. Note that the bilingual dictionaries listed include entries in **both** directions. For example, an English-Spanish dictionary contains both English words defined in Spanish as well as Spanish words defined in English. You do not need to extract the files in order to import them.
<https://kaikki.org/>

## Kaikki Wiktionary dumps
### Hu Zheng (StarDict author) personal website, over 100 dictionaries

High-quality parsed data of Wiktionary in various languages. Prefer these over the online Wiktionary API as they contain more information.
StarDict dictionaries converted by StarDict's author from various formats. They are usually of decent quality and is plaintext, which is suitable for display in VocabSieve and Anki. StarDicts need to be extracted first before importing. Select the .ifo file in the extracted folder.

<https://kaikki.org/>
The website has been dead on a while, but many of the files are archived on Wayback Machine:

<https://web.archive.org/web/20230717122310/https://download.huzheng.org/>

## Lingvo DSL
### Lingvo DSL

Rutracker GoldenDict Dictionaries (Russian, English, Ukrainian)

Expand All @@ -74,6 +55,18 @@ A bunch of dictionaries for GoldenDict, organized by language. Avoid MDX format

<https://cloud.freemdict.com/index.php/s/pgKcDcbSDTCzXCs>

### Apple Dictionaries, 41 dictionaries, some bilingual

<https://cloud.freemdict.com/index.php/s/HsC7ybBWsbZ7B4N>

Navigate to "json" folder and download items for your language. Note that the bilingual dictionaries listed include entries in **both** directions. For example, an English-Spanish dictionary contains both English words defined in Spanish as well as Spanish words defined in English. You do not need to extract the files in order to import them.

### Migaku dictionaries

Migaku Official MEGA Folder, 11 languages

<https://mega.nz/folder/eyYwyIgY#3q4XQ3BhdvkFg9KsPe5avw/folder/bz4ywa5A>

## Frequency lists

Lemmatized English frequency list
Expand All @@ -85,6 +78,17 @@ Lemmatized Russian frequency list
<https://github.com/FreeLanguageTools/resources/raw/master/freq/freq_ru.json.gz>

## Cognate data
CogNet processed data, includes all languages, may take a while to import.
CogNet processed data processed for VocabSieve, includes all languages, may take a while to import.

<https://github.com/FreeLanguageTools/resources/raw/master/cognates.json.gz>

## Audio folders
These need to be extracted into a folder first before importing. The containining folder should be selected for import. Do not delete the files as they are not copied.

Lingua Libre sound libraries

<https://lingualibre.org/datasets/>

Forvo dump in various languages (may not be as complete as the online version)

<https://github.com/FreeLanguageTools/resources/raw/master/cognates.json.gz>
<https://cloud.freemdict.com/index.php/s/pgKcDcbSDTCzXCs?path=%2F0%20Forvo%20audio>
2 changes: 1 addition & 1 deletion docs/workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ VocabSieve supports a variety of different workflows. They can be broadly classi
When you see any sentence from anywhere, you can simply copy it to the clipboard. It will appear on the Sentence field right away. Then, double click on any word. A definition should appear if found. You can look up words from the Definition field too. Then, when you are satisfied with the data, click on Add Note button to send it to Anki. You can add tags just like in Anki.

{: .note}
On MacOS and Linux with Wayland, clipboard change may not be detected due to OS restrictions. A workaround is implemented to use the focus event to retrieve clipboard content, but this may not work every time. Use the "Read clipboard" button if necessary.
On MacOS and Linux with Wayland, clipboard change may not be detected due to OS restrictions. A workaround is implemented to use polling to retrieve clipboard content, but this may not work every time. Use the "Read clipboard" button if necessary.

### Browser
When you turn on the extension, you will notice that sentences are underlined in green. Whenever you click on any word, VocabSieve will receive both the whole sentence and the word under your cursor. The word will be looked up immediately too. Chances are, with lemmatization on, this is exactly the word you want. In that case, just press Ctrl/Cmd + S to save the card, and you can keep reading!
Expand Down
2 changes: 2 additions & 0 deletions vocabsieve/global_names.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ def _get_settings_app_title():
settings = QSettings(app_organization, app_name)
datapath = QStandardPaths.writableLocation(QStandardPaths.DataLocation)
forvopath = os.path.join(datapath, "forvo")
_imagepath = os.path.join(datapath, "images")
os.makedirs(_imagepath, exist_ok=True)

_today_log_name = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
os.makedirs(os.path.join(datapath, "log"), exist_ok=True)
Expand Down
2 changes: 1 addition & 1 deletion vocabsieve/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -799,8 +799,8 @@ def lookup(self, target: str, no_lemma=False, trigger=LookupTrigger.double_click
if settings.value("sg2_enabled", False, type=bool):
self.definition2.lookup(target, no_lemma, rules)

self.audio_selector.lookup(target)
self.freq_widget.lookup(target, True, settings.value("freq_display", "Stars"))
self.audio_selector.lookup(target)

self.previous_word = target
self.previous_trigger = trigger
Expand Down
151 changes: 36 additions & 115 deletions vocabsieve/record.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,6 @@
from .tools import findNotes, notesInfo
from .global_names import logger, settings

dictionaries = bidict({"Wiktionary (English)": "wikt-en",
"Google Translate": "gtrans"})


class Record():
"""Class to store user data"""
Expand All @@ -28,54 +25,19 @@ def __init__(self, parent_settings: QSettings, datapath):
check_same_thread=False)
self.c = self.conn.cursor()
self.c.execute("PRAGMA foreign_keys = ON")
self.createTables()
self.fixOld()
if not parent_settings.value("internal/db_no_definitions"):
self.dropDefinitions()
parent_settings.setValue("internal/db_no_definitions", True)
if not parent_settings.value("internal/db_new_source"):
self.fixSource()
parent_settings.setValue("internal/db_new_source", True)
self.conn.commit()
if not parent_settings.value("internal/seen_has_no_word"):
self.fixSeen()
parent_settings.setValue("internal/seen_has_no_word", True)
if not parent_settings.value("internal/timestamps_are_seconds", True):
self.fixBadTimestamps()
parent_settings.setValue("internal/timestamps_are_seconds", True)
self._createTables()
if not parent_settings.value("internal/lookup_unique_index"):
self.makeLookupUnique()
self._makeLookupUnique()
parent_settings.setValue("internal/lookup_unique_index", True)
self.conn.commit()

self.last_known_data: Optional[tuple[dict[str, WordRecord], KnownMetadata]] = None
self.last_known_data_date: float = 0.0 # 1970-01-01

def fixSeen(self):
try:
self.c.execute("""
ALTER TABLE seen DROP COLUMN word
""")
self.conn.commit()
self.c.execute("VACUUM")
except Exception as e:
print(e)

def fixSource(self):
self.c.execute("""
UPDATE lookups SET source='vocabsieve'
""")
self.conn.commit()

def fixBadTimestamps(self):
"In the past some lookups were imported with millisecond timestamps"
self.c.execute("""
UPDATE lookups SET timestamp=timestamp/1000 WHERE timestamp > 1000000000000
""")

def createTables(self):
def _createTables(self):
self.c.execute("""
CREATE TABLE IF NOT EXISTS lookups (
timestamp FLOAT,
timestamp REAL,
word TEXT,
lemma TEXT,
language TEXT,
Expand All @@ -87,7 +49,7 @@ def createTables(self):
""")
self.c.execute("""
CREATE TABLE IF NOT EXISTS notes (
timestamp FLOAT,
timestamp REAL,
data TEXT,
success INTEGER,
sentence TEXT,
Expand All @@ -100,12 +62,11 @@ def createTables(self):
)
""")
self.c.execute("""
CREATE TABLE IF NOT EXISTS seen (
CREATE TABLE IF NOT EXISTS seen_new (
language TEXT,
lemma TEXT,
jd INTEGER,
source INTEGER,
FOREIGN KEY(source) REFERENCES contents(id) ON DELETE CASCADE
count INTEGER DEFAULT 1,
UNIQUE(language, lemma)
)
""")
self.c.execute("""
Expand All @@ -128,49 +89,14 @@ def createTables(self):
self.c.execute("""
CREATE UNIQUE INDEX IF NOT EXISTS modifier_index ON modifiers (language, lemma)
""")
# Non-unique index for seen_new
self.c.execute("""CREATE INDEX IF NOT EXISTS seen_index_lang ON seen_new (language)""")
# Clean up old seen table
self.c.execute("""DROP TABLE IF EXISTS seen""")
self.c.execute("""VACUUM""")
self.conn.commit()

def fixOld(self):
"""
1. In the past language name rather than code was recorded
2. In the past some dictonaries had special names.
3. Add proper columns in the notes table rather than just a json dump
"""
self.c.execute("""
SELECT DISTINCT language FROM lookups
""")
for languagename, in self.c.fetchall(): # comma unpacks a single value tuple
if not langcodes.get(languagename) and langcodes.inverse.get(languagename):
print(f"Replacing {languagename} with {langcodes.inverse[languagename]}")
self.c.execute("""
UPDATE lookups SET language=? WHERE language=?
""", (langcodes.inverse[languagename], languagename))
self.conn.commit()
self.c.execute("""
SELECT DISTINCT source FROM lookups
""")
for source, in self.c.fetchall(): # comma unpacks a single value tuple
if source in dictionaries.inverse: # pylint: disable=unsupported-membership-test
print(f"Replacing {source} with {dictionaries.inverse[source]}")
self.c.execute("""
UPDATE lookups SET source=? WHERE source=?
""", (dictionaries.inverse[source], source))
self.conn.commit()
try:
self.c.executescript("""
ALTER TABLE notes ADD COLUMN sentence TEXT;
ALTER TABLE notes ADD COLUMN word TEXT;
ALTER TABLE notes ADD COLUMN definition TEXT;
ALTER TABLE notes ADD COLUMN definition2 TEXT;
ALTER TABLE notes ADD COLUMN pronunciation TEXT;
ALTER TABLE notes ADD COLUMN image TEXT;
ALTER TABLE notes ADD COLUMN tags TEXT;
""")
except sqlite3.OperationalError:
pass
self.c.execute("VACUUM")

def makeLookupUnique(self):
def _makeLookupUnique(self):
"""
In the past, lookups were not unique, which made it very slow
to avoid inserting duplicates"""
Expand All @@ -188,20 +114,16 @@ def makeLookupUnique(self):
""")
self.conn.commit()

def dropDefinitions(self):
print('dropping definition')
try:
self.c.execute("""ALTER TABLE lookups DROP COLUMN definition""")
self.c.execute("VACUUM")
except Exception as e:
logger.error(e)

def seenContent(self, cid, name, content, language, jd):
def _seenContent(self, name, content, language):
start = time.time()
for word in content.replace("\\n", "\n").replace("\\N", "\n").split():
lemma = lem_word(word, language)
self.c.execute('INSERT INTO seen(source, language, lemma, jd) VALUES(?,?,?,?)', (cid, language, lemma, jd))
print("Lemmatized", name, "in", time.time() - start, "seconds")
self.c.execute("""
INSERT INTO seen_new(language, lemma) VALUES(?,?)
ON CONFLICT(language, lemma) DO UPDATE SET count = count + 1
""", (language, lemma))
self.conn.commit()
logger.info("Lemmatized", name, "in", time.time() - start, "seconds")

def importContent(self, name: str, content: str, language: str, jd: int):
start = time.time()
Expand All @@ -216,12 +138,12 @@ def importContent(self, name: str, content: str, language: str, jd: int):

self.c.execute("SELECT last_insert_rowid()")
source = self.c.fetchone()[0]
print("ID for content", name, "is", source)
self.seenContent(source, name, content, language, jd)
logger.debug("ID for content", name, "is", source)
self._seenContent(name, content, language)
self.conn.commit()
print("Recorded", name, "in", time.time() - start, "seconds")
logger.debug("Recorded", name, "in", time.time() - start, "seconds")
return True
print(name, "already exists")
logger.info(name, "already exists")
return False

def getContents(self, language):
Expand All @@ -248,25 +170,24 @@ def setModifier(self, language, lemma, value):
self.conn.commit()

def rebuildSeen(self):
self.c.execute("DELETE FROM seen")
self.c.execute("DELETE FROM seen_new")
self.c.execute('SELECT id, name, content, language, jd FROM contents')
for cid, name, content, language, jd in self.c.fetchall():
print("Lemmatizing", name)
self.seenContent(cid, name, content, language, jd)
for _, name, content, language, _ in self.c.fetchall():
self._seenContent(name, content, language)
self.conn.commit()
self.c.execute("VACUUM")

def getSeen(self, language):
return self.c.execute('''
SELECT lemma, COUNT (lemma)
FROM seen
SELECT lemma, count
FROM seen_new
WHERE language=?
GROUP BY lemma''', (language,))
''', (language,))

def countSeen(self, language):
self.c.execute('''
SELECT COUNT(lemma), COUNT (DISTINCT lemma)
FROM seen
SELECT SUM (count), COUNT (DISTINCT lemma)
FROM seen_new
WHERE language=?''', (language,))
return self.c.fetchone()

Expand Down Expand Up @@ -403,9 +324,9 @@ def countNotesDay(self, day):

def purge(self):
self.c.execute("""
DROP TABLE IF EXISTS lookups,notes,contents,seen
DROP TABLE IF EXISTS lookups,notes,contents,seen_new,seen
""")
self.createTables()
self._createTables()
self.c.execute("VACUUM")

def getKnownData(self) -> tuple[dict[str, WordRecord], KnownMetadata]:
Expand Down
Loading

0 comments on commit dadb23e

Please sign in to comment.