[WIP] Dexie-based search index implementation #322

poltak · 2018-02-28T02:04:57Z

List of all the stuff that has been/needs to be redone. No user-facing behaviour or features should change.

Data model

Data model is laid out as Dexie models in here and Dexie index schema is defined here

Page
- URL PK (0-1 page per URL)
- indexes on [title/URL/content] terms (full text search)
- index on domain (suggestions)
- ~~missing screenshot + favicon Blobs~~
Visit
- time + URL compound PK (N Visits per 1 Page)
- index on URL (find all visits for page)
Bookmark
- URL PK (0-1 per URL)
Tag
- name + URL compound PK (0-N Tags per 1 Page)
- index on name (suggestions)
- index on URL (find all tags for page)

Adding stuff

add page (+ bookmark, visits) by URL
- main page indexing method; pages should only exist with at least a visit or bookmark
add page terms by URL
- stage II of page visit scenario (see Initial page visit indexing #268 for page visit details)
add visit interaction data by URL + time
- stage iii/final of page visit scenario
add visit by URL
- used for recent revisits (no page data re-indexing)

Deleting stuff

delete pages by URLs
- overview delete
delete pages by domain
- opt. popup page on-blacklist-update delete hook
delete pages by pattern
- opt. options page on-blacklist-update delete hook

Tags specific

Simplified this interface - omitting unused methods

add tag for URL
- tags dropdowns in popup + overview pages
delete tag for URL

Bookmark specific

add bookmark by URL
- via overview or popup
- includes logic to extract data from tab if no page exists (popup-only)
delete bookmark by URL
- unbookmarking via overview or popup
browser bookmark create listener
- happens when user adds a bookmark via browser, not the ext.
- may do a remote fetch (same as imports) if the page does not exist (user goes to bookmark mgmt, bookmarks a completely new page)

Utilities

get page by URL
- takes URL, returns Page model instance or undefined

Imports

grab already stored history + bookmark URLs (perform set difference with browser data sources)
- would be nice to see if we can now replace the linear in-mem diff with a Dexie query

Search

Known bugs

imports creates a visit for import time
~~typing 1 letter in the popup search/tags inputs results in infinite typing loop (can't see how this is related yet...)~~ (seems to be my browser; other ext popups doing this too)
display times in overview are always the latest bookmark/visit even if time filter used (display issue only)
various imports related issues; progress sometimes isn't halted properly causing it to continue running lots of imports in background probably not related
search slows down sometimes for some unknown reason; script refresh fixes it. Maybe something with Dexie
FF Promise breaks FF

Misc TODOs

update old popup code dependent on old page structure
go over any remaining usages of Pouch
- lots of old WebMemex parts highly coupled to Pouch
- need to refactor within the old search index code and wrap in some index method
ensure no more usages of old page IDs exist in codebase (outside of old index dir)
- now simply using URL (which page ID was derived from in an overly complicated way)
cleanup the Storage class and how it relates to the interface in search-index-new/index
ensure all the old search-index works after refactoring
- lots of modules highly-coupled with Pouch meant big refactoring to move it into old index dir
- get some data on older version, then update to this branch and everything should behave the same
go over everything; at least this time a lot less code than prev. implementation
write some unit tests for index methods
squash this branch

Other things?

blackforestboi · 2018-02-28T09:41:36Z

Wow, such great work @poltak. As already mentioned, when testing it 2 days ago i could finish 15k documents in about 1.5 hours :)

But I also ran into a rather nasty bug:

How to reproduce:

Let the importer run
After a while it hangs itself and the ram shoots to almost 1GB and the CPU to 240%
Only way out is restarting the extension.In the latest test even that didn't help and I could not progress with the imports anymore.
In order for it to work, I had to change the blacklist in order to force a recalculation.
If you guys can't reproduce, i'd be happy to hop on a skype session and test with you what is necessary.

Also what we talked about before, does it make sense to include a term level concurrency mode, so that in any case a search request does not have to wait for a page to be indexed?

blackforestboi · 2018-02-28T09:54:21Z

Addition: While the importer is going, it seems like gradually increasing the RAM load, so there may be some things not properly being removed and clogging the RAM?

poltak · 2018-02-28T11:07:22Z

Cool thanks for the report. Will look into. There's no queue used for index ops currently. Dexie seems to manage this fine, but this is another thing I should try to break and get confirmation on. Will add to the list Dexie puts us at a much higher level than the prev IndexedDB implementation, so worrying about indexing individual terms is no longer relevant

…

On Feb 28, 2018, at 16:41, Oliver Sauter ***@***.***> wrote: Wow, such great work @poltak. As already mentioned, when testing it 2 days ago i could finish 15k documents in about 1.5 hours :) But I also ran into a rather nasty bug: How to reproduce: Let the importer run After a while it hangs itself and the ram shoots to almost 1GB and the CPU to 240% Only way out is restarting the extension.In the latest test even that didn't help and I could not progress with the imports anymore. In order for it to work, I had to change the blacklist in order to force a recalculation. If you guys can't reproduce, i'd be happy to hop on a skype session and test with you what is necessary. Also what we talked about before, does it make sense to include a term level concurrency mode, so that in any case a search request does not have to wait for a page to be indexed? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

blackforestboi · 2018-02-28T15:33:50Z

so worrying about indexing individual terms is no longer relevant

Does this mean Dexie handles prioritisation between search requests and indexing itself?
Just want to prevent that under high indexing load (long ass article or anything like that) we have a delay in the search speed, in the way it is now.

poltak · 2018-03-01T08:54:49Z

Does this mean Dexie handles prioritisation between search requests and indexing itself?

Dexie handles it at another level which it calls transactions. These are like a set of DB ops that either all happen or none happen (if something goes wrong, it's rolled back). In our ext, things like adding a new page or searching is a single transaction. It will handle scheduling these after each other if they are writing to the same table, but apparently allows them to run in parallel if they are just reading (search only needs to read).

That big 6k+ terms United States wiki article now only takes ~400ms for me to index the terms. Takes > 2s for me on the master version. Seems to be a good improvement with terms indexing. Would be good to play around with bigger DB once imports issues sorted and see how it scales

blackforestboi · 2018-03-01T10:27:26Z

Just tested your updates and wanna leave here that the problem with the stuck import still persists.

Thought you maybe have fixed it with c71ddeb

Could you reproduce it?

poltak · 2018-03-01T11:45:46Z

Haven't looked at this yet @oliversauter. It's on the radar

poltak · 2018-03-02T07:08:14Z

@oliversauter spent a bit of time playing with imports; managed to import ~2k pages pretty quickly at 20 concurrency and it stayed at fairly constant resource overhead the whole time. Didn't get into any hang or stop. We should talk about it briefly in call tonight if still a prob.

I found one issue that may be related which is sometimes it doesn't properly stop the imports progress when pause/cancel is pressed. The UI looks like it stopped fine, but in the background it's still downloading. Which isn't good as it could continue through the entire history, and the user will be left confused why the browser's going very slow.
Not reproducible yet, just noticed it happening once, spent ages trying to get it happen again, then it happened later when I wasn't expecting. Shouldn't be related to the new index though - imports isn't really touched - probably a bug from the recent changes to imports state, but I'll see if I can get a fix in here.

blackforestboi · 2018-03-02T11:43:59Z

Ok, another set of clues:
The importer didn't completely hang itself anymore in the sense that the application crashed.
But at some point it does get stuck and only very slowly progresses.
The only output then is a few error'd out urls, which actually should not error out.
Most of them are "data fetch failed"

I am not sure, but maybe it has something to do with us sending too many requests?
Do we handle all HTTP cases well?

EDIT: with reinstalling the extension it works again so probably less to do with the amount of requests we send?

blackforestboi · 2018-03-03T16:02:31Z

Another round of reviews:

clicking on tag filter in a result element when already a search term is typed, does not work > adds the #test tag to the search, but only by clicking on the search field and adding a space, it will put the tag into the filter.
When a page has ben revisited, I expect it to move to the top of the empty search. (test: import, then visit one of the pages, reload search overview after visit has been logged)

3) Scoring for titles and urls does not work yet. Expect the erdogan article be on top

sometimes when I search too fast, and it would result in an empty search result, it resorts to showing the results for no term entered.
The results counter does not work properly, it shows only the number of results that are already streamed in.
Adding tags to pages when navigating in the same tab not possible. The first page works, everything after not.
Test: 1) Open this article, try to tag, should work. 2) find a guardian link to another article, follow it, try to tag, tag field greyed out. In Prod it works.

poltak · 2018-03-04T02:48:33Z

@oliversauter search oddities are expected until I get around to going over all that code this week and verifying things. Gonna be lots of little things to fix up.

6 and 5 work fine for me. 6 will be greyed out if you open the popup before the initial indexing happens (DOM load + few hundred ms) - no changes there. 5 should show the correct total count as long as there are terms entered - all look correct for me with 4k pages to search through, but will keep an eye out for any wrong. Planning to look into possibly of counts for non-terms search this week

4 and 1, like imports, is not really related to these changes but will look into including fixes for them once all the other stuff is done.

One big issue to mention is the new index currently won't run under Firefox due to IndexedDB/Dexie transactions not working with FF's native Promise implementation. Should be able to get around it easy by updating our build to replace that code, at least for FF. This is main thing I'm playing around with, along with continuing implementation. More info upstream.

@oliversauter I'll let you know when I need any more manual testing or feedback just to save time and effort

ShishKabab · 2018-03-08T09:56:33Z

Looks very good John, lot's of great work done here!

Regarding the testing here's some thoughts:

Usually I try to hard code data, or write functions in the tests themselves to generate it (if those functions can be kept simple enough, since I also try to minimize logic in tests.) One example of this is createId() in runSuite(). Here maybe we break normalizeUrl() at one point, or suddenly we break the external API by changing the ID generation, and the tests won't catch that the way they're currently written. It's a debatable point, because writing this tests in this way might need more work both up front and maintaining a changing codebase, but it will probably increase the usefulness of the tests.
We cannot assume the tests in mutation tests are executed in order. Every test() is meant to be a complete scenario. For readability, you can split up the tests in different functions, maybe with a certain naming convention, which will show up in the stack trace if something goes wrong.
The mapResultsFunc() kind of worries me. The new and old index should have the same API right? Here I may be misunderstanding something, but to my understanding the rest of the codebase should not know if the new or old index is used.
I'd renamed expected{1,2,3} to something more descriptive.
The call to page.loadRels(): 1) Is this symetric with the old index API? 2) This is an implementation detail resulting from the use of Dexie, and thus should not be part of the public API. If you want to test the internals of the new index, I'd recommend moving the test inside search-index-new. Also, I'd recommend to prefix that method with an underscore, or make an entirely new object that wraps the internal object. Maybe we should have a clear policy about these things, how to hide internals and prevent people from using APIs we do not intend to support in the long run.
As a naming convention, I usually write top-level constants like TEST_PAGE_1 instead of page1 and visit1. This makes it cleared these things are shared between multiple tests.
What we could do as a convention is to place test data in a .test.data.js file, like I've done in the dexie-import branch. We need to decide here whether we want to use it in the tests in one of ways 1) import * as TEST_DATA from './index.test.data'; TEST_DATA.page1 or 2) import { TEST_PAGE_1, ... } from './index.test.data'. Thoughts about this?
Not sure if this matters, but my mental model would say that visit1 < visit2 < visit3

One other question:

When writing the import/export functionality, I noticed the code to retrieve screenshots from Pouch was gone. Will the old index keep working as expected?

blackforestboi · 2018-03-08T20:21:12Z

Found a bug with the domain filters.

when searching for nytimes.com in the address bar, it shows some results that definitely not from nytimes.com nor do they have this term in the text.
Seems to happen for all domains entered.

2) also weird is, that if i go on "show more results" it only shows one of the results:

poltak · 2018-03-09T06:05:47Z

@oliversauter good find! I had totally forgotten about implementing domains filter extraction from queries (different to the UI filter). Wrote a test to confirm it indeed works on old but fails on new index. Then wrote the code to ensure it now passes. Domains search without terms seems pretty slow right now (seems linear to something).

Thanks a lot for the tests feedback @ShishKabab !

ID gen

I like your points on hardcoding here instead of calling those ID deriving fns. The URL normalization stuff should get their own tests and be considered an unrelated side-effect to the actual index tests. Don't want to call them directly here. Although note that a few of the index methods still depend on them internally (anything that accepts a URL as param will transform it into an ID). Done in
633f46c

dependent tests

Yeah this was something I felt was a bit yucky. Have changed those mutation tests (now calling them "read-write ops tests") so that the test data gets reset before each of them, and that they don't make any assumptions about data based on other tests. Also ensured they all do assertions on results both before and after and write ops. All the outer tests now grouped under "read ops tests" which don't need to reset data each time. 35c2599

test data

Updated names, etc. I like the idea of the separate test data modules. May make tests less clear, but I think most editors should be able to easily show what's in the imported test data for reference. Should the expected values live with test data? In this case, we only need to assert matching IDs, so the expected values are all shared. 633f46c

page.loadRels() thing

Yes this was a big code smell. It is implementation specific to the Dexie Page model, and should be encapsulated in the new index method implementation methods d2dcb45 nice find

mapResultsFunc

Yes, this was weird. Basically it's the post-search stage where constant size display data fetching for results happens,
but the input shapes differed between old and new index (output - what gets sent to UI- is always same however). Updated the old index to be in-line with the new index intermediate results shape and
now no differences needed between the tests. 7b63ea7

I noticed the code to retrieve screenshots from Pouch was gone. Will the old index keep working as expected?

Do you remember the weird way it was implemented before where the images were fetched from Pouch in the UI layer?
This is all encapsulated behind the index interface now, and the search result objects include optional screenshot and favIcon
string properties which are just the image data URIs (displayed directly). In the old index, this is now in the map-search-to-pouch stage
(same ops as before, just not in the UI), while in the new index, it just grabs the Blobs stored with Pages in Dexie, and serializes to data URI.

ShishKabab · 2018-03-09T09:29:45Z

Looks very cool John :)

PouchDB attachments

Ah, missed that the code was there :) Have seen the code in the mapping function now, so looks good. Very happy to see that code get out of the UI layer!

Test data

I'd say that most of the time you expect something you have put in, so it's in the data file anyway. For expected stuff that you didn't put in I'd say to leave those things either in the tests themselves or in the in the test file. Mentally it makes a the most sense for me like that.

blackforestboi · 2018-03-09T09:40:43Z

good find! I had totally forgotten about implementing domains filter extraction from queries

:) Ok. seems to be the case for #tags filter too.

blackforestboi · 2018-03-14T13:05:38Z

Found another search weirdness:

How to reproduce:

Search for a term1 that is not indexed yet > get no results (expected) (my example: hammer)
Search for term1 and term2 which is in a title > get a couple of results, but only the title matches (my example: hammer time)
Search only for term2 > get all results of that term, including title matches. (my example: time)

blackforestboi · 2018-03-14T18:50:15Z

Just tested the current version on Chromium and it didnt start the downloads showing the following error:

poltak · 2018-03-15T02:47:50Z

@oliversauter RE that error: are you sure you weren't on dev/dexie-import branch? This is a known build error @ShishKabab brought up with me yesterday, however it looks like a commit has been pushed to that branch to fix it now. If you're sure it was this branch (dev/dexie-search-index), and it's still happening, let me know. Building and installing ok here

Bit confused with the search bug you reported. step 2 seems to contradict step 1: hammer has results in step 2, but in step 1, it's yet to be indexed - is there a missing in-between step? What are the expected and actual results you have for each step? I can try writing a test to cover this case with some more specifics and see what needs to be fixed.

blackforestboi · 2018-03-15T07:42:20Z

@poltak
Re Error: Yeah sure I am on the right branch.

Building and installing works, it happens when starting the import process.

RE: bug:
since now i had to delete this github issue page with 'hammer' in it and then tried the same saerch again, it didnt show any results anymore for 'hammer time'. Only for 'time'.
http://recordit.co/OQ44bXaby5

Also happens to me on complete reinstall and using 'rhammer time' as the word ('rhammer' is on purpose.)

ShishKabab · 2018-03-15T12:54:23Z

This is a known build error @ShishKabab brought up with me yesterday

That one actually was Symbol.asyncIterator not being defined ;)

poltak · 2018-03-16T02:44:15Z

That one actually was Symbol.asyncIterator not being defined ;)

Yeah I was wrong. This one is saying a missing Symbol.iterator prop on an object, which is used by standard for...of loops to iterate over objects. So here some for...of is trying to iterate an un-iterable object. This one has me confused though. I have tried on Chrome 65 (stable), 66, and 67, and also tried corresponding Chromium builds just in case, but cannot get this one to reproduce. Also gone through all the uses of for...of, including the one in that error message, but cannot see any issues.
I presume this is something with the changes in the babel setup that were needed in this branch, as there haven't been any other related changes - should be fairly simple to look into once I am able to reproduce it.
@oliversauter what is the Chrome/ium version that this is happening for you on?
@ShishKabab does this happen for you? If so, which browser

RE search bug from @oliversauter:
Including an unindexed term (that isn't a stop/filtered-out word) should return 0 results regardless of any other terms in the query, going by my understanding of search. Let me know what is your expected behaviour here and I'll write a test and look to see how we can change it

blackforestboi · 2018-03-18T22:00:17Z

should return 0 results regardless of any other terms in the query,

Yeah at least for now, we should stay with that behaviour.

blackforestboi · 2018-03-18T22:06:23Z

what is the Chrome/ium version that this is happening for you on?

On this branch, i dont get the error: https://github.com/digi0ps/Memex/tree/search-injection/src

This is my version: https://github.com/digi0ps/Memex/tree/search-injection/src

poltak · 2018-03-19T02:02:08Z

@oliversauter These are completely different branches. If reproducible for you on a particular Chrome/ium version, send me that number and I'll see if I can get it to reproduce

blackforestboi · 2018-03-19T09:29:11Z

These are completely different branches.

Yeah aware of that, just giving reference to where it happens and where not, so the error might be easier to track.

Oh I thought I posted my Chromium version last night in a separate comment. Didn't send it off.
Version 62.0.3202.94 (Developer Build) (64-bit)

- we haven't updated in about a year; lots of fixes made since then apparently, including some thigns we've seen like disconnected port errors

- good example is overview: everytime you change the search, URL state updates and tries to update a visit

Skip imports persisting any periods in history with no data - history extracted in week periods - if one week has no history (or all history is deduped), an empty chunk would be stored - this may or may not be an issue for the reading end during imports progress when an empty chunk is retrieved Update import-item-creation generators to be consistent Update import-state's _getChunk to be async generator - I don't think it really make a diff, but seems much more natural to use generators all the way down here, rather than yielding resolved Promises Ensure any import items finished after pause sent to UI - they will be finished anyway, but the user will have no idea about them as the message never got sent - this way the message will always send, even if it's after the time the user presses pause btn

- bit of an overview of purpose, architecture, and responsibilities of each of the parts Update import readme diagram - added in new web ext API abstractions - removed "import" prefix in front of everything - update readme text later

@type

Fix up incorrect class prop JSDoc typings - class props use @type rather than @Property for some reason Refactor-out cache-logic from import-state class - cache logic handles messy allocating into chunks and storing in local storage - now the import-state just concerned with being an interface to fetch, and remove import items - also removed import-item-creation relience on import-state Remove import-conn-handler dep on import-state - it only needs to interface with it to get estimates; afford this through progress-manager Remove import-state's rehyrdation of allowedTypes state - i really don't think this does anything; it will be init'd from progress-manager Abstract web ext API data sources behind class - now `import-item-creation` accesses this interface rather than directly accesses the APIs - still some cleanup and confirmation to do with this Remove old ext migration from imports - lots of code removal; should have gotten all, but may be missing things Simplify import data source abstraction - provides interface between web ext API data sources for `import-item-creation` Simplify import item creator - move all data stuff to DataSources class (root ID BM tree still generated) - DRY out the iteration code now that the DataSoures provides same API for bookmarks and history Improve way ItemCreator is passed around ImportState - prev was creating a new instance whenever cache was empty - now you can pass in a custom instance to ImportStateManager and it will tell that instace to reinit it's data if necessary - also fix regression with counts doubling Revert "Ensure any import items finished after pause sent to UI" This reverts commit 316b496. - it would mean that XHR errors that throw on being aborted would be flagged as error'd items, so removed from the import item pool - doesn't seem to be a way to differentiate the error on the XHR's error event Ensure import state estimate counts init'd from cache # Conflicts: # src/background.js

Write mocks for imports cache and item-creators - will let me test import-state without relying on those classes (item-creator to be tested separately; cache is an interface with local storage) Update import class inputs for ease of re-use - all constructors now accept objects - most have defaults for the general flow, but tests can override certain things Add mocks for hist/bms, blacklist + exist.keys lookups - DataSource class can be mocked as a whole for hist/bms (instead of ItemCreator) - blacklist is a separate thing, so just mock to return false for every item - logability check also mocked in same way (return true for all) Write tests for estimate counts derivation (w/wo cache) - painful but finally getting it working Write tests for import item iteration and marking off - iteration used as main progress thing -> keep requesting new chunks to go through - marking off happens after each item is processed; they are removed from their chunk Update tsconfig lib to es2017 - we're making use of es2017 stuff in rest of codebase and compiling with babel Set up URL list for history/bookmarks test sources - found this project which makes it simple: https://github.com/citizenlab/test-lists - now tests have history size of ~280; some fun URLs in there too - updated the removal test to be more thorough; remove lots of items from each chunk and calculate the expected changes Add mock for URL normalization module - messes with lots of tests - replaced extra input param on ItemCreator that was a poor work-around Write test for error'd import items handling - quite a big test, marks the first item of each incoming chunk as an error, making sure that those errord items don't show up in future item reads, unless errors specified - implemented error flagging on the mock cache - also unified the initial state calc needed for each test (put in `beforeEach`) Rename all classes, update readme + fix TS type errors - classes all renamed according to new README diagram - README text updated accordingly (and to include new `Cache` and `DataSources` classes) - some TS type issues in the state-manager tests fixed - still some weird TS-related issue at runtime with `checkWithBlacklist` (seems to still work tho; to look into) Diversify import item derivation test input sources - rewrote the existing tests inside a function to be run with custom data inputs - found 1000+ URL list to use as a additional source (more fun in there) - run all import tests on more diverse combiniations of bm/hist input sets

- will write tests for this part next; this makes a lot of external stuff afford being mocked - also removes coupling with local storage (moved to conn handler; doesn't belong here) Write mock for import ItemProcessor class - this is the main XHR + send request to search-index part of imports - should be N of these at any one time, where N is concurrency - trying to figure out how to replace the `process` method with setTimeout to simulate some time Set up imports progress manager tests - confused with getting the fake setTimeout working with jest - `runAllTimers` doesn't seem to be working as explained here: https://facebook.github.io/jest/docs/en/timer-mocks.html - the following expects fail as those mock cbs never called - obviously I'm doing something wrong; need to look into more Immediately resolve import processor mock - fake timers were not working properly for whatever reason - now Processor.process immediately resolves instead, which allows us to test the observers being called, but now still hard to test in-progress state Add checking of concurrent processors to progress tests - also updated the mock cache so it's not fixed at chunk size of 10 (means concurrency > 10 does nothing in tests) Write tests for interrupting imports progress - again not as nice as the timers aren't working - basically starting off the importer, then immediately stopping and making sure all the concurrent processors are set to cancelled and none of the observer cbs are called Write imports progress restart after interruption tests Fix bug with tsc transpiling asyncawait instead of babel - es6 target was telling tsc to handle our asyncawait code; we've already got babel to do that though and they don't seem to be happy to work together sometimes - tsc now targets ESNext, so it will ignore a lot more ES features and leave to babel Set up separate tsconfig for jest - we're not using Babel after tsc for ts test modules - hence we need to target lower for jest to run it, as babel isn't going to transpile stuff Add skipped big (4000+) imports progress test - takes a while to run, so skipping it - maybe look at optional tests or something (skip will never run until test updated)

- make sure it works for both counts and actual progressing through the created items (no URL gets iterated twice)

- should be Set of URL strings rather than encoded page IDs - put a simple decode call in and removed old unused `trimPrefix` arg

- after getting the tags/domains filtered URLs (fast) it was doing a range lookup over visits index and only keeping those from prev filter (slow; worst case is linear to visit index size) - no need to do range lookup at all; just get the latest events for those already filtered URLs and paginate! - for terms search, it already performs in log time

- URLs are not unique in visits index (compound PK on time + URL, as each page can have many visits) - this meant the existing `.eachPrimaryKey` iteration on the query result could be quite long if many visits to single pages (my memex page had hundreds of visits) - seems a hell of a lot faster just doing N parallel lookups just on URL and getting the first one that passes criteria (within time filters)

- untracked tabs aren't supported yet, but in FF will throw a bunch of "Tab not found" errors; expected but should be caught

- search response shape changed slightly in new index work to be less "Pouch-like"

- previously forced this to make Dexie work nicely with FF - it seems like it messes with a lot of other stuff in FF though, like content_script can no longer access local storage without the Promise rejecting for "unknown reason" - taking it out, indexing still seems to work fine (FF 59) and all the promise-related bugs go away

- minor cleanup of some derived imports UI state too (imports UI really needs a work-over; it's gotten quite bad) - bg script logic to handle forced recalc

- still left-over stuff for old ext items-specific state - start import btn disable state derivation simplified

- this module handles differentiating the listeners of different notifications via a tiny state (only a single event shared between all notifs; switch on IDs) - refactored all existing notif creation calls to use this module

- just suggestion text for now; will change - links to main knowledge base page until we write article/blog post

- rewrite new notifications module in TS to take advantage of it; seems to work well

- the reverse can be replaced by simply setting the redux reducer to prepend the details rows whenenver a new item is finished

- now it should show up on any search that has at least: any terms, any domains or any tags filters defined - also made show not to show it when loading (doesn't flash in and out now) - removed the old `getTotalCount` search param; it's always 'true'

- TODO url boost - this was working slightly differently between the implementations (rounding up vs down) - now fixed

- similar to prev. added boosted title terms search

blackforestboi · 2018-04-11T17:32:16Z

Ran a few tests and it all works pretty well :)

A couple of things and then I think we can merge it. (one might be a biit bigger) List is in order of priority

The notification at the end to reenable the fishing warnings (as already talked about)

the first warning should say: "Your browser may stop imports suddenly. Find out why and how to solve it"
also make that notification stay as long as the user does not actively remove it.
The lst notification should say: "In case you disabled your fishing filter, don't forget to re-enable it"
This is the link to the knowledge base article: https://worldbrain.helprace.com/i49-prevent-your-imports-from-stopping-midway

in the popup when i blacklist stuff, the red icon is gone (nothing to see there)

the undo button goes to the "settings" it should go to "blacklists"
when there is nothing more to download and i select "include previously failed urls" I cant start the process.
I think we need to find a better solution with the favicons for several reasons:

they are constantly refetched when the same domain urls are loaded (even from disk, it takes sometimes 200-300ms), i assume we can save a lot of time there.
-> even though they dont take much space, we might abstract those away and have favicons as a separate store belonging to a certain domain and when the results are loaded we load the favicons separately. What are ideas to mitigate this?
If it is an issue to get it done quickly, how can we prevent it being a problem if we do change that later?

blackforestboi · 2018-04-11T19:37:09Z

And another thing that is still not working properly is the counters.
They always show something different.
I was cancelling the download with 109XX of 14XXX downloaded items.
When I got back to the overview it still showed 14830 items. Then i pressed reload. then it showed me 8XXX and then i pressed reload again and it showed me 6XXX.

poltak force-pushed the dev/dexie-search-index branch from c71ddeb to 1071408 Compare March 2, 2018 01:08

poltak force-pushed the feature/swappable-search-index branch from 89c4f37 to 712fd0a Compare March 2, 2018 06:47

poltak force-pushed the dev/dexie-search-index branch from b62f514 to a51d09c Compare March 2, 2018 06:51

poltak mentioned this pull request Mar 4, 2018

Browser's incompatibility between native Promise and indexedDB dexie/Dexie.js#317

Closed

poltak added 18 commits April 6, 2018 13:01

Update webextension-polyfill

023d142

- we haven't updated in about a year; lots of fixes made since then apparently, including some thigns we've seen like disconnected port errors

Fix attempted visit logging of non-loggable pages

0446d53

- good example is overview: everytime you change the search, URL state updates and tries to update a visit

Add high level overview docs to imports feature

6b27cad

- bit of an overview of purpose, architecture, and responsibilities of each of the parts Update import readme diagram - added in new web ext API abstractions - removed "import" prefix in front of everything - update readme text later

Add tests to confirm deduped import inputs

d342f91

- make sure it works for both counts and actual progressing through the created items (no URL gets iterated twice)

Fix old index grabExistingKeys method return type

230ed71

- should be Set of URL strings rather than encoded page IDs - put a simple decode call in and removed old unused `trimPrefix` arg

Silence untracked tab errors

93ec918

- untracked tabs aren't supported yet, but in FF will throw a bunch of "Tab not found" errors; expected but should be caught

Update search injection results shape

3e2b765

- search response shape changed slightly in new index work to be less "Pouch-like"

Add a recalc button to imports UI

76bd728

- minor cleanup of some derived imports UI state too (imports UI really needs a work-over; it's gotten quite bad) - bg script logic to handle forced recalc

Cleanup import UI state derivations

eee4629

- still left-over stuff for old ext items-specific state - start import btn disable state derivation simplified

Move notifications API interaction to single module

74d50d0

- this module handles differentiating the listeners of different notifications via a tiny state (only a single event shared between all notifs; switch on IDs) - refactored all existing notif creation calls to use this module

Add notif on import start

f976ca4

- just suggestion text for now; will change - links to main knowledge base page until we write article/blog post

poltak force-pushed the dev/dexie-search-index branch from 72eed15 to f976ca4 Compare April 6, 2018 06:01

poltak added 5 commits April 6, 2018 13:51

Bring in web-ext polyfill support for TS

e3632b3

- rewrite new notifications module in TS to take advantage of it; seems to work well

Remove array reverse in import UI state derivations

528c48e

- the reverse can be replaced by simply setting the redux reducer to prepend the details rows whenenver a new item is finished

Add search test for boosted title scores

c97b2b6

- TODO url boost - this was working slightly differently between the implementations (rounding up vs down) - now fixed

Add boosted URL terms search test

af6a724

- similar to prev. added boosted title terms search

poltak force-pushed the dev/dexie-search-index branch from 4185616 to af6a724 Compare April 11, 2018 07:30

poltak merged commit 34d6014 into master Apr 11, 2018

poltak mentioned this pull request Apr 12, 2018

Misc fixes for reported issues #364

Merged

poltak deleted the dev/dexie-search-index branch May 8, 2018 07:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Dexie-based search index implementation #322

[WIP] Dexie-based search index implementation #322

poltak commented Feb 28, 2018 •

edited

Loading

blackforestboi commented Feb 28, 2018

blackforestboi commented Feb 28, 2018

poltak commented Feb 28, 2018 via email

blackforestboi commented Feb 28, 2018

poltak commented Mar 1, 2018

blackforestboi commented Mar 1, 2018

poltak commented Mar 1, 2018

poltak commented Mar 2, 2018

blackforestboi commented Mar 2, 2018 •

edited

Loading

blackforestboi commented Mar 3, 2018

poltak commented Mar 4, 2018

ShishKabab commented Mar 8, 2018

blackforestboi commented Mar 8, 2018

poltak commented Mar 9, 2018

ShishKabab commented Mar 9, 2018

blackforestboi commented Mar 9, 2018

blackforestboi commented Mar 14, 2018 •

edited

Loading

blackforestboi commented Mar 14, 2018

poltak commented Mar 15, 2018

blackforestboi commented Mar 15, 2018

ShishKabab commented Mar 15, 2018

poltak commented Mar 16, 2018 •

edited

Loading

blackforestboi commented Mar 18, 2018

blackforestboi commented Mar 18, 2018

poltak commented Mar 19, 2018

blackforestboi commented Mar 19, 2018

blackforestboi commented Apr 11, 2018 •

edited

Loading

blackforestboi commented Apr 11, 2018

[WIP] Dexie-based search index implementation #322

[WIP] Dexie-based search index implementation #322

Conversation

poltak commented Feb 28, 2018 • edited Loading

Data model

Adding stuff

Deleting stuff

Tags specific

Bookmark specific

Utilities

Imports

Search

Known bugs

Misc TODOs

blackforestboi commented Feb 28, 2018

How to reproduce:

blackforestboi commented Feb 28, 2018

poltak commented Feb 28, 2018 via email

blackforestboi commented Feb 28, 2018

poltak commented Mar 1, 2018

blackforestboi commented Mar 1, 2018

poltak commented Mar 1, 2018

poltak commented Mar 2, 2018

blackforestboi commented Mar 2, 2018 • edited Loading

blackforestboi commented Mar 3, 2018

poltak commented Mar 4, 2018

ShishKabab commented Mar 8, 2018

blackforestboi commented Mar 8, 2018

poltak commented Mar 9, 2018

ShishKabab commented Mar 9, 2018

blackforestboi commented Mar 9, 2018

blackforestboi commented Mar 14, 2018 • edited Loading

How to reproduce:

blackforestboi commented Mar 14, 2018

poltak commented Mar 15, 2018

blackforestboi commented Mar 15, 2018

ShishKabab commented Mar 15, 2018

poltak commented Mar 16, 2018 • edited Loading

blackforestboi commented Mar 18, 2018

blackforestboi commented Mar 18, 2018

poltak commented Mar 19, 2018

blackforestboi commented Mar 19, 2018

blackforestboi commented Apr 11, 2018 • edited Loading

blackforestboi commented Apr 11, 2018

poltak commented Feb 28, 2018 •

edited

Loading

blackforestboi commented Mar 2, 2018 •

edited

Loading

blackforestboi commented Mar 14, 2018 •

edited

Loading

poltak commented Mar 16, 2018 •

edited

Loading

blackforestboi commented Apr 11, 2018 •

edited

Loading