Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to turn on and use rtree spatial? How to use full text search? #390

Open
hoogw opened this issue May 2, 2020 · 9 comments
Open

How to turn on and use rtree spatial? How to use full text search? #390

hoogw opened this issue May 2, 2020 · 9 comments
Labels

Comments

@hoogw
Copy link

hoogw commented May 2, 2020

I know this have rtree already, but I don't know how to use it.
Can you write some usage in readme?

How to turn on and use rtree , do spatial query, like intersect, within? How to use full text search?

spatial query and full text search are 2 fundermental function I need.

@frafra
Copy link

frafra commented May 4, 2020

Look for "load extension", "FTS3", "spatialite" in the existing issues:

#68
#89
#316 (comment)
#342 (comment)

@kaizhu256
Copy link
Member

don't know about rtree, but the following full-text-search example works fine in the online-demo:

-- Create an FTS table
DROP TABLE IF EXISTS pages;
CREATE VIRTUAL TABLE pages USING fts4(title, body);

-- Insert a row with a specific docid value.
INSERT INTO pages(docid, title, body) VALUES(53, 'Home Page', 'SQLite is a software...');

-- Insert a row and allow FTS to assign a docid value using the same algorithm as
-- SQLite uses for ordinary tables. In this case the new docid will be 54,
-- one greater than the largest docid currently present in the table.
INSERT INTO pages(title, body) VALUES('Download', 'All SQLite source code...');
INSERT INTO pages(title, body) VALUES('Upload', 'Upload SQLite src code...');

-- Example full-text-search queries
SELECT * FROM pages WHERE pages MATCH 'sqlite';
SELECT * FROM pages WHERE pages MATCH 's* code';

image

@hoogw
Copy link
Author

hoogw commented May 5, 2020

Look for "load extension", "FTS3", "spatialite" in the existing issues:

#68
#89
#316 (comment)
#342 (comment)

I looked all, find impossible, no easy way, unless, you know how to compile raw c include spatial.
For most of people, this is impossible, author or member must knows how to do it.

Can any of you provide a browser distribution with spatial extension included?
That will be very helpful for many people.

@hoogw
Copy link
Author

hoogw commented May 5, 2020

don't know about rtree, but the following full-text-search example works fine in the online-demo:

-- Create an FTS table
DROP TABLE IF EXISTS pages;
CREATE VIRTUAL TABLE pages USING fts4(title, body);

-- Insert a row with a specific docid value.
INSERT INTO pages(docid, title, body) VALUES(53, 'Home Page', 'SQLite is a software...');

-- Insert a row and allow FTS to assign a docid value using the same algorithm as
-- SQLite uses for ordinary tables. In this case the new docid will be 54,
-- one greater than the largest docid currently present in the table.
INSERT INTO pages(title, body) VALUES('Download', 'All SQLite source code...');
INSERT INTO pages(title, body) VALUES('Upload', 'Upload SQLite src code...');

-- Example full-text-search queries
SELECT * FROM pages WHERE pages MATCH 'sqlite';
SELECT * FROM pages WHERE pages MATCH 's* code';

image

Very good. full text search use virtual table in memory, not store on physical hard disk, correct?
Does virtual table already create index automatically when you insert?
You do not need manually create index?
Everytime a new row insert will cause index update?

@hoogw
Copy link
Author

hoogw commented May 5, 2020

https://www.sqlite.org/fts3.html

I find some answer to my own question:

FTS3,4 means full text search, + version 3, 4

On c version of sqlite, virtual table no in ram(memory), still store in disk.
But what about sql.js, js version of sqlite, virtual table store in disk or memory? (not sure, but likely in disk)

The FTS3 table consumes around 2006 MB on disk compared to just 1453 MB for the ordinary table. Using the same hardware configuration used to perform the SELECT queries above, the FTS3 table took just under 31 minutes to populate, versus 25 for the ordinary table.

Does it auto update index everytime insert a row?

Yes.

To support full-text queries, FTS maintains an inverted index that maps from each unique term or word that appears in the dataset to the locations in which it appears within the table contents. For the curious, a complete description of the data structure used to store this index within the database file appears below. A feature of this data structure is that at any time the database may contain not one index b-tree, but several different b-trees that are incrementally merged as rows are inserted, updated and deleted. This technique improves performance when writing to an FTS table, but causes some overhead for full-text queries that use the index. Evaluating the special "optimize" command, an SQL statement of the form "INSERT INTO () VALUES('optimize')", causes FTS to merge all existing index b-trees into a single large b-tree containing the entire index. This can be an expensive operation, but may speed up future queries.

For example, to optimize the full-text index for an FTS table named "docs":

-- Optimize the internal structure of FTS table "docs".
INSERT INTO docs(docs) VALUES('optimize');
The statement above may appear syntactically incorrect to some. Refer to the section describing the simple fts queries for an explanation.

There is another, deprecated, method for invoking the optimize operation using a SELECT statement. New code should use statements similar to the INSERT above to optimize FTS structures.

@kaizhu256
Copy link
Member

sql.js runs entirely in memory. i'm not sure its practical to load the 2gb enron dataset w/ sql.js. my experience with chrome-browser (windows version) is that it will frequently crash when loading databases >200mb into memory.

100mb (typically a table with <500k rows) is probably the practical limit for sql.js datasets. and by practical, i mean the UX-performance is near limit in terms of human-usability:

  1. loading a 100mb dataset w/ sql.js typically takes 20-30 seconds.

  2. the sql-queries i run on 100mb dataset typically take 5-60 seconds to execute.

@hoogw
Copy link
Author

hoogw commented May 5, 2020

sql.js runs entirely in memory. i'm not sure its practical to load the 2gb enron dataset w/ sql.js. my experience with chrome-browser (windows version) is that it will frequently crash when loading databases >200mb into memory.

100mb (typically a table with <500k rows) is probably the practical limit for sql.js datasets. and by practical, i mean the UX-performance is near limit in terms of human-usability:

  1. loading a 100mb dataset w/ sql.js typically takes 20-30 seconds.
  2. the sql-queries i run on 100mb dataset typically take 5-60 seconds to execute.

Very good, to clear these fact, that give me the idea how the performance is.

I use lunr.js for full text search, looking for alternative, due to

              1) lunr.js store everything in memory, I thought sqlite could store everything in disk, 

that will avoid "out of memory" crash.

              2)lunr.js full text search is very fast, 500k rows, it take less then 1 second( the reason is lunr.js only return doc id, not the full row.  I think you said sql.js take 5-60 sec slow is because it select full row, if only select id, should be faster.

               3) you said loading 500k row, takes 30 seconds,  lunr.js need 5 - 10 seconds for initial indexing. Maybe sql.js have more overhead? not sure. 

@hoogw
Copy link
Author

hoogw commented May 12, 2020

I have compare sql.js with lunr.js in terms of full text search.

Both use in memory storage,

small to medium size data, both are same fast,

but when large data, 150k json items will crash lunr.js, not sure why.
however, same dataset, 150k json items insert as 150k row into sqlite table, no crash, full text search still same fast, no delay.

sqlite wins here.

@kaizhu256
Copy link
Member

thx for feedback on real-world performance. that's always good-to-know : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants