Overview

This is a database of Internet places. Mostly domains. Sometimes other things. Think of it as Internet meta database. This repository contains link metadata: title, description, publish date, etc.

Acceptable link types

domains
repository links. For example https://github.com/rumca-js/Internet-Places-Database
user spaces. Might be youtube channel link: Linus Tech Tips YouTube Channel. Might be X/Twitter user account

Not acceptable link types

malware sites
porn, casino, gambling etc.
analytic domains that are used for user surveillance
IT infrastructure domains, CDN domains
link shorteners. somethingsomething.lnk.to is not something useful. Main domain lnk.to is acceptable though

Some zen rules:

Anything not obeying the law will be removed from lists
Internet operates in ... many countries, so there are many laws
If things are offensive, they do not have to be removed
I might suspect that a page is notorious, I may flag it with a tag, like "piracy", but it may not be true
If page content is obnoxious, it can, and possible should be demoted
I do not always follow these rules strictly

I do not have resources to verify all links

Links are captured from the Internet automatically
If any link is suspicious, and should be removed, please create an Issue in this repository
Use 'votes' to see credibility of domains
Be careful, Internet is a dangerous space. You should know what you are doing when using this list

Sources of data

Obtained by the Django-link-archive web crawler.

Sources:

Benefit - Security

Google Search is known to be susceptible to malvertising. Predatory web pages can "disguise" them as other pages. The displayed link in Google Search does not have to be the linked you will be transported to.

This local search does not require Internet to operate. Once downloaded - you can just search these meta information
This local search might be faster than your ISP, depending on drive, machine, etc
It may be more secure. You can verify domain, it's status, how long it operates before accessing the Internet

Alternative solutions

Files

The database is distributed as a set of JSON files. We do not want to store binary data, binary files. SQL files should be fine, but I am going with JSON files for now.

Note: If you have problems with git clone, you can try downloading repository as a zip file.

Each link contains a set of attributes, like:

title
description
page rating
date of creation
date of last seen
etc.

Page rating

Content ranking is established by the Django link archive project.

To have a good page rating, it is desireable to follow good standards:

Schema Validator
W3C Validator
Provide HTML meta information. More info in Open Graph Protocol
Provide valid title, which is concise, but not too short
Provide valid description, which is concise, but not too short
Provide valid publication date
Provide valid thumbnail, media image
Provide a valid HTML status code. No fancy redirects, JavaScript redirects
Provide RSS feed. Provide HTML meta information for it https://www.petefreitag.com/blog/rss-autodiscovery/
Provide search engine keywords tags

Your page, domain exist alongside thousands of other pages. Imagine your meta data have an impact on your recognition, and page ranking.

Remember: a good page is always ranked higher.

You may wonder, why am I writing about search engine "keywords" meta field, if Google does not need them. Well I don't like Google. If we want alternative solutions to exist, it should be possible to easily find your page from simpler search engines. Provide keywords field if you support open web.

Releases

Binary releases are provided in form of SQLite table. The tables will be similar, or exatcly same as in Django link archive project. Use SQL viewer to see what kind of data it contains. Table "entries" with fields such as "title", "description" etc.

This binary release can be used directly as-is in any project you like.

I plan to make binary releases will be generated for each quarter. In case of necessity on-demand releasese might occur.

You don't like it? Fork it!

I have my own opinions, with which you do not have to agree. Most of tags, votes are added manually. You can use this repository, as a starting point, to kick off your own project. Add your own tags. Create your own version of search engine. Good luck!

Notes

Not all domains have to be stored here. I think it would be best to have valuable domains. Certainly we do not want content farms. We do not need sites that do not contribute anything useful to the society, to the reader
The distinction is not that clear-cut, but more lenient rules apply toward personal sites
I am not that interested in marking substack, or medium as "personal" sites, as I do not feel that it should be tagged as such

Demo database

Might not be working. Used for development: https://renegat0x0.ddns.net/apps/places/.

Name		Name	Last commit message	Last commit date
Latest commit History 392 Commits
configuration		configuration
images		images
permanent		permanent
sources		sources
LICENSE		LICENSE
LICENSE_DATA		LICENSE_DATA
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Overview

Acceptable link types

Not acceptable link types

I do not have resources to verify all links

Sources of data

Benefit - Security

Alternative solutions

Files

Page rating

Tags

Releases

You don't like it? Fork it!

Notes

Demo database

Roadmap

About

Licenses found

Releases 3

Packages

License

Licenses found

rumca-js/Internet-Places-Database

Folders and files

Latest commit

History

Repository files navigation

Overview

Acceptable link types

Not acceptable link types

I do not have resources to verify all links

Sources of data

Benefit - Security

Alternative solutions

Files

Page rating

Tags

Releases

You don't like it? Fork it!

Notes

Demo database

Roadmap

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 3

Packages 0

Packages