simple-web-crawlers

This is a sample project to explain about 'How to write web crawlers' in my technical documents.

This repository contains simple and small sample tools. They can work, but is not for a general use. Most of them have a single function, a single target or a single purpose, because you can understand them easily.

Please feel free to copy and modify all of them.

Sample tools

list-ibm-patterns

This is a sample code to make a contents list from a web page which has 'paged' function in the list module. And this tool's target is only;

IBM Developer: Code Patterns

It's very simple, and is maybe good startpoint for beginners to learn the code of web crawlers in Node.JS environment.

list-ibmjp-patterns

This is almost same as 'list-ibm-patterns' tool. But the target site is changed to:

IBM Developer Japan: Code Patterns

Optional tools

nedb_open

This is a simple tool to count # of items in a specific nedb file. You can use this to compact a nedb file.

nedb2json

This is a (a little bit) useful tool to convert a nedb file to a json file. The following is a sample:

node nedb2json list-ibm-patterns.nedb > list-ibm-patterns.json
node nedb2json list-ibmjp-patterns.nedb > list-ibmjp-patterns.json

I upload these *.json files to this GitHUB repository as a sample data. But maybe, it becomes old, and you should run each tools by yourself to get the newest data.

nedb2csv

This is a (a little bit) useful tool to convert a nedb file to a csv file. The following is a sample:

node nedb2jcsv list-ibm-patterns.nedb > list-ibm-patterns.csv
node nedb2jcsv list-ibmjp-patterns.nedb > list-ibmjp-patterns.csv

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
diff-patterns.js		diff-patterns.js
list-ibm-patterns.js		list-ibm-patterns.js
list-ibm-patterns.json		list-ibm-patterns.json
list-ibmjp-dw.js		list-ibmjp-dw.js
list-ibmjp-dw.json		list-ibmjp-dw.json
list-ibmjp-patterns.js		list-ibmjp-patterns.js
list-ibmjp-patterns.json		list-ibmjp-patterns.json
nedb2csv.js		nedb2csv.js
nedb2json.js		nedb2json.js
nedb_open.js		nedb_open.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

simple-web-crawlers

Sample tools

list-ibm-patterns

list-ibmjp-patterns

Optional tools

nedb_open

nedb2json

nedb2csv

About

Releases

Packages

Languages

yamachan/simple-web-crawlers

Folders and files

Latest commit

History

Repository files navigation

simple-web-crawlers

Sample tools

list-ibm-patterns

list-ibmjp-patterns

Optional tools

nedb_open

nedb2json

nedb2csv

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages