Extractor.js

Extract common information from a string.

About

Extractor.js is a small library that helps to extract common information like dates, times, emails or links from text. It also provides an easy way to add new patterns to extract custom things.

Patterns

Following patterns are incorporated in this library:

Date formats
Email formats
Hash tags
Web links
Mentions
Phone formats
Time formats
YouTube links

Example

Here is an example paragraph of text:

@friend I have sent you an email to your address name@web.com
on 3rd of June 2013 at 12:36pm about your web www.some-website.com.
Watch this youtu.be/5Jp9_sgJcN0 and then call me (123) 456 7890.
#video #website

Following information/values will be extracted:

{
  "dates": ["3rd of June 2013"],
  "emails": ["name@web.com"],
  "hashtags": ["video", "website"],
  "links": ["www.some-website.com", "youtu.be/5Jp9_sgJcN0"],
  "mentions": ["friend"],
  "phones": ["(123) 456 7890"],
  "times": ["12:36pm"],
  "youtube":[
    {
      "id": "5Jp9_sgJcN0",
      "link": "youtu.be/5Jp9_sgJcN0",
      "thumb": "http://img.youtube.com/vi/5Jp9_sgJcN0/default.jpg",
      "thumbHQ": "http://img.youtube.com/vi/5Jp9_sgJcN0/hqdefault.jpg"
    }
  ]
}

How to use

There are two main ways how to use Extractor.js:

1. Pass a text string → receive an object with results

var results = Extractor('Lorem #ipsum text...');
// results.hashtags = ['ipsum']

The result is an object containing the structure mentioned above.

YouTube results

Besides id, link, thumb and thumbHQ values there is also a method called embed. This allows you to generate an embed code. You can customise the width and height of the <iframe>. Default dimensions are 560x315.

var yt = Extractor('Example youtu.be/5Jp9_sgJcN0 link.'),
    ytEmbed = yt.youtube[0].embed;

ytEmbed();
// <iframe width="560" height="315" src="//www.youtube.com/embed/5Jp9_sgJcN0" frameborder="0" allowfullscreen></iframe>

ytEmbed(640, 480);
// <iframe width="640" height="480" src="//www.youtube.com/embed/5Jp9_sgJcN0" frameborder="0" allowfullscreen></iframe>

2. Calling without arguments → receive pattern methods

var ex = Extractor();

ex.dates('Try 3rd of June 2013');
// ["3rd of June 2013"]

ex.emails('Try some@email.com');
// ["some@email.com"]

Method names match the names of the patterns/variables mentioned above.

Duplicates

By specifying a second argument Boolean you can remove duplicate values. Duplicates are left by default (true).

ex.mentions('Try @one @two and @one');
ex.mentions('Try @one @two and @one', true);
// ["one", "two", "one"]

ex.mentions('Try @one @two and @one', false);
// ["one", "two"]

Advanced usage

Options for Extractor()

You can pass additional settings when parsing a string directly.

var results = Extractor('Lorem ipsum...', {/* additional settings */});

filter

type: Array default: []

Returned object contains only results from patterns specified in the filter.

var dateAndTime = Extractor('Try 1st Jun at 2:00 pm', {
        filter: ['dates', 'times']
    });
// {"dates": ["1st Jun"], "times": ["2:00 pm"]}

without

type: Array default: []

Returned object contains all results except the patterns specified in the array.

var withoutExample = Extractor('Try 1st Jun at 2:00 pm', {
        without: ['emails', 'links', 'mentions', 'times', 'youtube']
    });
// {"dates": ["1st Jun"], "hashtags": [], "phones": []}

duplicates

type: Boolean default: true

Remove duplicate values from the results.

var uniqueResults = Extractor('Try @one @two and @one', {
        duplicates: true
    }).mentions;
// ["one", "two"]

Adding new pattern - `Extractor.addPattern()`

You can add new pattern as follows.

// Adding "test" pattern which will match word "test"
// and as a result adds "1" to the end of the string.
Extractor.addPattern({
    name: 'test',
    regexp: /\btest\b/gim,
    trim: true,
    postProcessor: function (value) {
        return value + '1';
    }
});

Pattern will be automatically used across the whole library so next time you will use Extractor(...) you will get results for your pattern as well. And also you can use just the method if you call Extractor() without any arguments.

Extractor().test('test and test');
// ["test1", "test1"]

Extractor('test and #other', {
    filter: ['test', 'hashtags', 'times']
});
// {"hashtags": ["other"], "times": [], "test": ["test1"]}

Here is a list of configuration options:

name

type: String default: null

Name of new pattern. Can't use existing pattern name and will accept only lowercase and uppercase letters.

name: 'test',

regexp

type: RegExp default: null

Regular expression that defines behaviour of your pattern - what you want to match.

regexp: /\btest\b/gim,

trim `[optional]`

type: Boolean default: true

Should the white space around the result value be stripped out.

trim: true,

postProcessor `[optional]`

type: Function default: null

You can specify post-processing method which will amend the result value as desired.

// Example - just add '1' after a result.
postProcessor: function (value) {
        return value + '1';
    }

Development

Dev environment is set up using Grunt, tests are written in Jasmine.

Licensed MIT.

Grunt

Requires Grunt ~0.4.0.

If you haven't used Grunt before, be sure to check out the Getting Started guide, as it explains how to create a Gruntfile as well as install and use Grunt plugins.

Here is a list of some notable tasks:

default

Runs dev tests and start watch source files (see "watch" task).

build

Runs all the tests and builds the production files.

test-dev

Runs JSHint and tests for source files only.

test-build

Runs tests on production/build files.

watch

Watches source files and if any change detected - runs "test-dev" task.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
build		build
example		example
src		src
test		test
.gitignore		.gitignore
.jshintrc		.jshintrc
Gruntfile.js		Gruntfile.js
LICENSE		LICENSE
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extractor.js

About

Patterns

Example

How to use

1. Pass a text string → receive an object with results

YouTube results

2. Calling without arguments → receive pattern methods

Duplicates

Advanced usage

Options for Extractor()

filter

without

duplicates

Adding new pattern - `Extractor.addPattern()`

name

regexp

trim `[optional]`

postProcessor `[optional]`

Development

Grunt

default

build

test-dev

test-build

watch

About

Releases 1

Packages

Languages

License

msrch/extractorjs

Folders and files

Latest commit

History

Repository files navigation

Extractor.js

About

Patterns

Example

How to use

1. Pass a text string → receive an object with results

YouTube results

2. Calling without arguments → receive pattern methods

Duplicates

Advanced usage

Options for Extractor()

filter

without

duplicates

Adding new pattern - Extractor.addPattern()

name

regexp

trim [optional]

postProcessor [optional]

Development

Grunt

default

build

test-dev

test-build

watch

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Adding new pattern - `Extractor.addPattern()`

trim `[optional]`

postProcessor `[optional]`

Packages