-
Notifications
You must be signed in to change notification settings - Fork 64
How To Write a Simple Miner
As a first simple example of writing MineMeld nodes, let's write code a simple Miner.
The idea behind the Miner that will be used in this example comes from a script of Carlito Vleia, a Systems Engineer at Palo Alto Networks. The idea is pretty simple: periodically retrieve the list of videos published in a specific YouTube channel and translate the entries in a set of indicators of type URL for the External Dynamic List feature of Palo Alto Networks PAN-OS.
Before starting to code, make sure you have read about:
- MineMeld Architecture
- MineMeld Contribution guidelines
- MineMeld Engine Architecture
- MineMeld launcher
After all that reading, spin up a full MineMeld devel environment following the developer's guide.
The proper way of retrieving the list of videos from a YouTube channel is using the YouTube API, as suggested by Luke Amery. But it would be a bit too complex for this tutorial. Instead we will use the quick and dirty way suggested by Carlito:
- Retrieve the page
https://www.youtube.com/user/<channel_name>/videos
- Extract all the
data-context-item
attributes values inside the page - Generate a URL for each of them with the format
www.youtube.com/watch?v=<item>
Excerpt from the videos
HTML page, note the div
with the
data-context-item-id
attribute:
[...]
<li class="channels-content-item yt-shelf-grid-item">
<div class="yt-lockup clearfix yt-lockup-video yt-lockup-grid vve-check" data-context-item-id="FAKEVIDEOID" data-visibility-tracking="....">
...
</div>
</li>
[...]
Let's start with writing the code for the new Miner.
The new Miner should be able to:
- Read the YouTube channel name from the configuration
- Periodically poll the URL of the channel
- Extract videos IDs from the result
- Create an indicator of type URL for each video
- Generate an UPDATE message for new videos
- Generate a WITHDRAW message for removed videos
- Answer to RPC requests coming from master and API
To make our life a bit easier, logic for 5, 6, 7 and most of 1 and 2 is already
implemented in minemeld.ft.basepoller.BasePollerFT
- this is the superclass
used by most of the polling Miners.
Create a file in /opt/minemeld/engine/core/minemeld/ft
directory, and call it
ytexample.py
. Copy & paste the following code into it:
from __future__ import absolute_import
import logging
import requests
import bs4 # we use bs4 to parse the HTML page
from . import basepoller
LOG = logging.getLogger(__name__)
class YTExample(basepoller.BasePollerFT):
def configure(self):
pass
def _process_item(self, item):
# called on each item returned by _build_iterator
# it should return a list of (indicator, value) pairs
pass
def _build_iterator(self, now):
# called at every polling interval
# here you should retrieve and return the list of items
pass
Our YouTube Miner should read the channel name from the config. This
can be done inside the configure
method. When called, the configuration
is already stored as a dictionary in the self.config
attribute of
the instance.
We also want to read from the config a polling timeout and a flag to control HTTPS certificate verification (default should be True).
The configure
method of your class should look like:
def configure(self):
super(YTExample, self).configure()
self.polling_timeout = self.config.get('polling_timeout', 20)
self.verify_cert = self.config.get('verify_cert', True)
self.channel_name = self.config.get('channel_name', None)
if self.channel_name is None:
raise ValueError('%s - channel name is required' % self.name)
self.url = 'https://www.youtube.com/user/{}/videos'.format(
self.channel_name
)
At every polling interval, the _build_iterator
method is called with
current time since epoch in millisec. The return should be an iterator yielding
a list of items. Each item is then translated to a list of indicators using
the _process_item
method.
def _build_iterator(self, now):
# builds the request and retrieves the page
rkwargs = dict(
stream=False,
verify=self.verify_cert,
timeout=self.polling_timeout
)
r = requests.get(
self.url,
**rkwargs
)
try:
r.raise_for_status()
except:
LOG.debug('%s - exception in request: %s %s',
self.name, r.status_code, r.content)
raise
# parse the page
html_soup = bs4.BeautifulSoup(r.content, "lxml")
result = html_soup.find_all(
'div',
class_='yt-lockup-video',
attrs={
'data-context-item-id': True
}
)
return result
The method _build_iterator
returns a list of bs4.element.Tag
objects.
Each object is then passed by the base class to the _process_item
method.
This method is responsible for translating each object in a list of indicators.
In our case the _process_item
method creates an indicator of type URL for
each object.
def _process_item(self, item):
video_id = item.attrs.get('data-context-item-id', None)
if video_id is None:
LOG.error('%s - no data-context-item-id attribute', self.name)
return []
indicator = 'www.youtube.com/watch?v={}'.format(video_id)
value = {
'type': 'URL',
'confidence': 100
}
return [[indicator, value]]
To test the new Miner create replace the content of the file /opt/minemeld/local/config/committed-config.yml
with the following:
nodes:
testYT:
class: minemeld.ft.ytexample.YTExample
inputs: []
output: true
config:
# set the channel name to EEVblog
channel_name: EEVblog
# source name used in the indicators
source_name: youtube.EEVblog
# age out of indicators
# disabled, removed when they disappear from the channel
age_out:
sudden_death: true
default: null
and then restart the minemeld
service:
$ sudo service minemeld stop
$ sudo service minemeld start
Check that all the services are running:
$ sudo -u minemeld /opt/minemeld/engine/current/bin/supervisorctl -c /opt/minemeld/local/supervisor/config/supervisord.conf status
minemeld-engine RUNNING pid 4526, uptime 0:05:39
minemeld-traced RUNNING pid 4527, uptime 0:05:39
minemeld-web RUNNING pid 4528, uptime 0:05:39
If something fails the first place to look is the engine log file,
/opt/minemeld/log/minemeld-engine.log
.
Congratulations ! Your new node is up and running !
To make it available to the CONFIG page on the UI you should now create a simple prototype for it.
Create the file /opt/minemeld/local/prototypes/ytexample.yml
with the
following content:
description: Test prototype library
prototypes:
YTEEVblog:
author: Test
description: Miner for videos of EEVblog
class: minemeld.ft.ytexample.YTExample
config:
channel_name: EEVblog
source_name: youtube.EEVblog
age_out:
sudden_death: true
default: null
Refresh the UI, and then go to CONFIG. Click the browse button and search for the new prototype. It should be there:
Now you can use the prototype to instantiate the Miner from the UI:
And you can also modify the prototype from the UI to support additional YouTube channels: