Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support to create timeline from FireEye HX .mans triage #937

Closed
wants to merge 6 commits into from
Closed

Added support to create timeline from FireEye HX .mans triage #937

wants to merge 6 commits into from

Conversation

garanews
Copy link
Contributor

Added support to create timeline from FireEye HX .mans triage collections

…ions

Added support to create timeline from FireEye HX .mans triage collections
Pylint on Ubuntu Bionic (18.04) (Docker) with Python 3.6
@garanews
Copy link
Contributor Author

Alerts shown in Redline tool:

redline

Same alerts shown in Timesketch:

timesketch_mans

@jaegeral
Copy link
Collaborator

uh that is a great thing.

@garanews
Copy link
Contributor Author

uh that is a great thing.

Thanks, any idea or suggestion will be appreciated :)
Currently we're focusing in how to reduce time to processing and memory footprint:
With a typical mans triage 20M file (that unzipped are almost 400MB) time to process is ~5m :

[2019-07-26` 08:46:39,270: INFO/ForkPoolWorker-2] Index timeline [aDwYRm6oqW7cAfoYUySh5N] to index [8266462181ed4e5f814a457c0675c781] (source: mans)
[2019-07-26 08:51:48,556: INFO/ForkPoolWorker-2] Task timesketch.lib.tasks.run_mans[87940386-835c-45fd-9d5c-4b595881e4c3] succeeded in 309.29070999799296s: '8266462181ed4e5f814a457c0675c781'

Regarding memory, at the moment with multiprocessing the pandas dataframe is copied multiple times depending how many core you want to use.
For example, with 16 cores can allocate ~20GB RAM:
immagine

We're trying modin https://github.com/modin-project/modin but it seems that the time to processing everything will increase because it works on rows and not on columns.

I'm pretty sure that DASK https://dask.org/ will solve all my problems but don't know now many other users have it or would install it! ^___^

Copy link
Contributor

@kiddinn kiddinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments

timesketch/lib/tasks.py Show resolved Hide resolved
timesketch/lib/tasks.py Show resolved Hide resolved
timesketch/lib/tasks.py Show resolved Hide resolved
@garanews
Copy link
Contributor Author

We just followed the same path done for plaso files with psort. So don't know why you are asking to remove comments or to import directly. We tried to standardize code.
If you have the same questions for psort task and all agree to remove comments there and import instead call, well here we can do same.

@jaegeral jaegeral mentioned this pull request Aug 22, 2019
@jaegeral
Copy link
Collaborator

Ok so I spent quite some time, needed to start celery in a venv with python3.6, but after that I had some minor issue, which I addressed with @garanews already

cmd, stderr=subprocess.STDOUT, encoding='utf-8')
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.6/subprocess.py", line 729, in init
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'mans_to_es.py': 'mans_to_es.py'

Which I could fix by downloading the mans_to_es.py file directly from https://github.com/LDO-CERT/mans_to_es and put it into /usr/local/bin

Then uploading a mans file worked and the celery worker took some time to process it

Tested with various mans files, with smaller ones it worked. I need to test it with larger files, but at the moment it looks really good.

@jaegeral
Copy link
Collaborator

Maybe for reference, that is how it should look like in mans_to_es.log:

2019-08-22 13:21:53,706 Start parsing /tmp/aaaaaa. Push on IRTHRemote2_triage_2019-08-21 21:14:27Z index and aaaaaa timeline
2019-08-22 13:21:53,790 File extracted in /tmp/aaaaa
2019-08-22 13:21:53,790 Manifest.json parsed
2019-08-22 13:21:53,791 alert collected
2019-08-22 13:21:53,791 Filetype: ServiceItem - SKIPPED
2019-08-22 13:21:53,791 Filetype: PersistenceItem - START
2019-08-22 13:21:53,792 Opening aaaaaa [PersistenceItem]
2019-08-22 13:21:56,768 Preprocessing done
2019-08-22 13:22:13,170 Upload done
2019-08-22 13:22:13,170 Filetype: PortItem - SKIPPED
2019-08-22 13:22:13,170 Filetype: UserItem - SKIPPED
2019-08-22 13:22:13,170 Filetype: TaskItem - SKIPPED
2019-08-22 13:22:13,170 Filetype: ProcessItem - START
2019-08-22 13:22:13,170 Opening aaaaa [ProcessItem]
2019-08-22 13:22:13,199 Preprocessing done
2019-08-22 13:22:13,289 Upload done
2019-08-22 13:22:13,289 Filetype: PrefetchItem - START
2019-08-22 13:22:13,289 Opening aaaaa [PrefetchItem]
2019-08-22 13:22:13,289 Empty file - SKIPPED
2019-08-22 13:22:13,289 Filetype: RegistryItem - SKIPPED
2019-08-22 13:22:13,289 Filetype: DiskItem - SKIPPED
2019-08-22 13:22:13,289 Filetype: VolumeItem - SKIPPED
2019-08-22 13:22:13,289 Filetype: UrlHistoryItem - START
2019-08-22 13:22:13,289 Opening aaaaa [UrlHistoryItem]
2019-08-22 13:22:13,371 Preprocessing done
2019-08-22 13:22:13,661 Upload done
2019-08-22 13:22:13,661 Filetype: FileDownloadHistoryItem - START
2019-08-22 13:22:13,661 Opening aaaaa [FileDownloadHistoryItem]
2019-08-22 13:22:13,772 Preprocessing done
2019-08-22 13:22:14,086 Upload done
2019-08-22 13:22:14,086 Filetype: DnsEntryItem - SKIPPED
2019-08-22 13:22:14,086 Filetype: RouteEntryItem - SKIPPED
2019-08-22 13:22:14,086 Filetype: ArpEntryItem - SKIPPED
2019-08-22 13:22:14,086 Filetype: SystemInfoItem - SKIPPED
2019-08-22 13:22:14,087 completed

@kiddinn
Copy link
Contributor

kiddinn commented Oct 19, 2019

is this ready for a second round of reviews? if so can you press the button on the "review panel" to indicate you are ready? (there is the circle made out of arrows next to my name, otherwise I assume you are still working on the changes)

@garanews
Copy link
Contributor Author

is this ready for a second round of reviews? if so can you press the button on the "review panel" to indicate you are ready? (there is the circle made out of arrows next to my name, otherwise I assume you are still working on the changes)

Hello,
I didn't apply any modify since first submission. After your change request I asked you the reason of that and what was the best way to go ahead. (if you remember, you told me to "comment provides no value", but I just copy paste from previous part of code so my question was, worth remove only my comment or should I also remove previous one? otherwise keep both).
Still waiting for your answer before continue.

@kiddinn
Copy link
Contributor

kiddinn commented May 6, 2020

Hi

is this still applicable?

sorry for not getting sooner to this, but is this ready for another round of review?

you'll need to resolve some conflicts before you can merge the branch with master, but can you do that and mark me again for a review (need to click the button next to my name ini the review section)

Copy link
Contributor

@kiddinn kiddinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more comment

logging.info(message.format(timeline_name, index_name, source_type))

try:
mans_to_es_path = current_app.config['MANS_TO_ES_PATH']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this is being done...

mans_to_es seems to be a python project, why on earth are you requiring someone to install the toola dn then use supbrocess to run it, rather than just import it as a module (have it in the requirements and then import iit)

looks to me all that the command line tool does is to parse the command line arguments and then run

            mte = MansToEs(args)
            mte.extract_mans()
            mte.parse_manifest()
            mte.parse_hits()
            mte.process()
            mte.to_elastic()
            mte.delete_temp_folder()

so you might as well just import the library and do the same here, and avoid this whole subprocess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to follow the same behavior of plaso, when called psort.py
If you think is better to use library, we can change the approach.
We tought in this way it was easier to add features or fixes at our side without opening pr on timesketch repo.

@kiddinn
Copy link
Contributor

kiddinn commented May 18, 2020

OK, can you resolve the conflicts that you've got and update the branch?

In regards to the question of subprocess vs import. I think the correct method here is to import, since this library is a simple import. Plaso is another beast that is hard to import and we were planning on changing the plaso behavior, to have it run in a separate container for isolation as well as to implement a different storage backend so that plaso could output the data directly into the ES datastore instead of producing a storage file that needs to be run using psort. These are WIP ATM.

If there are some security risks of parsing that XML file that would mean we could get some security benefits by using a separate process and do some isolation, such as to run the parser in a separate container, then I'm all for it. Otherwise I would opt for an import of the library (how we do json lines and CSV, that is these text based files that we parse).

@garanews
Copy link
Contributor Author

Going to close this PR and submit an "updated one".
After one year waiting this is no more valid and to include it in your updated version, other files will be affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants