-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support to create timeline from FireEye HX .mans triage #937
Conversation
…ions Added support to create timeline from FireEye HX .mans triage collections
Pylint on Ubuntu Bionic (18.04) (Docker) with Python 3.6
uh that is a great thing. |
Thanks, any idea or suggestion will be appreciated :)
Regarding memory, at the moment with multiprocessing the pandas dataframe is copied multiple times depending how many core you want to use. We're trying modin https://github.com/modin-project/modin but it seems that the time to processing everything will increase because it works on rows and not on columns. I'm pretty sure that DASK https://dask.org/ will solve all my problems but don't know now many other users have it or would install it! ^___^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments
We just followed the same path done for plaso files with psort. So don't know why you are asking to remove comments or to import directly. We tried to standardize code. |
Ok so I spent quite some time, needed to start celery in a venv with python3.6, but after that I had some minor issue, which I addressed with @garanews already
Which I could fix by downloading the mans_to_es.py file directly from https://github.com/LDO-CERT/mans_to_es and put it into /usr/local/bin Then uploading a mans file worked and the celery worker took some time to process it Tested with various mans files, with smaller ones it worked. I need to test it with larger files, but at the moment it looks really good. |
Maybe for reference, that is how it should look like in mans_to_es.log:
|
is this ready for a second round of reviews? if so can you press the button on the "review panel" to indicate you are ready? (there is the circle made out of arrows next to my name, otherwise I assume you are still working on the changes) |
Hello, |
Hi is this still applicable? sorry for not getting sooner to this, but is this ready for another round of review? you'll need to resolve some conflicts before you can merge the branch with master, but can you do that and mark me again for a review (need to click the button next to my name ini the review section) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more comment
logging.info(message.format(timeline_name, index_name, source_type)) | ||
|
||
try: | ||
mans_to_es_path = current_app.config['MANS_TO_ES_PATH'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this is being done...
mans_to_es seems to be a python project, why on earth are you requiring someone to install the toola dn then use supbrocess to run it, rather than just import it as a module (have it in the requirements and then import iit)
looks to me all that the command line tool does is to parse the command line arguments and then run
mte = MansToEs(args)
mte.extract_mans()
mte.parse_manifest()
mte.parse_hits()
mte.process()
mte.to_elastic()
mte.delete_temp_folder()
so you might as well just import the library and do the same here, and avoid this whole subprocess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to follow the same behavior of plaso, when called psort.py
If you think is better to use library, we can change the approach.
We tought in this way it was easier to add features or fixes at our side without opening pr on timesketch repo.
OK, can you resolve the conflicts that you've got and update the branch? In regards to the question of subprocess vs import. I think the correct method here is to import, since this library is a simple import. Plaso is another beast that is hard to import and we were planning on changing the plaso behavior, to have it run in a separate container for isolation as well as to implement a different storage backend so that plaso could output the data directly into the ES datastore instead of producing a storage file that needs to be run using psort. These are WIP ATM. If there are some security risks of parsing that XML file that would mean we could get some security benefits by using a separate process and do some isolation, such as to run the parser in a separate container, then I'm all for it. Otherwise I would opt for an import of the library (how we do json lines and CSV, that is these text based files that we parse). |
Going to close this PR and submit an "updated one". |
Added support to create timeline from FireEye HX .mans triage collections