Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: multi-line logs #74

Closed
ventris opened this issue Dec 12, 2018 · 76 comments · Fixed by #3024
Closed

Feature: multi-line logs #74

ventris opened this issue Dec 12, 2018 · 76 comments · Fixed by #3024
Labels
component/agent keepalive An issue or PR that will be kept alive and never marked as stale. type/feature Something new we should do
Milestone

Comments

@ventris
Copy link

ventris commented Dec 12, 2018

Currently when an multi-line event is written to a logfile promtail will take this as each row is its own entry and send the separately to loki.

It would be great if Loki could handle multi-line events due to stacktraces.

This might also be a bug if promtail already should handle multi-lines.

@tomwilkie
Copy link
Contributor

Agreed, thanks @ventris. Any idea how to do this though? I guess we could have the option to parse the log entries in promtail, and define each entry as starting with a time stamp?

@ventris
Copy link
Author

ventris commented Dec 12, 2018

Some applications require an "special" character before the multi-line event. I know that our current Wildfly setup puts #### before the multiline event.

But I think that parsing the log entries somehow and cut each entry with the timestamp as key is the better solution.

@ventris
Copy link
Author

ventris commented Dec 12, 2018

Splunk handles this in a smart way.

Source: extract-timestamps in splunk

Maybe somethink like that?

One thing to take into account with only look for timestamps is that they can occur within a stacktrace. That is why Splunk has a variable that you specify how many characters into a line it should look for the timestamp.

@sandstrom
Copy link

sandstrom commented Dec 13, 2018

Here is another multi-line scenario

Our log files are JSONL (http://jsonlines.org/) and look like this:

{ "timestamp" : "2018-12-24T18:00:00", "user-id" : "abc", "request-id" : "123", "messages" : ["Parsing new user data", "Saving user data, valid model", "Storing user data in db"] }
{ "timestamp" : "2018-12-24T18:01:00", "user-id: " "def", "request-id" : "456", "messages" : ["Updating user record", "Updating search index"] }
{ "timestamp" : "2018-12-24T18:02:00", "user-id: " "abc", "request-id" : "789", "messages" : ["Keep alive ping from client", "Updated client with cluster state details"] }

We'd like to (1) associate the key-value metadata with each log line, (2) ingest the array of log lines from the "messages" key and (3) ideally group the associated log lines (so that a log reader client could easily show them together).

If it makes ingestion easier, we could easily keep a timestamp per message in the array instead. Something like this:

{ "timestamp" : "2018-12-24T18:02", "user-id: " "abc", "request-id" : "789", "messages" : [{ "t" : 2018-12-24T18:02:10", "m" : "Keep alive ping from client" }, { "t" : 2018-12-24T18:02:10", "m" : "Updated client with cluster state details" }] }

Also, Loki sounds awesome! 🏆

Have been looking for a lightweight alternative to the ELK stack for logging. We could easily live without full-text search inside log messages (distributed grep would be enough).

However, we'd need filtering by keywords and multi-line log handling to make it work.

@tomwilkie tomwilkie added component/agent type/feature Something new we should do labels Dec 16, 2018
@tomwilkie
Copy link
Contributor

We recently merged support for having multiple entries with the same timestamp; when running the docker log parser, multi-line logs usually show up as entries with the same timestamp, and are therefore grouped together. Not a perfect solution, but helps.

@m0wfo
Copy link

m0wfo commented Mar 11, 2019

When I was at Logentries we delimited on \r, \n, e.t.c but allowed loggers to concat multiple lines using the unicode line separator. The indexer would ignore it and treat the entire payload as one entry (it was then up to the UI to optionally reformat the code onto multiple lines). That's one version of an imperfect solution anyway :)

@DanCech
Copy link
Contributor

DanCech commented Mar 11, 2019

My understanding is that loki doesn't treat the content of log "lines" specially, so it should be up to the process tailing the logfile to decide when a single loki log "line" should actually contain multiple newline-delimited "lines" of content (maybe it would have made more sense to call it a log entry but such is life).

Of course I may be misunderstanding the intent here, but it seems like something like a backtrace would be more useful as a single entry in loki rather than being split up.

@davidhao3300
Copy link

davidhao3300 commented Mar 15, 2019

From what I've seen in other similar log tailing (fluentd concat plugin), the tailer provides options for multi-line start regexp (lines matching the regex indicate the start of a new log entry, so flush the old one out), with a timeout/max buffer size (if x lines end up in the buffer without a new log entry appearing, flush the logs out anyways).

I implemented support for the multi-line start regex for a hackathon, and it seems to work decently. I'll try to clean up the code over the weekend and open a PR to show as an example. I didn't implement any sort of timeout/max buffer size though.

@davidhao3300
Copy link

Opened a PR for this issue

@steven-sheehy
Copy link
Contributor

Note that multi-line is different depending upon the EntryParser. With CRI, it will send multiple entries with different flags to indicate if it's a partial log or a full log. See this document.

@cyriltovena
Copy link
Contributor

Currently investigating this.

@cyriltovena cyriltovena pinned this issue Aug 2, 2019
@cyriltovena cyriltovena unpinned this issue Aug 2, 2019
@cyriltovena
Copy link
Contributor

I looked at it #399. There would be a lot of refactoring involved if we want to get this feature in the current pipeline, I would love to hear more about how this is useful before getting started. Basically how merging lines together helps is it a visual problem? Or is to improve storage of those ? Good example of workload emitting multiline logs ?

I want to know a bit more.

@ventris
Copy link
Author

ventris commented Aug 4, 2019

Java and C# stacktraces are multiline.

One of the problems with not having access to the full event is that it makes it much harder to search for other similar events because that the linebreaks can cut javaclasses in half and thus does not paint a real picture of what happened in the logs on the server.

Another thing is if the example below is handled with multiple logs instead it is really hard to search for say:

  • I know that these two keywords should be present somewhere in the stacktrace. Say com.example.myproject.MyProjectServletException and javax.servlet.ServletException: Something bad happened.

If I search for them withing having the stacktrace in one entry I will just get the rows that matched my keywords and not the whole stacktrace. This makes troubleshooting java much harder with Loki if multiline events are handled as-is. Because I first need to search for the keywords and then try to find the first and last entry of the stacktrace so I can read it as a whole. And the stacktraces can be several hundred lines.

I think there are other multiline events that should benefit from having it all in the same entry, but I have only worked with Java that behaves like that so I can not talk for other languages.

Here is an example stacktrace from java:

javax.servlet.ServletException: Something bad happened
    at com.example.myproject.OpenSessionInViewFilter.doFilter(OpenSessionInViewFilter.java:60)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at com.example.myproject.ExceptionHandlerFilter.doFilter(ExceptionHandlerFilter.java:28)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at com.example.myproject.OutputBufferFilter.doFilter(OutputBufferFilter.java:33)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
    at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:943)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
    at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: com.example.myproject.MyProjectServletException
    at com.example.myproject.MyServlet.doPost(MyServlet.java:169)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
    at com.example.myproject.OpenSessionInViewFilter.doFilter(OpenSessionInViewFilter.java:30)
    ... 27 more
Caused by: org.hibernate.exception.ConstraintViolationException: could not insert: [com.example.myproject.MyEntity]
    at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:96)
    at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66)
    at org.hibernate.id.insert.AbstractSelectingDelegate.performInsert(AbstractSelectingDelegate.java:64)
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2329)
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2822)
    at org.hibernate.action.EntityIdentityInsertAction.execute(EntityIdentityInsertAction.java:71)
    at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:268)
    at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:321)
    at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:204)
    at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:130)
    at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:210)
    at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:56)
    at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:195)
    at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:50)
    at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:93)
    at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:705)
    at org.hibernate.impl.SessionImpl.save(SessionImpl.java:693)
    at org.hibernate.impl.SessionImpl.save(SessionImpl.java:689)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.hibernate.context.ThreadLocalSessionContext$TransactionProtectionWrapper.invoke(ThreadLocalSessionContext.java:344)
    at $Proxy19.save(Unknown Source)
    at com.example.myproject.MyEntityService.save(MyEntityService.java:59) <-- relevant call (see notes below)
    at com.example.myproject.MyServlet.doPost(MyServlet.java:164)
    ... 32 more
Caused by: java.sql.SQLException: Violation of unique constraint MY_ENTITY_UK_1: duplicate value(s) for column(s) MY_COLUMN in statement [...]
    at org.hsqldb.jdbc.Util.throwError(Unknown Source)
    at org.hsqldb.jdbc.jdbcPreparedStatement.executeUpdate(Unknown Source)
    at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeUpdate(NewProxyPreparedStatement.java:105)
    at org.hibernate.id.insert.AbstractSelectingDelegate.performInsert(AbstractSelectingDelegate.java:57)
    ... 54 more

@steven-sheehy
Copy link
Contributor

Above, plus Grafana only shows them in descending order by time, so stacktrace is upside down from how it should appear. Also makes emoji art impossible to see, which a lot of java apps print on startup. :)

@tlinhart
Copy link
Contributor

Python stack traces are the same:

[2019-08-13 06:58:20,588] ERROR in app: Exception on /graphql [POST]
Traceback (most recent call last):
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "./app/graphql/views.py", line 66, in service
    field_path_to_str(error.path, False),
AttributeError: 'Exception' object has no attribute 'path'

It's all about the context. Only complete log record holds the whole context, not any single line.

@khauser
Copy link

khauser commented Aug 22, 2019

Could you please encounter the following scenario or at least give your opinion:

Because of Loki we have changed the log format of our appenders to use the logstash JSON-layout-formatter in our logback configuration. That's why now we have an all information containing log entry with also the whole stacktrace (containing the already noted \n\t). So no need to join lines afterwards. That sounds good to me...

Here is a sample entry:

{"@timestamp":"2019-08-13T23:57:20.971+00:00","@version":"1","message":"Error in job timer thread occurred!","logger_name":"somecode.core.internal.job.Timer","thread_name":"JobTimer759282143","level":"ERROR","level_value":40000,"stack_trace":"somecode.core.capi.common.SystemException: java.sql.SQLException: Exception occurred while getting connection: oracle.ucp.UniversalConnectionPoolException: Cannot get Connection from Datasource: java.sql.SQLException: Listener refused the connection with the following error:\nORA-12505, TNS:listener does not currently know of SID given in connect descriptor\n \n\tat somecode.core.internal.jdbc.JDBCDataSourceMgrImpl.getConnection(JDBCDataSourceMgrImpl.java:783)\n\tat somecode.core.internal.jdbc.JDBCDataSourceMgrImpl.getConnection(JDBCDataSourceMgrImpl.java:796)\n\tat somecode.core.internal.jdbc.JDBCUtilsImpl.getConnection(JDBCUtilsImpl.java:925)\n\tat somecode.core.capi.jdbc.JDBCUtils.getConnection(JDBCUtils.java:560)\n\tat somecode.core.internal.jdbc.DBDate.currentDBDate(DBDate.java:75)\n\tat somecode.core.internal.job.Timer.run(Timer.java:102)\nCaused by: java.sql.SQLException: Exception occurred while getting connection: oracle.ucp.UniversalConnectionPoolException: Cannot get Connection from Datasource: java.sql.SQLException: Listener refused the connection with the following error:\nORA-12505, TNS:listener does not currently know of SID given in connect descriptor\n \n\tat oracle.ucp.util.UCPErrorHandler.newSQLException(UCPErrorHandler.java:479)\n\tat oracle.ucp.util.UCPErrorHandler.throwSQLException(UCPErrorHandler.java:154)\n\tat oracle.ucp.jdbc.PoolDataSourceImpl.getConnection(PoolDataSourceImpl.java:1127)\n\tat oracle.ucp.jdbc.PoolDataSourceImpl.getConnection(PoolDataSourceImpl.java:1031)\n\tat oracle.ucp.jdbc.PoolDataSourceImpl.getConnection(PoolDataSourceImpl.java:1006)\n\tat somecode.core.internal.jdbc.JDBCDataSource.getConnection(JDBCDataSource.java:101)\n\tat somecode.core.internal.jdbc.JDBCDataSource.getConnection(JDBCDataSource.java:73)\n\tat somecode.core.internal.jdbc.JDBCDataSourceMgrImpl.getConnection(JDBCDataSourceMgrImpl.java:779)\n\t... 5 common frames omitted\nCaused by: oracle.ucp.UniversalConnectionPoolException: Cannot get Connection from Datasource: java.sql.SQLException: Listener refused the connection with the following error:\nORA-12505, TNS:listener does not currently know of SID given in connect descriptor\n \n\tat oracle.ucp.util.UCPErrorHandler.newUniversalConnectionPoolException(UCPErrorHandler.java:359)\n\tat oracle.ucp.util.UCPErrorHandler.throwUniversalConnectionPoolException(UCPErrorHandler.java:58)\n\tat oracle.ucp.jdbc.DataSourceConnectionFactoryAdapter.createConnection(DataSourceConnectionFactoryAdapter.java:102)\n\tat oracle.ucp.jdbc.oracle.OracleDataSourceConnectionFactoryAdapter.createConnection(OracleDataSourceConnectionFactoryAdapter.java:82)\n\tat oracle.ucp.common.Database.createPooledConnection(Database.java:268)\n\tat oracle.ucp.common.Cluster.create(Cluster.java:234)\n\tat oracle.ucp.common.Core.growBorrowed(Core.java:634)\n\tat oracle.ucp.common.UniversalConnectionPoolImpl.borrowConnectionWithoutCountingRequests(UniversalConnectionPoolImpl.java:230)\n\tat oracle.ucp.common.UniversalConnectionPoolImpl.borrowConnectionAndValidate(UniversalConnectionPoolImpl.java:139)\n\tat oracle.ucp.common.UniversalConnectionPoolImpl.borrowConnection(UniversalConnectionPoolImpl.java:120)\n\tat oracle.ucp.jdbc.JDBCConnectionPool.borrowConnection(JDBCConnectionPool.java:170)\n\tat oracle.ucp.jdbc.oracle.OracleJDBCConnectionPool.borrowConnection(OracleJDBCConnectionPool.java:849)\n\tat oracle.ucp.jdbc.oracle.OracleConnectionConnectionPool.borrowConnection(OracleConnectionConnectionPool.java:82)\n\tat oracle.ucp.jdbc.PoolDataSourceImpl.getConnection(PoolDataSourceImpl.java:1103)\n\t... 10 common frames omitted\nCaused by: java.sql.SQLException: Listener refused the connection with the following error:\nORA-12505, TNS:listener does not currently know of SID given in connect descriptor\n \n\tat oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:743)\n\tat oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:666)\n\tat oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)\n\tat oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:566)\n\tat oracle.jdbc.pool.OracleDataSource.getPhysicalConnection(OracleDataSource.java:317)\n\tat oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:241)\n\tat oracle.ucp.jdbc.DataSourceConnectionFactoryAdapter.createConnection(DataSourceConnectionFactoryAdapter.java:96)\n\t... 21 common frames omitted\nCaused by: oracle.net.ns.NetException: Listener refused the connection with the following error:\nORA-12505, TNS:listener does not currently know of SID given in connect descriptor\n \n\tat oracle.net.ns.NSProtocolStream.negotiateConnection(NSProtocolStream.java:275)\n\tat oracle.net.ns.NSProtocol.connect(NSProtocol.java:264)\n\tat oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1452)\n\tat oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:496)\n\t... 27 common frames omitted\n","executionsite":"root"}

But... In loki and the file log itself the stacktrace isn't readable anymore. That's not good. At least for the loki side maybe something like a configuration possiblity (fluentd plugin plus promtail) for fields would be good (e.g. multiline_fields: [stack_trace]). Then in grafana each multiline field could be represented in a well formated column maybe (just a simple thought).

I don't know if the one-json-line approach is the best one but just wanted to note it here. In theory it might also be well displayed. For us this would be the most needed Loki+Grafana feature since I wouldn't let other engineers work with a one-line stacktrace..

On the other side we also need to handle logs from older environments where multilines in spite of stacktraces would play a role:

[2019-08-13 20:36:44 GMT] - [Thread-138270] INFO  c.i.b.w.w.DeleteExpiredFilesProcess: finished deletion of expired files. Total files removed: 0 pages from 0/4 sites (enabled/disabled for indexing)
[2019-08-13 21:06:50 GMT] - [Thread-138713] INFO  c.i.b.w.w.DeleteExpiredFilesProcess: finished deletion of expired files. Total files removed: 0 pages from 0/4 sites (enabled/disabled for indexing)
[2019-08-13 21:36:56 GMT] - [Thread-139154] INFO  c.i.b.w.w.DeleteExpiredFilesProcess: finished deletion of expired files. Total files removed: 0 pages from 0/4 sites (enabled/disabled for indexing)
[2019-08-13 22:00:12 GMT] - [main] ERROR c.i.b.w.w.WebAdapterAgent: cycle failed:
java.lang.NumberFormatException: For input string: "-db error"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at java.lang.Integer.parseInt(Integer.java:615)
	at somecode.webinterface.webadapteragent.PageCache.expirePages(PageCache.java:292)
	at somecode.webinterface.webadapteragent.WebAdapterAgent.loop(WebAdapterAgent.java:144)
	at somecode.webinterface.webadapteragent.WebAdapterAgent.main(WebAdapterAgent.java:40)
[2019-08-13 22:00:13 GMT] - [main] INFO  c.i.b.w.w.WebAdapterAgent: All services are now up and running

PS: Also this issue #616 also seems to be related with multilines.

@khauser
Copy link

khauser commented Aug 23, 2019

We checked with Azure Log Analytics and with Kibana..

If a json formatted log entry stacktrace arrives in

  • Azure Log Analytics the table only shows the first characters of the stacktrace. But expanding the entry gives you a well formed one.
  • Kibana already displays the first 5 lines of the stacktrace and is also able to expand the log entry. Both outputs are well formed.

@stale
Copy link

stale bot commented Sep 22, 2019

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Sep 22, 2019
@cyriltovena cyriltovena added the keepalive An issue or PR that will be kept alive and never marked as stale. label Sep 22, 2019
@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Sep 22, 2019
@pstibrany
Copy link
Member

pstibrany commented Sep 25, 2019

Using timestamp stage in promtail with logs that contain lines without a timestamp (typically stack traces) results in a broken log stream.

If we use example from #74 (comment)

[2019-08-13 22:00:12 GMT] - [main] ERROR c.i.b.w.w.WebAdapterAgent: cycle failed:
java.lang.NumberFormatException: For input string: "-db error"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at java.lang.Integer.parseInt(Integer.java:615)
	at somecode.webinterface.webadapteragent.PageCache.expirePages(PageCache.java:292)
	at somecode.webinterface.webadapteragent.WebAdapterAgent.loop(WebAdapterAgent.java:144)
	at somecode.webinterface.webadapteragent.WebAdapterAgent.main(WebAdapterAgent.java:40)
[2019-08-13 22:00:13 GMT] - [main] INFO  c.i.b.w.w.WebAdapterAgent: All services are now up and running

Then timestamp stage will extract correct timestamp from the first and last line, but other lines will be assigned 'current time' -- which will then cause subsequent lines to be out of order and rejected by Loki.

Now tracked in #1093

@mnadeem
Copy link

mnadeem commented Jan 3, 2021

image

Unfortunately as of today this is still an issue with grafana/loki:2.1.0

@cyriltovena
Copy link
Contributor

Have you tried to configured promtail to match those multi lines ? https://grafana.com/docs/loki/latest/clients/promtail/stages/multiline/

@mnadeem
Copy link

mnadeem commented Jan 4, 2021

level=error ts=2021-01-04T06:56:02.891620396Z caller=main.go:106 msg="error creating promtail" error="failed to make file target manager: invalid multiline stage config: Unknown stage type: multiline"

          - job_name: spring-boot-app
            pipeline_stages:
              - multiline:
                  firstline: '^\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}\.\d{3}'
                  max_wait_time: 1s
            static_configs:
              - targets:
                  - localhost
                labels:
                  job: spring-boot-app
                  __path__: /app/log/spring-boot-app*.log

grafana/promtail:2.1.0

Further I have documented below, the way I have setup loki

https://reachmnadeem.wordpress.com/2021/01/02/logging-stack-for-spring-boot-application-using-grafan-loki-on-openshift-kubernetes/

@cyriltovena
Copy link
Contributor

cyriltovena commented Jan 4, 2021

It has not been release into 2.1, it will be in 2.2. You can use grafana/promtail:master-ee9c629 in the meantime. Sorry for the confusion.

@F1ch18
Copy link

F1ch18 commented Jan 13, 2021

grafana/promtail:master-ee9c629 doesn't work fine for me. I have crashloopbackoff and this in pod logs

goroutine 1115 [sleep]:
time.Sleep(0xee6b280)
/usr/local/go/src/runtime/time.go:188 +0xbf
github.com/hpcloud/tail/watch.(*PollingFileWatcher).ChangeEvents.func1(0xc000c432e0, 0xc0009cc1d8, 0xc000ca5740, 0x2fe6480, 0xc0019f6c30, 0xc000ca5760)
/src/loki/vendor/github.com/hpcloud/tail/watch/polling.go:68 +0x8f
created by github.com/hpcloud/tail/watch.(*PollingFileWatcher).ChangeEvents
/src/loki/vendor/github.com/hpcloud/tail/watch/polling.go:59 +0x18c

goroutine 1116 [chan receive]:
github.com/grafana/loki/pkg/promtail/targets/file.(*tailer).readLines(0xc000bfc000)
/src/loki/pkg/promtail/targets/file/tailer.go:137 +0x225
created by github.com/grafana/loki/pkg/promtail/targets/file.newTailer
/src/loki/pkg/promtail/targets/file/tailer.go:81 +0x53d

goroutine 1117 [select]:
github.com/grafana/loki/pkg/promtail/targets/file.(*tailer).updatePosition(0xc000bfc000)
/src/loki/pkg/promtail/targets/file/tailer.go:101 +0x145
created by github.com/grafana/loki/pkg/promtail/targets/file.newTailer
/src/loki/pkg/promtail/targets/file/tailer.go:82 +0x565

goroutine 1118 [select]:
github.com/grafana/loki/pkg/promtail/targets/file.(*FileTarget).run(0xc000ad6480)
/src/loki/pkg/promtail/targets/file/filetarget.go:177 +0x18a
created by github.com/grafana/loki/pkg/promtail/targets/file.NewFileTarget
/src/loki/pkg/promtail/targets/file/filetarget.go:124 +0x269

goroutine 1119 [syscall]:
syscall.Syscall6(0xe8, 0x195, 0xc00218fb6c, 0x7, 0xffffffffffffffff, 0x0, 0x0, 0xfad, 0x100000002, 0xfae)
/usr/local/go/src/syscall/asm_linux_amd64.s:41 +0x5
golang.org/x/sys/unix.EpollWait(0x195, 0xc00218fb6c, 0x7, 0x7, 0xffffffffffffffff, 0xfb3, 0x100000002, 0xfb4)
/src/loki/vendor/golang.org/x/sys/unix/zsyscall_linux_amd64.go:76 +0x72
gopkg.in/fsnotify%2ev1.(*fdPoller).wait(0xc000a48360, 0x100000000, 0xfbe, 0x100000002)
/src/loki/vendor/gopkg.in/fsnotify.v1/inotify_poller.go:86 +0x91
gopkg.in/fsnotify%2ev1.(*Watcher).readEvents(0xc0018ea820)
/src/loki/vendor/gopkg.in/fsnotify.v1/inotify.go:192 +0x206
created by gopkg.in/fsnotify%2ev1.NewWatcher
/src/loki/vendor/gopkg.in/fsnotify.v1/inotify.go:59 +0x1a8

@cyriltovena
Copy link
Contributor

The log message is cut off could you share the full error log ? Have you tried recent build I fixed a panic recently in promtail that was unreleased.

@cmde-lusional
Copy link

It has not been release into 2.1, it will be in 2.2. You can use grafana/promtail:master-ee9c629 in the meantime. Sorry for the confusion.

@cyriltovena can you tell me when 2.2 is gonna be released?

@antoniocascais
Copy link

antoniocascais commented Feb 23, 2021

Hi all.

I'm also facing similar issues as @mnadeem does.

I was following the docs (https://grafana.com/docs/loki/latest/clients/promtail/stages/multiline/) and added this configuration to my promtail values file (I'm installing the promtail helm chart in my cluster):

config:
  snippets:
    pipelineStages:
      - cri: {}
      - multiline:
          # Identify timestamps as first line of a multiline block. Note the string should be in single quotes.
          firstline: '^\[\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}\]'
          max_wait_time: 3s

But promtail doesn't start:

level=error ts=2021-02-23T10:52:53.883293805Z caller=main.go:106 msg="error creating promtail" error="failed to make file target manager: invalid multiline stage config: Unknown stage type: multiline"

Using promtail v2.1.0 and helm chart version 3.1.0:

appVersion: 2.1.0
version: 3.1.0

So, this will be fixed in v2.2.0? We need to wait for now?

@triwats
Copy link

triwats commented Feb 23, 2021

@antoniocascais - I think so, it's lined up to be released in 2.2, but as the above comment states from @cyriltovena, you can use a different docker tag to pull in these features. This will be experimental:

It has not been release into 2.1, it will be in 2.2. You can use grafana/promtail:master-ee9c629 in the meantime. Sorry for the confusion.

@cyriltovena
Copy link
Contributor

We wanted to release 2.2 this week. I don't know if we're on track but if not next week.

We wanted to include some read path bug in there too.

@mdegat01
Copy link

mdegat01 commented Mar 29, 2021

Does this capability work when scraping the system journal or is it only when scraping files? I have an application dumping tracebacks line by line into the system journal, was hoping to pull those together into one log line in Loki. I've been trying to use it but seems like its getting ignored. Or do you have to combine it with output somehow to change the text content of the log?

@cyriltovena
Copy link
Contributor

Yes it does

@mdegat01
Copy link

Ah I figured it out. There was an escape sequence at the front of the text for terminal color-coding and that was causing my regex to fail. This is great! Much nicer to look at 👍

@triwats
Copy link

triwats commented Mar 31, 2021

Just incase anyone else is watching this space for the update, this was all included in v2.2.0 which is now available and released: See here: https://github.com/grafana/loki/releases/tag/v2.2.0

@omelnyk10
Copy link

If someone here is still alive, maybe you can help me. Is there an option to collapse the error stack trace, and if I need to see it I just click on the arrow sign and see a well-formatted stack trace?

periklis added a commit to periklis/loki that referenced this issue Dec 6, 2021
@Jojoooo1
Copy link

anyone had success multilining with logback and spring boot ?

@jeschkies
Copy link
Contributor

👋 I wrote the multiline stage. @omelnyk10 and @Jojoooo1, you both want to collapse Java stack traces?

@omelnyk10
Copy link

@jeschkies yes

@Jojoooo1
Copy link

Exactly @jeschkies ! With the logback default encoder!

@jeschkies
Copy link
Contributor

There is an example in the documentation for configuring logback. Does that help?

@Jojoooo1
Copy link

Made it work with
firstline: '^\d{4}-\d{2}-\d{2}'

periklis added a commit to periklis/loki that referenced this issue Dec 2, 2022
Update from upstream repository
@dcerecedo
Copy link

Anyone can share a sample grafana-agent.yml that makes use of the multiline feature? Not even sure if the multiline config even belongs there!

I have tried lots of options without success. This is my last piece of config:

logs:
  configs:
  - clients:
    - basic_auth:
        password: ${LOKI_SECRET}
        username: ${LOKI_USERNAME}
      url: ${LOKI_URL}
    name: integrations
    positions:
      filename: /tmp/positions.yaml
    scrape_configs:
    - job_name: integrations/docker
      docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
      relabel_configs:
      - action: replace
        replacement: integrations/docker
        source_labels:
        - __meta_docker_container_id
        target_label: job
      - action: replace
        replacement: ${HOSTNAME}
        source_labels:
        - __meta_docker_container_id
        target_label: instance
      - regex: /(.*)
        source_labels:
        - __meta_docker_container_name
        target_label: container
      - source_labels:
        - __meta_docker_container_log_stream
        target_label: stream
      pipeline_stages:
      - multiline:
        firstline: '^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.\d{7}\s[+-]\d{2}:\d{2}\s\[\w+\]'
        max_wait_time: 3s
    target_config:
      sync_period: 10s

TBH, the grafana docs, in general are hard to follow. Docs are modular in the sense that they are scattered according to the respective config sections. You can get sample blocks but not a full sample of the agent configuration file. Therefore, it is sometimes hard to see where you are making a mistake.

@rufreakde
Copy link

Made it work with firstline: '^\d{4}-\d{2}-\d{2}'

I tried the following in my values.yaml:

snippets:
  pipelineStages:
  - cri: {}
  - multiline:
      firstline: '^\d{4}-\d{2}-\d{2}'
      max_wait_time: 3s

With helm chart:

- name: promtail
  version: 6.14.1
  repo: https://grafana.github.io/helm-charts

But unfortunately it is not working at all. Still all logs are collected line by line no summarisation on timestamp...

Any ideas does it still work for you? @Jojoooo1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/agent keepalive An issue or PR that will be kept alive and never marked as stale. type/feature Something new we should do
Projects
None yet
Development

Successfully merging a pull request may close this issue.