Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract void *aggregate pointer in struct route_node #2785

Merged
merged 3 commits into from
Aug 31, 2018

Conversation

donaldsharp
Copy link
Member

The struct route_node data structure has a void *aggregate pointer that is being used by rfapi and ripngd. For a full bgp feed, this is causing an extra 8 bytes per node over 1.2 million nodes in bgp and zebra alone.

Create a struct agg_node and a struct agg_table data structure that we can replace into rfapi and ripng, then work magic over those code bases to convert over to the new data structure. Finally move the aggregate pointer in to the struct agg_node data structure.

Copy link
Member

@louberger louberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something is wrong in rfapi changes:
seeing:

Program received signal SIGSEGV, Segmentation fault.
rfapiRibFTDFilterRecentPrefix (rfd=0xdc4810, it_rn=0xdc3980,
pfx_target_original=0x7fffffffb830) at rfapi/rfapi_rib.c:1808
1808 if (bgp->rfapi_cfg->rfp_cfg.download_type != RFAPI_RFP_DOWNLOAD_FULL)
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.212.el6.x86_64 json-c-0.11-13.el6.x86_64 libattr-2.4.44-7.el6.x86_64 libcap-2.16-5.5.el6.x86_64 libgcc-4.4.7-18.el6.x86_64 nss-softokn-freebl-3.14.3-23.3.el6_8.x86_64
(gdb) where
#0 rfapiRibFTDFilterRecentPrefix (rfd=0xdc4810, it_rn=0xdc3980,
pfx_target_original=0x7fffffffb830) at rfapi/rfapi_rib.c:1808
#1 0x000000000048cf5a in rfapiNhlAddNodeRoutes (rn=0xdc3980, rprefix=0x7fffffffb740,
lifetime=45, removed=0, head=0x7fffffffb768, tail=0x7fffffffb760,
exclude_vnaddr=0xdc4428, rfd_rib_node=0xdc87e0, pfx_target_original=0x7fffffffb830)
at rfapi/rfapi_import.c:1555
#2 0x000000000048d820 in rfapiRouteNode2NextHopList (rn=0xdc3980, lifetime=45,
exclude_vnaddr=0xdc4428, rfd_rib_table=0xdc4810, pfx_target_original=0x7fffffffb830)
at rfapi/rfapi_import.c:1786
#3 0x0000000000491be8 in rfapi_query_inner (handle=0xdc4410, target=0xdb77f4,
l2o=, ppNextHopEntry=0x7fffffffd9e8) at rfapi/rfapi.c:1755
#4 rfapi_query (handle=0xdc4410, target=0xdb77f4, l2o=,
ppNextHopEntry=0x7fffffffd9e8) at rfapi/rfapi.c:2755
#5 0x00000000004c6a15 in rfp_query_impl (data=0xdc2630) at rfp_query.c:237
#6 0x00000000004cd4ae in rfp_dispatcher_accept (thread=)
at rfp_dispatcher.c:280
#7 0x00000000004f230f in thread_call (thread=0x7fffffffdca0) at lib/thread.c:1577
#8 0x00000000004dce3e in frr_run (master=0x7b86e0) at lib/libfrr.c:914
#9 0x000000000042378d in main (argc=19, argv=0x7fffffffded8) at bgp_main.c:432

@donaldsharp
Copy link
Member Author

Update -> I have not been able to get back to this until today. I will look at it this afternoon and see if I can fix this up.

@donaldsharp
Copy link
Member Author

Pushing changes up to fix merge error so I can work on changes on another box.

@louberger
Copy link
Member

latest crash in labn ci testing:
#0 0x000000331ec324f5 in raise () from /lib64/libc.so.6
(gdb) #0 0x000000331ec324f5 in raise () from /lib64/libc.so.6
#1 0x000000331ec33cd5 in abort () from /lib64/libc.so.6
#2 0x00000000004ec88f in core_handler (signo=11, siginfo=0x7ffc40b7b0f0,
context=0x7ffc40b7afc0) at lib/sigevent.c:249
#3
#4 rfapiRibFTDFilterRecentPrefix (rfd=0x1d1fd60, it_rn=0x1d0fdf0,
pfx_target_original=0x7ffc40b7b810) at rfapi/rfapi_rib.c:1808
#5 0x000000000048d8ba in rfapiNhlAddNodeRoutes (rn=0x1d0fdf0, rprefix=)
at rfapi/rfapi_import.c:1556
#6 0x000000000048e17c in rfapiRouteNode2NextHopList (rn=0x1d0fdf0, lifetime=)
at rfapi/rfapi_import.c:1787
#7 0x000000000049255d in rfapi_query (handle=0x1c59ec0, target=0x1d0fbc4,
l2o=, ppNextHopEntry=) at rfapi/rfapi.c:1756
#8 0x00000000004c7955 in rfp_query_impl (data=0x1d12db0) at rfp_query.c:237
#9 0x00000000004ce3ee in rfp_dispatcher (thread=)
at rfp_dispatcher.c:280
#10 0x00000000004f3b19 in thread_call (thread=0x7ffc40b7dc80)
at lib/thread.c:1578
#11 0x00000000004de58d in frr_run (master=0x173c6e0) at lib/libfrr.c:925
#12 0x0000000000423b3e in main (argc=19, argv=) at bgp_main.c:452
(gdb)

@LabN-CI
Copy link
Collaborator

LabN-CI commented Aug 30, 2018

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/2785 58570de
Date 08/30/2018
Start 11:34:33
Finish 11:57:52
Run-Time 23:19
Total 1816
Pass 1816
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2018-08-30-11:34:33.txt
Log autoscript-2018-08-30-11:35:18.log.bz2

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-5111/

This is a comment from an EXPERIMENTAL automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.


Warnings Generated during build:

Checkout code: Successful with additional warnings:

Report for agg_table.c | 4 issues
===============================================
WARNING: externs should be avoided in .c files
#49: FILE: /tmp/f1-12624/agg_table.c:49:
+extern struct agg_table *agg_table_init(void)
Report for agg_table.h | 8 issues
===============================================
WARNING: please, no spaces at the start of a line
#66: FILE: /tmp/f1-12624/agg_table.h:66:
+       atable->info = data;$

WARNING: please, no spaces at the start of a line
#71: FILE: /tmp/f1-12624/agg_table.h:71:
+       return atable->info;$

CLANG Static Analyzer Summary

  • Github Pull Request 2785, comparing to Git base SHA 66a9aa8

Fixed warnings:

  • Memory error: Use-after-free in dpd/lde.c, function lde_address_list_free, line 1624
  • Memory error: Use-after-free in ib/imsg-buffer.c, function msgbuf_clear, line 199
  • Memory error: Use-after-free in ib/imsg.c, function imsg_get_fd, line 305
  • Logic error: Assigned value is garbage or undefined in ib/skiplist.c, function skiplist_insert, line 224

New warnings:

Static Analysis warning summary compared to base:

  • Fixed warnings: 4
  • New warnings: 5

5 Static Analyzer issues remaining.

See details at
https://ci1.netdef.org/browse/FRR-FRRPULLREQ-5111/artifact/shared/static_analysis/index.html

@FRRouting FRRouting deleted a comment from NetDEF-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from LabN-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from LabN-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from NetDEF-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from NetDEF-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from LabN-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from NetDEF-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from LabN-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from NetDEF-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from LabN-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from NetDEF-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from LabN-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from NetDEF-CI Aug 30, 2018
@FRRouting FRRouting deleted a comment from NetDEF-CI Aug 30, 2018
@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-5117/

This is a comment from an EXPERIMENTAL automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.


Warnings Generated during build:

Checkout code: Successful with additional warnings:

Report for agg_table.c | 4 issues
===============================================
WARNING: externs should be avoided in .c files
#49: FILE: /tmp/f1-25460/agg_table.c:49:
+extern struct agg_table *agg_table_init(void)
Report for agg_table.h | 8 issues
===============================================
WARNING: please, no spaces at the start of a line
#66: FILE: /tmp/f1-25460/agg_table.h:66:
+       atable->info = data;$

WARNING: please, no spaces at the start of a line
#71: FILE: /tmp/f1-25460/agg_table.h:71:
+       return atable->info;$
Report for rfapi_rib.c | 2 issues
===============================================
< WARNING: suspect code indent for conditional statements (16, 20)
< #604: FILE: /tmp/f1-25460/rfapi_rib.c:604:

CLANG Static Analyzer Summary

  • Github Pull Request 2785, comparing to Git base SHA a91a5dc

No Changes in Static Analysis warnings compared to base

4 Static Analyzer issues remaining.

See details at
https://ci1.netdef.org/browse/FRR-FRRPULLREQ-5117/artifact/shared/static_analysis/index.html

@LabN-CI
Copy link
Collaborator

LabN-CI commented Aug 30, 2018

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/2785 93e852d
Date 08/30/2018
Start 15:31:24
Finish 15:54:35
Run-Time 23:11
Total 1813
Pass 1813
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2018-08-30-15:31:24.txt
Log autoscript-2018-08-30-15:32:08.log.bz2

For details, please contact louberger

@donaldsharp
Copy link
Member Author

On a 32 bit system with a full bgp feed:
Pre-changes:
annie# show mem
Memory statistics for zebra:
System allocator statistics:
Total heap allocated: 378 MiB
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 292 MiB

Post changes:
Memory statistics for zebra:
System allocator statistics:
Total heap allocated: 364 MiB
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: 281 MiB

About 14 mb per process for a full bgp feed. So we are saving 28mb

@louberger
Copy link
Member

louberger commented Aug 30, 2018 via email

Add a abstraction for `struct route_node` and `struct route_table`
such that we can have an aggregate route_node and table.  This
is because only bgp/rfapi and ripng use the aggregate data pointer
in `struct route_node`.  For full route tables other routing
protocols and tables are paying a 8 byte overhead per node.
A full bgp table ends up being ~1.2 million routes in bgp
and zebra.  This is not an insiginificant amount of data.

So create the data structures for this replacement, but
do not replace the aggregate pointer yet.  This is because
later commits will convert rfapi and ripng over to this
new data, and finally we'll move the aggregate pointer.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Switch bgp and ripngd to use the new aggregate table and
route data structures.  This was mainly a search and replace
operation.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Move the aggregate pointer from the route_node into agg_node
so that people using struct route_node will see a savings
in data size.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
@donaldsharp
Copy link
Member Author

I think so, especially once we start cleaning up the struct bgp_node and cleanup the struct nexthop usage in zebra. This is significant

@LabN-CI
Copy link
Collaborator

LabN-CI commented Aug 30, 2018

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/2785 c2b2356
Date 08/30/2018
Start 18:01:24
Finish 18:25:00
Run-Time 23:36
Total 1813
Pass 1813
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2018-08-30-18:01:24.txt
Log autoscript-2018-08-30-18:02:10.log.bz2

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-5123/

This is a comment from an EXPERIMENTAL automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.


CLANG Static Analyzer Summary

  • Github Pull Request 2785, comparing to Git base SHA a91a5dc

No Changes in Static Analysis warnings compared to base

4 Static Analyzer issues remaining.

See details at
https://ci1.netdef.org/browse/FRR-FRRPULLREQ-5123/artifact/shared/static_analysis/index.html

@louberger louberger merged commit e93c633 into FRRouting:master Aug 31, 2018
@donaldsharp donaldsharp deleted the AGGanomics branch September 13, 2018 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants