Skip to content

Latest commit

 

History

History
319 lines (299 loc) · 33.1 KB

AnchorWave_Shared.md

File metadata and controls

319 lines (299 loc) · 33.1 KB

Manisha has translated the AnchorWave output to determine regions of the B73 genome that are shared or unique in each other NAM line.

Here's what I'm going to do:
Take my list of 175 candidate B73 NLRs (from the NAM paper) and ask if each of these falls in a shared, polymorphic, or ambiguous region
against each NAM genome. This will tell me something about conservation of each NLR.

Next I will create a coordinate list of 2kb up and 2kb down from each NLR and intersect this with TE annotation to determine if these are
share TEs.
I will first use BEDtools merge to merge overlapping intervals (these are in effect, NLR clusters), followed by BEDtools intersect with TEs.

Open to view Data
V1 V2 V3 V4 V5 NLR KKNewList Distance between NLRs Cluster(3kb) 2kbflank1 2kbflank2
chr1 35063891 35068791 Zm00001eb010990 Gene NLR KK_new_list 17783294 NO 35061891 35070791
chr1 52852085 52855107 Zm00001eb015450 Gene NLR KK_new_list 46029646 NO 52850085 52857107
chr1 98884753 98889758 Zm00001eb024240 Gene NLR KK_new_list 69097442 NO 98882753 98891758
chr1 167987200 167991827 Zm00001eb030390 Gene NLR KK_new_list 16436246 NO 167985200 167993827
chr1 184428073 184439205 Zm00001eb033040 Gene NLR KK_new_list 16603702 NO 184426073 184441205
chr1 201042907 201046494 Zm00001eb037410 Gene NLR KK_new_list 25125093 NO 201040907 201048494
chr1 226171587 226177701 Zm00001eb042890 Gene NLR KK_new_list 4731839 NO 226169587 226179701
chr1 230909540 230913731 Zm00001eb044020 Gene NLR KK_new_list 66776409 NO 230907540 230915731
chr1 297690140 297701616 Zm00001eb061810 Gene NLR KK_new_list 4192227 NO 297688140 297703616
chr1 301893843 301896543 Zm00001eb063200 Gene NLR #N/A 304 CLUSTER 301891843 301898543
chr1 301896847 301904989 Zm00001eb063210 Gene NLR #N/A NO NO 301894847 301906989
chr10 1566941 1573505 Zm00001eb405270 Gene NLR KK_new_list 68196 NO 1564941 1575505
chr10 1641701 1648836 Zm00001eb405290 Gene NLR KK_new_list 123082 NO 1639701 1650836
chr10 1771918 1776181 Zm00001eb405370 Gene NLR KK_new_list 156652 NO 1769918 1778181
chr10 1932833 1934044 Zm00001eb405380 Gene NLR #N/A 460043 NO 1930833 1936044
chr10 2394087 2400185 Zm00001eb405670 Gene NLR REAL 120346 NO 2392087 2402185
chr10 2520531 2524130 Zm00001eb405700 Gene NLR KK_new_list 180163 NO 2518531 2526130
chr10 2704293 2715986 Zm00001eb405770 Gene NLR KK_new_list 106322 NO 2702293 2717986
chr10 2822308 2827757 Zm00001eb405860 Gene NLR KK_new_list 5174 NO 2820308 2829757
chr10 2832931 2844283 Zm00001eb405870 Gene NLR KK_new_list 67247 NO 2830931 2846283
chr10 2911530 3233043 Zm00001eb405880 Gene NLR KK_new_list 1637 CLUSTER 2909530 3235043
chr10 3234680 3236152 Zm00001eb405900 Gene NLR KK_new_list 11522 NO 3232680 3238152
chr10 3247674 3253625 Zm00001eb405910 Gene NLR KK_new_list 64488 NO 3245674 3255625
chr10 3318113 3319095 Zm00001eb405920 Gene NLR #N/A 14844 NO 3316113 3321095
chr10 3333939 3335714 Zm00001eb405930 Gene NLR KK_new_list 63865 NO 3331939 3337714
chr10 3399579 3404352 Zm00001eb405940 Gene NLR KK_new_list 37416 NO 3397579 3406352
chr10 3441768 3447157 Zm00001eb405960 Gene NLR KK_new_list 79174 NO 3439768 3449157
chr10 3526331 3532500 Zm00001eb405980 Gene NLR KK_new_list 1056140 NO 3524331 3534500
chr10 4588640 4593178 Zm00001eb406540 Gene NLR KK_new_list 7346 NO 4586640 4595178
chr10 4600524 4602520 Zm00001eb406550 Gene NLR #N/A 5138170 NO 4598524 4604520
chr10 9740690 9743607 Zm00001eb407930 Gene NLR KK_new_list 492 CLUSTER 9738690 9745607
chr10 9744099 9760530 Zm00001eb407940 Gene NLR KK_new_list 18368675 NO 9742099 9762530
chr10 28129205 28134407 Zm00001eb410900 Gene NLR KK_new_list 33080943 NO 28127205 28136407
chr10 61215350 61217574 Zm00001eb413380 Gene NLR #N/A 9403365 NO 61213350 61219574
chr10 70620939 70627663 Zm00001eb414490 Gene NLR KK_new_list 14696668 NO 70618939 70629663
chr10 85324331 85329344 Zm00001eb416830 Gene NLR KK_new_list 11610955 NO 85322331 85331344
chr10 96940299 96944201 Zm00001eb418840 Gene NLR KK_new_list 2429330 NO 96938299 96946201
chr10 99373531 99389974 Zm00001eb419270 Gene NLR KK_new_list 56800 NO 99371531 99391974
chr10 99446774 99453384 Zm00001eb419300 Gene NLR #N/A 3870 NO 99444774 99455384
chr10 99457254 99458927 Zm00001eb419310 Gene NLR #N/A 2863 CLUSTER 99455254 99460927
chr10 99461790 99471718 Zm00001eb419320 Gene NLR KK_new_list 328018 NO 99459790 99473718
chr10 99799736 99813696 Zm00001eb419360 Gene NLR KK_new_list 20820295 NO 99797736 99815696
chr10 120633991 120636293 Zm00001eb422890 Gene NLR KK_new_list 70 CLUSTER 120631991 120638293
chr10 120636363 120636791 Zm00001eb422900 Gene NLR #N/A 58 CLUSTER 120634363 120638791
chr10 120636849 120638732 Zm00001eb422910 Gene NLR #N/A NO NO 120634849 120640732
chr2 30418373 30422931 Zm00001eb077540 Gene NLR KK_new_list 69743611 NO 30416373 30424931
chr2 100166542 100171447 Zm00001eb087590 Gene NLR KK_new_list 16521675 NO 100164542 100173447
chr2 116693122 116696752 Zm00001eb089490 Gene NLR KK_new_list 21254330 NO 116691122 116698752
chr2 137951082 137962100 Zm00001eb091490 Gene NLR KK_new_list 92688 NO 137949082 137964100
chr2 138054788 138061170 Zm00001eb091500 Gene NLR KK_new_list 77889619 NO 138052788 138063170
chr2 215950789 215955047 Zm00001eb108350 Gene NLR KK_new_list 4230076 NO 215948789 215957047
chr2 220185123 220197261 Zm00001eb110140 Gene NLR KK_new_list 759705 NO 220183123 220199261
chr2 220956966 220959768 Zm00001eb110490 Gene NLR KK_new_list 7014141 NO 220954966 220961768
chr2 227973909 227977823 Zm00001eb112770 Gene NLR KK_new_list 3651346 NO 227971909 227979823
chr2 231629169 231633219 Zm00001eb113900 Gene NLR KK_new_list 3682546 NO 231627169 231635219
chr2 235315765 235322044 Zm00001eb115030 Gene NLR KK_new_list 8073 NO 235313765 235324044
chr2 235330117 235334226 Zm00001eb115050 Gene NLR KK_new_list 3396722 NO 235328117 235336226
chr2 238730948 238749473 Zm00001eb116510 Gene NLR KK_new_list 2618848 NO 238728948 238751473
chr2 241368321 241373187 Zm00001eb117700 Gene NLR KK_new_list 19244 NO 241366321 241375187
chr2 241392431 241398053 Zm00001eb117720 Gene NLR KK_new_list 692789 NO 241390431 241400053
chr2 242090842 242093725 Zm00001eb118040 Gene NLR KK_new_list 2792 CLUSTER 242088842 242095725
chr2 242096517 242119653 Zm00001eb118050 Gene NLR #N/A 684709 NO 242094517 242121653
chr2 242804362 242806337 Zm00001eb118300 Gene NLR #N/A NO NO 242802362 242808337
chr3 61902414 61905310 Zm00001eb131200 Gene NLR KK_new_list 52782073 NO 61900414 61907310
chr3 114687383 114691564 Zm00001eb134970 Gene NLR KK_new_list 567889 NO 114685383 114693564
chr3 115259453 115262468 Zm00001eb135090 Gene NLR #N/A 79384 NO 115257453 115264468
chr3 115341852 115348371 Zm00001eb135110 Gene NLR KK_new_list 182487 NO 115339852 115350371
chr3 115530858 115534811 Zm00001eb135130 Gene NLR #N/A 13839316 NO 115528858 115536811
chr3 129374127 129377587 Zm00001eb136790 Gene NLR KK_new_list 4575209 NO 129372127 129379587
chr3 133952796 133954515 Zm00001eb137530 Gene NLR KK_new_list 214575 NO 133950796 133956515
chr3 134169090 134170927 Zm00001eb137570 Gene NLR KK_new_list 4850105 NO 134167090 134172927
chr3 139021032 139026768 Zm00001eb138420 Gene NLR KK_new_list 54403124 NO 139019032 139028768
chr3 193429892 193432838 Zm00001eb150750 Gene NLR KK_new_list 143097 NO 193427892 193434838
chr3 193575935 193577493 Zm00001eb150770 Gene NLR KK_new_list -1491 CLUSTER 193573935 193579493
chr3 193576002 193577458 Zm00001eb150780 Gene NLR #N/A 1258787 NO 193574002 193579458
chr3 194836245 194840589 Zm00001eb151150 Gene NLR KK_new_list 22606342 NO 194834245 194842589
chr3 217446931 217477147 Zm00001eb157730 Gene NLR #N/A NO NO 217444931 217479147
chr4 1425175 1438104 Zm00001eb164570 Gene NLR KK_new_list 158725 NO 1423175 1440104
chr4 1596829 1601927 Zm00001eb164630 Gene NLR KK_new_list 566694 NO 1594829 1603927
chr4 2168621 2172051 Zm00001eb164870 Gene NLR KK_new_list 133642 NO 2166621 2174051
chr4 2305693 2307244 Zm00001eb164880 Gene NLR KK_new_list 4531 NO 2303693 2309244
chr4 2311775 2314207 Zm00001eb164890 Gene NLR #N/A 201196 NO 2309775 2316207
chr4 2515403 2516954 Zm00001eb164900 Gene NLR KK_new_list 82342 NO 2513403 2518954
chr4 2599296 2600296 Zm00001eb164910 Gene NLR #N/A 52855 NO 2597296 2602296
chr4 2653151 2654841 Zm00001eb164920 Gene NLR #N/A 68294 NO 2651151 2656841
chr4 2723135 2724832 Zm00001eb164930 Gene NLR #N/A 999 CLUSTER 2721135 2726832
chr4 2725831 2739388 Zm00001eb164940 Gene NLR KK_new_list 495405 NO 2723831 2741388
chr4 3234793 3255162 Zm00001eb165170 Gene NLR KK_new_list 4721 NO 3232793 3257162
chr4 3259883 3265271 Zm00001eb165200 Gene NLR #N/A 13664841 NO 3257883 3267271
chr4 16930112 16940894 Zm00001eb169030 Gene NLR KK_new_list 27384355 NO 16928112 16942894
chr4 44325249 44345549 Zm00001eb174770 Gene NLR KK_new_list 146565946 NO 44323249 44347549
chr4 190911495 190915285 Zm00001eb195760 Gene NLR KK_new_list 1988149 NO 190909495 190917285
chr4 192903434 192908835 Zm00001eb196580 Gene NLR KK_new_list 2482016 NO 192901434 192910835
chr4 195390851 195393586 Zm00001eb197290 Gene NLR KK_new_list 8839090 NO 195388851 195395586
chr4 204232676 204237107 Zm00001eb199520 Gene NLR #N/A 2675069 NO 204230676 204239107
chr4 206912176 206922066 Zm00001eb200120 Gene NLR KK_new_list 845029 NO 206910176 206924066
chr4 207767095 207769977 Zm00001eb200420 Gene NLR KK_new_list 1671688 NO 207765095 207771977
chr4 209441665 209452490 Zm00001eb200700 Gene NLR KK_new_list 45050 NO 209439665 209454490
chr4 209497540 209503851 Zm00001eb200710 Gene NLR KK_new_list 181358 NO 209495540 209505851
chr4 209685209 209708969 Zm00001eb200740 Gene NLR KK_new_list 51628 NO 209683209 209710969
chr4 209760597 209782508 Zm00001eb200750 Gene NLR KK_new_list 54685 NO 209758597 209784508
chr4 209837193 209843525 Zm00001eb200760 Gene NLR KK_new_list 10905736 NO 209835193 209845525
chr4 220749261 220759638 Zm00001eb202350 Gene NLR KK_new_list 3877817 NO 220747261 220761638
chr4 224637455 224643701 Zm00001eb202940 Gene NLR KK_new_list 15781441 NO 224635455 224645701
chr4 240425142 240431109 Zm00001eb205560 Gene NLR KK_new_list NO NO 240423142 240433109
chr5 21888981 21895227 Zm00001eb219900 Gene NLR #N/A -5933 CLUSTER 21886981 21897227
chr5 21889294 21890637 Zm00001eb219910 Gene NLR KK_new_list 50 CLUSTER 21887294 21892637
chr5 21890687 21891459 Zm00001eb219920 Gene NLR #N/A 618 CLUSTER 21888687 21893459
chr5 21892077 21892562 Zm00001eb219930 Gene NLR #N/A 19471661 NO 21890077 21894562
chr5 41364223 41365909 Zm00001eb224260 Gene NLR #N/A 4777 NO 41362223 41367909
chr5 41370686 41372354 Zm00001eb224270 Gene NLR KK_new_list 15861435 NO 41368686 41374354
chr5 57233789 57238852 Zm00001eb226690 Gene NLR KK_new_list 116569 NO 57231789 57240852
chr5 57355421 57359985 Zm00001eb226700 Gene NLR KK_new_list 5630 NO 57353421 57361985
chr5 57365615 57387844 Zm00001eb226710 Gene NLR KK_new_list 216990 NO 57363615 57389844
chr5 57604834 57609095 Zm00001eb226720 Gene NLR KK_new_list 179769 NO 57602834 57611095
chr5 57788864 57790637 Zm00001eb226760 Gene NLR KK_new_list 536179 NO 57786864 57792637
chr5 58326816 58330060 Zm00001eb226880 Gene NLR #N/A 731414 NO 58324816 58332060
chr5 59061474 59069065 Zm00001eb227070 Gene NLR KK_new_list 7647611 NO 59059474 59071065
chr5 66716676 66721872 Zm00001eb228790 Gene NLR KK_new_list 113625292 NO 66714676 66723872
chr5 180347164 180347727 Zm00001eb245050 Gene NLR KK_new_list 33061524 NO 180345164 180349727
chr5 213409251 213413332 Zm00001eb253770 Gene NLR KK_new_list NO NO 213407251 213415332
chr6 11507155 11511452 Zm00001eb261200 Gene NLR KK_new_list 1298451 NO 11505155 11513452
chr6 12809903 12814283 Zm00001eb261570 Gene NLR #N/A 266742 NO 12807903 12816283
chr6 13081025 13084717 Zm00001eb261610 Gene NLR #N/A 507041 NO 13079025 13086717
chr6 13591758 13592480 Zm00001eb261630 Gene NLR #N/A 15 CLUSTER 13589758 13594480
chr6 13592495 13593144 Zm00001eb261640 Gene NLR #N/A 122075 NO 13590495 13595144
chr6 13715219 13719719 Zm00001eb261660 Gene NLR KK_new_list 54430318 NO 13713219 13721719
chr6 68150037 68154834 Zm00001eb268960 Gene NLR #N/A 19998441 NO 68148037 68156834
chr6 88153275 88157186 Zm00001eb271410 Gene NLR KK_new_list 51457594 NO 88151275 88159186
chr6 139614780 139623752 Zm00001eb283180 Gene NLR #N/A 20981 NO 139612780 139625752
chr6 139644733 139655572 Zm00001eb283200 Gene NLR KK_new_list 27405575 NO 139642733 139657572
chr6 167061147 167065592 Zm00001eb291370 Gene NLR KK_new_list NO NO 167059147 167067592
chr7 2369632 2371252 Zm00001eb298790 Gene NLR #N/A 39509 NO 2367632 2373252
chr7 2410761 2412587 Zm00001eb298800 Gene NLR KK_new_list 156471 NO 2408761 2414587
chr7 2569058 2571228 Zm00001eb298830 Gene NLR KK_new_list 8286 NO 2567058 2573228
chr7 2579514 2580628 Zm00001eb298840 Gene NLR #N/A 884 CLUSTER 2577514 2582628
chr7 2581512 2632086 Zm00001eb298860 Gene NLR #N/A -37156 CLUSTER 2579512 2634086
chr7 2594930 2595683 Zm00001eb298880 Gene NLR #N/A 3374 NO 2592930 2597683
chr7 2599057 2600661 Zm00001eb298890 Gene NLR KK_new_list 23490 NO 2597057 2602661
chr7 2624151 2624891 Zm00001eb298920 Gene NLR #N/A 11847 NO 2622151 2626891
chr7 2636738 2637131 Zm00001eb298930 Gene NLR #N/A 79659 NO 2634738 2639131
chr7 2716790 2719122 Zm00001eb299040 Gene NLR #N/A 664 CLUSTER 2714790 2721122
chr7 2719786 2720319 Zm00001eb299050 Gene NLR #N/A 90995 NO 2717786 2722319
chr7 2811314 2813424 Zm00001eb299080 Gene NLR #N/A 6957 NO 2809314 2815424
chr7 2820381 2823397 Zm00001eb299090 Gene NLR #N/A 26252 NO 2818381 2825397
chr7 2849649 2851885 Zm00001eb299100 Gene NLR #N/A 108508 NO 2847649 2853885
chr7 2960393 2961094 Zm00001eb299160 Gene NLR #N/A 2141611 NO 2958393 2963094
chr7 5102705 5110265 Zm00001eb299830 Gene NLR KK_new_list -5765 CLUSTER 5100705 5112265
chr7 5104500 5105934 Zm00001eb299840 Gene NLR KK_new_list 24097449 NO 5102500 5107934
chr7 29203383 29247396 Zm00001eb304830 Gene NLR KK_new_list -43750 CLUSTER 29201383 29249396
chr7 29203646 29204114 Zm00001eb304840 Gene NLR #N/A 43478 NO 29201646 29206114
chr7 29247592 29260760 Zm00001eb304860 Gene NLR KK_new_list -3574 CLUSTER 29245592 29262760
chr7 29257186 29261106 Zm00001eb304870 Gene NLR #N/A 288303 NO 29255186 29263106
chr7 29549409 29552744 Zm00001eb304920 Gene NLR KK_new_list 64680164 NO 29547409 29554744
chr7 94232908 94252757 Zm00001eb310010 Gene NLR KK_new_list 867544 NO 94230908 94254757
chr7 95120301 95157771 Zm00001eb310060 Gene NLR KK_new_list 51095058 NO 95118301 95159771
chr7 146252829 146267516 Zm00001eb318600 Gene NLR KK_new_list 8841320 NO 146250829 146269516
chr7 155108836 155113412 Zm00001eb321430 Gene NLR KK_new_list 30719 NO 155106836 155115412
chr7 155144131 155158389 Zm00001eb321440 Gene NLR KK_new_list 4715815 NO 155142131 155160389
chr7 159874204 159899722 Zm00001eb322130 Gene NLR KK_new_list NO NO 159872204 159901722
chr8 29814265 29817242 Zm00001eb339320 Gene NLR KK_new_list 42253927 NO 29812265 29819242
chr8 72071169 72072142 Zm00001eb343880 Gene NLR #N/A 2 CLUSTER 72069169 72074142
chr8 72072144 72074447 Zm00001eb343890 Gene NLR KK_new_list 34485784 NO 72070144 72076447
chr8 106560231 106562858 Zm00001eb349330 Gene NLR KK_new_list 47724 NO 106558231 106564858
chr8 106610582 106614710 Zm00001eb349360 Gene NLR KK_new_list 28518429 NO 106608582 106616710
chr8 135133139 135142568 Zm00001eb355090 Gene NLR #N/A -8798 CLUSTER 135131139 135144568
chr8 135133770 135142391 Zm00001eb355100 Gene NLR KK_new_list 1765696 NO 135131770 135144391
chr8 136908087 136909619 Zm00001eb355630 Gene NLR KK_new_list 22666527 NO 136906087 136911619
chr8 159576146 159578040 Zm00001eb361650 Gene NLR KK_new_list 37661 NO 159574146 159580040
chr8 159615701 159617260 Zm00001eb361660 Gene NLR KK_new_list 7132087 NO 159613701 159619260
chr8 166749347 166753682 Zm00001eb363970 Gene NLR KK_new_list NO NO 166747347 166755682
chr9 2880701 2881264 Zm00001eb371700 Gene NLR KK_new_list 18158608 NO 2878701 2883264
chr9 21039872 21045940 Zm00001eb376840 Gene NLR KK_new_list 6001849 NO 21037872 21047940
chr9 27047789 27052392 Zm00001eb378630 Gene NLR KK_new_list 93548999 NO 27045789 27054392
chr9 120601391 120602364 Zm00001eb391090 Gene NLR #N/A 0 CLUSTER 120599391 120604364
chr9 120602364 120604669 Zm00001eb391100 Gene NLR KK_new_list NO NO 120600364 120606669

Manisha has classified each gene into one of three categories based on the AnchorWave alignment - 1. Shared (95% of gene falls in shared block)2. Polymorphic (95% of gene falls in polymorphic block) or 3. Ambiguous (doesn't fit the other two categories)

Each B73 vs NAM comparison looks like this:
col5 is the percent of the gene in shared block(s)
col6 is percent of gene in a B73 insertion seq
col7 will always be zero in the B73 vs files
col8 is percent of gene in unalignable block
col9 is percent missing data
col10 is a list of the Anchorwave blocks that correspond to the region
col11 is the classification of the gene

head B73_B97_gene_classification_by_full.tsv 
id_name chr start end alignable_region structural_insertion_inB73 structural_insertion_inB97 unalignable Missing_Data AW_Blocks classification
Zm00001eb000010_T001 chr1 34617 40204 1 0 0 0 0 chr1_AW_BlockID_8 shared
Zm00001eb000020_T001 chr1 41214 46762 1 0 0 0 0 chr1_AW_BlockID_8 shared
Zm00001eb000050_T001 chr1 108554 114382 0.605181880576527 0.394303363074811 0 0 0 chr1_AW_BlockID_8,chr1_AW_BlockID_9,chr1_AW_BlockID_10,chr1_AW_BlockID_11 ambiguous
Zm00001eb000060_T001 chr1 188559 189581 1 0 0 0 0 chr1_AW_BlockID_16 shared
Zm00001eb000070_T001 chr1 190192 198832 1 0 0 0 0 chr1_AW_BlockID_16 shared
Zm00001eb000080_T001 chr1 200262 203393 1 0 0 0 0 chr1_AW_BlockID_16 shared
Zm00001eb000100_T001 chr1 206619 209723 0.999355670103093 0 0 0 0 chr1_AW_BlockID_16,chr1_AW_BlockID_17,chr1_AW_BlockID_18 shared
Zm00001eb000110_T001 chr1 246422 247242 1 0 0 0 0 chr1_AW_BlockID_22 shared
Zm00001eb000120_T001 chr1 315219 315846 1 0 0 0 0 chr1_AW_BlockID_22 shared

I'm 'grepping' out all the NLR rows from each comparison (I'm 100% sure there's a better way to do this)

(base) grep -f B73_GeneIDs.txt B73_CML247_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_CML247_classification.txt
(base) grep -f B73_GeneIDs.txt B73_CML277_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_CML277_classification.txt
(base) grep -f B73_GeneIDs.txt B73_CML322_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_CML322_classification.txt
(base) grep -f B73_GeneIDs.txt B73_CML333_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_CML333_classification.txt
(base) grep -f B73_GeneIDs.txt B73_CML52_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_CML52_classification.txt
(base) grep -f B73_GeneIDs.txt B73_CML69_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_CML69_classification.txt
(base) grep -f B73_GeneIDs.txt B73_HP301_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_HP301_classification.txt
(base) grep -f B73_GeneIDs.txt B73_IL14H_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_IL14H_classification.txt
(base) grep -f B73_GeneIDs.txt B73_Il14H_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_Il14H_classification.txt
(base) grep -f B73_GeneIDs.txt B73_Ki11_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_Ki11_classification.txt
(base) grep -f B73_GeneIDs.txt B73_Ki3_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_Ki3_classification.txt
(base) grep -f B73_GeneIDs.txt B73_Ky21_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_Ky21_classification.txt
(base) grep -f B73_GeneIDs.txt B73_M162W_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_M162W_classification.txt
(base) grep -f B73_GeneIDs.txt B73_M37W_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_M37W_classification.txt
(base) grep -f B73_GeneIDs.txt B73_Mo18W_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_Mo18W_classification.txt
(base) grep -f B73_GeneIDs.txt B73_Ms71_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_Ms71_classification.txt
(base) grep -f B73_GeneIDs.txt B73_NC350_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_NC350_classification.txt
(base) grep -f B73_GeneIDs.txt B73_NC358_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_NC358_classification.txt
(base) grep -f B73_GeneIDs.txt B73_Oh43_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_Oh43_classification.txt
(base) grep -f B73_GeneIDs.txt B73_O7B_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_Oh7B_classification.txt
(base) grep -f B73_GeneIDs.txt B73_Oh7B_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_Oh7B_classification.txt
(base) grep -f B73_GeneIDs.txt B73_P39_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_P39_classification.txt
(base) grep -f B73_GeneIDs.txt B73_Tx303_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_Tx303_classification.txt
(base) grep -f B73_GeneIDs.txt B73_Tzi8_gene_classification_by_full.tsv > NLR_classification/B73_NLRs_Tzi8_classification.txt

After this I'm going into VIM and adding a column to the end with the cultivar identifier so I can cat them into a single file

#here's an example of the VIM command
#:%s/$/ B97/g

#and here's how I concatenated everything
cat *.txt > B73_NLRs_All_classification.txt

Here's a table showing the counts of shared, polymorphic, and ambiguous NLR genes by cultivar (B73 vs NAM) when looking at
Exons Only -- Note that I am filtering the polymorphic NLRs from this group because the they're too different to do SV analysis

ambiguous polymorphic shared
B97 19 32 125
CML103 26 32 118
CML228 23 33 120
CML247 36 32 108
CML277 35 27 114
CML322 38 29 109
CML333 16 43 117
CML52 38 17 121
CML69 34 25 117
HP301 23 42 111
Il14H 35 24 117
Ki11 28 32 116
Ki3 34 27 115
Ky21 30 27 119
M162W 27 33 116
M37W 28 32 116
Mo18W 30 22 124
Ms71 28 28 120
NC350 16 40 120
NC358 34 21 121
Oh43 25 27 124
P39 33 24 119
Tx303 21 41 114
Tzi8 27 38 111

And here's a comparison when you expand beyond exons and include the entire gene sequence pluse 1kb up and downstream

ambiguous polymorphic shared
B97 76 27 73
CML103 76 30 70
CML228 79 29 68
CML247 96 27 53
CML277 94 20 62
CML322 96 24 56
CML333 85 36 55
CML52 97 13 66
CML69 93 19 64
HP301 79 37 60
Il14H 87 23 66
Ki11 82 30 64
Ki3 84 24 68
Ky21 78 23 75
M162W 77 30 69
M37W 78 30 68
Mo18W 91 16 69
Ms71 76 22 78
NC350 81 33 62
NC358 87 20 69
Oh43 74 24 78
P39 81 21 74
Tx303 71 35 70
Tzi8 84 34 58