Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a convenience function to return the largest connected component of a graph #785

Closed
ngmaclaren opened this issue Apr 18, 2023 · 17 comments · Fixed by #786
Closed

Add a convenience function to return the largest connected component of a graph #785

ngmaclaren opened this issue Apr 18, 2023 · 17 comments · Fixed by #786
Assignees

Comments

@ngmaclaren
Copy link
Contributor

What is the feature or improvement you would like to see?
A function that takes a graph with possibly many components and returns an induced subgraph that is the largest connected component of the original graph.

Use cases for the feature
I routinely need to extract the largest connected component of a graph (or largest weakly connected component of a directed graph) for further analysis. This use case comes up both in sampling graphs from an ensemble (e.g., when the sampling algorithm has generated isolated nodes that I want to ignore) and in working with empirical networks (e.g., when a data set contains information on microscopic clusters of nodes that are not critical for understanding the primary network). I have a function written that does this (modified from a StackOverflow post, link and function below), but an official version in igraph would be better if there's a general interest.

References
I wrote the below function modified from this StackOverflow post. It seems to work well enough, but it probably isn't as complete as it could be. I'd be happy try to improve it if there are some things it needs to be up to standard.

get_gcc <- function(g) {
    if(is_directed(g)) {
        comps <- components(g, mode = "weak")
    } else comps <- components(g)

    gcc_id <- which.max(comps$csize)
    vids <- V(g)[comps$membership == gcc_id]
    g <- induced_subgraph(g, vids)
    return(g)
}
@szhorvat
Copy link
Member

This request makes a lot of sense. Both the Python and Mathematica interfaces of igraph have such a function, as this is indeed a very common task.

Since this is relatively simple to implement, would you like to open a pull request? I would suggest including a mode argument as well, and writing up some documentation.

@ngmaclaren
Copy link
Contributor Author

Sure, I will give it a shot. Thanks!

@ngmaclaren
Copy link
Contributor Author

Sorry to bug you with trivial things, but I'm having some trouble going the steps outlined in CONTRIBUTING.md.

I have adjusted the function to include the mode argument and the roxygen2-style documentation. However, I don't seem to be able to run testthat::test_local() without error. I think the problem is with the testing, not the changes I made. Details are below, but the bottom line question is: do you need me to run these error messages down, or shall I open the pull request? Is there something I'm missing in the build/test process?

Details

Here's the function, which I put in my forked version of ./rigraph/R/components.R:

#' Find the largest connected component of a graph
#'
#' This function returns the largest connected component of a graph. In case of
#' a tie, the first component by vertex order is returned. Vertex IDs from
#' the original graph are not retained in the returned graph.
#' 
## #' @rdname components
## #' @family components
#' @param graph The original graph.
#' @param mode Passed to `components()`. Ignored if `graph` is undirected.
#' @returns The largest connected component of the graph.
#' @seealso [components()], [induced_subgraph()]
#' @export
#'
#' @examples
#' g <- sample_gnp(8, .2, directed = TRUE)
#' V(g)$color <- 1:N
#' plot(g)
#' plot(largest_component(g))
#' plot(largest_component(g, mode = "strong"))
largest_component <- function(graph, mode = c("weak", "strong")) {
  if (!is_igraph(graph)) {
    stop("Not a graph object")
  }
    
  if (is_directed(graph)) {
    comps <- components(graph, mode = mode)
  } else comps <- components(graph)

  lcc_id <- which.max(comps$csize)
  vids <- V(graph)[comps$membership == lcc_id]

  induced_subgraph(graph, vids)
}

I ran make igraph in my local ./rigraph directory:

tools/builddocs.sh
ℹ Updating igraph documentation
ℹ Loading igraph
Warning: Failed to load at least one DLL.
Caused by error in `getDLLRegisteredRoutines.DLLInfo()`:
! must specify DLL via a “DLLInfo” object. See getLoadedDLLs()
Writing NAMESPACE
Writing largest_component.Rd
Rscript -e 'devtools::build(path = ".")'
── R CMD build ─────────────────────────────────────────────────────────────────
✔  checking for file ‘/home/neil/Documents/testing/rigraph/DESCRIPTION’ ...
─  preparing ‘igraph’:
✔  checking DESCRIPTION meta-information ...
─  cleaning src
─  running ‘cleanup’
─  installing the package to build vignettes (670ms)
✔  creating vignettes (4m 32.5s)
─  cleaning src
─  running ‘cleanup’
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories (404ms)
─  building ‘igraph_1.4.99.9005.tar.gz’
   
[1] "./igraph_1.4.99.9005.tar.gz"

The documentation in ./rigraph/man/ appears to be correct, so I think the package built. However, in a clean R session I get this:

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> setwd('/home/neil/Documents/testing/rigraph/')
> testthat::test_local()
Starting 2 test processes
✔ | F W S  OK | Context
⠸ [ FAIL 0 | WARN 0 | SKIP 0 | PASS 0 ] Starting up...                          

Error in `private$handle_error()`:
! testthat subprocess failed to start, stderr:
Error in `(function (command = NULL, args = character(), error_on_status = TRUE, …`:
! System command 'R' failed
---
Exit status: 1
Stdout & stderr:
* installing *source* package ‘igraph’ ...
** using staged installation
rm: missing operand
Try 'rm --help' for more information.
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... configure: error: in `/home/neil/Documents/testing/rigraph':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
ERROR: configuration failed for package ‘igraph’
* removing ‘/tmp/Rtmp0K6ohS/devtools_install_182ed56bbaea7/igraph’
---
Backtrace:
 1. global callr_startup_hook()
 2. pkgload::load_all("/home/neil/Documents/testing/rigraph/tests/testthat", …
 3. pkgbuild::compile_dll(path, quiet = quiet)
 4. pkgbuild:::install_min(path, dest = install_dir, components = "libs", args = if (need…
 5. pkgbuild::rcmd_build_tools("INSTALL", c(path, paste("--library=", dest, …
 6. pkgbuild::with_build_tools({ …
 7. base::withCallingHandlers(callr::rcmd_safe(..., env = env, spinner = FALSE, …
 8. callr::rcmd_safe(..., env = env, spinner = FALSE, show = FALSE, …
 9. callr:::run_r(options)
10. base::with(options, with_envvar(env, do.call(processx::run, c(list(bin, …
11. base::with.default(options, with_envvar(env, do.call(processx::run, …
12. base::eval(substitute(expr), data, enclos = parent.frame())
13. base::eval(substitute(expr), data, enclos = parent.frame())
14. callr:::with_envvar(env, do.call(processx::run, c(list(bin, args = real_cmdargs, …
15. base::force(code)
16. base::do.call(processx::run, c(list(bin, args = real_cmdargs, stdout_line_callba…
17. (function (command = NULL, args = character(), error_on_status = TRUE, …
18. base::throw(new_process_error(res, call = sys.call(), echo = echo, …
19. | base::signalCondition(cond)
20. (function (e) …
21. asNamespace("callr")$err$throw(e)
Caused by error:
! R session crashed with exit code 1
Run `rlang::last_trace()` to see where the error occurred.
> 

I have gcc-12.2.1-2 installed.

@krlmlr
Copy link
Contributor

krlmlr commented Apr 19, 2023

Thanks. CONTRIBUTING.md is slightly out of date, that's on me.

The following steps should work:

  • Clone the repo
  • (If cloned, run git clean -fdx to get rid of detritus)
  • Run R CMD INSTALL . in the shell or pkgload::load_all() in R
  • Run testthat::test_local() in R

Can you confirm?

@ngmaclaren
Copy link
Contributor Author

That did the trick: I ran testthat::test_local() without errors. Here are the results:

── Skipped tests  ───────────────────────────────────────────────
• getRversion() < 4.3 is TRUE (1)
• nested igraph call handling not implemented yet (1)
• No graph package (2)

[ FAIL 0 | WARN 0 | SKIP 4 | PASS 4090 ]

Now with a fresh R session I can use my function, but I can't load the man page. I've never used roxygen2 before, so maybe I did something wrong there?

Here's the session:

> pkgload::load_all()
ℹ Loading igraph
> g <- sample_gnp(100, .025)
> h <- largest_component(g)
> g
IGRAPH 0f6b634 U--- 100 113 -- Erdos-Renyi (gnp) graph
+ attr: name (g/c), type (g/c), loops (g/l), p (g/n)
+ edges from 0f6b634:
 [1]  1--12  6--14 13--22 10--24 19--29  2--31 32--34  5--35 22--36  6--37
[11]  7--39 12--39 21--39  2--42 13--42 10--45 12--47 20--47 27--47 37--49
[21] 38--50 40--51 43--51 35--53  1--55 10--55 28--55 25--56  7--57 28--57
[31] 10--58 35--58 47--59 44--60 11--61 43--61 13--62  4--63 53--63 61--64
[41] 23--66 47--66 65--67 66--67 30--68 34--68  8--69 49--69 60--69 22--70
[51] 28--70 49--70 64--70  3--72  6--74 14--75 34--75 50--75 64--75 14--76
[61] 20--77 60--77 28--78 68--78 64--79 69--80 70--80 12--81  7--82 28--82
[71] 39--83 63--83 40--84 54--84 58--84 67--84 81--85 37--86 64--86  6--87
+ ... omitted several edges
> h
IGRAPH 573f58d U--- 82 109 -- Erdos-Renyi (gnp) graph
+ attr: name (g/c), type (g/c), loops (g/l), p (g/n)
+ edges from 573f58d:
 [1]  1--10  5--12 11--15  8--17  2--22 23--24  4--25 15--26  5--27  6--29
[11] 10--29 14--29  2--31 11--31  8--34 10--36 13--36 19--36 27--37 28--38
[21] 30--39 32--39 25--40  1--42  8--42 20--42 18--43  6--44 20--44  8--45
[31] 25--45 36--46 33--47  9--48 32--48 11--49  3--50 40--50 48--51 16--53
[41] 36--53 52--54 53--54 21--55 24--55  7--56 37--56 47--56 15--57 20--57
[51] 37--57 51--57  5--59 12--60 24--60 38--60 51--60 12--61 13--62 47--62
[61] 20--63 55--63 51--64 56--65 57--65 10--66  6--67 20--67 29--68 50--68
[71] 30--69 41--69 45--69 54--69 66--70 27--71 51--71  5--72 27--72 59--72
+ ... omitted several edges
> ?largest_component
No documentation for ‘largest_component’ in specified packages and libraries:
you could try ‘??largest_component’

@krlmlr
Copy link
Contributor

krlmlr commented Apr 19, 2023

You need devtools::document() .

Would you like to improve CONTRIBUTING.md while we're at it? Perhaps @maelle can help too.

@ngmaclaren
Copy link
Contributor Author

There it is! Worked.

Sure, I can give it a try. Shall I send this pull request, then do a separate issue and pull request for CONTRIBUTING.md? I don't really know how pull requests work. I guess I send the draft changes, then you all review it on your end before accepting the changes?

@krlmlr
Copy link
Contributor

krlmlr commented Apr 19, 2023

Yes, please. Focused pull requests are best. We review and merge when good (or good enough). We're experimenting with optimistic merging, and are also happy to provide feedback if it helps you.

https://dmerej.info/blog/post/optimistic-merging/

@krlmlr
Copy link
Contributor

krlmlr commented Apr 19, 2023

  1. If not on a branch yet, please switch to a branch and bring in the changes
  2. Ensure the branch is based on the latest main
  3. Push the branch to your fork
  4. Open a PR
  5. Create a second branch from the main branch, switch to it
  6. Edit CONTRIBUTING.md
  7. Open a second PR

Does that make sense?

@ngmaclaren
Copy link
Contributor Author

Ha! Nope! But I'll figure it out. I think I skipped the branch part earlier, because locally git says I'm working on main. Can I just name my branch dev or should I name it something unique to me?

@ngmaclaren
Copy link
Contributor Author

Never mind: reading the blog post and clicking around on GitHub I think I'm tracking now. Let me try to submit this PR on my own branch like you say and then move on to the CONTRIBUTING.md issue. This will take me a few minutes at least.

@krlmlr
Copy link
Contributor

krlmlr commented Apr 19, 2023

Take your time, thanks for pushing through!

@ngmaclaren
Copy link
Contributor Author

Definitely! I use igraph pretty much every day, so hopefully this helps, even if only a little. I definitely appreciate all the time you all put into igraph.

@ngmaclaren
Copy link
Contributor Author

Ok, I think I've submitted the pull request. I made it a draft pull request to hopefully minimize any trouble if I've done something wrong. The branch should be called neil-lccfunc and the only changes should be to ./rigraph/R/components.R. To make the pull request, I wound up deleting and recreating my fork, then cloned the fresh copy. I made my changes without compiling, then committed the changes to a new branch. I pushed that branch to GitHub. GitHub prompted me to make the pull request, and that pull request was already set up to go to igraph/rigraph instead of ngmaclaren/rigraph. So that's where we're at. If this works, I'll start working on CONTRIBUTING.md.

@maelle
Copy link
Contributor

maelle commented Apr 20, 2023

If this works, I'll start working on CONTRIBUTING.md.

Thanks so much! Your feedback as a first-time contributor is extremely valuable.

Regarding pull requests, maybe https://usethis.r-lib.org/articles/pr-functions.html can be useful if you use usethis (and if you don't use usethis, maybe it can suit your workflow preferences).

@ngmaclaren
Copy link
Contributor Author

Thanks! I will check it out.

Copy link
Contributor

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators May 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants