Implementing Data Structures with Traits and Generics
Reminder: please run cargo clippy
and cargo fmt
on all your code before submitting, and deal with all warnings (yellow text), not just errors (red text).
For this assignment, we will implement a graph data structure in Rust.
No code will be given for the assignment. Additionally, don't use any crates
unless you check on Piazza first, other than serde
.
The assignment is divided into three parts: (1) the data structure itself, (2) advanced functionality (including custom trait implementations related to the data structure), and (3) an application of your data structure to some target problem.
Your graph data structure should be generic in two arguments: the vertex label type V
and the edge label type E
. You should allow directed edges (edges in each direction) and self-loop edges; however, you should prohibit having multiple edges between the same source and target vertex.
Your Graph<V, E>
should support at least the following basic methods.
It should also support more advanced search functions; more on this under "Advanced methods".
-
Creating a new empty graph;
-
Adding a vertex with a given vertex label;
-
Adding an edge between two existing vertices with a given edge label;
-
Ensuring a vertex with a given label exists (or adding it if it does not exist);
-
Ensuring an edge between two existing vertices (or adding it if it does not exist);
-
Removing an existing edge;
-
Removing a vertex (and all associated edges);
-
Getting the out-degree or in-degree (number of outgoing edges or ingoing edges) of a vertex;
-
A vertex iterator which iterates through all vertex labels in your graph: I recommend you use the
impl Iterator
syntax to help with this. Some syntax to get started:fn iter_vertices(&self) -> impl Iterator<Item = &V> + '_
. Theimpl Iterator
means you are hiding the return type, but you are asking the type checker to verify that whatever the return type is, it implements the Iterator trait. The'_
lifetime placeholder is to make the borrow checker happy since the iterator object has to have read-only access to&self
over its existence. You can also add explicit'a'
lifetimes if needed for your use case. -
Two edge iterators:
iter_sources
anditer_targets
which, given a&V
, iterate through all source edges and target edges from that vertex in the graph. You will want the iterator to return(&E, &V)
, i.e. both the edge label and the source/target vertex. -
A
merge_vertices
function which merges verticesv1
andv2
into justv1
, and moves all edges from and tov2
to be from and tov1
(respectively). -
Existence checks:
has_vertex
(accepting one&V
argument) andhas_edge
(accepting two&V
arguments), returningbool
. -
A
get_edge
function to get the edge label between two vertices, orNone
if there is no edge. -
An invariant check
fn assert_invariant()
which usesdebug_assert!
to (exhaustively) check any invariants you are assuming about your data structure.
To avoid worrying about difficult lifetime issues, I recommend that you use identifiers to represent vertices and edges.
To do this, define new types: struct VertexIden(usize)
and struct EdgeIden(usize)
.
Whenever a vertex or edge is added to your graph, first assign it a new label;
then use that label internally to represent the vertex or edge.
For example you could store the original labels in a Vec<Option<V>>
(Option
to allow vertex removal), but the actual graph
using HashMaps.
The type wrappers ensure that you can't accidentally try to use a vertex as an edge or
vice versa; you have to do .0
to get the underlying usize
value.
Make sure that most of your methods are efficient (O(1)
insert and remove), but merge_vertices
will necessarily be less efficient.
HashMap
or a similar data structure is necessary for this.
- Edit: Your graph will need ways to convert between identifiers and the original objects: so you need a way to go from
V
toVertexIden
and fromVertexIden
toV
. Instead of aVec
as mentioned above, you could useHashMap<VertexIden, V>
andHashMap<&V, VertexIden>
. Note that the latter has a reference toV
as the HashMap key: to get this to work, you will need your Graph to have a lifetime, likeGraph<'a, V, E>
, and then you can haveHashMap<&'a V, VertexIden>
. Alternatively, if you want, you can do the assignment requiringClone
and useClone
when a vertex is added; in this case you can get away withHashMap<VertexIden, V>
andHashMap<V, VertexIden>
. See the other Edit below. Either way, I recommend starting out in your implementation by implementing the functions which go between identifiers and the original objects. So try to write your methods likeget_vertex_iden(&self, &V) -> VertexIden
andget_vertex(&self, VertexIden) -> &V
. If you can get these two functions working (and the same thing for edges), then that should solidify the main design, and the rest of the assignment should go more smoothly.
Avoid unnecessary trait bounds. Your Graph
should not require any trait bounds on V
and E
by default. In particular, it should be usable even if V
and E
don't implement Clone
.
- Edit: Some clarification on this: you can require other bounds, like
Eq
andHash
. Additionally, if you want a slightly easier task, I am allowing you to requireV: Clone
andE: Clone
if you prefer, but make sure that.clone()
is only used sparingly. That is, you should only need to clone a vertex or edge when it is added to the graph, and not anywhere else.
Please #[derive(...)]
or implement Clone
, Debug
, Display
, Index
, and IndexMut
for your graph. Although it should not require any trait bounds by default, your graph will require trait bounds in order to implement these traits: for example Clone
won't be implemented unless V
and E
implement Clone
.
For Index
and IndexMut
, it makes the most sense to accept a (&V, &V)
index, and then return a reference to the edge between those vertices.
Avoid the use of clone and unnecessary owned arguments. That means that, for example, if you have a function fn add_vertex
it should take a v: V
(to avoid cloning), but if you have add_edge
it should take v1: &V
and v2: &V
(and edge_label: E
): it needs ownership over the edge, but not ownership over the vertices.
On the other hand, your ensure_vertex
function needs v: V
since it needs to insert the vertex if it doesn't exist.
Or if you want to be fancy, ensure_vertex
can take a function argument f: impl Fn() -> V
that is not called unless needed, called with a closure like move || v
.
For this part, write the following advanced methods and custom traits.
Use serde
to derive the Serialize and Deserialize
traits for your object.
Then implement two derived functions:
-
fn save_to_file(&self, filename: &str) -> Result<(), String>
-
fn load_from_file(filename: &str) -> Result<Self, String>
Write at least one unit test for these (see unit testing below).
We want our graph to support more interesting functionality, like using a DFS or BFS to check reachability between vertices.
For this, implement two utility structs, one for DFS and one for BFS.
Your structs will need to have, as one of the fields, a next
function
that gives the next items from a current item that can be used during search.
For this, we can use function traits in Rust:
next
will be a function implementing the Fn
trait.
Here is a starting point:
struct DFS<T: Eq + PartialEq + Hash, F: Fn(T) -> Vec<T>> {
next: F,
visited: HashSet<T>
to_visit: Vec<T>,
}
Implement a new
function to create a new DFS
.
Then, your DFS
function should implement the Iterator
trait:
impl<T: Eq + PartialEq + Hash, F: Fn(T) -> Vec<T>> Iterator for DFS<T, F> {
...
}
Do the same thing for a BFS
struct.
Then, implement corresponding methods for your Graph
structure,
using the DFS and BFS structs.
Implement both vertex searches and edge searches, in both forwards and backwards directions.
For example, the edge search could look something like this:
impl<V, E> Graph<V, E> {
fn edge_bfs_forward(&self, start_vertex: &V) -> impl Iterator<Item = &V> + 'a {
...
}
}
Internally, it would use a BFS
struct over VertexIden
.
This may require playing around with function closure and lifetime compiler errors if you get down the wrong path. Don't give up and be patient! Definitely post to Piazza if you get stuck.
We can use traits in Rust to abstract behavior in an implementation-independent way. Now that you have the DFS functionality, implement a trait which works for a general graph-like collection:
trait Reachable {
type T;
fn can_reach(start: &T, end: &T) -> bool;
fn distance(start: &T, end: &T) -> Option<usize>;
}
Implement Reachable
for your Graph
struct. You can use vertex reachability
and ignore edge reachability for this part.
The distance
should be the minimum distance from one vertex to the other.
Additionally, implement at least two derived methods for Reachable
.
Some ideas:
-
fn can_reach_eachother(start: &T, end: &T)
-
fn is_closer_than(start: &T, end: &T, dist: usize) -> bool
Finally, implement a trait called Summary
, similar to what we saw in class
(Lecture 6 part 2)
that allows summarizing an object in n
lines or fewer.
I.e. the core method should be fn summarize(&self, lines: usize) -> String
.
Implement Summary
for Graph<V, E>
assuming the trait bounds
V, E: Summary
.
Think of a cool example application for your graph object!
Maybe you have a bunch of People
objects, and you want to form
friendships between them, where the graph tracks if one person is a friend
of another.
Something more extreme would be to use your graph to implement a compiler
for a very basic language, where the graph is the control flow graph of the
language.
Or, if you like to go with something more mathematical, you could use your graph to
implement some basic graphs, like cycle graphs, path graphs, etc.
This can be a relatively short demonstration, you don't need to implement a full-fledged set of features, just a small file with a proof-of-concept and a few examples in the unit tests or main function.
We will be less pedantic about error handling on this assignment than on the previous one.
Your graph insertions/deletions don't need to have a Result
type; instead use Option
when a getter-type method could fail, and use your judgment depending on the method
when a setter-type can fail:
either use assert!
in the code or fall back to some default functionality (e.g. don't
insert the object).
Sort your code into modules! The general rule is one struct or trait per file.
Smaller structs (like VertexIden
etc.) or related traits can go in the same file as
each other.
Remember to include all the modules in lib.rs
or else they won't be compiled.
Write at least one unit test for every function or method that you implement. Traits do not have unit tests, but there should be unit tests for the types implementing that trait.
This assignment is submitted via GitHub classroom. The deadline is Wednesday, April 7, 2021 at 11:59pm Eastern. (Edit: updated deadline.)