feathr-ai · xiaoyongzhu · Jul 9, 2022 · Jun 24, 2022 · Jun 24, 2022 · Jun 24, 2022
diff --git a/README.md b/README.md
@@ -12,7 +12,7 @@ Feathr is the feature store that is used in production in LinkedIn for many year
 Feathr lets you:
 
 - **Define features** based on raw data sources (batch and streaming) using pythonic APIs.
-- **Register and get features by names** during model training and model inferencing.
+- **Register and get features by names** during model training and model inference.
 - **Share features** across your team and company.
 
 Feathr automatically computes your feature values and joins them to your training data, using point-in-time-correct semantics to avoid data leakage, and supports materializing and deploying your features for use online in production.
@@ -151,7 +151,7 @@ Follow the [quick start Jupyter Notebook](./feathr_project/feathrcli/data/feathr
 
 ## 🚀 Roadmap
 
-For a complete roadmap with esitmated dates, please [visit this page](https://github.com/linkedin/feathr/milestones?direction=asc&sort=title&state=open).
+For a complete roadmap with estimated dates, please [visit this page](https://github.com/linkedin/feathr/milestones?direction=asc&sort=title&state=open).
 
 - [x] Private Preview release
 - [x] Public Preview release

diff --git a/docs/concepts/feature-definition.md b/docs/concepts/feature-definition.md
@@ -110,7 +110,7 @@ Note that the `agg_func`([API doc](https://feathr.readthedocs.io/en/latest/feath
 | Aggregation Type | Input Type | Description |
 | --- | --- | --- |
 |SUM, COUNT, MAX, MIN, AVG	|Numeric|Applies the the numerical operation on the numeric inputs. |
-|MAX_POOLING, MIN_POOLING, AVG_POOLING	| Numeric Vector | Applies the max/min/avg operation on a per entry bassis for a given a collection of numbers.|
+|MAX_POOLING, MIN_POOLING, AVG_POOLING	| Numeric Vector | Applies the max/min/avg operation on a per entry basis for a given a collection of numbers.|
 |LATEST| Any |Returns the latest not-null values from within the defined time window |
 
 

diff --git a/docs/concepts/feature-generation.md b/docs/concepts/feature-generation.md
@@ -48,16 +48,13 @@ client.materialize_features(settings)
 
 Note that if you don't have features available in `now`, you'd better specify a `BackfillTime` range where you have features.
 
-Also, Feathr will submit a materialization job for each of the step for performance reasons. I.e. if you have 
-`BackfillTime(start=datetime(2022, 2, 1), end=datetime(2022, 2, 20), step=timedelta(days=1))`, Feathr will submit 20 jobs to run in parallel for maximum performance.
+Also, Feathr will submit a materialization job for each of the step for performance reasons. I.e. if you have `BackfillTime(start=datetime(2022, 2, 1), end=datetime(2022, 2, 20), step=timedelta(days=1))`, Feathr will submit 20 jobs to run in parallel for maximum performance.
 
 More reference on the APIs:
 
 - [BackfillTime API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.BackfillTime)
 - [client.materialize_features() API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.FeathrClient.materialize_features)
 
-
-
 ## Consuming features in online environment
 
 After the materialization job is finished, we can get the online features by querying the `feature table`, corresponding `entity key` and a list of `feature names`. In the example below, we query the online features called `f_location_avg_fare` and `f_location_max_fare`, and query with a key `265` (which is the location ID).
@@ -67,6 +64,7 @@ res = client.get_online_features('nycTaxiDemoFeature', '265', ['f_location_avg_f
 ```
 
 More reference on the APIs:
+
 - [client.get_online_features API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.FeathrClient.get_online_features)
 
 ## Materializing Features to Offline Store

diff --git a/docs/concepts/feature-join.md b/docs/concepts/feature-join.md
@@ -8,47 +8,48 @@ parent: Feathr Concepts
 
 ## Intuitions of Frame Join
 
-Observation dataset has 2 records as below, and we want to use it as the 'spine' dataset, joining two features onto it:
+Observation dataset has 2 records as below, and we want to use it as the 'spine' dataset, joining two
+features onto it:
 
-1) Feature `page_view_count` from dataset `page_view_data`
+1. Feature 'page_view_count' from dataset 'page_view_data'
 
-2) Feature `like_count` from dataset `like_count_data`
+2. Feature 'like_count' from dataset 'like_count_data'
 
-The Feathr feature join in this case, will use the field `id` as join key of the observation data, and also consider the timestamp of each row during the join, making sure the joined feature values are collected **before** the observation_time of each row.
+2) Feature `like_count` from dataset `like_count_data`
 
-| id | observe_time | Label |
-| --- | --- | --- |
-| 1 | 2022-01-01 | Yes |
-| 1 | 2022-01-02 | Yes |
-| 2 | 2022-01-02 | No  |
+| id  | observe_time | Label |
+| --- | ------------ | ----- |
+| 1   | 2022-01-01   | Yes   |
+| 1   | 2022-01-02   | Yes   |
+| 2   | 2022-01-02   | No    |
 
 Dataset `page_view_data` contains `page_view_count` of each user at a given time:
 
-| UserId | log_time | page_view_count |
-| --- | --- | --- |
-|1 | 2022-01-01 | 101 |
-|1 | 2022-01-02 | 102 |
-|1 | 2022-01-03 | 103 |
-|2 | 2022-01-02 | 200 |
-|3 | 2022-01-02 | 300 |
+| UserId | log_time   | page_view_count |
+| ------ | ---------- | --------------- |
+| 1      | 2022-01-01 | 101             |
+| 1      | 2022-01-02 | 102             |
+| 1      | 2022-01-03 | 103             |
+| 2      | 2022-01-02 | 200             |
+| 3      | 2022-01-02 | 300             |
 
-Dataset `like_count_data` contains `like_count` of each user at a given time:
+Dataset 'like_count_data' contains "like_count" of each user at a given time:
 
 | UserId | updated_time | 'like_count' |
-| --- | --- | --- |
-|1 | 2022-01-01 | 11 |
-|1 | 2022-01-02 | 12 |
-|1 | 2022-01-03 | 13 |
-|2 | 2022-01-02 | 20 |
-|3 | 2022-01-02 | 30 |
+| ------ | ------------ | ------------ |
+| 1      | 2022-01-01   | 11           |
+| 1      | 2022-01-02   | 12           |
+| 1      | 2022-01-03   | 13           |
+| 2      | 2022-01-02   | 20           |
+| 3      | 2022-01-02   | 30           |
 
 The expected joined output, a.k.a. training dataset would be assuming feature:
 
-| id | observe_time | Label | f_page_view_count | f_like_count|
-| --- | --- | --- | --- | --- |
-|1 | 2022-01-01 | Yes | 101 | 11 |
-|1 | 2022-01-02 | Yes | 102 | 12 |
-|2 | 2022-01-02 | No | 200 | 20
+| id  | observe_time | Label | f_page_view_count | f_like_count |
+| --- | ------------ | ----- | ----------------- | ------------ |
+| 1   | 2022-01-01   | Yes   | 101               | 11           |
+| 1   | 2022-01-02   | Yes   | 102               | 12           |
+| 2   | 2022-01-02   | No    | 200               | 20           |
 
 Note: In the above example, feature `f_page_view_count` and `f_like_count` are defined as simply a reference of field `page_view_count` and `like_count` respectively. Timestamp in these 3 datasets are considered automatically.
 
@@ -83,8 +84,8 @@ The path of a dataset as the 'spine' for the to-be-created training dataset. We
 2. A column representing the event time of the row. By default, Feathr will make sure the feature values joined have a timestamp earlier than it, ensuring no data leakage in the resulting training dataset.
 
 3. Other columns will be simply pass through onto the output training dataset.
-The key fields from the observation data, which are used to joined with the feature data.
-List of feature names to be joined with the observation data. They must be pre-defined in the Python APIs.
+   The key fields from the observation data, which are used to joined with the feature data.
+   List of feature names to be joined with the observation data. They must be pre-defined in the Python APIs.
 
 The time information of the observation data used to compare with the feature's timestamp during the join.
 

diff --git a/docs/concepts/point-in-time-join.md b/docs/concepts/point-in-time-join.md
@@ -18,10 +18,7 @@ The model will perform better during training(usually), but it will not perform
 
 Point-in-time correctness ensures that no future data is used for training.
 
-Point-in-time correctness can be achieved via two approaches. If your observation data has a global timestamp for all observation events, then you can simply time-travel your feature dataset back to that timestamp. If your observation data has different timestamps for each observation events, then you need to point-in-time join for each events.
-
-- The first approach is easier to implement but have more restrictions (global timestamp).
-- The second approach provides better flexibility and no feature data is wasted. Feathr uses the second approach and can scale to large datasets.
+Point-in-time correctness can be achieved via two approaches. If your observation data has a global timestamp for all observation events, then you can simply time-travel your feature dataset back to that timestamp. If your observation data has different timestamps for each observation events, then you need to point-in-time join for each events. The first approach is easier to implement but have more restrictions(global timestamp). The second approach provides better flexibility and no feature data is wasted. Feathr uses the second approach and can scale to large datasets.
 
 ## Point-in-time Feature Lookup in Feathr
 

diff --git a/docs/how-to-guides/azure-deployment.md b/docs/how-to-guides/azure-deployment.md
@@ -208,7 +208,7 @@ external_ip=$(curl -s http://whatismyip.akamai.com/)
 echo "External IP is: ${external_ip}. Adding it to firewall rules"
 az synapse workspace firewall-rule create --name allowAll --workspace-name $synapse_workspace_name --resource-group $resoruce_group_name --start-ip-address "$external_ip" --end-ip-address "$external_ip"
 
-# sleep for a few seconds for the chagne to take effect
+# sleep for a few seconds for the change to take effect
 sleep 2
 az synapse role assignment create --workspace-name $synapse_workspace_name --role "Synapse Contributor" --assignee $service_principal_name
 

diff --git a/docs/how-to-guides/expression-language.md b/docs/how-to-guides/expression-language.md
@@ -18,7 +18,7 @@ If the feature transformation can't be accomplished with a short line of express
 
 # Usage Guide
 
-Your data transformation can be composed of one or a few smaller tasks. Divide and conquer! For each individual task, check the following sections on how to acheive them. Then combine them. For example, we have a trip mileage column but it's in string form. We want to compare if it's a long trip(> 30 miles). So we need to cast it into double and then compare with 30. We can do `cast_double(mile_column) > 30`.
+Your data transformation can be composed of one or a few smaller tasks. Divide and conquer! For each individual task, check the following sections on how to achieve them. Then combine them. For example, we have a trip mileage column but it's in string form. We want to compare if it's a long trip(> 30 miles). So we need to cast it into double and then compare with 30. We can do `cast_double(mile_column) > 30`.
 
 ## Field accessing
 
@@ -48,7 +48,7 @@ You can concatenate string with `concat(str1, str2)`. For exmample, `concat("app
 
 ## Arithmetic Operations
 
-For data of numeric types, you can use arithmetic operators to perform opterations. Here are the supported operators: `+,-,*,/`
+For data of numeric types, you can use arithmetic operators to perform operations. Here are the supported operators: `+,-,*,/`
 
 ## Logical Operators