Fix image path in efficient_ai.qmd and update data_engineering.qmd #301

Sara-Khosravi · 2024-07-02T00:05:29Z

Before submitting your Pull Request, please ensure that you have carefully reviewed and completed all items on this checklist.

Content
- The chapter content is complete and covers the topic in detail.
- All technical terms are well-defined and explained.
- Any code snippets or algorithms are well-documented and tested.
- The chapter follows a logical flow and structure.
References & Citations
- All references are correctly listed at the end of the chapter.
- In-text citations are used appropriately and match the references.
- All figures, tables, and images have proper sources and are cited correctly.
Quarto Website Rendering
- The chapter has been locally built and tested using Quarto.
- All images, figures, and tables render properly without any glitches.
- All images have a source or they are properly linked to external sites.
- Any interactive elements or widgets work as intended.
- The chapter's formatting is consistent with the rest of the book.
Grammar & Style
- The chapter has been proofread for grammar and spelling errors.
- The writing style is consistent with the rest of the book.
- Any jargon is clearly explained or avoided where possible.
Collaboration
- All group members have reviewed and approved the chapter.
- Any feedback from previous reviews or discussions has been addressed.
Miscellaneous
- All external links (if any) are working and lead to the intended destinations.
- If datasets or external resources are used, they are properly credited and linked.
- Any necessary permissions for reused content have been obtained.
Final Steps
- The chapter is pushed to the correct branch on the repository.
- The Pull Request is made with a clear title and description.
- The Pull Request includes any necessary labels or tags.
- The Pull Request mentions any stakeholders or reviewers who should take a look.

…gineering-Section Update data_engineering.qmd

Fixed the image path in efficient_ai.qmd

Sara-Khosravi · 2024-07-02T01:10:36Z

@profvjreddi

Hi Professor Vijay,

I hope all is well! I've made significant updates and fixes that are crucial for our project's progress:

I fixed the image path in efficient_ai: I corrected the reference to ensure it is rendered correctly in the document.
Updated data_engineering: I refined the content for better clarity and coherence.
I used Quarto for rendering to ensure the changes were adequately reflected in the output.

Additionally, I am working on telecom outage prediction and am fully committed to applying TinyML in this domain to the best of my abilities.

Working on this book has been a truly enjoyable experience, especially given my over 7 years of industry experience. I am deeply committed to this project and look forward to our continued collaboration.

Please let me know your feedback, and I am ready to work on other chapters.

Warm regards,
Sara

profvjreddi · 2024-07-04T17:57:45Z

@Sara-Khosravi thanks again for these edits. In the future, could you please make sure that you modify only one file at a time, as it is easier to do merges and rollbacks if needed?

Sara-Khosravi · 2024-07-04T17:59:47Z

Hi Vijay, Thank you for your positive feedback. I am glad that you are willing to take a look at my project. Regarding your question, I apologize for any oversight. I will double-check to ensure I used the latest version of the repository. If it turns out I didn't, I will make sure to synchronize with the latest version and update my changes accordingly. Thank you for bringing this to my attention. I appreciate your understanding and patience. Best regards, Sara Khosravi

…

On Thu, Jul 4, 2024 at 1:58 PM Vijay Janapa Reddi ***@***.***> wrote: @Sara-Khosravi <https://github.com/Sara-Khosravi> thanks again for these edits. In the future, could you please make sure that you modify only one file at a time, as it is easier to do merges and rollbacks if needed? — Reply to this email directly, view it on GitHub <#301 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASHBIFEVONCTL22Z23K3JNTZKWEK5AVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTCNRYGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

profvjreddi

Could you please look over the comments and make the small tweaks please?

profvjreddi · 2024-07-04T17:58:17Z

contents/efficient_ai/efficient_ai.qmd

@@ -173,7 +173,8 @@ Another important consideration is the relationship between model complexity and

 Furthermore, while benchmark datasets, such as ImageNet [@russakovsky2015imagenet], COCO [@lin2014microsoft], Visual Wake Words [@chowdhery2019visual], Google Speech Commands [@warden2018speech], etc. provide a standardized performance metric, they might not capture the diversity and unpredictability of real-world data. Two facial recognition models with similar benchmark scores might exhibit varied competencies when faced with diverse ethnic backgrounds or challenging lighting conditions. Such disparities underscore the importance of robustness and consistency across varied data. For example, @fig-stoves from the Dollar Street dataset shows stove images across extreme monthly incomes. Stoves have different shapes and technological levels across different regions and income levels. A model that is not trained on diverse datasets might perform well on a benchmark but fail in real-world applications. So, if a model was trained on pictures of stoves found in wealthy countries only, it would fail to recognize stoves from poorer regions.

-![Different types of stoves. Credit: Dollar Street stove images.](https://pbs.twimg.com/media/DmUyPSSW0AAChGa.jpg){#fig-stoves}
+![Different types of stoves. Credit: Dollar Street stove images.](images/jpg/DmUyPSSW0AAChGa.jpg))


Could you please rename the file as something like dollar_street.jpg

profvjreddi · 2024-07-04T17:58:34Z

contents/efficient_ai/efficient_ai.qmd

@@ -173,7 +173,8 @@ Another important consideration is the relationship between model complexity and

 Furthermore, while benchmark datasets, such as ImageNet [@russakovsky2015imagenet], COCO [@lin2014microsoft], Visual Wake Words [@chowdhery2019visual], Google Speech Commands [@warden2018speech], etc. provide a standardized performance metric, they might not capture the diversity and unpredictability of real-world data. Two facial recognition models with similar benchmark scores might exhibit varied competencies when faced with diverse ethnic backgrounds or challenging lighting conditions. Such disparities underscore the importance of robustness and consistency across varied data. For example, @fig-stoves from the Dollar Street dataset shows stove images across extreme monthly incomes. Stoves have different shapes and technological levels across different regions and income levels. A model that is not trained on diverse datasets might perform well on a benchmark but fail in real-world applications. So, if a model was trained on pictures of stoves found in wealthy countries only, it would fail to recognize stoves from poorer regions.

-![Different types of stoves. Credit: Dollar Street stove images.](https://pbs.twimg.com/media/DmUyPSSW0AAChGa.jpg){#fig-stoves}
+![Different types of stoves. Credit: Dollar Street stove images.](images/jpg/DmUyPSSW0AAChGa.jpg))
+{#fig-stoves}


For consistency could we please put this next to the closing ]

profvjreddi · 2024-07-04T17:59:34Z

contents/data_engineering/data_engineering.qmd

@@ -8,7 +8,7 @@ bibliography: data_engineering.bib
 Resources: [Slides](#sec-data-engineering-resource), [Videos](#sec-data-engineering-resource), [Exercises](#sec-data-engineering-resource), [Labs](#sec-data-engineering-resource)
 :::

-![_DALL·E 3 Prompt: Create a rectangular illustration visualizing the concept of data engineering. Include elements such as raw data sources, data processing pipelines, storage systems, and refined datasets. Show how raw data is transformed through cleaning, processing, and storage to become valuable information that can be analyzed and used for decision-making._](images/png/cover_data_engineering.png)


We should keep this as is because this is verbatim what went into the DALLE model :)

profvjreddi · 2024-07-04T18:00:25Z

contents/data_engineering/data_engineering.qmd

+ Once data is collected, thoughtful labeling through manual or AI-assisted annotation enables the creation of high-quality training datasets. Proper storage in databases, warehouses, or lakes facilitates easy access and analysis. Metadata provides contextual details about the data. Data processing transforms raw data into a clean, consistent format for machine learning model development. 
+Throughout this pipeline, transparency through documentation and provenance tracking is crucial for ethics, auditability, and reproducibility. Data licensing protocols also govern legal data access and use. Key challenges in data engineering include privacy risks, representation gaps, legal restrictions around proprietary data, and the need to balance competing constraints like speed versus quality.
+ By thoughtfully engineering high-quality training data, machine learning practitioners can develop accurate, robust, and responsible AI systems. This includes applications in embedded systems and TinyML, where resource constraints demand particularly efficient and effective data-handling practices. In the context of TinyML, data engineering practices take on a unique character. Resource-constrained devices often necessitate smaller datasets with high signal-to-noise ratios. Data collection may be limited to on-device sensors or specific environmental conditions. Crowdsourcing and synthetic data generation have become precious tools for generating specialized datasets with limited memory and processing power. Careful optimization techniques for data cleansing, feature selection, and model compression are essential for TinyML applications. By understanding these nuances, data engineers can empower the development of efficient and effective AI solutions at the edge.
+## Resources {#sec-data-engineering-resource .unnumbered}

 Data is the fundamental building block of AI systems. Without quality data, even the most advanced machine learning algorithms will fail. Data engineering encompasses the end-to-end process of collecting, storing, processing, and managing data to fuel the development of machine learning models. It begins with clearly defining the core problem and objectives, which guides effective data collection. Data can be sourced from diverse means, including existing datasets, web scraping, crowdsourcing, and synthetic data generation. Each approach involves tradeoffs between cost, speed, privacy, and specificity. Once data is collected, thoughtful labeling through manual or AI-assisted annotation enables the creation of high-quality training datasets. Proper storage in databases, warehouses, or lakes facilitates easy access and analysis. Metadata provides contextual details about the data. Data processing transforms raw data into a clean, consistent format for machine learning model development. Throughout this pipeline, transparency through documentation and provenance tracking is crucial for ethics, auditability, and reproducibility. Data licensing protocols also govern legal data access and use. Key challenges in data engineering include privacy risks, representation gaps, legal restrictions around proprietary data, and the need to balance competing constraints like speed versus quality. By thoughtfully engineering high-quality training data, machine learning practitioners can develop accurate, robust, and responsible AI systems, including embedded and TinyML applications.


Does this need to be deleted based on the above text?

There seems to be some repetition.

profvjreddi · 2024-07-04T18:03:45Z

Cool. Thanks for taking a look. I left some comments. Also, in the future, do you think we could make small changes because that makes it easier to merge things in. Vijay Janapa Reddi, Ph. D. | John L. Loeb Associate Professor of Engineering and Applied Sciences | John A. Paulson School of Engineering and Applied Sciences | Science and Engineering Complex (SEC) | 150 Western Ave, Room #5.305 | Boston, MA 02134 | Harvard University | My Website <http://scholar.harvard.edu/vijay-janapa-reddi> | Google Scholar <https://scholar.google.com/citations?hl=en&user=gy4UVGcAAAAJ&view_op=list_works&sortby=pubdate> | Edge Computing Lab <https://edge.seas.harvard.edu> | Book Meeting <https://fantastical.app/vjreddi/> | Contact Admin <https://scholar.harvard.edu/vijay-janapa-reddi/contact> | On Thu, Jul 04, 2024 at 2:00 PM, Sara Khosravi ***@***.***> wrote:

…

Hi Vijay, Thank you for your positive feedback. I am glad that you are willing to take a look at my project. Regarding your question, I apologize for any oversight. I will double-check to ensure I used the latest version of the repository. If it turns out I didn't, I will make sure to synchronize with the latest version and update my changes accordingly. Thank you for bringing this to my attention. I appreciate your understanding and patience. Best regards, Sara Khosravi On Thu, Jul 4, 2024 at 1:58 PM Vijay Janapa Reddi ***@***.***> wrote: > @Sara-Khosravi <https://github.com/Sara-Khosravi> thanks again for these > edits. In the future, could you please make sure that you modify only one > file at a time, as it is easier to do merges and rollbacks if needed? > > — > Reply to this email directly, view it on GitHub > <https://github.com/harvard-edge/cs249r_book/pull/ 301#issuecomment-2209411682>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ ASHBIFEVONCTL22Z23K3JNTZKWEK5AVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTCNRYGI> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#301 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABT6DFB7SRZJBMDS5IBWQH3ZKWESTAVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTGNBQHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Sara-Khosravi · 2024-07-04T18:06:56Z

It is greatly appreciated, Vijay. I will look at it and fix it as soon as possible. Also, I am working on the next section. I will update you after I consider your comment. Have a wonderful time. On Thu, Jul 4, 2024 at 2:04 PM Vijay Janapa Reddi ***@***.***> wrote:

…

Cool. Thanks for taking a look. I left some comments. Also, in the future, do you think we could make small changes because that makes it easier to merge things in. Vijay Janapa Reddi, Ph. D. | John L. Loeb Associate Professor of Engineering and Applied Sciences | John A. Paulson School of Engineering and Applied Sciences | Science and Engineering Complex (SEC) | 150 Western Ave, Room #5.305 | Boston, MA 02134 | Harvard University | My Website <http://scholar.harvard.edu/vijay-janapa-reddi> | Google Scholar < https://scholar.google.com/citations?hl=en&user=gy4UVGcAAAAJ&view_op=list_works&sortby=pubdate> | Edge Computing Lab <https://edge.seas.harvard.edu> | Book Meeting <https://fantastical.app/vjreddi/> | Contact Admin <https://scholar.harvard.edu/vijay-janapa-reddi/contact> | On Thu, Jul 04, 2024 at 2:00 PM, Sara Khosravi ***@***.***> wrote: > Hi Vijay, > > Thank you for your positive feedback. I am glad that you are willing to > take a look at my project. > > Regarding your question, I apologize for any oversight. I will > double-check > to ensure I used the latest version of the repository. If it turns out I > didn't, I will make sure to synchronize with the latest version and update > my changes accordingly. > > Thank you for bringing this to my attention. I appreciate your > understanding and patience. > > Best regards, > Sara Khosravi > > On Thu, Jul 4, 2024 at 1:58 PM Vijay Janapa Reddi ***@***.***> > wrote: > > > @Sara-Khosravi <https://github.com/Sara-Khosravi> thanks again for > these > > edits. In the future, could you please make sure that you modify only > one > > file at a time, as it is easier to do merges and rollbacks if needed? > > > > — > > Reply to this email directly, view it on GitHub > > <https://github.com/harvard-edge/cs249r_book/pull/ > 301#issuecomment-2209411682>, > > or unsubscribe > > <https://github.com/notifications/unsubscribe-auth/ > ASHBIFEVONCTL22Z23K3JNTZKWEK5AVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTCNRYGI> > > > . > > You are receiving this because you were mentioned.Message ID: > > ***@***.***> > > > > — > Reply to this email directly, view it on GitHub > < #301 (comment)>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ABT6DFB7SRZJBMDS5IBWQH3ZKWESTAVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTGNBQHA> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#301 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASHBIFBSYRLR3CJYJEFJV6DZKWFBNAVCNFSM6AAAAABKGO63ZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGQYTMOJUGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

profvjreddi · 2024-08-18T15:11:31Z

Looked over this and the changes are already merged in from other edits we did, so these updates are already in!

Sara-Khosravi and others added 5 commits June 29, 2024 00:44

Update data_engineering.qmd

75f84fb

Merge pull request #2 from Sara-Khosravi/Sara-Khosravi-Update-Data-En…

4a318b8

…gineering-Section Update data_engineering.qmd

Fix image path in efficient_ai.qmd

212a999

Merge branch 'dev' into dev

a2f6bc0

Update efficient_ai.qmd

3e0381b

Fixed the image path in efficient_ai.qmd

profvjreddi requested changes Jul 4, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/dev' into dev

8692795

profvjreddi closed this Aug 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix image path in efficient_ai.qmd and update data_engineering.qmd #301

Fix image path in efficient_ai.qmd and update data_engineering.qmd #301

Sara-Khosravi commented Jul 2, 2024 •

edited

Loading

Sara-Khosravi commented Jul 2, 2024 •

edited

Loading

profvjreddi commented Jul 4, 2024

Sara-Khosravi commented Jul 4, 2024 via email

profvjreddi left a comment

profvjreddi Jul 4, 2024

profvjreddi Jul 4, 2024

profvjreddi Jul 4, 2024

profvjreddi Jul 4, 2024

profvjreddi commented Jul 4, 2024 via email

Sara-Khosravi commented Jul 4, 2024 via email

profvjreddi commented Aug 18, 2024

Fix image path in efficient_ai.qmd and update data_engineering.qmd #301

Fix image path in efficient_ai.qmd and update data_engineering.qmd #301

Conversation

Sara-Khosravi commented Jul 2, 2024 • edited Loading

Sara-Khosravi commented Jul 2, 2024 • edited Loading

profvjreddi commented Jul 4, 2024

Sara-Khosravi commented Jul 4, 2024 via email

profvjreddi left a comment

Choose a reason for hiding this comment

profvjreddi Jul 4, 2024

Choose a reason for hiding this comment

profvjreddi Jul 4, 2024

Choose a reason for hiding this comment

profvjreddi Jul 4, 2024

Choose a reason for hiding this comment

profvjreddi Jul 4, 2024

Choose a reason for hiding this comment

profvjreddi commented Jul 4, 2024 via email

Sara-Khosravi commented Jul 4, 2024 via email

profvjreddi commented Aug 18, 2024

Sara-Khosravi commented Jul 2, 2024 •

edited

Loading

Sara-Khosravi commented Jul 2, 2024 •

edited

Loading