Skip to content

Open imgData Collection codeBase for Object Detection, Segmentation & Classification

License

Notifications You must be signed in to change notification settings

Sid-047/Image-DataCollection

Repository files navigation

Image Scrapping

License GitHub Stars GitHub Issues GitHub Forks

Image Data Collection Tool for Object Detection, Segmentation & Classification achieved through Web Scrapping (Google Images) ~ Image Scrapping Peeps!

Table of Contents

Installation

  1. Clone the Repository:
    git clone https://github.com/Sid-047/Image-DataCollection.git

Usage

  1. Navigate to the Project Directory:

    cd Image-DataCollection
  2. Install Dependencies:

    pip install -r requirements.txt

    Note: Mozilla FireFox Web Browser is Recommended

    Windows

     winget install Mozilla.Firefox

    MacOS

     brew install firefox

    Linux

     sudo snap install firefox
  3. Wait, Wanna Create QueryList?

    python queryList.py

    Here it Comes!

    Come On Start Entering the Search QueryKeyWords Yo!
    Enter 'Exit' to Finish
    
    '
    <Search Keyword Query1>
    <Search Keyword Query2>
    .
    .
    .
    <Search Keyword QueryN>
    Exit
    '
    
    The Search KeyWord Query List Yo!
    ['<Search Keyword Query1>', '<Search Keyword Query2>', ..., '<Search Keyword QueryN>']

    Now copy the QueryList

  4. Enlist the Search Queries:

    #ImgScrapping.py
    q = ['<Search Keyword Query>', '<Search Keyword Query>', '<Search Keyword Query>']

    Alter the line of Code or Paste the queryList from the Previous Stage

  5. Run the Tool:

    python ImgScrapping.py
  6. Boom! That is it.

  7. But Wait! What if yo Program's crashed? No Worries:

     python URLset_convo.py

    Select the right TimeStamp, then GooD to Go!

  8. Just the Last One:

     python ImgDown.py

    You could see the Image Files Written

Features

  • Automated Image Web Scrapping via Selenium.
  • The image URLs are backed in a .txt file in Real-time.
  • Image files are Dynamically written without OverWriting.
  • Concept of Threading & TimeOut is used to efficiently write the Image files.
  • The Image URLs are scrapped at first, next off the Image downloads are initiated.
  • The QueryLiat can be generated via the built-in tool as per the User Inputs each Line.
  • Should a glitch disrupt the execution, Fear Not! the URLs stored in the .txt files can be served to initiate Image downloads via ImgDown.py.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Open imgData Collection codeBase for Object Detection, Segmentation & Classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages