Skip to content

JanDitzen/htmltab2stata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

htmltab2stata

version release

Loading html tables into Stata.

Table of Contents

  1. Syntax
  2. Options
  3. Description
  4. Examples
  5. How to install
  6. About

1. Syntax

The general syntax is:

htmltab2stata  , url(_url_) [tablenumber(_integer_) firstrow href]

2. Options

url(url) the url of the html website to be processed. The url has to be a downloadable html website. url can be a web address or a local html file.

tablenumber(integer) number of table within the html document. Default is 1, i.e. the first table is processed.

firstrow Use firstrow of table as variable names.

href Links enclosed in <a href=....></a> are added to the content transferred to Stata.

3. Description

htmltab2stata parses html code from websites. It detects tables enclosed with the html <table> environment and transforms the table into a Stata dataset. To do so, htmltab2stata parses the html code and uses <tr> as row identifiers and code enclosed in <td> as columns. It only transfers content which is not enclosed in < > to Stata, unless option href is used for links. Empty cells remain empty in the Stata dataset.

4. Examples

For all examples, the following table in html code saved in table.html is processed:

Content
Table 1
<table>
 <tr><th>Country</th><th>Population</th><th>GDP</th></tr>
  <tr><td>Country A</td><td>10</td><td>100</td></tr>
  <tr><td>Country B</td><td>20</td><td>5</td></tr>
  <tr><td>Country C</td><td>500</td><td>10</td></tr>
</table>
More Content
Table 2
<table>
  <tr><th>Firstname</th><th>Surname</th><th>Webpage</th></tr>
  <tr><td>Adam</td><td>Smith</td><td>none</td></tr>
  <tr><td>Allan</td><td>Richards</td><td><a href="www.google.com">webpage</a></td></tr>
  <tr><td>Richard</td><td>Johnson</td><td></td></tr>
</table>

Loading Table into Stata as a dataset:

htmltab2stata , url(table.html)

Returns:

.list

     +---------------------------------+
     |    myvar1       myvar2   myvar3 |
     |---------------------------------|
  1. |   Country   Population      GDP |
  2. | Country A           10      100 |
  3. | Country B           20        5 |
  4. | Country C          500       10 |
     +---------------------------------+

To use the first column as variable names the option firstrow is required.

htmltab2stata , url(table.html) firstrow
.list
     +----------------------------+
     |   Country   Popula~n   GDP |
     |----------------------------|
  1. | Country A         10   100 |
  2. | Country B         20     5 |
  3. | Country C        500    10 |
     +----------------------------+

To process Table 2, use the first row as variable names and add the url of the hyperlink as text to the contents:

stata htmltab2stata , url(table.html) firstrow tablenumber(2) href
. list

     +----------------------------------------------+
     | Firstn~e    Surname                  Webpage |
     |----------------------------------------------|
  1. |     Adam      Smith                     none |
  2. |    Allan   Richards   www.google.com webpage |
  3. |  Richard    Johnson                          |
     +----------------------------------------------+

5. How to install

In Stata:

net install htmltab2stata, from(https://janditzen.github.io/htmltab2stata/)

6. About

Jan Ditzen

Email: jan.ditzen@unibz.it

Web: www.jan.ditzen.net

This version: 1.0