-
-
Notifications
You must be signed in to change notification settings - Fork 106
ParsingXBRL
Some SEC filings include data in XBRL (eXtensible Business Reporting Language) format. Edgartools provides ways to extract and use this data.
XBRL can either be embedded in the filing or provided as attachments to the filing. This guide focuses on extracting XBRL from filing attachments.
Inline XBRL is also being parsed using HTMLDocument.from_html(html)
and the data is available but for now this parser is not completely developed.
To list all filings that include XBRL use index="xbrl"
in the get_filings
function.
filings = get_filings(index="xbrl")
Some forms like 424B2 offerings have only the XBRL instance document as an attachment. For these filings the XBRL parser will extract only an XBRLInstance
class containing the data from the instance document.
XBRLDocuments(filings[0].attachments)
Other forms like 10-K have multiple XBRL attachments.
These will be parsed into an XBRLData
class containing the XBRLInstance
but also include information about the labels, the presentation, the calculations and other details that allow for precise analysis of the XBRL data.
XBRLDocuments(filings[2].attachments)
The XBRL parser is integrated within edgartools so it is as simple as calling filing.xbrl()
- If there is no attached XBRL this will return
None
. - If there is only the XBRL instance document this will return an
XBRLInstance
object. - If there are multiple XBRL attachments this will return an
XBRLData
object.
The XBRLInstance
object contains the facts extracted from the XBRL instance document in a dataframe facts
.
-
facts
: A pandas DataFrame containing the facts extracted from the XBRL instance document. -
query_facts(concept=None, period=None, dimensions=None)
: Query the facts in the dataframe -
dimensions
: Show the dimensions in the XBRL instance -
get_all_dimensions
: List all the dimensions in the XBRL instance -
get_dimension_values
: Get the values for a specific dimension e.g.get_dimension_values("us-gaap:ProductOrServiceAxis")
The dimensions
attribute shows the dimensions in the XBRL instance.
instance.dimensions
You can select a dimension by index using the bracket []
notation.
instance.dimensions[0]
You can select a dimension by name using the get_dimension
method. Because this can return potentially several values the object returned is
a DimensionValue
instance.dimensions["ecd:IndividualAxis"]
>>> Dimension(name='ecd:IndividualAxis', values=['aapl:DeirdreOBrienMember', 'aapl:JeffWilliamsMember'])
You can again select a dimension by name and value using a tuple. Then you can get the facts for that dimension.
dimension_value = instance.dimensions[('srt:ConsolidationItemsAxis', 'us-gaap:CorporateNonSegmentMember')]
>>> DimensionValue(dimension='srt:ConsolidationItemsAxis', value='us-gaap:CorporateNonSegmentMember')
dimension_value.get_facts()
You can get the values for a specific dimension using the get_dimension_values
method.
instance.get_dimension_values("us-gaap:ProductOrServiceAxis")
For example, see the XBRL Instance for a 424B4. This has 8 facts in the dataframe.
You can query the underlying facts by using the query
method on the facts
dataframe, using pandas syntax
instance.facts.query("concept=='ffd:FormTp'")
Alternatively you can use the convenience function query_facts
on the XBRLInstance
object.
instance.query_facts(concept='ffd:FormTp')
This is useful if the facts have dimensions
instance.query_facts(dimensions={'srt:ProductOrServiceAxis': 'aapl:IPhoneMember'},
end_date='2023-09-30'
)
XBRLData is the container for all the XBRL data extracted from the filing. It contains the XBRLInstance
as well as the labels
, presentation
, calculations
and other data that is extracted from the filing.
Importantly, it contains all the Statement
's extracted from the filing. These are shown when displaying the XBRLData
object.
-
parse(instance_xml, presentation_xml, labels, calculations)
: Class method to parse XBRL documents from XML strings. -
extract(filing)
: Class method to create anXBRLData
instance from aFiling
object. -
get_statement(statement_name, ...)
: Retrieves a specific financial statement as a pandas DataFrame. -
list_statement_definitions()
: Returns a list of available statement names. -
get_concept_for_label(label)
: Finds the concept corresponding to a given label. -
get_labels_for_concept(concept)
: Retrieves all labels for a given concept.
filing = filings[2]
xbrl_data = filing.xbrl()
A Statement
is a single table from the XBRL data. This could be the financial statements like the balance sheet, income statement, or cash flow statement, or it could be a table with text content like disclosure text.
The first few statements in the XBRLData
object are usually the CoverPage (also called DocumentAndEntityInformtion, and the financial statements.
Statement names are not standard among companies, so the statements table allows you to access specific statements by name or index.
The XBRLData
object contains a Statements
variable which contains a list of statements.
You can access these using the bracket []
notation or by name using the get_statement
function.
Choose one of the statements from the list of statements in the XBRLData
object using the index.
statement = xbrl_data.statements[0]
Choose one of the statements from the list of statements in the XBRLData
object.
statement = xbrl_data.get_statement("CONSOLIDATEDSTATEMENTOFOPERATIONS")