- Article
- 9 minutes to read
Summary
Item | Description |
---|---|
Release State | General Availability |
Products | Power BI (Datasets) Power BI (Dataflows) Power Apps (Dataflows) Excel Dynamics 365 Customer Insights |
Authentication Types Supported | Anonymous Windows Basic Web API Organizational Account |
Function Reference Documentation | Web.Page Web.BrowserContents |
Note
Some capabilities may be present in one product but not others due to deployment schedules and host-specific capabilities.
Prerequisites
- Internet Explorer 10
Capabilities supported
- Basic
- Advanced
- URL parts
- Command timeout
- HTTP request header parameters
Load Web data using Power Query Desktop
To load data from a web site with Power Query Desktop:
Select Get Data > Web in Power BI or From Web in the Data ribbon in Excel.
Choose the Basic button and enter a URL address in the text box. For example, enter
https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States
. Then select OK.If the URL address you enter is invalid, a
warning icon will appear next to the URL textbox.
If you need to construct a more advanced URL before you connect to the website, go to Load Web data using an advanced URL.
Select the authentication method to use for this web site. In this example, select Anonymous. Then select the level to you want to apply these settings to—in this case, https://en.wikipedia.org/. Then select Connect.
The available authentication methods for this connector are:
(Video) Import from Web in Excel with Power QueryAnonymous: Select this authentication method if the web page doesn't require any credentials.
Windows: Select this authentication method if the web page requires your Windows credentials.
Basic: Select this authentication method if the web page requires a basic user name and password.
Web API: Select this method if the web resource that you’re connecting to uses an API Key for authentication purposes.
Organizational account: Select this authentication method if the web page requires organizational account credentials.
Note
When uploading the report to the Power BI service, only the anonymous, Windows and basic authentication methods are available.
The level you select for the authentication method determines what part of a URL will have the authentication method applied to it. If you select the top-level web address, the authentication method you select here will be used for that URL address or any subaddress within that address. However, you might not want to set the top URL address to a specific authentication method because different subaddresses could require different authentication methods. For example, if you were accessing two separate folders of a single SharePoint site and wanted to use different Microsoft Accounts to access each one.
Once you've set the authentication method for a specific web site address, you won't need to select the authentication method for that URL address or any subaddress again. For example, if you select the https://en.wikipedia.org/ address in this dialog, any web page that begins with this address won't require that you select the authentication method again.
Note
If you need to change the authentication method later, go to Changing the authentication method.
From the Navigator dialog, you can select a table, then either transform the data in the Power Query editor by selecting Transform Data, or load the data by selecting Load.
(Video) Getting Started with Power Query APIs - It's surprisingly easy!The right side of the Navigator dialog displays the contents of the table you select to transform or load. If you're uncertain which table contains the data you're interested in, you can select the Web View tab. The web view lets you see the entire contents of the web page, and highlights each of the tables that have been detected on that site. You can select the check box above the highlighted table to obtain the data from that table.
On the lower left side of the Navigator dialog, you can also select the Add table using examples button. This selection presents an interactive window where you can preview the content of the web page and enter sample values of the data you want to extract. For more information on using this feature, go to Get webpage data by providing examples.
Load Web data using Power Query Online
To load data from a web site with Power Query Online:
From the Get Data dialog box, select either Web page or Web API.
In most cases, you'll want to select the Web page connector. For security reasons, you'll need to use an on-premises data gateway with this connector. The Web Page connector requires a gateway because HTML pages are retrieved using a browser control, which involves potential security concerns. This concern isn't an issue with Web API connector, as it doesn't use a browser control.
In some cases, you might want to use a URL that points at either an API or a file stored on the web. In those scenarios, the Web API connector (or file-specific connectors) would allow you to move forward without using an on-premises data gateway.
Also note that if your URL points to a file, you should use the specific file connector instead of the Web page connector.
Enter a URL address in the text box. For this example, enter
https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States
.Select the name of your on-premises data gateway.
Select the authentication method you'll use to connect to the web page.
(Video) Excel Power Query Web Scraping, Custom Functions & Parameters - Part 3The available authentication methods for this connector are:
Anonymous: Select this authentication method if the web page doesn't require any credentials.
Windows: Select this authentication method if the web page requires your Windows credentials.
Basic: Select this authentication method if the web page requires a basic user name and password.
Organizational account: Select this authentication method if the web page requires organizational account credentials.
Once you've chosen the authentication method, select Next.
From the Navigator dialog, you can select a table, then transform the data in the Power Query Editor by selecting Transform Data.
Load Web data using an advanced URL
When you select Get Data > From Web in Power Query Desktop, in most instances you'll enter URLs in the Basic setting. However, in some cases you may want to assemble a URL from its separate parts, set a timeout for the connection, or provide individualized URL header data. In this case, select the Advanced option in the From Web dialog box.
Use the URL parts section of the dialog to assemble the URL you want to use to get data. The first part of the URL in the URL parts section most likely would consist of the scheme, authority, and path of the URI (for example, http://contoso.com/products/
). The second text box could include any queries or fragments that you would use to filter the information provided to the web site. If you need to add more than one part, select Add part to add another URL fragment text box. As you enter each part of the URL, the complete URL that will be used when you select OK is displayed in the URL preview box.
Depending on how long the POST request takes to process data, you may need to prolong the time the request continues to stay connected to the web site. The default timeout for both POST and GET is 100 seconds. If this timeout is too short, you can use the optional Command timeout in minutes to extend the number of minutes you stay connected.
You can also add specific request headers to the POST you send to the web site using the optional HTTP request header parameters drop-down box. The following table describes the request headers you can select.
Request Header | Description |
---|---|
Accept | Specifies the response media types that are acceptable. |
Accept-Charset | Indicates which character sets are acceptable in the textual response content. |
Accept-Encoding | Indicates what response content encodings are acceptable in the response. |
Accept-Language | Indicates the set of natural languages that are preferred in the response. |
Cache-Control | Indicates the caching policies, specified by directives, in client requests and server responses. |
Content-Type | Indicates the media type of the content. |
If-Modified-Since | Conditionally determines if the web content has been changed since the date specified in this field. If the content hasn't changed, the server responds with only the headers that have a 304 status code. If the content has changed, the server will return the requested resource along with a status code of 200. |
Prefer | Indicates that particular server behaviors are preferred by the client, but aren't required for successful completion of the request. |
Range | Specifies one or more subranges of the selected representation data. |
Referer | Specifies a URI reference for the resource from which the target URI was obtained. |
Import files from the web
Normally when you import a local on-premises file in Power Query Desktop, you'll use the specific file-type connector to import that file, for example, the JSON connector to import a JSON file or the CSV connector to import a CSV file. However, if you're using Power Query Desktop and the file you want to import is located on the web, you must use the Web connector to import that file. As in the local case, you'll then be presented with the table that the connector loads by default, which you can then either Load or Transform.
The following file types are supported by the Web Connector:
- Access database
- CSV document
- Excel workbook
- JSON
- Text file
- HTML page
- XML tables
For example, you could use the following steps to import a JSON file on the https://contoso.com/products
web site:
From the Get Data dialog box, select the Web connector.
Choose the Basic button and enter the address in the URL box, for example:
http://contoso.com/products/Example_JSON.json
Select OK.
If this is the first time you're visiting this URL, select Anonymous as the authentication type, and then select Connect.
Power Query Editor will now open with the data imported from the JSON file. Select the View tab in the Power Query Editor, then select Formula Bar to turn on the formula bar in the editor.
As you can see, the Web connector returns the web contents from the URL you supplied, and then automatically wraps the web contents in the appropriate document type specified by the URL (
Json.Document
in this example).
Handling dynamic web pages
Web pages that load their content dynamically might require special handling. If you notice sporadic errors in your web queries, it's possible that you're trying to access a dynamic web page. One common example of this type of error is:
- You refresh the site.
- You see an error (for example, "the column 'Foo' of the table wasn't found").
- You refresh the site again.
- No error occurs.
These kinds of issues are usually due to timing. Pages that load their content dynamically can sometimes be inconsistent since the content can change after the browser considers loading complete. Sometimes Web.BrowserContents downloads the HTML after all the dynamic content has loaded. Other times the changes are still in progress when it downloads the HTML, leading to sporadic errors.
The solution is to pass the WaitFor
option to Web.BrowserContents
, which indicates either a selector or a length of time that should be waited for before downloading the HTML.
How can you tell if a page is dynamic? Usually it's pretty simple. Open the page in a browser and watch it load. If the content shows up right away, it's a regular HTML page. If it appears dynamically or changes over time, it's a dynamic page.
See also
- Extract data from a Web page by example
- Troubleshooting the Power Query Web connector
FAQs
How do I use Power Query in Excel Web? ›
- Select the Data tab > then choose Refresh All.
- Open the Queries Pane > then select Refresh.
...
- Step 1: Convert Query to a Function. Starting off where we left the last example, we need to go to the Query Editor Home tab and open the Advanced Editor. ...
- Step 2: Generate Page Start Numbers. ...
- Step 3: Invoke Custom Function.
...
How to extract data from a website
- Code a web scraper with Python. ...
- Use a data service. ...
- Use Excel for data extraction. ...
- Web scraping tools.
- Scrape the content directly from the JavaScript.
- Scrape the website as we view it in our browser — using Python packages capable of executing the JavaScript.
Connect to an Excel workbook from Power Query Online
To make the connection from Power Query Online: Select the Excel option in the connector selection. In the Excel dialog box that appears, provide the path to the Excel workbook. If necessary, select an on-premises data gateway to access the Excel workbook.
A web query is a text file having a file extension of .
A web query file contains the URL of the web page that holds the data. The query becomes part of the Excel worksheet. The result set of a query is called a QueryTable.
The first thing you'll need to do is to get the URL of your SharePoint Library. Note this will be something like https://yourdomain.sharepoint.com/sites/yourlibrary. Now, from Power BI (or Power Query in Excel) select Get data and choose SharePoint folder, and then click Connect.
How do I extract data from a Web page automatically in Excel? ›Select Data > Get & Transform > From Web. Press CTRL+V to paste the URL into the text box, and then select OK. In the Navigator pane, under Display Options, select the Results table. Power Query will preview it for you in the Table View pane on the right.
How do I extract all pages from a website? ›- Inspect the website HTML that you want to crawl.
- Access URL of the website using code and download all the HTML contents on the page.
- Format the downloaded content into a readable format.
- Extract out useful information and save it into a structured format.
- Go to Data > Get External Data > From Web.
- A browser window named “New Web Query” will appear.
- In the address bar, write the web address. ...
- The page will load and will show yellow icons against data/tables.
- Select the appropriate one.
- Press the Import button.
Is Power Query better than VBA? ›
Power Query can easily replace VBA (Visual Basic for Applications) as it enables you to: Process your tables simply by clicking on buttons. Plus, no coding skills are needed! Visualize your operations step-by-step without running a single macro.
Can I Automate Power Query? ›With Power Query, we can automate our report by developing a query that pulls data from all the files in a given folder to create a single data set. We can quickly create a PivotTable from this single data set to summarize the transactions by general ledger account and by month.
Is Power Query and M query same? ›Power Query is where you pull your data into Power BI. M is the coding language used by Powery Query. You can use Power Query by pointing and clicking and the code in M will essentially be created for you. You can also write your own code in M directly.
Is it illegal to pull data from a website? ›Web scraping is completely legal if you scrape data publicly available on the internet. But some kinds of data are protected by international regulations, so be careful scraping personal data, intellectual property, or confidential data.
Can you scrape dynamic content from a website? ›Dynamic websites are based on code that is rendered once they are loaded on a browser. Therefore, the content to be scraped technically does not exist before the page is loaded. This requires the web scraping process to include a step for rendering the page content on a browser.
Is Power Query available for Excel 365 online? ›The Power Query experience is available in all Excel 2016 or later Windows stand alone versions and Microsoft 365 subscription plans on the Data tab in the Get & Transform group.
How do I use Power Query in Office 365? ›Load a query from the Queries and Connections pane
In Excel, select Data > Queries & Connections, and then select the Queries tab. In the list of queries, locate the query, right click the query, and then select Load To. The Import Data dialog box appears. Decide how you want to import the data, and then select OK.
Connect to a SharePoint folder from Power Query Online
To connect to a SharePoint folder: From the Data sources page, select SharePoint folder. Paste the SharePoint site URL you copied in Determine the site URL to the Site URL text box in the SharePoint folder dialog box.
A web query or web search query is a query that a user enters into a web search engine to satisfy their information needs. Web search queries are distinctive in that they are often plain text and boolean search directives are rarely used.
How do I open a Microsoft web query file? ›- On the Data tab, in the Get External Data group, click From Other Sources, and then click From Microsoft Query. The Choose Data Source dialog box is displayed.
- In the Choose Data Source dialog box, click the Queries tab.
- Double-click the saved query that you want to open.
Is Excel web same as desktop? ›
Excel Online version looks just the same as the desktop version of Excel. With Excel online being cloud-based, you get an easier collaboration. Access any file from any device, at any time. You don't need to worry about not being able to use the Excel files created in Excel online in your Excel desktop app.
How do I link a SharePoint file to a Power Query? ›To connect to a SharePoint folder in Power Query, go to Data > Get Data > From File > From SharePoint Folder. Enter the folder path from above and click OK.
How do I connect Power Query to access database in Excel? ›Connect to an Access database from Power Query Desktop
Select the Access database option in the connector selection. Browse for and select the Access database you want to load. Then select Open. If the Access database is online, use the Web connector to connect to the database.
...
- On the External Data tab, select More > SharePoint List.
- Specify the SharePoint site.
- Select Link to the data source by creating a linked table, and then click Next.
- Select the list you want to link to, and then click OK.
VBA extends the capabilities of Microsoft Office tools and allows users to develop advanced functions and complex automation. VBA can also be used to write macros to pull data from websites into Excel.
How do I copy all content from a website? ›Ask Leo says you can use the Ctrl+A keyboard command to select everything on the page, then Ctrl+C to copy everything.
Which method is used to extract a webpage? ›Web scraping is an automated method used to extract large amounts of data from websites. The data on the websites are unstructured. Web scraping helps collect these unstructured data and store it in a structured form.
How do I retrieve data from a website in Excel? ›Select Data > Get & Transform > From Web. Press CTRL+V to paste the URL into the text box, and then select OK. In the Navigator pane, under Display Options, select the Results table.
How can I get data from a website to inspect? ›Inspecting the HTML of a Website
Firstly load the web page you want to scrape from. Right click on the page and select inspect. This will load the HTML of the website which shows the make-up of the website. Select the tool at the top left of the pane to highlight the code responsible for each part of the web page.
Step 1) Open an Excel-based Macro and access the developer option of excel. Step 2) Select Visual Basic option under Developer ribbon. Step 3) Insert a new module. Step 5) Access the reference option under the tool tab and reference Microsoft HTML Object Library and Microsoft internet control.
Which websites allow web scraping? ›
- Table of Contents.
- Overview.
- Top 10. Mercadolibre.
- Top 09. Twitter.
- Top 8. Indeed.
- Top 7. Tripadvisor.
- Top 6. Google.
- Top 5. Yellowpages.