Proven uses of Web Data Extractor

I want to extract contact data of travel related companies.

Go to New Session Dialog

Select "Source = Search Engines"

Enter travel in Keyword Box

Select what type of data you want to extract (email, phone, fax, ...)

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV or line by line

Click OK button

I want to extract contact data of travel related companies of Australia.

Repeat previous step but select "Engine = Australia" from Engine Listing Dialog. You can lunch this dialog by clicking "Engines" button of New Session - General Tab.

By default US/International Engines are selected.

I want to get more data of travel related companies of Australia.

Repeat previous step but use more keywords like

  • travel
  • hotel
  • cruise

I want to extract all data from a website http://www.mydomain.com

Go to New Session Dialog

Select "Source = WebSite"

Enter website URL in Starting Address box: like http://www.mydomain.com

Select depth = 0 (to spider entire website , see more about depth here)

Select what type of data you want to extract (email, phone, fax, ...)

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV or line by line 

Click OK button

I want to extract all photographers contact data from yahoo dir Photographers to send them invitation to visit my new photographer forum.

Go to New Session Dialog

Select "Source = WebSite"

Enter website URL in Starting Address box: like http://dir.yahoo.com/Arts/Visual_Arts/Photography/Photographers/

Select depth = 0 ; Check "Stay within Full URL"
These 2 combination tells WDE to process entire photographers dir but not other part of yahoo dir.

Select what type of data you want to extract (email, phone, fax, ...)

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV or line by line 

Now go to External Site tab - select "Follow External URLs" - Select Spidering Mode (Intelligent or you define depth)
[ Intelligent Spidering:When you set this mode, WDE uses special technique and only processes potential pages that may contain contact information (phone, fax, email).  ]

Now back to General tab and Click OK button

I have a list of urls in a file and I want to extract data from those urls.

Go to New Session Dialog

Select "Source = URLs from File"

Enter url file path in File name box. This file must be plain text file with one URL per line and starting with http:// string each line.

Select Depth = 0 for entire website extraction of each website located in the text file  or select "process 1 page only" to spider only the specified url.

Select what type of data you want to extract (email, phone, fax, ...)

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV or line by line 

Click OK button

I want to compile a list of offshore, banking, tax, accounting related websites that do link exchange with other sites.

Go to New Session Dialog

Select "Source = Search Engines"

Generate Keywords using following 2 lists:

  • offshore banking tax accounting
  • link exchange trade links swap link add url

Select Extract Meta Tag  and Extract Email

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV  

Now go to External Site Tab. Select "Follow External URL". Select Spidering Mode = I will Select the Depth. Select "Process 1 Page Only". Select "Spider Base URL only"

Now go to Filters - Text Filters tab. Check "page must contain following text" . Enter following string in the box:

  • links.html
  • link.html
  • resource.html
  • add url
  • submit url
  • add your site
  • submit your site

So that WDE will extract data from only those websites who do link exchange or add urls to their directories.

Now back to General tab and Click OK button.

After extraction completed, go to Data Tab - Meta Tag list. These are the related sites that do link exchange with other sites.

I want to extract real estate companies phone / fax numbers of Canada, Toronto area.

Go to New Session Dialog

Select "Source = Search Engines"

Enter real estate in Keyword Box

Select "Engine = Canada" from Engine Listing Dialog. You can lunch this dialog by clicking "Engines" button

Select Extract phone, fax

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV or line by line 

Now go to Filters Tab - Data Filters - Phone/Fax box. Enter
416
1416
in both boxes. so that WDE will extract only those phone/fax numbers that contain 416 or 1416
(See more info about phone/fax filter in Session Details page)

Now back to General tab and Click OK button.

Click OK button

I want to build a domain list of health/medicine related websites.

Go to New Session Dialog

Select "Source = Search Engines"

Enter following Keywords:
health
medicine
so on...

Select Extract URL (select Base URL)

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - line by line 

Click OK button

I have url list in a SQL database. I want to extract url, title, description, keyword, plain page text of html <BODY> to </BODY> and merge them into database.

WDE can not access SQL database. You need to export url list from SQL database to a plain text disk file, and use this file in WDE.

Go to New Session Dialog

Select "Source = URLs from File"

Enter url file path in File name box. This file must be plain text file with one URL per line and starting with http:// string each line.

Select "process 1 page only" to extract meta tag of specified root domain.  If you need to extract meta tag of ALL pages of each website then select depth=0

Select Extract Meta tag, Extract Body (you can set text size limit by clicking ... button)

Select "Save Data" folder , i.e. where program will save the data

Select Save Format - CSV 

Uncheck "View - Display data in data tab" for very large URL Meta tag extraction, so that WDE will not display data within program but will write directly to disk file - this will surely increase program performance.

Click OK button

After extraction completed, you can import this csv file (metatag.txt) to SQL databse and do further processing, query, etc...

Country Specific Search Engine List

Intelligent Spidering Mode
(extracts more contact data without visiting entire site)

Parser options

Meta tags limit, prefixes and more