Wednesday, 29 May 2013

Limitations and Challenges in Effective Web Data Mining

Web data mining and data collection is critical process for many business and market research firms today. Conventional Web data mining techniques involve search engines like Google, Yahoo, AOL, etc and keyword, directory and topic-based searches. Since the Web's existing structure cannot provide high-quality, definite and intelligent information, systematic web data mining may help you get desired business intelligence and relevant data.

Factors that affect the effectiveness of keyword-based searches include:
• Use of general or broad keywords on search engines result in millions of web pages, many of which are totally irrelevant.
• Similar or multi-variant keyword semantics my return ambiguous results. For an instant word panther could be an animal, sports accessory or movie name.
• It is quite possible that you may miss many highly relevant web pages that do not directly include the searched keyword.

The most important factor that prohibits deep web access is the effectiveness of search engine crawlers. Modern search engine crawlers or bot can not access the entire web due to bandwidth limitations. There are thousands of internet databases that can offer high-quality, editor scanned and well-maintained information, but are not accessed by the crawlers.

Almost all search engines have limited options for keyword query combination. For example Google and Yahoo provide option like phrase match or exact match to limit search results. It demands for more efforts and time to get most relevant information. Since human behavior and choices change over time, a web page needs to be updated more frequently to reflect these trends. Also, there is limited space for multi-dimensional web data mining since existing information search rely heavily on keyword-based indices, not the real data.

Above mentioned limitations and challenges have resulted in a quest for efficiently and effectively discover and use Web resources. Send us any of your queries regarding Web Data mining processes to explore the topic in more detail.


Source: http://ezinearticles.com/?Limitations-and-Challenges-in-Effective-Web-Data-Mining&id=5012994

Tuesday, 28 May 2013

Real Estate and Mortgage Data for Your Site

Turn Your Site Into a Real Estate Portal With Zillow

The new Zillow API Network turns member sites into mini real estate portals by offering fresh and provocative real estate content to keep people coming back.
Home Valuation

Search results list, Zestimate®, Rent Zestimate®, home valuations, home valuation charts, comparable houses, and market trend charts.

API calls of interest:

- GetZestimate
- GetSearchResults
- GetChart
- GetComps

Property Details

Property-level data, including historical sales price and year, taxes, beds/baths, etc.

API calls of interest

- GetDeepComps
- GetDeepSearchResults
- GetUpdatedPropertyDetails

Neighborhood Data

- Neighborhood and city affordability statistics: Zillow Home Value Index, Zestimate distribution, median single family home and condo values, average tax rates, and percentage of flips.
- Demographic data at the city and neighborhood level
- Lists of counties, cities, ZIP codes, and neighborhoods, as well as latitude and longitude data for these areas so you can put them on a map.

API calls of interest

- GetDemographics
- GetRegionChildren
- GetRegionChart

Mortgage Rates

Current mortgage rates from Zillow Mortgage Marketplace broken down by state and loan type (30 year fixed, 15 year fixed, 5/1 ARM).

API calls of interest

- GetRateSummary

Mortgage Calculators

A full suite of mortgage and real estate calculators so you can add a full calculator section on your website.

API calls of interest

- GetMonthlyPayments - Simple mortgage payment calculator
- CalculateMonthlyPaymentsAdvanced - Advanced mortgage payment calculator
- CalculateAffordability
- CalculateRefinance
- CalculateAdjustableMortgage
- CalculateMortgageTerms
- CalculateDiscountPoints
- CalculateBiWeeklyPayment
- CalculateNoCostVsTraditional
- CalculateTaxSavings
- CalculateFixedVsAdjustableRate
- CalculateClosingCostImpact
- CalculateInterstOnlyVsTraditional
- CalculateHELOC

Zillow Neighborhood Boundaries

- Neighborhood boundaries for nearly 7,000 neighborhoods and 150 cities.
- Available via a Creative Commons license.

- Learn more

Get Started

- Read the Terms of Use to make sure your integration adheres to the fine print.
- Read the Developer Guide for your choice of API to get instructions for making Web Service calls.
- Get a Zillow Web Services ID (ZWSID). You must create a Zillow account to get the ZWSID.
- Start coding — be sure to use your ZWSID to make your API calls.

Sign UpForgot your ZWSID?
Some things you need to know...

The Zillow API Network is a free service. Network Member requirements are summarized below; for details please see our Terms of Use.
For all APIs

- Limit queries to 1,000 per day, per API. If you think you need more, Please request a higher call limit by filling out a request form
- Provide a Web-based user interface directly to consumers for their personal use (no caching or data storage)
- Include Zillow text, images, and links in all API input and output elements
- Make Zillow technology available for free to customers, without registration
- The license is only valid for use on a Member Web site; special permissions are required for uses such as Yahoo widgets
- Adhere to all Zillow API Network Terms of Use
- Notify Zillow of any breach of terms and help resolve them

Specific to Property Details API

- Each consumer-facing provider must have its own ZWSID
- Implement anti-scraping measures

Your Site to Zillow and Back Again

You can see below how Zillow exposes data through its various API calls. (For example, in order to get a Chart, enter the address in the GetSearchResults API and then use the returned ZPID to call the GetCharts API.) Click on the links to see Developer Guides for specific API calls.

Source: http://www.zillow.com/howto/api/APIOverview.htm

Scraping Property Listings

A few months ago a client of mine received an email from a web application called myhousehunt. The site was scraping property listings from the major real estate portals and populating a portion of each listing on their website. They were then allowing property seekers to build a timetable for open inspections which could be printed or emailed to the user’s phone.

When i first looked at myhousehunt I noticed it was displaying listings from REA and Domain. I thought this was unusual as I doubt they had permission from either portal to do this. I recently visited the site and noticed they explicitly state that:

    “MyHouseHunt will search for properties currently listed on Domain.com.au”

So it appears Domain have given the green light while REA in typical style, have unleashed their lawyers on the scraping site.

I’m always puzzled why REA will never share their content. What threat does a site like myhousehunt have on a real estate portal, in particular REA? Nothing, they display a few details of each listing with a link back to the original property on REA. All this will do is strengthen the REA brand and drive traffic back to their portal increasing those UB’s they’re always talking about. Thumbs up to Domain for embracing applications like this as it encourages greater innovation and the creation of cool and useful real estate applications.

Source: http://www.business2.com.au/2009/12/scraping-property-listings/