The influence of the keywords selection method on the effectiveness of the web pages classification using the boosting algorithm

Tomasz Gąciarz,

Krzysztof Czajkowski

Abstrakt

The paper concerns the issues of web pages analysis process. The classification is performed based on the analysis of the structure as well content of pages. Various characteristics are taken into account including inter alia, structural, visual, text, web and links features. During the construction of classifiers the AdaBoost algorithm was applied. This paper focuses on the impact of keyword selection methods on the effectiveness of the classification process.

Słowa kluczowe: web page, features extraction, classification, AdaBoost