Data gathered by Search Auditor

Interpretation of more than 100 parameters gathered and considered in formulas and modes by Clusteric Search Auditor

ONSITE FACTORS

HTTP_STATUS
The HTTP status in response to a request (e.g. 200, 404). The program distinguishes 301 redirects into three categories:
– Within a domain (abcdefg.com -301-> www.abcdefg.com)
– Redirects to the user’s domain (abcdefg.com -301-> youranalyseddomain.com)
– Redirects to another domain (abcdefg.com -> gfedcba.com)

and errors evident by a lack of response (non-existent domain, timeout)
PAGE_TITLE
Title (<title>) of the site.

PBN_PROBABILITY
(BETA!) Program tries to recognize public blog network (PBN) sites (beta version) – based on onsite factors.

INTERNAL
Total number of internal links.

EXTERNAL
Total number of outgoing links from the subpage to other domains/subdomains (total).
EXTERNAL = EXT_DOMAIN + EXT_SUBDOMAIN

EXT_DOMAIN
Total number of links going from the subpage to other top domains (abcdefg.com -> hijklmn.com).

EXT_DOMAIN_DF
Total number of links going from the subpage to other top domains (abcdefg.com -> hijklmn.com) – with DOFOLLOW links.

EXT_SUBDOMAIN
Total number of links going from the subpage to subdomains within the main domain (abcdefg.com -> blog.abcdefg.com or news.abcdefg.com -> sport.abcdefg.com).

EXT_UNIQ_DOM
Total number of unique top level domains linked from page.

EXT_UNIQ_DOM_DF
Total number of unique top level domains linked from page – with DOFOLLOW links.

LINK_FOUND
Information on whether the user’s link has been found on the site.

REL
The type and number of found links to the user’s domain, e.g. DFx2, NF means two dofollow links and one nofollow link.

ANCHOR
Anchors of links leading to the user’s domain.

LINK_TARGET
Addresses of subpages within the user’s domain, where the found links go.

LINK_CONTEXT
The surroundings of the link (links) leading to the user’s domain. The HTML element and length of the text surrounding the link, e.g. p:330 means <p> (paragraph) with a link inside, with 330 characters of text aside from the link in the paragraph.

TEXT_LEN
Total lenght of all texts found on page.

LINK_CONTEXT_DESC
The guessed context of the link (links) leading to the user’s domain. For instance, it can be: “comment” for blog page or “post”/”signature” for forum.

JS_IN_SOURCE
Recognized popular scripts installed on the site, e.g. ADSENSE, ANALYTICS, PIWIK.
For some scripts, Link Auditor extracts account ids as well.

DUPLICATE_CONTENT
Duplicated content in other domains. A parameter set with use of the Google search engine. The longest text on the site is selected, and then the number of the text’s occurrences in domains other than the one the link is located in is assessed. Exemplary results:
– 43800 – the text occurs 43800 times outside the given domain
– not enough text – adequately long and coherent text was not found on the subpage, which makes it impossible to draw sensible conclusions on duplicated content

PAGE_LANGUAGE
Recognized language of the texts on the site, stated in percentage.

ENGINE
Recognized type and engine of the site, e.g. cms_blog:Wordpress.

WORDS_BLACKLISTED
„Suspicious” words typical for spam and pornography found on the site. A list of the found words is given along with their frequency.

PAGE_TOPIC
(BETA!) Link Auditor tries to guess page topic based on keywords model.

KEYWORDS SCORE:

At the beginning of analysis, user can define a set of keywords. Link Auditor will search for these keywords during onsite factors evaluation. This feature can be used during link profile analysis and link prospecting as well.

KEYWORDS_SCORE
Total keywords score for a link – based on other KEYWORDS_* parameters.

KEYWORDS_MATCH
Determines the degree of keyword overlap between defined keywords set and link’s meta keywords.

KEYWORDS_IN_TITLE
Determines which of the defined keywords appear in the analyzed site’s title.

KEYWORDS_IN_DESCRIPTION
Determines which of the defined keywords appear in the description metatag of the analyzed site.

KEYWORDS_IN_HEADERS
Determines which of the defined keywords appear in the h1-h6 headers of the analyzed site.

KEYWORDS_IN_ANCHOR
Determines which of the defined keywords appear in the anchors of links leading to the user’s site.

KEYWORDS_IN_TEXT
Determines which of the defined keywords appear in the analyzed site’s content.

GOOGLE PARAMETERS

INDEXED
The number of indexed subpages in the link’s domain.

PAGE_INDEXED
Is link indexed in Google? 1 = link in index.

DOMAIN_PR
Value of domain’s Page Rank parameter.

PAGE_PR
Value of the link’s Page Rank parameter.

CACHE_TITLE, CACHE_DATE
The program checks if Google cache is available for the link and returns its date plus title of the cached page (if found).

SAFEBROWSING
The program uses safebrowsing API (https://developers.google.com/safe-browsing/) to check whether the domain is on the list of sites potentially dangerous to users. Exemplary responses:
– ok – no information on danger
– malware, phishing – classification of site usage danger has been stated

MOZ.COM PARAMETERS

The program downloads moz.com parameters available in a free-of-charge API. To download these parameters, please register at: http://moz.com/products/api/pricing and add your API credentials in “External APIs” window.

Moz_Domain_Authority
The main domain’s Domain Authority parameter.

Moz_Domain_MozRank
The main domain’s Moz Rank parameter.

Moz_Domain_Homepage_Links
Total number of links leading to the domain’s homepage.

Moz_Page_MozRank
The Moz Rank parameter for a given link.

Moz_Page_External_Links
Number of external links to the subpage.

Moz_Page_Authority
The Authority parameter for a given link.

Moz_Page_Links
Total number of links (external and internal) indicating a given address.

Moz_Page_Title
Title (<title>) of the site during the last request made to the site by Moza robots.

Moz_Page_Canonical_URL
Canonic form of the address for the link.

Moz_Page_Status_Code
The http status during the last request made to the site by Moza robots.

AHREFS PARAMETERS

The program downloads Ahrefs parameters by using their OpenApps API (if the access data is stated by the user – “External APIs” window) or by proxy (not advised – not effective).

Ahrefs_BL
Number of links to the domain.

Ahrefs_RD
Number of domains linking to the main domain.

Ahrefs_RIP
Number of IP addresses from which links to the main domain originate.

Ahrefs_RCC
Number of IP C-classes from which links to the main domain originate.

Ahrefs_Domain_Rating
The Domain Rating parameter for the main domain.

MAJESTIC SEO PARAMETERS

The program downloads Majestic SEO parameters – using their API (if the access data is stated by the user – “External APIs” window)

Majestic_BL
Number of links to the main domain.

Majestic_RD
Number of domains linking to the main domain.

Majestic_TF
The Trust Flow parameter of the main domain.

Majestic_CF
The Citation Flow parameter of the main domain.

Majestic_RIP
Number of IP addresses from which links to the main domain originate.

Majestic_RCC
Number of IP C-classes from which links to the main domain originate.

SOCIAL STATS

Link Auditor downloads data on link’s/homepage’s popularity in social platforms (using Social API). For homepage’s stats, Link Auditor can download params from its own server cache (if available).

HOMEPAGE STATS:
Homepage’s popularity in social media:
FB_Total, FB_Likes, FB_Shares, FB_Comments, FB_Total, Google+, Twitter, StumbleUpon, Pins, LinkedIn

SUBPAGE STATS:
The program collects data on popularity of a given link in social media:
FB_Likes_subpage, FB_Shares_subpage, FB_Comments_subpage, FB_Total_subpage, Google+_subpage, Twitter_subpage, StumbleUpon_subpage, Pins_subpage, LinkedIn_subpage

ALEXA.COM PARAMETERS

ALEXA_RANK
The site’s alexa global ranking.

ALEXA_LOCAL
The site’s local (domestic) alexa ranking.

ALEXA_BL
Number of links to the domain that were indexed by alexa.

WHOIS RECORDS

Link Auditor checks whois registers and returns domain creation/expiry dates, DNS servers, registrar name and found e-mail addresses (if available):
WHOIS_created, WHOIS_expiry, WHOIS_registrar, WHOIS_email, WHOIS_DNS

OTHER DOMAIN PARAMETERS

WEB_ARCHIVE
Date of the domain’s first indexing by web.archive.org. It determines the domain’s minimal age.

TLD
Domain of the highest level for the link (e.g. „com”, „org”).

GEO_IP
Geolocation of the main domain’s IP address (country).

IP
The main domain’s IP address.

LINKS’ RATING

ALGO_RATING
Based on the total formulas score and taking into account additional user preferences, i.e.:
– selection of „critical” formulas which cause the link to be assessed as bad (BAD) regardless of total formulas score, e.g. a link in a domain banned in Google can be always defined as bad
– a defined list of „trusted” domains (links in these domains are always rated as good (OK)), meaning e.g. lists of one’s own background blogs

the program divides links into 3 groups (RATING parameter): OK (good/safe), AVG (average or slightly risky), BAD (bad, low quality or typical spam).

SITE_WIDE
At the second stage of link import, there is a possibility of marking the domains that include SITE-WIDE links to the analyzed site.

FORMULAS_SCORE
The formulas parameter is an assessment of the link’s quality in the form of a list of red formulas (spam) and green formulas (quality, trust) satisfied by the analyzed link. Total formulas score for a given link that is also the basis for its classification as safe or harmful is stated as well.

API_SPAM
The API_SPAM parameter means an spam assessment of a link returned by API.
API considers:
– Information on onsite factors of a given domain’s various subpages gathered by the Clusteric servers
– The domain’s ranking factors
– The onsite parameters collected by the program (as long as they were selected for analysis) with particular emphasis on onsite elements related to the user’s domain
Based on the above, the AI mechanisms assess the probability of spam in the link, returning its assessment in a range from 0 (not suspicious) to 100 (almost certain spam/site of high risk).

BH_LISTS_OCCURS
Thanks to utilization of Clusteric API, the program can check whether or not and when a domain has appeared on spam lists (dedicated for usage in software for link construction such as Xrumer, GSA Search Engine Ranker, Scrapebox). An exemplary verse is:
– 2012-09-03 – 2014-07-14
– n/f, if there is no information on a domain’s occurrence on lists.
Information alone on a domain’s occurrence on spam lists does not mean that the domain is of low quality/spammy. Linking this information with additional factors (link the number of links going out of a subpage: EXTERNAL, EXT_DOMAIN) can be a good basis for conclusions on the degree of spam of a domain/its specific subpage.

DOMAIN_PROFILE_POWER
Determines, how strong domains’ link profile is. (0-10 range).

DOMAIN_PROFILE_RISK
Determines, how risky (manipulated/potentially unnatural) domains’ link profile seems to be. (0-10 range, 0=very low risk, 10=high risk).

TAGS
User can tag/comment links using Link Auditor, which can make further analysis easier.

DATE_ADDED
Determines the first time, a link was added to project.

SEARCH ENGINE VISIBILITY

CLUSTERIC Search Auditor is capable of showing links’ visibility in Google and domain-level visibility data (up to 50 000 keywords per domain). This data is available with active CLUSTERIC PREMIUM access.

CLUSTERIC Search Auditor provides you with monthly visits from organic search estimation. URL-level parameters include:
SERP_VISIBLITY – estimated number of visits (monthly) from organic search
SERP_TOPIC – estimated number of visits (monthly) from organic search – by topic
SERP_SUBTOPIC – estimated number of visits (monthly) from organic search – by subtopic
SERP_KEYWORDS – estimated number of visits (monthly) from organic search – by keyword. With some additional data such as SERP rankings in monitored countries, number of results and number of monthly searches.

CLUSTERIC Search Auditor is capable of providing up to 50 000 visibility rows per domain. This data includes keywords, subpages, traffic estimations and other params. The interpretation is as follows:
SUBPAGE – URL address of subpage visible in search results
GOOGLE_TITLE – subpage’s title as presented in Google
EST_VISITS – estimated monthly organic visits (for each row)
KEYWORD_CAT – keyword category
KEYWORD_SUBCAT – keyword subcategory
SERP_RESULTS – search results count for keyword
RANK_PL / RANK_UK / RANK_US / RANK_NL – rank in monitored country
SEARCHES_PL / SEARCHES_UK / SEARCHES_US / SEARCHES_NL – number of monthly searches (by country)