[Search Engine](EN) Difference between crawling and indexing
Post about difference between crawling and indexing in search Engine
Environment and Prerequisite
- Web
Crawling and Indexing
Crawling
Crawling
: Finding web pages or contents in web using crawler or bots.- Each search engine company has its own crawling bot which crawls web pages.
- Prevent from crawling by using
robots.txt
file in site root.- Example: https://www.google.com/robots.txt
Indexing
Indexing
: Read content of discovered web page or content and save it to search engine in well organized format.- Each search engine company indexes discovered web page or content in well organized format.
- Prevent from indexing by using
<meta name="robots" content="noindex">
tag in<head></head>
tag.
<head>
<meta charset="utf-8">
...
<meta name="robots" content="noindex">
...
</head>
Caution
Issue
- Even though page has
<meta name="robots" content="noindex">
tag,noindex
tag may not work if it is blocked inrobots.txt
because its page cannot be checked.
Solution
- If page has
<meta name="robots" content="noindex">
tag then remove fromrobots.txt
file to be read from crawler to check<meta name="robots" content="noindex">
tag. - Related Link: https://developers.google.com/search/docs/advanced/crawling/block-indexing?hl=ko
Reference
- https://www.google.com/intl/en/search/howsearchworks/crawling-indexing/
- https://en.wikipedia.org/wiki/Search_engine_indexing
- https://www.geeksforgeeks.org/difference-between-crawling-and-indexing-in-search-engine-optimization-seo/
- https://developers.google.com/search/docs/advanced/crawling/block-indexing?hl=ko