Post about Iterable and Iterator


Environment and Prerequisite

  • Python


Definition of Iterable and Iterator

Iterable

  • An object capable of returning its members one at a time.
  • In document, it is written An object capable of returning its members one at a time..
  • “Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements Sequence semantics.”
  • If iterable object is passed to built-in function iter(), it returns its object’s iterator.
>>> a = [1,2,3]
>>> type(a)
<class 'list'>
>>> a_iterator = iter(a)
>>> type(a_iterator)
<class 'list_iterator'>

Iterator

  • An object representing a stream of data.
  • In document, it is written An object representing a stream of data..
  • Repeated calls to the iterator’s __next__() method return successive items in the stream. It raise StopIteration exception if there is no more data.
  • Iterator has __iter__() method which returns iterator itself so iterator is also iterable.
>>> a = [1,2,3]
>>> type(a)
<class 'list'>
>>> a_iterator = iter(a)
>>> type(a_iterator)
<class 'list_iterator'>
>>> dir(a_iterator)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__length_hint__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__']


Iterator Types

Iterator Protocol

  • Python supports iteration concept. This is implemented by below two methods(iterator.__iter__() and iterator.__next__()) and these are used to allow user-defined classes to support iteration.
  • Document only tells about container but it looks it supports user-defined classes.
  • The iterator objects themselves are required to support the following two methods, which together form the iterator protocol.

iterator.__iter__()

  • Return the iterator object itself.

iterator.__next__()

  • Return the next item from the object. If there are no further items, raise the StopIteration exception.


Iteration in container

  • One method needs to be defined for container objects to provide iteration support.
  • It could be implemented by above Iterator Protocol. (Writer’s opinion)

container.__iter__()

  • Return an iterator object. This object is required to support the Iterator Protocol described above.


Additional Questions

Relation between Iterable and Iterator

  • As wrote above, we call all objects to Iterable which is capable of returning its members one at a time and one of its implmentation methods is using Iterator.
  • Implement __iter()__ method to class which you want to make Iterable to return Iterator. That Iterator should be implemented with __iter()__ method which returns itself and __next()__ method which returns its next member following to above Iterator Protocol in Iterator Types.


Difference between __iter__() and __getitem__()


Summary and Conclusion

At first I was just curious about Iterable and Iterator but those are little bit difficult when I searched.

In short, Iterable is an object capable of returning its members one at a time such as list, str, dict, file object and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements Sequence semantics. We can commonly call object which can return its members one at a time.

Now Iterator is an object representing a stream of data. It can be implemented with two methods(iterator.__iter__() and iterator.__next__()) according to Iterator Protocol. Iterator is also Iterable because it has __iter__() method. We can make Iterable object via using this Iterator.

List returns Iterator when use its __iter__() method or call built-in function iter(). This Iterator internally implements two methods(iterator.__iter__() and iterator.__next__()).

a = [1, 2, 3]

print(type(a))
print(type(a.__iter__()))
print(type(iter(a)))

print()

print(dir(a))
print(dir(a.__iter__()))
print(dir(iter(a)))
<class 'list'>
<class 'list_iterator'>
<class 'list_iterator'>

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__length_hint__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__']
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__length_hint__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__']


Reference

환경변수를 가져오는 방법 정리


환경

  • Python 3.X 이상


환경변수 가져오기

os 모듈 사용

기본 형태

  • key에 해당하는 환경변수가 있다면 반환한다.
  • key에 해당하는 환경변수가 없다면 default를 반환한다.
  • default의 값이 없고 key에 해당하는 환경변수가 없다면 None을 반환한다.
  • 환경변수 반환값의 타입은 str이다.
os.getenv(key, default=None)

예시

  • 환경변수 설정
export TEST="TEST env value"
export NUM=123
  • 환경변수 가져오기
>>> import os
>>> os.getenv("TEST", "TEST env default value")
'TEST env value'
  • 반환형 확인
>>> import os
>>> type(os.getenv("NUM", "NUM env default value"))
<class 'str'>
  • key 값이 없을때 default를 가져오기
>>> import os
>>> os.getenv("default", "TEST env default value")
'TEST env default value'
>>> type(os.getenv("default", "TEST env default value"))
<class 'str'>
>>> import os
>>> os.getenv("default", 12345)
12345
>>> type(os.getenv("default", 12345))
<class 'int'>
  • keydefault도 없다면 None을 반환
>>> import os
>>> type(os.getenv("NONE TEST"))
<class 'NoneType'>


참고자료

Post about get environment variable


Environment and Prerequisite

  • Python 3.X or higher


Get Environment Variable

Use os Module

Basic Form

  • Returns the environment variable key if it exists.
  • If key does not exists, then return default value.
  • If default is not set and environment variable of key does not exists, then return None.
  • Return type of environment variable is str.
os.getenv(key, default=None)

Example

  • Set environment variable
export TEST="TEST env value"
export NUM=123
  • Get environment variable
>>> import os
>>> os.getenv("TEST", "TEST env default value")
'TEST env value'
  • Check return type
>>> import os
>>> type(os.getenv("NUM", "NUM env default value"))
<class 'str'>
  • Get default if key does not exists
>>> import os
>>> os.getenv("default", "TEST env default value")
'TEST env default value'
>>> type(os.getenv("default", "TEST env default value"))
<class 'str'>
>>> import os
>>> os.getenv("default", 12345)
12345
>>> type(os.getenv("default", 12345))
<class 'int'>
  • Return None if both key and default does not exists
>>> import os
>>> type(os.getenv("NONE TEST"))
<class 'NoneType'>


Reference

검색 엔진에서 나오는 용어인 크롤링(Crawling)과 인덱싱(Indexing)의 차이에 대한 내용


환경


크롤링(Crawling)과 인덱싱(Indexing)

크롤링(Crawling)

  • 크롤링(Crawling): 크롤러나 봇(Bot)을 통해서 웹에 있는 웹 페이지들과 컨텐츠들을 찾아다니는 작업
  • 각 검색 엔진 회사들은 자기들만의 크롤링을 하는 봇(Bot)이 있으며 이를 통해서 웹 페이지들을 크롤링한다.
  • 사이트 루트에 robots.txt 파일을 통해서 크롤링을 막을 수 있다.


인덱싱(Indexing)

  • 인덱싱(Indexing): 크롤러나 봇(Bot)을 통해서 발견한 웹 페이지나 컨텐츠의 내용을 읽어서 해당 정보들을 검색 엔진에 구조화하여 저장하는 작업
  • 각 검색 엔진은 발견한 웹 페이지나 컨텐츠를 구조화하여 각 검색 엔진에 맞게 인덱싱한다.
  • 페이지 소스안에 <meta name="robots" content="noindex"> 태그를 <head></head> 태그안에 넣어서 인덱싱을 막을 수 있다.
<head>
<meta charset="utf-8">
...
<meta name="robots" content="noindex">
...
</head>


주의사항

이슈

  • <meta name="robots" content="noindex"> 태그가 추가되어 있더라도 robots.txt 파일에 의해서 접근이 막혀있다면 페이지 자체를 확인할 수 없기 때문에 noindex 태그가 적용되지 않을 수 있다.

해결 방법


참고자료

Post about difference between crawling and indexing in search Engine


Environment and Prerequisite

  • Web


Crawling and Indexing

Crawling

  • Crawling: Finding web pages or contents in web using crawler or bots.
  • Each search engine company has its own crawling bot which crawls web pages.
  • Prevent from crawling by using robots.txt file in site root.


Indexing

  • Indexing: Read content of discovered web page or content and save it to search engine in well organized format.
  • Each search engine company indexes discovered web page or content in well organized format.
  • Prevent from indexing by using <meta name="robots" content="noindex"> tag in <head></head> tag.
<head>
<meta charset="utf-8">
...
<meta name="robots" content="noindex">
...
</head>


Caution

Issue

  • Even though page has <meta name="robots" content="noindex"> tag, noindex tag may not work if it is blocked in robots.txt because its page cannot be checked.

Solution


Reference