Welcome, guest! Login / Register - Why register?
[email protected] webmail now available. Want one? Go here.
You will not find us on Bing and you will get a bounce when you try registering with an outlook/hotmail/live email address here. And the irony is they spam us and we don't spam them. Not our fault. Blame the corporate bully. #microsoftdeez


Pasted as Python by nrg ( 10 years ago )
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys

current_word = None
current_count = 0
word = None

for line in sys.stdin:
 line = line.strip()
 (word, count) = line.split('\t', 1)

 count = int(count)

 # Так как Hadoop сортирует выдачу вывода mapper.py, то данные пиходят к нам в отсортированном порядке
 if current_word == word:
  current_count += count
  # Если слово поменялось, то выводим результаты предыдущего подсчета 
  if current_word:
   print '%s\t%s' % (current_word, current_count)

  current_word = word
  current_count = count

# В конце цикла нужно вывести результаты подсчета
if current_word == word:
 print '%s\t%s' % (current_word, current_count)


Revise this Paste

Your Name: Code Language: