python - Why influx performance is so slow -
i storing data in influx , quite confusing influx 4-5 times slow mysql. try test inserting 10000 rows in mysql , in influxdb. , stats below.
mysql real: 6m 39sec user: 2.956sec sys: 0.504sec influxdb real: 6m 17.193sec user: 11.860sec sys: 0.328sec
my code influx given below, used same pattern store in mysql.
#!/usr/bin/env python # coding: utf-8 import time import csv import sys import datetime import calendar import pytz influxdb import client influxdb datetime import datetime host = 'localhost' port = 8086 user = "admin" password = "admin" db_name = "testdatabase" db = influxdb.influxdbclient(database=db_name) def read_data(): open(file) f: reader = f.readlines()[4:] line in reader: yield (line.strip().split(',')) fmt = '%y-%m-%d %h:%m:%s' file = '/home/rob/mycsvfile.csv' csvtoinflux = read_data() body = [] metric in csvtoinflux: timestamp = datetime.strptime(metric[0][1: len(metric[0]) - 1], fmt) new_value = float(metric[1]) body.append({ 'measurement': 'mytable1', 'time': timestamp, 'fields': { 'col1': metric[1], 'col2': metric[2], 'col3': metric[3], 'col4': metric[4], 'col5': metric[5], 'col6': metric[6], 'col7': metric[7], 'col8': metric[8], 'col9': metric[9] } }) db.write_points(body)
can give me idea how can improve it. think might due cache. cache option off default in influx db? , can guide me batch processing in influx. try on , google couldn't solve problem. newbie influx db. trying make faster. or tips.
inserting 1 one influxdb slow, should in batches. example, trying csv of 10000 lines (one one):
with open('/tmp/blah.csv') f: lines = f.readlines() import influxdb inf = influxdb.influxdbclient('localhost', 8086, 'root', 'root', 'example1') line in lines: parts = line.split(',') json_body = [{ 'measurement': 'one_by_one', 'time': parts[0], 'fields':{ 'my_value': int(parts[1].strip()) } }] inf.write_points(json_body)
this gives me result of
└─ $ ▶ time python influx_one.py real 1m43.655s user 0m19.547s sys 0m3.266s
and doing small change insert lines of csv in 1 go:
json_body = [] line in lines: parts = line.split(',') json_body.append({ 'measurement': 'one_batch', 'time': parts[0], 'fields':{ 'my_value': int(parts[1].strip()) } }) inf.write_points(json_body)
the result much better:
└─ $ ▶ time python influx_good.py real 0m2.693s user 0m1.797s sys 0m0.734s
Comments
Post a Comment