python - Why influx performance is so slow -

i storing data in influx , quite confusing influx 4-5 times slow mysql. try test inserting 10000 rows in mysql , in influxdb. , stats below.

mysql  real: 6m 39sec user: 2.956sec   sys: 0.504sec  influxdb  real: 6m 17.193sec user: 11.860sec sys: 0.328sec

my code influx given below, used same pattern store in mysql.

#!/usr/bin/env python # coding: utf-8 import time import csv import sys import datetime import calendar import pytz influxdb import client influxdb datetime import datetime  host = 'localhost' port = 8086 user = "admin" password = "admin" db_name = "testdatabase" db = influxdb.influxdbclient(database=db_name)   def read_data():     open(file) f:         reader = f.readlines()[4:]        line in reader:             yield (line.strip().split(','))   fmt = '%y-%m-%d %h:%m:%s' file = '/home/rob/mycsvfile.csv'  csvtoinflux = read_data() body = [] metric in csvtoinflux:     timestamp = datetime.strptime(metric[0][1: len(metric[0]) - 1], fmt)      new_value = float(metric[1])     body.append({         'measurement': 'mytable1',         'time': timestamp,         'fields': {              'col1': metric[1],              'col2': metric[2],              'col3': metric[3],              'col4': metric[4],              'col5': metric[5],              'col6': metric[6],              'col7': metric[7],              'col8': metric[8],              'col9': metric[9]         }         })     db.write_points(body)

can give me idea how can improve it. think might due cache. cache option off default in influx db? , can guide me batch processing in influx. try on , google couldn't solve problem. newbie influx db. trying make faster. or tips.

inserting 1 one influxdb slow, should in batches. example, trying csv of 10000 lines (one one):

with open('/tmp/blah.csv') f:     lines = f.readlines()  import influxdb  inf = influxdb.influxdbclient('localhost', 8086, 'root', 'root', 'example1')  line in lines:     parts = line.split(',')     json_body = [{         'measurement': 'one_by_one',         'time': parts[0],         'fields':{             'my_value': int(parts[1].strip())         }     }]     inf.write_points(json_body)

this gives me result of

└─ $ ▶ time python influx_one.py  real    1m43.655s user    0m19.547s sys     0m3.266s

and doing small change insert lines of csv in 1 go:

json_body = [] line in lines:     parts = line.split(',')     json_body.append({         'measurement': 'one_batch',         'time': parts[0],         'fields':{             'my_value': int(parts[1].strip())         }     })  inf.write_points(json_body)

the result much better:

└─ $ ▶ time python influx_good.py  real    0m2.693s user    0m1.797s sys     0m0.734s

Search This Blog

Brent

python - Why influx performance is so slow -

Comments

Post a Comment

Popular posts from this blog

inversion of control - Autofac named registration constructor injection -

ios - Change Storyboard View using Seague -

verilog - Systemverilog dynamic casting issues -