Reading Data From a Txt and Storing Numbers to an Array
11. Reading and Writing Information Files: ndarrays
By Bernd Klein. Concluding modified: 01 February 2022.
There are lots of ways for reading from file and writing to data files in numpy. We will discuss the different ways and corresponding functions in this chapter:
- savetxt
- loadtxt
- tofile
- fromfile
- salvage
- load
- genfromtxt
Saving textfiles with savetxt
The start two functions we will cover are savetxt and loadtxt.
In the post-obit simple case, we define an assortment x and salvage information technology as a textfile with savetxt:
import numpy as np 10 = np . assortment ([[ 1 , ii , 3 ], [ iv , 5 , 6 ], [ 7 , 8 , ix ]], np . int32 ) np . savetxt ( "exam.txt" , x )
The file "test.txt" is a textfile and its content looks similar this:
[electronic mail protected]:~/Dropbox/notebooks/numpy$ more test.txt 1.000000000000000000e+00 2.000000000000000000e+00 three.000000000000000000e+00 4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00 Attention: The above output has been created on the Linux control prompt!
Information technology's as well possible to print the array in a special format, similar for instance with three decimal places or as integers, which are preceded with leading blanks, if the number of digits is less than 4 digits. For this purpose we assign a format string to the third parameter 'fmt'. We saw in our first example that the default delimeter is a bare. We can modify this behaviour past assigning a string to the parameter "delimiter". In most cases this string volition consist solely of a single grapheme but it can be a sequence of character, similar a smiley " :-) " too:
np . savetxt ( "test2.txt" , ten , fmt = " %ii.3f " , delimiter = "," ) np . savetxt ( "test3.txt" , x , fmt = " %04d " , delimiter = " :-) " )
The newly created files await like this:
[email protected]:~/Dropbox/notebooks/numpy$ more test2.txt 1.000,2.000,three.000 4.000,five.000,6.000 7.000,eight.000,9.000 [email protected]:~/Dropbox/notebooks/numpy$ more test3.txt 0001 :-) 0002 :-) 0003 0004 :-) 0005 :-) 0006 0007 :-) 0008 :-) 0009
The complete syntax of savetxt looks like this:
savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ')
| Parameter | Meaning |
|---|---|
| X | array_like Data to be saved to a text file. |
| fmt | str or sequence of strs, optional A single format (%10.5f), a sequence of formats, or a multi-format string, e.1000. 'Iteration %d -- %x.5f', in which case 'delimiter' is ignored. For circuitous 'X', the legal options for 'fmt' are: a) a unmarried specifier, "fmt='%.4e'", resulting in numbers formatted like "' (%s+%sj)' % (fmt, fmt)" b) a total string specifying every real and imaginary role, e.g. "' %.4e %+.4j %.4e %+.4j %.4e %+.4j'" for iii columns c) a listing of specifiers, one per column - in this case, the real and imaginary part must have separate specifiers, e.k. "['%.3e + %.3ej', '(%.15e%+.15ej)']" for 2 columns |
| delimiter | A string used for separating the columns. |
| newline | A string (e.yard. "\northward", "\r\n" or ",\north") which will terminate a line instead of the default line ending |
| header | A Cord that will be written at the beginning of the file. |
| footer | A String that will be written at the end of the file. |
| comments | A String that will exist prepended to the 'header' and 'footer' strings, to mark them as comments. The hash tag '#' is used as the default. |
Loading Textfiles with loadtxt
Nosotros will read in at present the file "test.txt", which we have written in our previous subchapter:
y = np . loadtxt ( "exam.txt" ) print ( y ) OUTPUT:
[[ 1. two. iii.] [ 4. 5. vi.] [ 7. 8. 9.]]
y = np . loadtxt ( "test2.txt" , delimiter = "," ) print ( y ) OUTPUT:
[[ 1. two. 3.] [ four. 5. half-dozen.] [ seven. 8. 9.]]
Zip new, if we read in our text, in which we used a smiley to separator:
y = np . loadtxt ( "test3.txt" , delimiter = " :-) " ) impress ( y ) OUTPUT:
[[ 1. 2. 3.] [ 4. 5. 6.] [ vii. 8. 9.]]
It's also possible to choose the columns by index:
y = np . loadtxt ( "test3.txt" , delimiter = " :-) " , usecols = ( 0 , 2 )) impress ( y ) OUTPUT:
[[ ane. 3.] [ 4. vi.] [ 7. 9.]]
We will read in our adjacent example the file "times_and_temperatures.txt", which we have created in our chapter on Generators of our Python tutorial. Every line contains a time in the format "hh::mm::ss" and random temperatures between x.0 and 25.0 degrees. We take to convert the time string into float numbers. The fourth dimension will exist in minutes with seconds in the hundred. We define first a part which converts "hh::mm::ss" into minutes:
def time2float_minutes ( time ): if type ( time ) == bytes : time = time . decode () t = time . split ( ":" ) minutes = float ( t [ 0 ]) * lx + float ( t [ 1 ]) + float ( t [ 2 ]) * 0.05 / 3 return minutes for t in [ "06:00:10" , "06:27:45" , "12:59:59" ]: impress ( time2float_minutes ( t )) OUTPUT:
360.1666666666667 387.75 779.9833333333333
You lot might have noticed that we bank check the blazon of time for binary. The reason for this is the use of our function "time2float_minutes in loadtxt in the following example. The keyword parameter converters contains a dictionary which tin hold a function for a column (the key of the column corresponds to the key of the dictionary) to convert the cord information of this column into a bladder. The string data is a byte string. That is why nosotros had to transfer it into a a unicode string in our function:
y = np . loadtxt ( "times_and_temperatures.txt" , converters = { 0 : time2float_minutes }) impress ( y ) OUTPUT:
[[ 360. twenty.1] [ 361.5 16.i] [ 363. 16.9] ..., [ 1375.v 22.5] [ 1377. 11.1] [ 1378.5 15.two]]
# delimiter = ";" , # i.e. use ";" every bit delimiter instead of whitespace tofile
tofile is a function to write the content of an array to a file both in binary, which is the default, and text format.
A.tofile(fid, sep="", format="%s")
The data of the A ndarry is always written in 'C' lodge, regardless of the order of A.
The data file written by this method can be reloaded with the role fromfile().
| Parameter | Meaning |
|---|---|
| fid | can be either an open file object, or a string containing a filename. |
| sep | The cord 'sep' defines the separator between assortment items for text output. If it is empty (''), a binary file is written, equivalent to file.write(a.tostring()). |
| format | Format string for text file output. Each entry in the array is formatted to text by first converting it to the closest Python blazon, and and so using 'format' % item. |
Remark:
Data on endianness and precision is lost. Therefore it may not be a good idea to utilise the office to archive information or transport data between machines with dissimilar endianness. Some of these problems can be overcome by outputting the data as text files, at the expense of speed and file size.
dt = np . dtype ([( 'time' , [( 'min' , int ), ( 'sec' , int )]), ( 'temp' , float )]) x = np . zeros (( 1 ,), dtype = dt ) x [ 'fourth dimension' ][ 'min' ] = 10 x [ 'temp' ] = 98.25 print ( x ) fh = open ( "test6.txt" , "bw" ) x . tofile ( fh ) OUTPUT:
Alive Python training
Upcoming online Courses
Information Analysis With Python
09 Mar 2022 to 11 Mar 2022
18 May 2022 to xx May 2022
31 Aug 2022 to 02 Sep 2022
19 October 2022 to 21 Oct 2022
Enrol here
fromfile
fromfile to read in information, which has been written with the tofile office. It'due south possible to read binary data, if the data type is known. It's also possible to parse but formatted text files. The data from the file is turned into an array.
The general syntax looks like this:
numpy.fromfile(file, dtype=float, count=-1, sep='')
| Parameter | Significant |
|---|---|
| file | 'file' can exist either a file object or the name of the file to read. |
| dtype | defines the information type of the array, which volition be constructed from the file data. For binary files, it is used to determine the size and byte-club of the items in the file. |
| count | defines the number of items, which will be read. -1 means all items will be read. |
| sep | The string 'sep' defines the separator betwixt the items, if the file is a text file. If it is empty (''), the file will be treated as a binary file. A space (" ") in a separator matches naught or more whitespace characters. A separator consisting solely of spaces has to friction match at least i whitespace. |
fh = open ( "test4.txt" , "rb" ) np . fromfile ( fh , dtype = dt ) OUTPUT:
array([((4294967296, 12884901890), i.0609978957e-313), ((30064771078, 38654705672), two.33419537056e-313), ((55834574860, 64424509454), three.60739284543e-313), ((81604378642, 90194313236), 4.8805903203e-313), ((107374182424, 115964117018), 6.1537877952e-313), ((133143986206, 141733920800), vii.42698527006e-313), ((158913789988, 167503724582), viii.70018274493e-313), ((184683593770, 193273528364), 9.9733802198e-313)], dtype=[('time', [('min', '<i8'), ('sec', '<i8')]), ('temp', '<f8')]) import numpy every bit np import bone # platform dependent: departure between Linux and Windows #data = np.arange(fifty, dtype=np.int) data = np . arange ( 50 , dtype = np . int32 ) data . tofile ( "test4.txt" ) fh = open ( "test4.txt" , "rb" ) # 4 * 32 = 128 fh . seek ( 128 , os . SEEK_SET ) x = np . fromfile ( fh , dtype = np . int32 ) print ( x ) OUTPUT:
[32 33 34 35 36 37 38 39 twoscore 41 42 43 44 45 46 47 48 49]
Attention:
It can crusade problems to use tofile and fromfile for information storage, because the binary files generated are non platform independent. There is no byte-society or data-type information saved by tofile. Data tin can exist stored in the platform contained .npy format using save and load instead.
All-time Practice to Load and Save Data
The recommended way to store and load data with Numpy in Python consists in using load and save. We also use a temporary file in the following :
import numpy as np impress ( x ) from tempfile import TemporaryFile outfile = TemporaryFile () x = np . arange ( 10 ) np . salvage ( outfile , 10 ) outfile . seek ( 0 ) # Simply needed here to simulate endmost & reopening file np . load ( outfile ) OUTPUT:
[32 33 34 35 36 37 38 39 twoscore 41 42 43 44 45 46 47 48 49] array([0, 1, 2, 3, four, five, six, 7, 8, 9])
and nonetheless another fashion: genfromtxt
At that place is yet another way to read tabular input from file to create arrays. As the proper noun implies, the input file is supposed to be a text file. The text file tin exist in the grade of an archive file as well. genfromtxt tin can procedure the archive formats gzip and bzip2. The type of the annal is determined by the extension of the file, i.east. '.gz' for gzip and bz2' for an bzip2.
genfromtxt is slower than loadtxt, simply it is capable of coping with missing information. It processes the file data in two passes. At first it converts the lines of the file into strings. Thereupon information technology converts the strings into the requested data type. loadtxt on the other hand works in 1 go, which is the reason, why it is faster.
recfromcsv(fname, **kwargs)
This is not actually another way to read in csv data. 'recfromcsv' basically a shortcut for
np.genfromtxt(filename, delimiter=",", dtype=None)
Live Python preparation
Upcoming online Courses
Information Assay With Python
09 Mar 2022 to eleven Mar 2022
18 May 2022 to 20 May 2022
31 Aug 2022 to 02 Sep 2022
19 Oct 2022 to 21 October 2022
Enrol here
Reading Data From a Txt and Storing Numbers to an Array
Source: https://python-course.eu/numerical-programming/reading-and-writing-data-files-ndarrays.php
0 Response to "Reading Data From a Txt and Storing Numbers to an Array"
Post a Comment