파이썬에서 한 문자열을 다른 문자열에 어떻게 추가합니까?
파이썬에서 한 문자열을 다른 문자열에 추가하는 효율적인 방법을 원합니다.
var1 = "foo"
var2 = "bar"
var3 = var1 + var2
사용할 좋은 기본 제공 방법이 있습니까?
문자열에 대한 참조가 하나만 있고 다른 문자열을 끝에 연결하는 경우 CPython은 이제 특수한 경우에 해당 문자열을 제자리에서 확장하려고합니다.
최종 결과는 작업이 O (n)으로 상각된다는 것입니다.
예 :
s = ""
for i in range(n):
s+=str(i)
예전에는 O (n ^ 2) 였지만 지금은 O (n)입니다.
소스 (bytesobject.c)에서 :
void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
PyBytes_Concat(pv, w);
Py_XDECREF(w);
}
/* The following function breaks the notion that strings are immutable:
it changes the size of a string. We get away with this only if there
is only one module referencing the object. You can also think of it
as creating a new string object and destroying the old one, only
more efficiently. In any case, don't use this if the string may
already be known to some other part of the code...
Note that if there's not enough memory to resize the string, the original
string object at *pv is deallocated, *pv is set to NULL, an "out of
memory" exception is set, and -1 is returned. Else (on success) 0 is
returned, and the value in *pv may or may not be the same as on input.
As always, an extra byte is allocated for a trailing \0 byte (newsize
does *not* include that), and a trailing \0 byte is stored.
*/
int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
register PyObject *v;
register PyBytesObject *sv;
v = *pv;
if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
*pv = 0;
Py_DECREF(v);
PyErr_BadInternalCall();
return -1;
}
/* XXX UNREF/NEWREF interface should be more symmetrical */
_Py_DEC_REFTOTAL;
_Py_ForgetReference(v);
*pv = (PyObject *)
PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
if (*pv == NULL) {
PyObject_Del(v);
PyErr_NoMemory();
return -1;
}
_Py_NewReference(*pv);
sv = (PyBytesObject *) *pv;
Py_SIZE(sv) = newsize;
sv->ob_sval[newsize] = '\0';
sv->ob_shash = -1; /* invalidate cached hash value */
return 0;
}
경험적으로 검증하는 것은 쉽습니다.
$ python -m timeit -s "s = ''" "i in xrange (10) : s + = 'a'" 1000000 루프, 최고 3 : 1.85 usec 루프 당 $ python -m timeit -s "s = ''" "i in xrange (100) : s + = 'a'" 10000 루프, 최고 3 : 루프 당 16.8 usec $ python -m timeit -s "s = ''" "for i in xrange (1000) : s + = 'a'" 10000 loops, best of 3: 158 usec per loop $ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'" 1000 loops, best of 3: 1.71 msec per loop $ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'" 10 loops, best of 3: 14.6 msec per loop $ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'" 10 loops, best of 3: 173 msec per loop
It's important however to note that this optimisation isn't part of the Python spec. It's only in the cPython implementation as far as I know. The same empirical testing on pypy or jython for example might show the older O(n**2) performance .
$ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'" 10000 loops, best of 3: 90.8 usec per loop $ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'" 1000 loops, best of 3: 896 usec per loop $ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'" 100 loops, best of 3: 9.03 msec per loop $ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'" 10 loops, best of 3: 89.5 msec per loop
So far so good, but then,
$ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'" 10 loops, best of 3: 12.8 sec per loop
ouch even worse than quadratic. So pypy is doing something that works well with short strings, but performs poorly for larger strings.
Don't prematurely optimize. If you have no reason to believe there's a speed bottleneck caused by string concatenations then just stick with +
and +=
:
s = 'foo'
s += 'bar'
s += 'baz'
That said, if you're aiming for something like Java's StringBuilder, the canonical Python idiom is to add items to a list and then use str.join
to concatenate them all at the end:
l = []
l.append('foo')
l.append('bar')
l.append('baz')
s = ''.join(l)
Don't.
That is, for most cases you are better off generating the whole string in one go rather then appending to an existing string.
For example, don't do: obj1.name + ":" + str(obj1.count)
Instead: use "%s:%d" % (obj1.name, obj1.count)
That will be easier to read and more efficient.
str1 = "Hello"
str2 = "World"
newstr = " ".join((str1, str2))
That joins str1 and str2 with a space as separators. You can also do "".join(str1, str2, ...)
. str.join()
takes an iterable, so you'd have to put the strings in a list or a tuple.
That's about as efficient as it gets for a builtin method.
If you need to do many append operations to build a large string, you can use StringIO or cStringIO. The interface is like a file. ie: you write
to append text to it.
If you're just appending two strings then just use +
.
it really depends on your application. If you're looping through hundreds of words and want to append them all into a list, .join()
is better. But if you're putting together a long sentence, you're better off using +=
.
Python 3.6 gives us f-strings, which are a delight:
var1 = "foo"
var2 = "bar"
var3 = f"{var1}{var2}"
print(var3) # prints foobar
You can do most anything inside the curly braces
print(f"1 + 1 == {1 + 1}") # prints 1 + 1 == 2
Basically, no difference. The only consistent trend is that Python seems to be getting slower with every version... :(
List
%%timeit
x = []
for i in range(100000000): # xrange on Python 2.7
x.append('a')
x = ''.join(x)
Python 2.7
1 loop, best of 3: 7.34 s per loop
Python 3.4
1 loop, best of 3: 7.99 s per loop
Python 3.5
1 loop, best of 3: 8.48 s per loop
Python 3.6
1 loop, best of 3: 9.93 s per loop
String
%%timeit
x = ''
for i in range(100000000): # xrange on Python 2.7
x += 'a'
Python 2.7:
1 loop, best of 3: 7.41 s per loop
Python 3.4
1 loop, best of 3: 9.08 s per loop
Python 3.5
1 loop, best of 3: 8.82 s per loop
Python 3.6
1 loop, best of 3: 9.24 s per loop
a='foo'
b='baaz'
a.__add__(b)
out: 'foobaaz'
append strings with __add__ function
str = "Hello"
str2 = " World"
st = str.__add__(str2)
print(st)
Output
Hello World
참고URL : https://stackoverflow.com/questions/4435169/how-do-i-append-one-string-to-another-in-python
'Nice programing' 카테고리의 다른 글
Android의 전체 화면 활동? (0) | 2020.10.03 |
---|---|
1980 년 이래 컴퓨팅 분야의 중요한 새로운 발명 (0) | 2020.10.03 |
JavaScript의 "새"키워드가 유해한 것으로 간주됩니까? (0) | 2020.10.03 |
C #에서 문자열 앞의 @는 무엇입니까? (0) | 2020.10.03 |
Entity Framework에서 삽입 된 엔티티의 ID를 얻으려면 어떻게해야합니까? (0) | 2020.10.03 |