Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add takestring! and make String(::Memory) not truncate#54372

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

jakobnissen
Copy link
Contributor

@jakobnissen jakobnissen commented May 6, 2024

This function creates a string from a byte vector, truncating the vector and reusing its memory where possible.

This also removes the truncating behaviour of String(::Memory) - see #54369

Still needs to be done

  • Make tests pass
  • Comb through Base to find places where take_string!(::Memory) should be used instead of String(::Memory) for efficiency [edit: couldn't find any]

Closes #54369

@jakobnissen jakobnissen marked this pull request as ready for review May 6, 2024 09:28
@jakobnissen jakobnissen added the status:awaiting review PR is complete and seems ready to merge. Has tests and news/compat if needed. CI failures unrelated. label May 6, 2024
@nhz2
Copy link
Contributor

nhz2 commented May 6, 2024

Also, check for uses of StringMemory for example:

slug = Base.StringMemory(len)
chars = b"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
for i = 1:len
slug[i] = chars[(Libc.rand() % length(chars)) + 1]
end
return String(slug)

#53962

base/strings/string.jl Outdated Show resolved Hide resolved
base/strings/string.jl Outdated Show resolved Hide resolved
@oscardssmith oscardssmith requested a review from vtjnash May 6, 2024 13:11
@stevengj
Copy link
Member

stevengj commented May 6, 2024

The name makes me think that take_string!(iobuffer) should be a replacement/synonym for String(take!(iobuffer))?

@oscardssmith oscardssmith added the status:triage This should be discussed on a triage call label May 7, 2024
@oscardssmith
Copy link
Member

I would be in favor of that. Every time I have used an IOBuffer I have had to go to the docs to figure out how to get the data out of it correctly.

@jariji
Copy link
Contributor

jariji commented May 7, 2024

So what's the specification of String(_): does it truncate or not? Or does it depend on the specific argument type? An "it depends" is not a very nice property for generic programming.

@jakobnissen
Copy link
Contributor Author

It does not truncate - except when the argument is a Vector{UInt8}. Yes, that's bad and in my opinion it shouldn't have been like that. But it's documented, and therefore can't change. I made this PR to make sure more exceptions aren't added.

@stevengj
Copy link
Member

stevengj commented May 9, 2024

Do we need an underscore here? takestring! is pretty readable

base/strings/string.jl Outdated Show resolved Hide resolved
@oscardssmith
Copy link
Member

oscardssmith commented May 9, 2024

Triage had a long talk about this, #54369, #54156 and #54273. The conclusions we came to are

  1. String(Memory) should copy
  2. String(Array) should truncate the array but not the Memory
  3. triage was right last time wrt view and reshape of Memory
  4. using the an Array aliased with another Array that is turned into a String is UB.
  5. take_string! is a much better API for IOBuffer et al.

@oscardssmith oscardssmith added domain:arrays [a, r, r, a, y, s] domain:strings "Strings!" and removed status:triage This should be discussed on a triage call status:awaiting review PR is complete and seems ready to merge. Has tests and news/compat if needed. CI failures unrelated. labels May 9, 2024
@nhz2 nhz2 mentioned this pull request May 9, 2024
@stevengj
Copy link
Member

stevengj commented May 9, 2024

The style guide says not to use underscores "when readable" without, which I think applies here.

@jakobnissen jakobnissen changed the title Add take_string! and make String(::Memory) not truncate Add takestring! and make String(::Memory) not truncate May 10, 2024
@topolarity
Copy link
Member

topolarity commented May 10, 2024

  1. using the an Array aliased with another Array that is turned into a String is UB.

Can this be refined to "mutating an Array after its alias has been turned into a String is UB"?

edit: For the record , I don't think it's right for us to put this contract on an API that's not marked unsafe. I expect that this 'contract' is broken much more often than the UB is actually triggered though, which is probably what's saving us in practice.

@oscardssmith
Copy link
Member

Yes. observing it is not great but not UB. the only UB is if you mutate it.

@nhz2
Copy link
Contributor

nhz2 commented May 10, 2024

takestring! can mutate the bytes in the underlying memory of a Vector input, it is not just making a String view.

@jakobnissen
Copy link
Contributor Author

jakobnissen commented May 10, 2024

After triage, this PR now contains the following changes:

  • Add takestring!. This function can be used with IOBuffer and Vector{UInt8}, returning a String. It resets the input to its initial state. This means it empties vectors, and empties IOBuffer if and only if the buffer is writable. This behaviour of not empying the input if the input is not writable is slightly weird, but matches take!
  • Add the internal methods Base.unsafe_takestring! and Base.unsafe_takestring, which are similar to takestring!, except they leave the argument (IOBuffer or Memory) in an unusable corrupt state. They are useful for the common pattern when code creates a buffer, takes a string from the buffer and then returns the string without touching the buffer again.
  • String(::Memory) now no longer truncates the memory
  • String(::Vector{UInt8}) now no longer truncates the underlying memory of the string. It still empties the vector, and assigns a new (empty) memory to the vector just like before.
  • The many different calls to String(take!(::IOBuffer)) and String(_unsafe_take!(::IOBuffer)) has been replaced with takestring! and the two internal methods Base.unsafe_takestring! and Base.unsafe_takestring). Not all calls to String(take!(...)) has been replaced, as there are many, many of such calls in the test suite and no need to change them.
  • Replace most uses of String(take!(::IOBuffer)) in Base and the stdlibs
  • Add tests

@jakobnissen jakobnissen removed the needs compat annotation Add !!! compat "Julia x.y" to the docstring label May 27, 2024
@jakobnissen
Copy link
Contributor Author

Rebased. Test failures look unrelated, but it's suspicious that they now failed twice on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays [a, r, r, a, y, s] domain:strings "Strings!" status:awaiting review PR is complete and seems ready to merge. Has tests and news/compat if needed. CI failures unrelated.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

String(v::Memory{UInt8}) doesn't make a copy
8 participants