Wikipedia:Reference desk/Computing
of the Wikipedia reference desk.
Main page: Help searching Wikipedia
How can I get my question answered?
- Select the section of the desk that best fits the general topic of your question (see the navigation column to the right).
- Post your question to only one section, providing a short header that gives the topic of your question.
- Type '~~~~' (that is, four tilde characters) at the end – this signs and dates your contribution so we know who wrote what and when.
- Don't post personal contact information – it will be removed. Any answers will be provided here.
- Please be as specific as possible, and include all relevant context – the usefulness of answers may depend on the context.
- Note:
- We don't answer (and may remove) questions that require medical diagnosis or legal advice.
- We don't answer requests for opinions, predictions or debate.
- We don't do your homework for you, though we'll help you past the stuck point.
- We don't conduct original research or provide a free source of ideas, but we'll help you find information you need.
How do I answer a question?
Main page: Wikipedia:Reference desk/Guidelines
- The best answers address the question directly, and back up facts with wikilinks and links to sources. Do not edit others' comments and do not give any medical or legal advice.
July 31
[edit]Semitic roots and LLM tokenisation
[edit]I recently was sent an abstract about failures of LLMs to correctly answer questions about the Quran in Arabic. That got me pondering. As I understand the tokenizers used in LLMs, they identify relatively frequent character sequences as tokens. That is a good match for languages that work (mostly) with a word stem and various pre- and postfixes for grammatical markers. But is this a good match for languages like Hebrew or Arabic that use multilateral roots and modify words by injecting extra characters in between the consonant roots? Moved from language desk, where it found no takers. --Stephan Schulz (talk) 12:43, 31 July 2025 (UTC)
- @Stephan Schulz Even if a language had special cases meaning that sequences of characters make less sense for tokenization, more exposure in training should make up for it by providing more combinations and relations between tokens to look at. We know that training was mainly done on content from the Internet, and there is a lot more English- and Chinese-language content in the training set than Arabic, Hebrew, Malay, and Georgian. On the other hand: I think would need to be researched properly but changing the method of tokenization for those language could help, as you say.
- I found a couple of papers on the subject, this one seems the most relevant (and also recent): https://aclanthology.org/2025.findings-acl.1151/ -- they present a "pre-processing method for subword tokenizers" called Splinter, specifically to help with these languages.
- As Quranic Arabic differs from the Arabics used today, that might also present issues. The Quranic Arabic in its training set would be much smaller than common Arabic found online and in modern books. A model not using a method like Splinter might need to answer from a translation.
- From my own experience, practically: I do sometimes ask LLMs to infer in English, and then translate it into the language I actually need. That is more reliable, though I have only tried it with Arabic (Arabizi specifically) rarely. Komonzia (talk) 17:41, 9 August 2025 (UTC)
- Thanks, especially for the source! It seems that I'm not the only one with the same thought (and the others are a lot more competent in the area than I am ;-). --Stephan Schulz (talk) 20:02, 9 August 2025 (UTC)
August 4
[edit]C: Drive on Windows 11 Dell Filling Up
[edit]I have a two-part question about a problem with my Dell Windows 11 desktop computer (bought in late 2022). About two days ago, I tried to save a Word document to a folder on my C: drive, and got an error message saying that the disk was full. I am in general familiar with this message, and have encountered various sorts of storage exhaustion on various sorts of computer equipment for more than fifty-five years. My C: drive is a 216 Gb solid-state drive. Normally when I restart the computer, a view of the This PC folder shows more than 30 Gb free. After I got the error message, a view of the This PC folder did show that there was no free storage on the C: drive. I have observed that after the computer has been up and running for a few days, free storage on the C: drive drops, sometimes to less than 10 Gb.
I deleted a few files that I knew I didn't need, which left a few megabytes free, less than 1%, but enough to save the Word document, and saved the Word document,. I then restarted the computer, and it showed that I had about 30 Gb free. It also showed that I have more than 900 Gb free on an external 4 Tb drive, but that is not the issue. I know that I can offload seldom-used files from my C: drive to my E: drive, but I don't want to do that.
So I have one short-to-medium-term question, and one medium-to-long-term question. First, what is causing my C: drive to fill up while the computer is running? It appears to be filling up the free space with some sort of crud that is cleared on restart. Is there a script or utility that I can use to free up the crud without restarting the computer? Second, is there a way, short of buying a new computer with a larger solid-state C: drive, that I can make more C: storage available? Is it feasible, with a three-year-old computer, to have the C: drive replaced with a larger internal solid state device? I know that if I come upon a thousand dollars that I don't need for food and rent, I can buy a new computer, and I like electronic equipment, but maybe I would rather travel to somewhere in that case. It I move some of my frequently used user data to a large thumb drive, how much will I degrade my performance?
To repeat the first question, what is causing my C: drive to fill up with crud that goes away? Is there a way to clear the crud short of restarting the computer? Robert McClenon (talk) 19:18, 4 August 2025 (UTC)
- I may be the Windows swap file. Ruslik_Zero 19:42, 4 August 2025 (UTC)
- I forgot to mention pagefile.sys, which is growing, but is not growing enough to explain all of what I am observing. It starts at about 12 GB. It promptly grows to 16 GB. I have seen it expand to 24 GB, and sometimes to 28 GB. I didn't happen to check its size when the free storage went to zero. If this happens again, I will check it. However, its growth accounts, as a guess, for about half of the loss of free space. So, yes, thank you, the pagefile is part of the problem, but I don't think it is all of the problem. Robert McClenon (talk) 21:13, 4 August 2025 (UTC)
- I've no experience with recent Windows versions and in particular don't know about its use of swap space, but I'll add a few comments.
- Accounting for storage space is sometimes a bit of a black art. The drive has space visible to the operating system, but may also have some hidden reserve; spare blocks that can be used when other blocks fail. This visible space may not be entirely allocated to filesystems. Your OS drive has multiple partitions, each their own filesystem, so your C "drive" is less than the entire solid state drive. Some of the filesystem is used for overhead and therefore unavailable to store files. Some space may be reserved for the OS, making it unavailable to users. Drive space is allocated to files in blocks, so if the size of a file is not a multiple of the block size, space is wasted. On the other hand, there are sparse files; files with entire blocks filled with zeros, which may be omitted from the drive. This means that the sum of all file sizes differs from the drive space taken by all files. Details depend on the filesystem in use. Finally, there's the difference between gigabytes (of 10003=1000000000 bytes) and gigabinarybytes (of 10243=1073741824 bytes). Tools to analyse file system usage often confuse these matters, each in their own way, leading to conflicting results.
- And keeping some space unused is good. It allows wear levelling to do its job, prolonging drive life, and gives more opportunity to clear unused blocks, improving write speed. On spinning drives, there's also a reduction of fragmentation. It's more useful than the 4% extra storage space on a drive that's 99% full compared to one that's 95% full.
- I don't know how much drive space Windows needs for itself and a basic set of software, but I read that it's a lot, a substantial fraction of your 216 GB. Which is actually pretty small for a computer that's only 3 years old. A somewhat decent computer should last at least 10 years nowadays. You can swap a larger drive in. It's not so hard, in particular in most desktops, but you'll have to reinstall your operating system. PiusImpavidus (talk) 09:13, 5 August 2025 (UTC)
- You can try to use Windows Disk Cleanup utility, especially its option of cleaning system files. Installed windows updates can take a lot of space. Ruslik_Zero 20:41, 5 August 2025 (UTC)
- Thank you, User: PiusImpavidus. I see that at least 59.0 Gb of my C: drive is being used for file system directories that are used by Windows. I infer that you are also saying that some of the 216 Gb is being used in ways that do not show up as file system directories.
- You say that it should be possible to swap out the 216 Gb drive for a larger drive. I haven't opened the cabinet of a desktop computer since the 1990s. So my question is how you are recommending that I resize my C: drive. I don't want to open the cabinet. Should I ask the dealer if I take it back to their shop for them to replace the C: drive, or will they try to sell me a new computer? Should I look for a third-party electronic repair shop that will swap the C: drive? Robert McClenon (talk) 03:52, 6 August 2025 (UTC)
- Some of the physical device in your computer isn't part of the C "drive" and some of that C "drive" may not be available for files. Depending on partitioning, it may be possible that some parts are currently unused and could be assigned to C, but I don't expect that will be significant.
- Many sellers (in particular the big ones) only sell, they don't repair. A decent third-party electronic repair shop should be able to swap the hard drive. Just visit them, explain your problem and ask what they can do.
- I suggest you take precautions to avoid loss or theft of your data. PiusImpavidus (talk) 09:13, 7 August 2025 (UTC)
- Thank you, User:PiusImpavidus. You say to take precautions to avoid loss or theft of my data. Do you have any specific suggestions for avoiding theft of my data? Do you have any specific suggestions for avoiding loss of my data? My defense against loss of my data is that I back the C: drive up periodically to the 4 TB external drive. I have not backed my C: drive up to the Microsoft Cloud. They don't need my data if I know how to copy data to a 4 TB external drive. Robert McClenon (talk) 18:07, 7 August 2025 (UTC)
- Backups on your external 4 TB drive are excellent. Regarding theft, if you've got some really sensitive files on that internal drive (like state secrets), wipe them before you hand your computer over to some repair shop. Just in case. Keep the backups at home. PiusImpavidus (talk) 19:13, 8 August 2025 (UTC)
- Thank you, User:PiusImpavidus. You say to take precautions to avoid loss or theft of my data. Do you have any specific suggestions for avoiding theft of my data? Do you have any specific suggestions for avoiding loss of my data? My defense against loss of my data is that I back the C: drive up periodically to the 4 TB external drive. I have not backed my C: drive up to the Microsoft Cloud. They don't need my data if I know how to copy data to a 4 TB external drive. Robert McClenon (talk) 18:07, 7 August 2025 (UTC)
August 6
[edit]Microsoft 365 Restarts for No Obvious Reason
[edit]I have a Dell desktop computer running Windows 11, and often have a lot of Word documents open, and four Access databases open. I have autorecovery saving So enabled. Sometimes when I have been away for a few hours (or have been in bed for the night), I sit down at my computer and discover that Word has some, but not all, of my Word documents open, and some of them are shown as being recovered files, and that Access is open to a blank screen. If I had had Excel open, it will also be open to a blank screen. I assume that this means that the Microsoft 365 suite or Microsoft Office or whatever it is called now has restarted itself. This is a nuisance rather than a serious problem, because I have to recover all of the changes to the Word documents from the autosaves, and any changes to any Excel spreadsheets from the Excel autosaves.
So I have a two-part question. First, what is causing Microsoft 365 to restart itself? I have the Event Viewer, but it isn't helping me because I don't know how to look for the relevant events. Second, what can I do to minimize these restarts? I know that maybe the answer to that depends on the answer to the first question. Robert McClenon (talk) 04:03, 6 August 2025 (UTC)
- First off, check if it is Office or Windows rebooting. In the task manager, select "more details" (if it doesn't show them already), then on the side check the tab "performance". Under CPU it should list "Up time". If that is lower than expected, it was your laptop rebooting and reopening office, not Office restarting.You can also check with Event Viewer, under "Critical" there is the "Kernel-Power" error if your system restarted without a shutdown. Under "Information" there is "Kernel-Boot" which should give you when the system booted if it rebooted "expectedly", from e.g. a Windows update. That should sort out which of the two it is, and give us a clue on what to troubleshoot. Hope that helps, Rmvandijk (talk) 09:46, 6 August 2025 (UTC)
- Thank you, User:Rmvandijk. I said that I had Event Viewer. I didn't say that I knew how to use it. It there a manual for how to filter its output so that I don't have to read through thousands of entries whose meanings are only known to those who know their meanings? Where do I look for system restarts, rather than for thousands of other events? Robert McClenon (talk) 22:17, 6 August 2025 (UTC)
- Hello User:Robert McClenon, this might be a bit difficult to explain without pictures, but I'll try. If you open Event viewer, you get 3 columns: the left one shows the folder structure, the middle the events, and the right the actions. We only need the middle one. It should show several boxes, and the only one we are interested in is the "Summary of Administrative Events". This shows a list of "Event Types" such as "critical" and "Information" (by default all are collapsed). If you open them, you can see the actual events. The "Source" shows the descriptions "Kernel Power" and "Kernel Boot" I mentioned earlier. By double-clicking on a source/event ID you get more information showing in the centre column. The Top box then shows the events and the date it was logged. The box below that shows a description in the "general" tab. you shouldn't need the details tab. To go back to the overview, there's a little blue left arrow in the top left. Hope this helps! Rmvandijk (talk) 09:04, 8 August 2025 (UTC)
- Thank you, User:Rmvandijk - I got the picture just fine. Where can I look up an explanation of what the RestartManager is and what to make of warnings issued by it, that it has been unable to restart each of the programs in Microsoft 365? Robert McClenon (talk) 19:31, 10 August 2025 (UTC)
- Hello User:Robert McClenon, this might be a bit difficult to explain without pictures, but I'll try. If you open Event viewer, you get 3 columns: the left one shows the folder structure, the middle the events, and the right the actions. We only need the middle one. It should show several boxes, and the only one we are interested in is the "Summary of Administrative Events". This shows a list of "Event Types" such as "critical" and "Information" (by default all are collapsed). If you open them, you can see the actual events. The "Source" shows the descriptions "Kernel Power" and "Kernel Boot" I mentioned earlier. By double-clicking on a source/event ID you get more information showing in the centre column. The Top box then shows the events and the date it was logged. The box below that shows a description in the "general" tab. you shouldn't need the details tab. To go back to the overview, there's a little blue left arrow in the top left. Hope this helps! Rmvandijk (talk) 09:04, 8 August 2025 (UTC)
- Thank you, User:Rmvandijk. I said that I had Event Viewer. I didn't say that I knew how to use it. It there a manual for how to filter its output so that I don't have to read through thousands of entries whose meanings are only known to those who know their meanings? Where do I look for system restarts, rather than for thousands of other events? Robert McClenon (talk) 22:17, 6 August 2025 (UTC)
August 7
[edit]Why do websites such as Reddit or LinkedIn display the approximate age (e.g. "two months ago") of a post instead of its exact date and time?
[edit]Thanks. Apokrif (talk) 01:12, 7 August 2025 (UTC)
- You'd be better off asking there, but I would suspect they see it as an unnecessary waste of resources to constantly update all the dates and times. Shantavira|feed me 08:28, 7 August 2025 (UTC)
- I don't see how the resources would be conserved by doing that; the easy thing would be to just leave the timestamp and let users do the math if they want to. No math to do at all. Next easiest would be to do it precisely: ("Comment is 4 days, 17 hours old"). Hardest would be what they're doing, which is taking the math and then converting it to something conversational ("About two months ago"). As a user, I like what they're doing: it gives you an immediate sense of whether the conversation is stale or not and you can find the exact timestamp if that's important to you. Matt Deres (talk) 14:01, 7 August 2025 (UTC)
you can find the exact timestamp
: how? Apokrif (talk) 15:41, 7 August 2025 (UTC)- Generally, hovering your pointer over the "one day ago" (or whatever). Reddit gives the timestamp down to the second. Not sure about LinkedIn; I don't use it much. They might not do it. Matt Deres (talk) 17:20, 7 August 2025 (UTC)
- Is the info available on mobile? Apokrif (talk) 23:15, 7 August 2025 (UTC)
- Generally, hovering your pointer over the "one day ago" (or whatever). Reddit gives the timestamp down to the second. Not sure about LinkedIn; I don't use it much. They might not do it. Matt Deres (talk) 17:20, 7 August 2025 (UTC)
- @Apokrif Generally, because a relative date is easier information for a user to consume/mentally process, than the exact dates and times. The exact dates and times require more mental calculations to process and often have a level of detail that is too high for most use cases. —TheDJ (talk • contribs) 14:26, 7 August 2025 (UTC)
- I agree, that's surely what they would say. E.g. looking at Reddit, you can find the timestamp in the HTML as e.g.
⟨time datetime="2020-04-02T12:17:17.182Z" title="Thursday, April 2, 2020 at 12:17:17 PM UTC" class="text-neutral-content-weak text-12"⟩5y ago⟨/time⟩
- This "service" is something that became common some time ago (15 years?). Unnecessarily taking away potentially important information from the first glance, not helping in any way, if you ask me.
- Icek~enwiki (talk) 17:18, 7 August 2025 (UTC)
- So why does Mediawiki software (as used for Wikimedia projects) do otherwise? 😉 Apokrif (talk) 23:16, 7 August 2025 (UTC)
- @Apokrif Mediawiki does still do proper UX research with their own users and the development process is more open. If you hover over the timestamp next to your message, you can see they still felt the need to put the relative time somewhere, but it was more practical (probably to make archiving easier too) to put the absolute timestamp at the very front. Komonzia (talk) 17:47, 9 August 2025 (UTC)
- I agree, that's surely what they would say. E.g. looking at Reddit, you can find the timestamp in the HTML as e.g.
- I don't see how the resources would be conserved by doing that; the easy thing would be to just leave the timestamp and let users do the math if they want to. No math to do at all. Next easiest would be to do it precisely: ("Comment is 4 days, 17 hours old"). Hardest would be what they're doing, which is taking the math and then converting it to something conversational ("About two months ago"). As a user, I like what they're doing: it gives you an immediate sense of whether the conversation is stale or not and you can find the exact timestamp if that's important to you. Matt Deres (talk) 14:01, 7 August 2025 (UTC)
- They do that specifically for me. When I look at pages like those, usually referred there througn some googling results, I'm interested how old some information is - and it's much much easier to comprehend and compare smething like '2 weeks' vs. '1 yr old' than 'March 17, 2024' vs. 'May 12, 2013'. Just because the most significant difference can be one of the least visible (a single digit at the end may mean more than a word and a number spanning the half of the date). --CiaPan (talk) 18:05, 7 August 2025 (UTC)
- My complaint would be that it stays "1 week ago" for 1 week, with an error up to 100% it does not seem to be so useful. It could be like "2025-07-31 02:15Z" or at least "1.1 weeks ago" for a little bit more information. Icek~enwiki (talk) 19:24, 7 August 2025 (UTC)
- It also loses accuracy over time, progressing from a resolution of minutes to hours to days to months to "about 1 year ago". Imagine future research of, say, people's attitudes during the covid pandemic. When exactly were these posts written? Oh, about 1 decade ago. Thanks for the information. Card Zero (talk) 00:48, 8 August 2025 (UTC)
- Agree to all that, but both of you are viewing this from the wrong lens. It's about letting people know how active or stale a thread or conversation is. On August 8, 2025, a post from August 7, 2023 might initially appear to still be active because people are generally inattentive. But saying it's from "two years ago" is easy to read and unambiguous. The difference between a week old thread and a two-week old thread is not material to that purpose. (Also, I'm sure different sites use different methods and they've certainly changed over time, but Reddit seems to use days until a post is a month old. At least currently. I only did a cursory glance, but found no "week" nomenclature.) Matt Deres (talk) 12:34, 8 August 2025 (UTC)
- It also loses accuracy over time, progressing from a resolution of minutes to hours to days to months to "about 1 year ago". Imagine future research of, say, people's attitudes during the covid pandemic. When exactly were these posts written? Oh, about 1 decade ago. Thanks for the information. Card Zero (talk) 00:48, 8 August 2025 (UTC)
- There was a recent UK story about an itinerant murderer who left numerous reviews of scenic locations on Google maps. One of these was close to the site where he did the murder, but the interesting matter of whether it was written on the same day is unknown, since the date was stated in terms of "about N months ago". Card Zero (talk) 00:42, 8 August 2025 (UTC)
- Google did not provide more info? Apokrif (talk) 02:23, 8 August 2025 (UTC)
- Perhaps to the cops, but it didn't reach the news sites. Besides this is more of a curiosity than something that could be used in evidence, since from what I saw his reviews just discussed painters and nature reserves and fishing spots, and used the word "tranquil" a lot. Card Zero (talk) 03:52, 8 August 2025 (UTC)
- Google did not provide more info? Apokrif (talk) 02:23, 8 August 2025 (UTC)
- My complaint would be that it stays "1 week ago" for 1 week, with an error up to 100% it does not seem to be so useful. It could be like "2025-07-31 02:15Z" or at least "1.1 weeks ago" for a little bit more information. Icek~enwiki (talk) 19:24, 7 August 2025 (UTC)
- I've wondered about this, too. It's not just Reddit and LinkedIn, it's virtually all modern software. It's quite annoying. I regularly find myself knowing the exact date of something I'm looking for in a long list, but if it's less than a week old, first I have to do math to determine that it's, say, 3 days ago.
- I honestly don't know if this is a meaningless fashion trend that all modern programmers copy each other on, or if they actually think people will find it useful, or if some majority of users actually do find it useful. I certainly don't, but I appear to be in some kind of minority. —scs (talk) 15:22, 9 August 2025 (UTC)
- Decisions like this aren't usually the programmers' unless the programmer is also a UX designer. Those decisions are based on (sometimes shoddy, assumption-laden) research, or, sometimes the idea spreads to other software (because the UX designer thinks it's a good idea or 'the done thing') without researching about whether that website's particular audience prefers one option or the other (in this case, precise versus relative timestamps). In cases where the programmer still had their wits about them and knew better, you can hover over a relative time to see the exact time (I prefer to see both). Komonzia (talk) 17:17, 9 August 2025 (UTC)
- It's possible that most people do prefer natural language relative times to precise absolute timestamps. But then, most people prefer TikTok to Reddit, so it's not clear why UX should conform to what most people like, as opposed to being good. Card Zero (talk) 12:18, 10 August 2025 (UTC)
- Decisions like this aren't usually the programmers' unless the programmer is also a UX designer. Those decisions are based on (sometimes shoddy, assumption-laden) research, or, sometimes the idea spreads to other software (because the UX designer thinks it's a good idea or 'the done thing') without researching about whether that website's particular audience prefers one option or the other (in this case, precise versus relative timestamps). In cases where the programmer still had their wits about them and knew better, you can hover over a relative time to see the exact time (I prefer to see both). Komonzia (talk) 17:17, 9 August 2025 (UTC)